-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 90 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 219 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 24
Collections
Discover the best community collections!
Collections including paper arxiv:2512.20619
-
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper • 2512.19526 • Published • 10 -
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
Paper • 2512.18314 • Published • 8 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 24
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Paper • 2512.17532 • Published • 64 -
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper • 2512.19693 • Published • 61 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
EgoX: Egocentric Video Generation from a Single Exocentric Video
Paper • 2512.08269 • Published • 115
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 50 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 270
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper • 2512.16093 • Published • 90 -
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 219 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199 -
Sharp Monocular View Synthesis in Less Than a Second
Paper • 2512.10685 • Published • 24
-
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
Paper • 2512.19526 • Published • 10 -
MatSpray: Fusing 2D Material World Knowledge on 3D Geometry
Paper • 2512.18314 • Published • 8 -
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Paper • 2512.17351 • Published • 24
-
Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding
Paper • 2512.17532 • Published • 64 -
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding
Paper • 2512.19693 • Published • 61 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
EgoX: Egocentric Video Generation from a Single Exocentric Video
Paper • 2512.08269 • Published • 115
-
MMGR: Multi-Modal Generative Reasoning
Paper • 2512.14691 • Published • 114 -
KlingAvatar 2.0 Technical Report
Paper • 2512.13313 • Published • 40 -
SemanticGen: Video Generation in Semantic Space
Paper • 2512.20619 • Published • 88 -
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI
Paper • 2512.16676 • Published • 199
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 50 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 270
-
ARE: Scaling Up Agent Environments and Evaluations
Paper • 2509.17158 • Published • 35 -
ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation
Paper • 2510.08551 • Published • 33 -
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
Paper • 2510.04212 • Published • 23 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 27
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 19 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48