Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 4 days ago • 150
Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation Paper • 2512.04678 • Published 4 days ago • 37
EditThinker: Unlocking Iterative Reasoning for Any Image Editor Paper • 2512.05965 • Published 3 days ago • 30
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty Paper • 2512.05927 • Published 3 days ago • 8
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning Paper • 2512.02425 • Published 7 days ago • 22
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 20 days ago • 74
Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published 21 days ago • 64
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published 27 days ago • 40
Diffusion-SDPO: Safeguarded Direct Preference Optimization for Diffusion Models Paper • 2511.03317 • Published Nov 5 • 6
WithAnyone: Towards Controllable and ID Consistent Image Generation Paper • 2510.14975 • Published Oct 16 • 84
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper • 2510.15870 • Published Oct 17 • 89
NANO3D: A Training-Free Approach for Efficient 3D Editing Without Masks Paper • 2510.15019 • Published Oct 16 • 63
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published Oct 6 • 47
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10 • 49