Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Paper • 2510.27571 • Published Oct 31 • 17
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published Oct 14 • 108
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20 • 52
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning Paper • 2503.07588 • Published Mar 10 • 7