MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published Oct 9 • 109
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22 • 27
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22 • 27
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs Paper • 2509.18056 • Published Sep 22 • 27 • 3
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation Paper • 2406.00670 • Published Jun 2, 2024
Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction Paper • 2412.06244 • Published Dec 9, 2024
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3 • 13
Revisiting Efficient Semantic Segmentation: Learning Offsets for Better Spatial and Class Feature Alignment Paper • 2508.08811 • Published Aug 12 • 2
A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models Paper • 2508.01548 • Published Aug 3 • 13
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning Paper • 2504.07960 • Published Apr 10 • 50
ROICtrl: Boosting Instance Control for Visual Generation Paper • 2411.17949 • Published Nov 27, 2024 • 87