Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2504.16072

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Paper • 2410.09604 • Published Oct 12, 2024
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6 • 11
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11 • 6
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 137
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30 • 46

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning

Running on Zero

Featured

339

Describe Anything

⚡

339

Describe masked parts of images
nvidia/DAM-3B

Image-Text-to-Text • Updated May 7 • 2.81k • 128
nvidia/DAM-3B-Video

Image-Text-to-Text • Updated May 7 • 1.71k • 57
nvidia/DAM-3B-Self-Contained

Image-Text-to-Text • Updated May 7 • 695 • 24

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Paper • 2505.01456 • Published May 1 • 2
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Paper • 2505.24760 • Published May 30 • 74

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Describe Anything

Multimodal Large Language Models for Detailed Localized Image and Video Captioning

Running on Zero

Featured

339

Describe Anything

⚡

339

Describe masked parts of images
nvidia/DAM-3B

Image-Text-to-Text • Updated May 7 • 2.81k • 128
nvidia/DAM-3B-Video

Image-Text-to-Text • Updated May 7 • 1.71k • 57
nvidia/DAM-3B-Self-Contained

Image-Text-to-Text • Updated May 7 • 695 • 24

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Paper • 2505.01456 • Published May 1 • 2
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63

Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
EmbodiedCity: A Benchmark Platform for Embodied Agent in Real-world City Environment

Paper • 2410.09604 • Published Oct 12, 2024
Geospatial Mechanistic Interpretability of Large Language Models

Paper • 2505.03368 • Published May 6 • 11
Scenethesis: A Language and Vision Agentic Framework for 3D Scene Generation

Paper • 2505.02836 • Published May 5 • 8

TTRL: Test-Time Reinforcement Learning

Paper • 2504.16084 • Published Apr 22 • 120
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
Reinforcing General Reasoning without Verifiers

Paper • 2505.21493 • Published May 27 • 26
REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Paper • 2505.24760 • Published May 30 • 74

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

Paper • 2504.08641 • Published Apr 11 • 6
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters

Paper • 2504.08791 • Published Apr 7 • 137
Describe Anything: Detailed Localized Image and Video Captioning

Paper • 2504.16072 • Published Apr 22 • 63
A Survey of Interactive Generative Video

Paper • 2504.21853 • Published Apr 30 • 46

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10 • 88

RuCCoD: Towards Automated ICD Coding in Russian

Paper • 2502.21263 • Published Feb 28 • 133
Unified Reward Model for Multimodal Understanding and Generation

Paper • 2503.05236 • Published Mar 7 • 122
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7 • 46
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning

Paper • 2503.05592 • Published Mar 7 • 27

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 26
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 104

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs