Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.15144

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
mlfoundations-cua-dev/osworld-trajectories

Viewer • Updated Nov 5 • 13.7k • 1.19k

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21 • 46
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14 • 44
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

continuous pretraining

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

mPLUG/GUI-Owl-7B

8B • Updated Aug 22 • 959 • 49
mPLUG/GUI-Owl-32B

33B • Updated Aug 22 • 308 • 22
mPLUG/GUI-Owl-7B-Desktop-RL

8B • Updated Sep 17 • 7 • 3
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 140
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158

AI Paper of the Day

A collection of papers that I think are interesting, one added each day

Can Large Language Models Understand Context?

Paper • 2402.00858 • Published Feb 1, 2024 • 23
OLMo: Accelerating the Science of Language Models

Paper • 2402.00838 • Published Feb 1, 2024 • 85
Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 151
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity

Paper • 2401.17072 • Published Jan 30, 2024 • 25

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25 • 29
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
mlfoundations-cua-dev/osworld-trajectories

Viewer • Updated Nov 5 • 13.7k • 1.19k

continuous pretraining

Scaling Agents via Continual Pre-training

Paper • 2509.13310 • Published Sep 16 • 117
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 317
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4 • 263
DINOv3

Paper • 2508.10104 • Published Aug 13 • 285

Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Paper • 2508.15760 • Published Aug 21 • 46
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28 • 116

mPLUG/GUI-Owl-7B

8B • Updated Aug 22 • 959 • 49
mPLUG/GUI-Owl-32B

33B • Updated Aug 22 • 308 • 22
mPLUG/GUI-Owl-7B-Desktop-RL

8B • Updated Sep 17 • 7 • 3
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

UI-Venus Technical Report: Building High-performance UI Agents with RFT

Paper • 2508.10833 • Published Aug 14 • 44
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 140
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs