Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2510.14545

about 6 hours ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 8 days ago • 83
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published 19 days ago • 105
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104

Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104

The official datasets and model checkpoints of AEPO

Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
dongguanting/Qwen3-8B-AEPO-DeepSearch

Text Generation • 8B • Updated Oct 27 • 10 • 1
dongguanting/Qwen3-14B-AEPO-DeepSearch

Robotics • 15B • Updated Oct 21 • 8 • 1
dongguanting/Qwen2.5-7B-AEPO

Text Generation • 8B • Updated Oct 27 • 22 • 1

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 114
Kwaipilot/KAT-Dev-72B-Exp

Text Generation • 73B • Updated Oct 13 • 680 • 155
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 22 days ago • 17

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 70
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26 • 79
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14 • 115
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published Oct 15 • 62
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7 • 53

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3 • 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 105
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 495
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6 • 30

Read Later Stack

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13 • 31
Self-Improving LLM Agents at Test-Time

Paper • 2510.07841 • Published Oct 9 • 9
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6 • 22
DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published Oct 13 • 27

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Paper • 2510.18927 • Published Oct 21 • 83

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158

about 6 hours ago

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6 • 129
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 8 days ago • 83
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Paper • 2511.16043 • Published 19 days ago • 105
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26 • 79
Robot Learning: A Tutorial

Paper • 2510.12403 • Published Oct 14 • 115
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE

Paper • 2510.13344 • Published Oct 15 • 62
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding

Paper • 2510.06308 • Published Oct 7 • 53

Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104

Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward

Paper • 2510.03222 • Published Oct 3 • 75
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Paper • 2510.05592 • Published Oct 7 • 105
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 495
Multi-Agent Tool-Integrated Policy Optimization

Paper • 2510.04678 • Published Oct 6 • 30

The official datasets and model checkpoints of AEPO

Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
dongguanting/Qwen3-8B-AEPO-DeepSearch

Text Generation • 8B • Updated Oct 27 • 10 • 1
dongguanting/Qwen3-14B-AEPO-DeepSearch

Robotics • 15B • Updated Oct 21 • 8 • 1
dongguanting/Qwen2.5-7B-AEPO

Text Generation • 8B • Updated Oct 27 • 22 • 1

Read Later Stack

Demystifying Reinforcement Learning in Agentic Reasoning

Paper • 2510.11701 • Published Oct 13 • 31
Self-Improving LLM Agents at Test-Time

Paper • 2510.07841 • Published Oct 9 • 9
Making Mathematical Reasoning Adaptive

Paper • 2510.04617 • Published Oct 6 • 22
DocReward: A Document Reward Model for Structuring and Stylizing

Paper • 2510.11391 • Published Oct 13 • 27

FlowRL: Matching Reward Distributions for LLM Reasoning

Paper • 2509.15207 • Published Sep 18 • 114
Kwaipilot/KAT-Dev-72B-Exp

Text Generation • 73B • Updated Oct 13 • 680 • 155
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO

Paper • 2511.13288 • Published 22 days ago • 17

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
Agentic Entropy-Balanced Policy Optimization

Paper • 2510.14545 • Published Oct 16 • 104
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping

Paper • 2510.18927 • Published Oct 21 • 83

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 70
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 22
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 158

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs