-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 83 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 105 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104
Collections
Discover the best community collections!
Collections including paper arxiv:2510.14545
-
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
dongguanting/Qwen3-8B-AEPO-DeepSearch
Text Generation • 8B • Updated • 10 • 1 -
dongguanting/Qwen3-14B-AEPO-DeepSearch
Robotics • 15B • Updated • 8 • 1 -
dongguanting/Qwen2.5-7B-AEPO
Text Generation • 8B • Updated • 22 • 1
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 114 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 680 • 155 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 17
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 115 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62 -
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Paper • 2510.06308 • Published • 53
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 495 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 83
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158
-
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129 -
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices
Paper • 2512.01374 • Published • 83 -
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
Paper • 2511.16043 • Published • 105 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104
-
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights
Paper • 2509.22944 • Published • 79 -
Robot Learning: A Tutorial
Paper • 2510.12403 • Published • 115 -
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Paper • 2510.13344 • Published • 62 -
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
Paper • 2510.06308 • Published • 53
-
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
Paper • 2510.03222 • Published • 75 -
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Paper • 2510.05592 • Published • 105 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 495 -
Multi-Agent Tool-Integrated Policy Optimization
Paper • 2510.04678 • Published • 30
-
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
dongguanting/Qwen3-8B-AEPO-DeepSearch
Text Generation • 8B • Updated • 10 • 1 -
dongguanting/Qwen3-14B-AEPO-DeepSearch
Robotics • 15B • Updated • 8 • 1 -
dongguanting/Qwen2.5-7B-AEPO
Text Generation • 8B • Updated • 22 • 1
-
Demystifying Reinforcement Learning in Agentic Reasoning
Paper • 2510.11701 • Published • 31 -
Self-Improving LLM Agents at Test-Time
Paper • 2510.07841 • Published • 9 -
Making Mathematical Reasoning Adaptive
Paper • 2510.04617 • Published • 22 -
DocReward: A Document Reward Model for Structuring and Stylizing
Paper • 2510.11391 • Published • 27
-
FlowRL: Matching Reward Distributions for LLM Reasoning
Paper • 2509.15207 • Published • 114 -
Kwaipilot/KAT-Dev-72B-Exp
Text Generation • 73B • Updated • 680 • 155 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 17
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
Agentic Entropy-Balanced Policy Optimization
Paper • 2510.14545 • Published • 104 -
BAPO: Stabilizing Off-Policy Reinforcement Learning for LLMs via Balanced Policy Optimization with Adaptive Clipping
Paper • 2510.18927 • Published • 83
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 70 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 22 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 23 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 4
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 133 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 141 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 158