-
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Paper • 2504.04718 • Published • 42 -
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Paper • 2504.03561 • Published • 18 -
Concept Lancet: Image Editing with Compositional Representation Transplant
Paper • 2504.02828 • Published • 16
Collections
Discover the best community collections!
Collections including paper arxiv:2503.01307
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Agency Is Frame-Dependent
Paper • 2502.04403 • Published • 23 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 40 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models
Paper • 2504.04718 • Published • 42 -
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Paper • 2504.03561 • Published • 18 -
Concept Lancet: Image Editing with Compositional Representation Transplant
Paper • 2504.02828 • Published • 16
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 39 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 46 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 37 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 47
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 24 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 48 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 117
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 23 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?
Paper • 2502.19361 • Published • 28 -
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning
Paper • 2502.17407 • Published • 26 -
Small Models Struggle to Learn from Strong Reasoners
Paper • 2502.12143 • Published • 40 -
Language Models can Self-Improve at State-Value Estimation for Better Search
Paper • 2503.02878 • Published • 10
-
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Agency Is Frame-Dependent
Paper • 2502.04403 • Published • 23 -
Distillation Scaling Laws
Paper • 2502.08606 • Published • 47 -
LLM Pretraining with Continuous Concepts
Paper • 2502.08524 • Published • 29
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 123 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4