Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Paper • 2502.17387 • Published Feb 24 • 7
RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems Paper • 2510.02263 • Published Oct 2 • 8
Just Enough Thinking: Efficient Reasoning with Adaptive Length Penalties Reinforcement Learning Paper • 2506.05256 • Published Jun 5 • 2
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models Paper • 2502.17387 • Published Feb 24 • 7
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 38
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs Paper • 2503.01307 • Published Mar 3 • 38
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models Paper • 2407.07086 • Published Jul 9, 2024
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though Paper • 2501.04682 • Published Jan 8 • 99