On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral Paper • 2512.04220 • Published 3 days ago • 8
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment Paper • 2512.02807 • Published 4 days ago • 7
Guided Self-Evolving LLMs with Minimal Human Supervision Paper • 2512.02472 • Published 5 days ago • 47
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 4 days ago • 165
Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models Paper • 2511.18890 • Published 12 days ago • 29
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning Paper • 2511.22570 • Published 9 days ago • 63
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published 17 days ago • 8
ROOT: Robust Orthogonalized Optimizer for Neural Network Training Paper • 2511.20626 • Published 11 days ago • 169
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms Paper • 2511.17592 • Published 19 days ago • 118
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Paper • 2511.16664 • Published 16 days ago • 24
Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance Paper • 2511.13254 • Published 19 days ago • 134
P1: Mastering Physics Olympiads with Reinforcement Learning Paper • 2511.13612 • Published 19 days ago • 132
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 22 days ago • 158
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published 20 days ago • 5
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning Paper • 2511.14460 • Published 18 days ago • 17