TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows Paper • 2512.05150 • Published 8 days ago • 62
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads Paper • 2511.06209 • Published Nov 9 • 17
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28 • 173
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26 • 70
Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning Paper • 2502.11962 • Published Feb 17 • 38
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published Jun 2 • 52
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published Jun 2 • 52 • 2
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
Sherlock: Self-Correcting Reasoning in Vision-Language Models Paper • 2505.22651 • Published May 28 • 49
Reinforcement Fine-Tuning Powers Reasoning Capability of Multimodal Large Language Models Paper • 2505.18536 • Published May 24 • 18
Optimizing Anytime Reasoning via Budget Relative Policy Optimization Paper • 2505.13438 • Published May 19 • 36