Interactive Training: Feedback-Driven Neural Network Optimization Paper • 2510.02297 • Published Oct 2 • 42
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 75
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31 • 84
BrowseComp-Plus: A More Fair and Transparent Evaluation Benchmark of Deep-Research Agent Paper • 2508.06600 • Published Aug 8 • 41
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper • 2506.03930 • Published Jun 4 • 26
MixEval-X: Any-to-Any Evaluations from Real-World Data Mixtures Paper • 2410.13754 • Published Oct 17, 2024 • 75