The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution Paper • 2510.25726 • Published Oct 29 • 45
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 9
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory Paper • 2406.12375 • Published Jun 18, 2024 • 1