Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2511.19399

Models and data associated with DR Tulu, http://allenai-web/papers/drtulu

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 11 days ago • 54
rl-research/DR-Tulu-8B

Text Generation • 8B • Updated 4 days ago • 1.38k • 68
rl-research/DR-Tulu-SFT-8B

Text Generation • 8B • Updated 7 days ago • 308 • 5
rl-research/dr-tulu-sft-data

Viewer • Updated 11 days ago • 13.1k • 711 • 24

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published 25 days ago • 28
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published 26 days ago • 12
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published 19 days ago • 118

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

General Agentic Memory Via Deep Research

Paper • 2511.18423 • Published 13 days ago • 154
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 11 days ago • 54

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.51k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 139 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Models and data associated with DR Tulu, http://allenai-web/papers/drtulu

DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 11 days ago • 54
rl-research/DR-Tulu-8B

Text Generation • 8B • Updated 4 days ago • 1.38k • 68
rl-research/DR-Tulu-SFT-8B

Text Generation • 8B • Updated 7 days ago • 308 • 5
rl-research/dr-tulu-sft-data

Viewer • Updated 11 days ago • 13.1k • 711 • 24

General Agentic Memory Via Deep Research

Paper • 2511.18423 • Published 13 days ago • 154
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research

Paper • 2511.19399 • Published 11 days ago • 54

Parallel-R1: Towards Parallel Thinking via Reinforcement Learning

Paper • 2509.07980 • Published Sep 9 • 101
Robot Learning from a Physical World Model

Paper • 2511.07416 • Published 25 days ago • 28
MathSE: Improving Multimodal Mathematical Reasoning via Self-Evolving Iterative Reflection and Reward-Guided Fine-Tuning

Paper • 2511.06805 • Published 26 days ago • 12
GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published 19 days ago • 118

microsoft/bitnet-b1.58-2B-4T

Text Generation • 0.8B • Updated May 1 • 7.51k • 1.22k
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Paper • 2504.10449 • Published Apr 14 • 15
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct

Text Generation • 8B • Updated Apr 17 • 139 • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

Paper • 2504.11536 • Published Apr 15 • 63

Reinforcement learning

Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning

Paper • 2407.20798 • Published Jul 30, 2024 • 24
Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published Dec 20, 2024 • 38
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

Paper • 2502.18449 • Published Feb 25 • 75

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs