5 13 186

Jian Hu

chuyi777

https://hujian.website

hijkzzz

AI & ML interests

Reinforcement Learning

Recent Activity

upvoted a paper about 2 months ago

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

upvoted a paper 2 months ago

BroRL: Scaling Reinforcement Learning via Broadened Exploration

liked a model 3 months ago

moonshotai/Kimi-K2-Instruct-0905

View all activity

Organizations

upvoted a paper about 2 months ago

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published Oct 16 • 15

upvoted a paper 2 months ago

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published Oct 1 • 18

liked 2 models 3 months ago

moonshotai/Kimi-K2-Instruct-0905

Text Generation • 1T • Updated about 1 month ago • 28.9k • • 627

nvidia/NVIDIA-Nemotron-Nano-12B-v2

Text Generation • 12B • Updated 13 days ago • 27.1k • • 140

updated a dataset 3 months ago

OpenRLHF/gem_guess_game

Viewer • Updated Aug 30 • 2.05k • 39 • 1

published a dataset 3 months ago

OpenRLHF/gem_guess_game

Viewer • Updated Aug 30 • 2.05k • 39 • 1

New activity in nvidia/NVIDIA-Nemotron-Nano-9B-v2 3 months ago

some problem when I asked the model: 你是谁？

🤯 2

#8 opened 4 months ago by

wenzel94

upvoted a paper 4 months ago

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11 • 49

liked 2 models 4 months ago

openai/gpt-oss-20b

Text Generation • 22B • Updated Aug 26 • 7.95M • • 4.03k

mistralai/Devstral-Small-2505

24B • Updated Aug 18 • 11.2k • 857

liked a dataset 4 months ago

MegaScience/MegaScience

Viewer • Updated Jul 24 • 1.25M • 9.35k • 118

updated a model 4 months ago

OpenRLHF/Llama-3-8b-rm-700k

Text Ranking • 8B • Updated Jul 28 • 106 • 3

liked 3 datasets 5 months ago

upvoted an article 6 months ago

Article

Learn the Hugging Face Kernel Hub in 5 Minutes

Jun 12

•

151

liked a dataset 6 months ago

nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated May 8 • 3.91M • 6.39k • 611

liked a model 6 months ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated 16 days ago • 5.05k • 235

authored a paper 6 months ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 142

upvoted a paper 6 months ago