Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
panjinhao's picture
7 48

panjinhao

ishaqsaviani
Rangofd's profile picture phxember's profile picture theGreatGuy's profile picture
·
  • ishaqsaviani590

AI & ML interests

NLP,DL,RL,ML

Organizations

MLX Community's profile picture

upvoted a collection 8 months ago

Gemma 3 QAT

Collection
Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Jul 10, 2025 • 212
upvoted an article 8 months ago
view article
Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

  • +2
Dec 9, 2022
•
387
upvoted an article 10 months ago
view article
Article

You could have designed state of the art positional encoding

Nov 25, 2024
•
426
upvoted 3 papers 10 months ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 60

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 211

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 431
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs