M Saad Salman's picture

4 216

M Saad Salman

MSS444

·

MSS444

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

upvoted a paper 3 days ago

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

upvoted a paper 3 days ago

PretrainZero: Reinforcement Active Pretraining

View all activity

Organizations

None yet

upvoted a paper 1 day ago

On GRPO Collapse in Search-R1: The Lazy Likelihood-Displacement Death Spiral

Paper • 2512.04220 • Published 3 days ago • 8

upvoted 4 papers 3 days ago

SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment

Paper • 2512.02807 • Published 4 days ago • 7

PretrainZero: Reinforcement Active Pretraining

Paper • 2512.03442 • Published 4 days ago • 39

Guided Self-Evolving LLMs with Minimal Human Supervision

Paper • 2512.02472 • Published 5 days ago • 47

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 4 days ago • 165

upvoted 3 papers 5 days ago

Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models

Paper • 2511.18890 • Published 12 days ago • 29

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published 9 days ago • 63

Rectifying LLM Thought from Lens of Optimization

Paper • 2512.01925 • Published 5 days ago • 23

upvoted 6 papers 9 days ago

Think Visually, Reason Textually: Vision-Language Synergy in ARC

Paper • 2511.15703 • Published 17 days ago • 8

HunyuanOCR Technical Report

Paper • 2511.19575 • Published 12 days ago • 19

ROOT: Robust Orthogonalized Optimizer for Neural Network Training

Paper • 2511.20626 • Published 11 days ago • 169

GigaEvo: An Open Source Optimization Framework Powered By LLMs And Evolution Algorithms

Paper • 2511.17592 • Published 19 days ago • 118

NVIDIA Nemotron Parse 1.1

Paper • 2511.20478 • Published 11 days ago • 20

Latent Collaboration in Multi-Agent Systems

Paper • 2511.20639 • Published 11 days ago • 111

upvoted a paper 16 days ago

Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs

Paper • 2511.16664 • Published 16 days ago • 24

upvoted 5 papers 17 days ago

Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance

Paper • 2511.13254 • Published 19 days ago • 134

P1: Mastering Physics Olympiads with Reinforcement Learning

Paper • 2511.13612 • Published 19 days ago • 132

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Paper • 2511.11793 • Published 22 days ago • 158

Agent READMEs: An Empirical Study of Context Files for Agentic Coding

Paper • 2511.12884 • Published 20 days ago • 5

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

Paper • 2511.14460 • Published 18 days ago • 17