Probing-RM Collection Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models • 2 items • Updated Nov 20, 2025
Probing-RM Collection Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models • 2 items • Updated Nov 20, 2025
Parallel-R1: Towards Parallel Thinking via Reinforcement Learning Paper • 2509.07980 • Published Sep 9, 2025 • 101
GRAM-R^2: Self-Training Generative Foundation Reward Models for Reward Reasoning Paper • 2509.02492 • Published Sep 2, 2025 • 1
GRAM: A Generative Foundation Reward Model for Reward Generalization Paper • 2506.14175 • Published Jun 17, 2025 • 1
GRAM-RR Collection Self-Training Generative Foundation Reward Models for Reward Reasoning • 4 items • Updated Nov 8, 2025
GRAM-RR Collection Self-Training Generative Foundation Reward Models for Reward Reasoning • 4 items • Updated Nov 8, 2025
GRAM Collection Generative Foundation Reward Models for Reward Generalization • 8 items • Updated Jun 19, 2025 • 1