cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesepoch_reward_modeling_anthropic_hh 1B • Updated Sep 14, 2024 • 8
cj453/dense_reward_trainer_final_opt__NumTrainEpochs2_SaveStrategiesno_reward_modeling_anthropic_hh 1B • Updated Sep 15, 2024 • 7
cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesepoch_reward_modeling_anthropic_hh 1B • Updated Sep 16, 2024 • 6
cj453/dense_reward_trainer_final_opt__NumTrainEpochs5_SaveStrategiesno_reward_modeling_anthropic_hh 1B • Updated Sep 16, 2024 • 4