Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Correction in DPO Beta intuition. Changed 'higher' to 'lower' and vice versa in line 3995.
Browse files
app/src/content/article.mdx
CHANGED
|
@@ -3992,7 +3992,7 @@ Our recommendation for your training runs is to run scans of your learning rate
|
|
| 3992 |
|
| 3993 |
**Tune your β**
|
| 3994 |
|
| 3995 |
-
The experiments we ran for the ß parameter ranged from 0.01 to 0.99 to explore values that encourage different degrees of alignment to the reference model. As a reminder,
|
| 3996 |
|
| 3997 |
These results suggest that values greater than 0.1 are preferable for preference optimisation, and that aligning the model with the preference data is more beneficial than staying close to the reference model. However, we suggest exploring ß values in the range 0.01 and 0.5. Higher values may erase capabilities from the SFT checkpoint that we might not be capturing in the evals shown on the plot.
|
| 3998 |
|
|
|
|
| 3992 |
|
| 3993 |
**Tune your β**
|
| 3994 |
|
| 3995 |
+
The experiments we ran for the ß parameter ranged from 0.01 to 0.99 to explore values that encourage different degrees of alignment to the reference model. As a reminder, higher values of beta encourage staying close to the reference model while lower values allow the model to match the preference data more closely. The model performance for β=0.1 is the highest for both reasoning modes and improves compared to the metrics from the SFT checkpoint. Using a low beta value hurts model performance and results in a worse model than the SFT checkpoint, while performance remains stable across multiple ß values without extended thinking.
|
| 3996 |
|
| 3997 |
These results suggest that values greater than 0.1 are preferable for preference optimisation, and that aligning the model with the preference data is more beneficial than staying close to the reference model. However, we suggest exploring ß values in the range 0.01 and 0.5. Higher values may erase capabilities from the SFT checkpoint that we might not be capturing in the evals shown on the plot.
|
| 3998 |
|