How to Run an Annealing Experiment¶
This guide explains how to evaluate dataset quality using microannealing experiments, similar to techniques used in the Llama-3, Olmo-2, and Mosaic papers. The process involves cooldown training a pretrained model on a specific mixture of datasets and evaluating its impact on benchmark performance. For example, we could do a cooldown on FineMath, and expect improvements on tasks like MATH and GSM8K.
Prerequisites¶
- A pretrained model (default is a Marin Tootsie 8b model trained for ~800B tokens)
- A dataset or mixture of datasets you want to try
Overview¶
- Configure the Dataset Mixture: Use
LMMixtureDatasetConfig
to set up your dataset mixture. - Set Up a Control Model (optional): Create a control model for baseline comparison.
- Run the Annealing Experiment: Use the
default_anneal
function to execute your experiment.
This tutorial is based on the FineMath annealing experiment.
Steps¶
1. Configure the Dataset Mixture¶
First, create an AnnealConfig
with your desired dataset mixture
using LMMixtureDatasetConfig
.
For example, to evaluate the impact of Finemath:
finemath_anneal_config = AnnealConfig(
dataset_config=lm_mixture_data_config(
components={
"finemath": finemath_3_plus_tokenized,
"dolmino/dclm": dolmino_dclm,
},
weights={"finemath": 0.3, "dolmino/dclm": 0.7},
),
)
2. Set Up a Control Model (Optional)¶
To establish a baseline, you can create a control model using only one dataset:
control_dataset_config=lm_mixture_data_config(
components={
"dolmino/dclm": dolmino_dclm,
},
weights={"dolmino/dclm": 1.0},
)
control_dclm_anneal_config = AnnealConfig(
dataset_config=control_dataset_config,
)
3. Run the Annealing Experiment¶
Import the default_anneal
function and run your experiment:
from experiments.defaults import default_anneal
control_model = default_anneal(
name="llama-8b-anneal-dclm",
anneal_config=control_dclm_anneal_config
)
Configuration Options¶
Training Duration¶
You can adjust the number of tokens for annealing by modifying the num_anneal_training_tokens
parameter in AnnealConfig
.
Model Checkpoint¶
The default checkpoint path points to a Marin Tootsie 8b model trained for approximately 800B tokens. You can change this to use your own model or a different checkpoint.
Example Experiments¶
For reference, see these example files:
- FineMath:
experiments.exp722_anneal.py
- Dolmino:
experiments.dolmino.dolmino_anneal.py
- Dolma:
experiments.dolma.dolma_anneal.py
- Cooldown:
experiments.cooldown_anneal.py
Troubleshooting¶
If you encounter issues:
- Verify your dataset configurations are correct
- Check that your model checkpoint path is valid
- Ensure you have sufficient computational resources for the experiment
Next Steps¶
After running your annealing experiment:
- Evaluate the model on relevant benchmarks
- Compare results with your control model
- Analyze the impact of different dataset mixtures