Experiments¶

At the infrastructure level, an experiment is simply a DAG of steps to be executed. However, conceptually, an experiment represents a unit of inquiry with a particular hypothesis or goal. Each such experiment is captured by a GitHub issue with the experiments tag (e.g., #72).

An experiment might involve testing whether one optimizer is better than another in a controlled setting, or trying out different tokenizers or data quality filtering schemes. Regardless, an experiment consists of a sequence of steps.

To promote the reproducibility of experiments, we record all experiments in the experiments directory. Each file in that directory (e.g., exp934_hq_vs_pt.py) corresponds to one experiment, where the naming convention contains the GitHub issue number.

Running each experiment produces an experiment JSON file (see the executor documentation), which can be visualized specially in the data browser. From this experiments page in the data browser, you can follow links to the Iris dashboard and wandb (for training steps).