Selected Experiment Reports¶
This is a semi-automatically generated list of experiment issues and reports. It is periodically updated by a script, and then curated by hand.
This page includes only experiments that have at least one run or report.
Marin 8B Base¶
Cooldowns¶
- Try deepening the cooldown of "monumental-jellyfish" (tootsie 8b cooldown 1) to see if it improves SFT
- not-quite-so-deep cooldown (Spoonbill)
- GitHub Issue #916
- WandB Report
- Data Browser
- Conclusion: Exploding logits in deep parts of cooldown can be mitigated by Z-Loss.
- Tootsie Phoenix Cooldown (sensible-starling)
Big Runs¶
Modeling¶
- Pick Tokenizer type
- GitHub Issue #524
- WandB Report
- Conclusion: Llama3 tokenizer is the best.
- Default z-loss?
- GitHub Issue #935
- WandB Report
- Conclusion: z-loss seems not harmful. We'll use it.
- Figuring out learning rate schedule!
- GitHub Issue #764
- WandB Report
- Conclusion: Cosine is best. High LR is important. WSD isn't terrible.
- Mixture of Experts
- Hybrid Norm and Input Embedding Norm
- OLMoE replication - MoE vs dense
- GitHub Issue #1183
- WandB Report
- Conclusion: Despite having a lower MFU, MoE outperforms similar sized dense model in both training and evaluation.
Training and Performance¶
- INT8 training in Levanter
- GitHub Issue #620
- WandB Report
- Conclusion: Int8 training is much faster on the right hardware, but might lead to worse performance in terms of time-to-loss except in the early stages.
- MuP for scaling laws
- GitHub Issue #621
- WandB Report
- Conclusion: not worth it compared to our heuristic version.
- Figuring out learning rate schedule!
- Try out different remat strategies to get the 70b working on fewer slices
- GitHub Issue #906
- WandB Report
- Data Browser
- Conclusion: Substantial performance hit but helpful. Still need to iterate.
Data Experiments¶
Datashop¶
- MEDU filtering across MMLU subsets
- GitHub Issue #923
- WandB Report
- Data Browser
- Reproduce Finemath performance using Datashop
- GitHub Issue #939
- WandB Report
High Quality Data Ablations¶
- Ablations on Cooldown for Markdownified Wikipedia
- GitHub Issue #845
- WandB Report
- Data Browser
- Conclusion: No major improvement compared to control.
- Ablations on Cooldown for Markdownified Arxiv
- GitHub Issue #846
- WandB Report
- Data Browser
- Conclusion: No major improvement compared to control.
- Ablations on Cooldown for Markdownified StackExchange
- GitHub Issue #847
- WandB Report
- Data Browser
- Conclusion: No major improvement compared to control.
- Mixture of Formats Training on Wikipedia and Arxiv
- GitHub Issue #818
- WandB Report
- Conclusion: No major difference observed, switch to @Helw150's annealing setup for evaluations.
- High Quality Many Epochs vs. Low Quality Few Epochs
- GitHub Issue #636
- WandB Report
- Conclusion: There's no data like more data.
Data Filtering¶
- Stack Exchange Quality Classifier
- GitHub Issue #596
- WandB Report
- Conclusion: Seems to lead to better loss than using Reddit ELI5 or OpenHermes.
- NOTE: this seems like a loose end, we should pursue this further.
Text Extraction and Formatting¶
- Compare HTML -> text methods
- GitHub Issue #246
- WandB Report
- Data Browser
- Conclusion: some amount of format preservation is helpful for loss on Paloma.
- Wikipedia Training Runs with DOLMA source substitution
- Ar5iv Training Runs with DOLMA source substitution
- Markdownification Processing Report
Supervised Fine Tuning¶
- Reproduce olmov2 SFT
- Create Mixture from All SFT Datasets
- SFT on further cool-downed tootsie checkpoints
- SFT Deeper Starling
- GitHub Issue #1237
- WandB Run: deeper_mixture_sft_starling_1e-4-longer-2
Scaling Laws¶
- Scaling laws to predict tootsie performance
- Optimizer Scaling Law Part 1: AdamW
- GitHub Issue #725
- WandB Report
- Conclusion: After sweeping, we discovered that the (near) optimal set of hyperparameters for AdamW remains surprisingly stable across three settings.
Baselines and Reproductions¶
Other Projects¶
Compel¶
Uncategorized¶
(This is for experiments that have been added via the script but have not yet been curated.)