Getting Started with Marin¶
In this tutorial, you will install Marin on your local machine.
Prerequisites¶
Before you begin, ensure you have the following installed:
- Python 3.10 or higher
- pip (Python package manager)
- Git
- A Weights & Biases account for experiment tracking (optional but recommended)
This document focuses on basic setup and usage of Marin. If you're on a GPU, see Local GPU Setup for a GPU-specific walkthrough for getting started. If you want to set up a TPU cluster, see TPU Setup.
Installation¶
-
Clone the repository:
-
Create and activate a virtual environment:
or with conda:
-
Install the package (this might take a while, which is something we should fix):
-
Setup Weights and Biases (WandB) so you can monitor your runs:
-
Setup the Hugging Face CLI so you can use gated models/tokenizers (such as Meta's Llama 3.1 8B model):
Hardware-specific Setup¶
Marin runs on multiple types of hardware (CPU, GPU, TPU).
Install marin
for different accelerators
Marin requires different JAX installations depending on your hardware accelerator. These installation options are defined in our pyproject.toml
file and will install the appropriate JAX version for your hardware.
If you are working on GPUs you'll need to set up your system first by installing the appropriate CUDA version. In Marin, we default to 12.9.0:
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_575.51.03_linux.run
sudo sh cuda_12.9.0_575.51.03_linux.run
wget https://developer.download.nvidia.com/compute/cudnn/9.10.0/local_installers/cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.10.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
sudo apt-get -y install cudnn-cuda-12
- CPU: Works out of the box, suitable for small experiments
- GPU: See Local GPU Setup for CUDA configuration and multi-GPU support
- TPU: See TPU Setup for Google Cloud TPU configuration
Trying it Out¶
To check that your installation worked, you can go to the First Experiment tutorial, where you train a tiny language model on TinyStories on your CPU. For a sneak preview, simply run:
wandb offline # Disable WandB logging
python experiments/tutorials/train_tiny_model_cpu.py --prefix local_store
This will:
- Download and tokenize the TinyStories dataset to
local_store/
- Train a tiny language model
- Save the model checkpoint to
local_store/
Next Steps¶
Now that you have Marin set up and running, you can either continue with the next hands-on tutorial or read more about how Marin is designed for building language models.
- Follow our First Experiment tutorial to run a training experiment
- Read our Language Modeling Pipeline to understand Marin's approach to language models