Getting Started with Marin¶
In this tutorial, you will install Marin on your local machine.
Prerequisites¶
Before you begin, ensure you have the following installed:
- Python 3.12 or higher
- uv (Python package manager)
- Git
- Rust toolchain via rustup (only needed for source builds of Rust crates; see Rust Crates section below)
- Recommended:
rustup toolchain install 1.91.0 && rustup default 1.91.0(matches the Docker pin) - If you hit an
edition2024error from Cargo (e.g., when building Arrow), use nightly:rustup default nightly
- Recommended:
- On macOS, install additional build tools for SentencePiece:
brew install cmake pkg-config coreutils
In addition, you might find it useful to have the following accounts: - GitHub for submitting pull requests - Weights & Biases for experiment tracking - Hugging Face for accessing gated models/tokenizers (such as Meta's Llama 3.1 8B model)
This document focuses on basic setup and usage of Marin.
If you're on a GPU, see Local GPU Setup for a GPU-specific walkthrough for getting started.
Running on shared TPU/GPU capacity is handled by Iris; Marin's live TPU pool is reachable via uv run iris --cluster=marin job run ....
Installation¶
-
Clone the repository (~10s):
-
Create and activate a virtual environment (~0s):
-
Install the package and dependencies (5-10m, mostly building packages from source):
Use
uv syncto install dependencies and the local Marin package (editable) in one step: -
Setup Weights and Biases (WandB) so you can monitor your runs:
You can also setWANDB_ENTITYandWANDB_PROJECT. -
Setup the Hugging Face CLI so you can use gated models/tokenizers (such as Meta's Llama 3.1 8B model):
-
Define the path to where all artifacts generated during execution will be stored (e.g.,
For example, training checkpoints usually will be written tolocal_store):${MARIN_PREFIX}/checkpoints/. You can set this to an fsspec-recognizable path (e.g., a GCS bucket) or a directory on your machine. See UnderstandingMARIN_PREFIXand--prefixfor details.
You might find it convenient to store WANDB_API_KEY and HF_TOKEN and
MARIN_PREFIX in an .env file, which you can load in one go with source
.env.
Hardware-specific Setup¶
Marin runs on multiple types of hardware (CPU, GPU, TPU).
Install marin for different accelerators
Marin requires different JAX installations depending on your hardware accelerator. These installation options are defined in our pyproject.toml file and will install the appropriate JAX version for your hardware.
If you are working on GPUs you'll need to set up your system first by installing the appropriate CUDA version. In Marin, we default to 12.9.0:
wget https://developer.download.nvidia.com/compute/cuda/12.9.0/local_installers/cuda_12.9.0_575.51.03_linux.run
sudo sh cuda_12.9.0_575.51.03_linux.run
wget https://developer.download.nvidia.com/compute/cudnn/9.10.0/local_installers/cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2404-9.10.0_1.0-1_amd64.deb
sudo cp /var/cudnn-local-repo-ubuntu2404-9.10.0/cudnn-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cudnn
sudo apt-get -y install cudnn-cuda-12
Rust Crates (dupekit)¶
Marin includes Rust crates (e.g., dupekit) that are installed as pre-built
wheels by default — no Rust toolchain needed. uv sync fetches wheels from
GitHub Releases automatically.
To switch to source builds (requires Cargo), use the Makefile targets:
# Check current mode and Cargo availability
make rust-status
# Switch to dev mode: modifies pyproject.toml to build from source (requires Cargo)
make rust-dev
# Switch back to user mode: reverts pyproject.toml to pre-built wheels (no Cargo needed)
make rust-user
Warning
make rust-dev modifies pyproject.toml to add a local path source for dupekit.
Do not commit pyproject.toml while in dev mode — CI will reject it.
Run make rust-user before committing.
Trying it Out¶
To check that your installation worked, you can go to the First Experiment tutorial, where you train a tiny language model on TinyStories on your CPU. For a sneak preview, simply run:
This will:
- Download and tokenize the TinyStories dataset to
${MARIN_PREFIX}/ - Train a tiny language model
- Save the model checkpoint to
${MARIN_PREFIX}/
Next Steps¶
Now that you have Marin set up and running, you can either continue with the next hands-on tutorial or read more about how Marin is designed for building language models.
- Follow our First Experiment tutorial to run a training experiment.
- Read our Language Modeling Pipeline to understand Marin's approach to language models.
- Read the Executor 101 tutorial to learn how Marin's execution model works.