Skip to content

Marin

"I am not afraid of storms, for I am learning how to sail my ship."
– Louisa May Alcott

Marin is an open-source framework for the research and development of foundation models.

A key feature of Marin is reproducibility: every step, from raw data to the final model are recorded, not just the end result. This includes failed experiments, so the entire research process is transparent.

Marin's primary use case is training language model like Llama, DeepSeek, Qwen, etc. Notably, this includes data curation, transformation, filtering, tokenization, training, and evaluation.

We used Marin to train the first open-source 8B parameter model to outperform Llama 3.1 8B. You can see the training script or read the retrospective.

Documentation Structure

Our documentation is organized into the following main sections:

  • Tutorials: Step-by-step guides to help you get started with Marin, including installation, basic usage, and local GPU setup
  • Explanation: Background information and context about the project
  • Experiment Reports: Reports from our experiments
  • Developer Guide: Information for developers who want to contribute to Marin
  • Technical Reference: Detailed technical information about Marin's architecture and components

These sections are available on the left side bar (or hamburger menu).

Get Involved

To get started with Marin:

Get Help

If you have any questions or need help, please feel free to reach out to us on Discord or open an issue on GitHub.