⚡️ Kénotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Kénotron is a library for pretraining transformer models. It provides a simple and flexible API to pretrain models on custom datasets. Kénotron is designed to be easy to use, fast, and scalable. It is built with the following principles in mind:

Simplicity: Kénotron is designed to be easy to use. It provides a simple and flexible API to pretrain models on custom datasets.
Scalability: Kénotron uses the latest techniques to train models more efficiently at scale.
Speed: This version of Nanotron focuses on HPC-oriented optimizations, typically made available via C++ extensions.

Installation

We recommend using Spack to install Kénotron.

git clone -c feature.manyFiles=true --depth=2 https://github.com/spack/spack.git
cd spack/bin
./spack repo add --name korovod https://github.com/korovod/korovod-spack-packages.git
./spack install py-kenotron

Spack allows you to install a specific version e.g., py-kenotron@0.4.0 or py-kenotron@main.

Tip

It is advised to maintain a proper Spack environment to ensure reproducibility. You can find some examples in the toolchains directory

Extensions

To install an extension, simply use the corresponding Spack variant:

./spack install py-kenotron +datastates +nanosets

Some examples are shipped into Kénotron, with some of them requiring the installation of a Spack variant:

Variant	Description	Docs	Spack variant
`datastates`	Asynchronous checkpointing	Docs	`py-kenotron +datastates`
`nanosets`	Use the datatrove library to load data	Docs	`py-kenotron +nanosets`
`custom-dataloader`	Plug a custom dataloader to Kénotron	Docs	`py-kenotron`
`doremi`	Use DoReMi to speed up training	Docs	`py-kenotron`
`mamba`	Train an example Mamba model	Docs	`py-kenotron`
`moe`	Train an example Mixture-of-Experts (MoE) model	Docs	`py-kenotron`
`mup`	Use spectral µTransfer to scale up your model	Docs	`py-kenotron`
`s3`	For automatically uploading checkpoints to S3	Docs	`py-kenotron`

Building a container

Before building a container, you need to define a Spack environment in toolchains/<your_toolchain>/spack.yaml. Please read about Spack environments and take inspiration from existing toolchains. Spack will generate your Dockerfile for you.

cd kenotron/toolchains/<your_toolchain>
vim spack.yaml
./spack containerize > Dockerfile
docker build -t kenotron .

As we are doing HPC (and we are serious about it 😛), we prefer using Apptainer (singulariry) over Docker. To use Apptainer, add the following acontainer: format: key to your spack.yaml:

spack:
  container:
    format: singularity
    images:
      os: "ubuntu:24.04"
      spack: latest
    strip: true

You can then build the SIF image using ./spack containerize > kenotron.def && apptainer build kenotron.def kenotron.sif.

Quick Start

First, have a look at the Ultrascale Playbook, a comprehensive guide to efficiently scale LLM training with Nanotron. Everything in this guide applies to Kénotron.

Predicting the memory that you will need

A good starting point is to understand the memory usage from model configurations. The Nanotron team created a tool for this purpose. Just paste your YAML configuration to generate memory diagrams.

Training a tiny Llama model

The following command will train a tiny Llama model on a single node with 8 GPUs. The model will be saved in the checkpoints directory as specified in the config file.

CUDA_DEVICE_MAX_CONNECTIONS=1 python -m torch.distributed.run --nproc_per_node=8 run_train.py --config-file examples/llama/config_tiny_llama.yaml

For detailed instructions on training your first model, check out our Your First Training guide.

For multi-node training with Slurm, see our Multi-Node Training guide.

Run generation from your checkpoint

python -m torch.distributed.run --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --tp 1 --pp 1
# We could set a larger TP for faster generation, and a larger PP in case of very large models.

Features

We currently support the following features:

And we have on our roadmap:

Models

The following models are currently supported:

Mistral 7B
Qwen
Llama 3.2
Llama 3.1
StarCoder2

Credits

We thank the Hugging Face team for their work on the original project.

Some related projects are:

Name		Name	Last commit message	Last commit date
Latest commit History 1,079 Commits
.github		.github
docs		docs
examples		examples
scripts		scripts
src/kenotron		src/kenotron
tests		tests
toolchains/a100.polaris.spack		toolchains/a100.polaris.spack
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
run_generate.py		run_generate.py
run_train.py		run_train.py
slurm_launcher.py		slurm_launcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⚡️ Kénotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Installation

Extensions

Building a container

Quick Start

Predicting the memory that you will need

Training a tiny Llama model

Run generation from your checkpoint

Features

Models

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 23

Uh oh!

Languages

License

korovod/kenotron

Folders and files

Latest commit

History

Repository files navigation

⚡️ Kénotron

Installation • Quick Start • Features • Contributing

Pretraining models made easy

Installation

Extensions

Building a container

Quick Start

Predicting the memory that you will need

Training a tiny Llama model

Run generation from your checkpoint

Features

Models

Credits

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 23

Uh oh!

Languages

Packages