diff --git a/README.md b/README.md index f68db216e9..a933709256 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ ## 📣 News -* [12/15/2025] NeMo-RL is the framework that trained [NVIDIA-NeMotron-3-Nano-30B-A3B-FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8)! [Reproducible code here](https://github.com/NVIDIA-NeMo/RL/tree/nano-v3) +* [12/15/2025] NeMo-RL is the framework that trained [NVIDIA-NeMotron-3-Nano-30B-A3B-FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8)! [This guide](docs/guides/nemotron-3-nano.md) provides reproducible instructions for the post-training process. * [12/1/2025] [Release v0.4.0!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.4.0) * First release with official NGC Container [nvcr.io/nvidia/nemo-rl:v0.4.0](https://registry.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl/tags). * 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/1u5lmjHOsYpJqXaeYstjw7Qbzvbo67U0v?usp=sharing) to get a head start on your experimentation. diff --git a/docs/guides/nemotron-3-nano.md b/docs/guides/nemotron-3-nano.md new file mode 100644 index 0000000000..23d493bb40 --- /dev/null +++ b/docs/guides/nemotron-3-nano.md @@ -0,0 +1,68 @@ +# Nemotron 3 Nano + +This guide explains how to post-train the [Nemotron 3 Nano model](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Nano-Technical-Report.pdf) using NeMo RL. + +## Download and prepare the data + +```bash +# Download RL data blend +uvx --from huggingface-hub hf download nvidia/Nemotron-3-Nano-RL-Training-Blend --repo-type dataset --local-dir=data + +# Fill in placeholders in dataset +chmod +x data/create_nanov3_jsonl.py +./data/create_nanov3_jsonl.py --input data/train.jsonl --output data/train-full.jsonl + +# Use the last 1000 rows for validation +head -n -1000 data/train-full.jsonl > data/train-split.jsonl +tail -n 1000 data/train-full.jsonl > data/val-split.jsonl +``` + +## Prepare the code +Note that we currently require using the `nano-v3` branch to train Nemotron 3 Nano. +```bash +# Checkout NeMo RL +git clone -b nano-v3 https://github.com/NVIDIA-NeMo/RL.git +cd RL + +# Initialize the submodules +git submodule update --init --recursive +``` + +## Create a launch script + +Create a file named `launch.sh` with the following contents. Be sure to fill in the `DATA_DIR`, `MODEL_CHECKPOINT`, `WANDB_API_KEY`, `SLURM_ACCOUNT`, `SLURM_PARTITION`, `MOUNTS`. Note that the default recipe (`examples/nemo_gym/grpo_nanov3.yaml`) uses 32 nodes. + +```bash +CODE_DIR=$PWD +SLURM_JOB_NAME=nano-v3-rl-training + +# Fill these in +DATA_DIR=... +MODEL_CHECKPOINT=... +WANDB_API_KEY=... +SLURM_ACCOUNT=... +SLURM_PARTITION=... +MOUNTS=... # SRC:DST[,SRC:DST...] e.g., MOUNTS="/lustre:/lustre,/data:/data" + +CONTAINER="nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano" +COMMAND="uv run examples/nemo_gym/run_grpo_nemo_gym.py --config examples/nemo_gym/grpo_nanov3.yaml data.train_jsonl_fpath=$DATA_DIR/train-split.jsonl data.validation_jsonl_fpath=$DATA_DIR/val-split.jsonl policy.model_name=$MODEL_CHECKPOINT logger.wandb_enabled=True" + +COMMAND="${COMMAND}" \ +CONTAINER="${CONTAINER}" \ +MOUNTS="${MOUNTS}" \ +WANDB_API_KEY=${WANDB_API_KEY} \ +sbatch \ + --nodes=32 \ + --account="${SLURM_ACCOUNT}" \ + --job-name="${SLURM_JOB_NAME}" \ + --partition="${SLURM_PARTITION}" \ + --time=4:0:0 \ + --gres=gpu:8 \ + ray.sub +``` + + +## Launch training +```bash +bash launch.sh +``` diff --git a/docs/index.md b/docs/index.md index 051893d618..18fd643104 100644 --- a/docs/index.md +++ b/docs/index.md @@ -203,6 +203,7 @@ guides/sft-openmathinstruct2.md :caption: Guides :hidden: +guides/nemotron-3-nano.md adding-new-models.md guides/sft.md guides/dpo.md