Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
</div>

## 📣 News
* [12/15/2025] NeMo-RL is the framework that trained [NVIDIA-NeMotron-3-Nano-30B-A3B-FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8)! [Reproducible code here](https://github.com/NVIDIA-NeMo/RL/tree/nano-v3)
* [12/15/2025] NeMo-RL is the framework that trained [NVIDIA-NeMotron-3-Nano-30B-A3B-FP8](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8)! [This guide](docs/guides/nemotron-3-nano.md) provides reproducible instructions for the post-training process.
* [12/1/2025] [Release v0.4.0!](https://github.com/NVIDIA-NeMo/RL/releases/tag/v0.4.0)
* First release with official NGC Container [nvcr.io/nvidia/nemo-rl:v0.4.0](https://registry.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl/tags).
* 📊 View the release run metrics on [Google Colab](https://colab.research.google.com/drive/1u5lmjHOsYpJqXaeYstjw7Qbzvbo67U0v?usp=sharing) to get a head start on your experimentation.
Expand Down
68 changes: 68 additions & 0 deletions docs/guides/nemotron-3-nano.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Nemotron 3 Nano

This guide explains how to post-train the [Nemotron 3 Nano model](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Nano-Technical-Report.pdf) using NeMo RL.

## Download and prepare the data

```bash
# Download RL data blend
uvx --from huggingface-hub hf download nvidia/Nemotron-3-Nano-RL-Training-Blend --repo-type dataset --local-dir=data

# Fill in placeholders in dataset
chmod +x data/create_nanov3_jsonl.py
./data/create_nanov3_jsonl.py --input data/train.jsonl --output data/train-full.jsonl

# Use the last 1000 rows for validation
head -n -1000 data/train-full.jsonl > data/train-split.jsonl
tail -n 1000 data/train-full.jsonl > data/val-split.jsonl
```

## Prepare the code
Note that we currently require using the `nano-v3` branch to train Nemotron 3 Nano.
```bash
# Checkout NeMo RL
git clone -b nano-v3 https://github.com/NVIDIA-NeMo/RL.git
cd RL

# Initialize the submodules
git submodule update --init --recursive
```

## Create a launch script

Create a file named `launch.sh` with the following contents. Be sure to fill in the `DATA_DIR`, `MODEL_CHECKPOINT`, `WANDB_API_KEY`, `SLURM_ACCOUNT`, `SLURM_PARTITION`, `MOUNTS`. Note that the default recipe (`examples/nemo_gym/grpo_nanov3.yaml`) uses 32 nodes.

```bash
CODE_DIR=$PWD
SLURM_JOB_NAME=nano-v3-rl-training

# Fill these in
DATA_DIR=...
MODEL_CHECKPOINT=...
WANDB_API_KEY=...
SLURM_ACCOUNT=...
SLURM_PARTITION=...
MOUNTS=... # SRC:DST[,SRC:DST...] e.g., MOUNTS="/lustre:/lustre,/data:/data"

CONTAINER="nvcr.io/nvidia/nemo-rl:v0.4.0.nemotron_3_nano"
COMMAND="uv run examples/nemo_gym/run_grpo_nemo_gym.py --config examples/nemo_gym/grpo_nanov3.yaml data.train_jsonl_fpath=$DATA_DIR/train-split.jsonl data.validation_jsonl_fpath=$DATA_DIR/val-split.jsonl policy.model_name=$MODEL_CHECKPOINT logger.wandb_enabled=True"

COMMAND="${COMMAND}" \
CONTAINER="${CONTAINER}" \
MOUNTS="${MOUNTS}" \
WANDB_API_KEY=${WANDB_API_KEY} \
sbatch \
--nodes=32 \
--account="${SLURM_ACCOUNT}" \
--job-name="${SLURM_JOB_NAME}" \
--partition="${SLURM_PARTITION}" \
--time=4:0:0 \
--gres=gpu:8 \
ray.sub
```


## Launch training
```bash
bash launch.sh
```
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,7 @@ guides/sft-openmathinstruct2.md
:caption: Guides
:hidden:

guides/nemotron-3-nano.md
adding-new-models.md
guides/sft.md
guides/dpo.md
Expand Down
Loading