Skip to content

Conversation

@arendu
Copy link

@arendu arendu commented Dec 12, 2025

What does this PR do ?

Adds Llama LoRA SFT support via Megatron Bridge. To verify correctness we compared train and validation curves with the LoRA SFT DTensor path on the squad and tulu3 datasets. The wandb link for the verification runs

image image image

Issues

List issues that this PR closes (syntax):

Usage

uv run examples/run_sft.py \
  --config examples/configs/sft.yaml \
  policy.dtensor_cfg.enabled=false \
  policy.megatron_cfg.enabled=true \
  policy.megatron_cfg.lora_cfg.enabled=true

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • Documentation

    • Enhanced LoRA configuration documentation clarifying DTensor v2 as the default backend and introducing Megatron-specific options with expanded parameter details and usage examples.
  • New Features

    • Added comprehensive LoRA configuration support for both DTensor and Megatron backends with configurable parameters for initialization, dropout, and module targeting.
  • Tests

    • Introduced functional test coverage for LoRA workflows with metrics validation.

✏️ Tip: You can customize this high-level summary in your review settings.

arendu and others added 6 commits December 11, 2025 15:18
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: adithyare <adithyare@nvidia.com>
Signed-off-by: adithyare <adithyare@nvidia.com>
…s w.o crashing. WIP check correctness

Signed-off-by: adithyare <adithyare@nvidia.com>
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaoyu-33 to review

@terrykong terrykong changed the title Adithyre/llama lora mcore feat: Megatron SFT LoRA Dec 19, 2025
Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aasthajh @vadam5 could you add a companion recipe for mcore to test this path

a010564

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 19, 2025
@vadam5 vadam5 marked this pull request as ready for review December 19, 2025 21:29
@vadam5 vadam5 requested review from a team as code owners December 19, 2025 21:29
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

📝 Walkthrough

Walkthrough

This pull request introduces LoRA/PEFT integration for Megatron policy workers with conditional pre-wrap hooking and checkpoint-aware loading. Changes include updated documentation clarifying DTensor v2 as default, Megatron-specific LoRA configuration alongside DTensor, example configurations, implementation changes to the policy worker, and functional testing for the LoRA SFT workflow.

Changes

Cohort / File(s) Summary
Documentation and Configuration
docs/guides/sft.md, examples/configs/sft.yaml
Updated LoRA documentation to clarify DTensor v2 as default backend and added Megatron-specific LoRA configuration. Expanded parameter details for both DTensor and Megatron LoRA configurations with new fields: enabled, target_modules, exclude_modules, dim, alpha, dropout, dropout_position, lora_A_init_method, lora_B_init_method, a2a_experimental, and lora_dtype. Added example usage blocks for both backends.
LoRA/PEFT Integration
nemo_rl/models/policy/workers/megatron_policy_worker.py
Integrated LoRA/PEFT functionality including: conditional PEFT config computation from policy.lora_cfg, pre-wrap hook generation and registration, composed peft_hook passed to model construction, and checkpoint loading logic that respects LoRA state with explicit finetune flag control.
Functional Testing
tests/functional/test_mbridge_lora_sft.sh
New test script executing SFT run with LoRA configuration, capturing metrics, validating loss constraints (train/loss at step 3 < 5.9), and managing checkpoint cleanup.

Sequence Diagram(s)

sequenceDiagram
    participant PolicyWorker as Policy Worker
    participant Config as Policy Config
    participant PEFTHook as PEFT Hook<br/>Manager
    participant Model as Megatron<br/>Model
    participant Checkpoint as Checkpoint<br/>Loader
    
    PolicyWorker->>Config: Load policy.lora_cfg
    alt LoRA Enabled
        Config-->>PolicyWorker: lora_cfg config
        PolicyWorker->>PEFTHook: Create LoRA config<br/>from lora_cfg
        PEFTHook->>PEFTHook: Generate<br/>_create_peft_pre_wrap_hook()
        PEFTHook->>PEFTHook: Compose peft_hook<br/>with pre-wrap behavior
        PolicyWorker->>Model: Construct with<br/>pre_wrap_hook=peft_hook
    else LoRA Disabled
        Config-->>PolicyWorker: lora_cfg = None
        PEFTHook-->>PolicyWorker: peft_hook = []
        PolicyWorker->>Model: Construct with<br/>pre_wrap_hook=[]
    end
    
    Model-->>PolicyWorker: Model created
    
    alt should_load_checkpoint
        PolicyWorker->>Checkpoint: Check checkpoint<br/>availability
        Checkpoint-->>PolicyWorker: Checkpoint found
        alt LoRA Enabled
            PolicyWorker->>Checkpoint: Set finetune=False<br/>for state loading
        end
        Checkpoint->>Checkpoint: Load checkpoint
    else
        Checkpoint-->>PolicyWorker: No checkpoint
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Implementation logic in megatron_policy_worker.py: Conditional PEFT hook generation and checkpoint loading logic with LoRA state awareness requires careful verification of control flow
  • Configuration additions: New LoRA parameter fields across both DTensor and Megatron backends need validation against actual implementation expectations
  • Integration points: PEFT hook composition and pre-wrap behavior alteration affects model initialization sequence and warrants validation

Suggested reviewers

  • joyang-nv
  • terrykong
  • ffrujeri

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Megatron SFT LoRA' directly summarizes the main change: adding LoRA support for Megatron-based Supervised Fine-Tuning, which is the primary focus of the changeset.
Test Results For Major Changes ✅ Passed PR contains major feature additions (Megatron SFT LoRA support) with comprehensive test result documentation including W&B verification link, training/validation curves comparison, and functional test assertions.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch adithyre/llama-lora-mcore

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a010564 and 0b81c63.

📒 Files selected for processing (4)
  • docs/guides/sft.md (3 hunks)
  • examples/configs/sft.yaml (1 hunks)
  • nemo_rl/models/policy/workers/megatron_policy_worker.py (6 hunks)
  • tests/functional/test_mbridge_lora_sft.sh (1 hunks)
🧰 Additional context used
📓 Path-based instructions (6)
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Update docs/index.md when a new markdown doc is added under docs/**/*.md or a markdown file is renamed, ensuring the document appears in the most appropriate section

Files:

  • docs/guides/sft.md
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • docs/guides/sft.md
  • examples/configs/sft.yaml
  • tests/functional/test_mbridge_lora_sft.sh
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/functional/test_mbridge_lora_sft.sh
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • tests/functional/test_mbridge_lora_sft.sh
  • nemo_rl/models/policy/workers/megatron_policy_worker.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/policy/workers/megatron_policy_worker.py
🧠 Learnings (3)
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/functional/test_mbridge_lora_sft.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/functional/test_mbridge_lora_sft.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/functional/test_mbridge_lora_sft.sh
🪛 markdownlint-cli2 (0.18.1)
docs/guides/sft.md

233-233: Multiple headings with the same content

(MD024, no-duplicate-heading)


236-236: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


237-237: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


238-238: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)


239-239: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (10)
tests/functional/test_mbridge_lora_sft.sh (1)

1-51: LGTM! Well-structured functional test for Megatron LoRA SFT.

The test script correctly:

  • Uses uv run per coding guidelines
  • Implements proper error handling with set -eou pipefail
  • Sets up cleanup trap for temporary checkpoints
  • Exercises the new Megatron LoRA configuration (policy.megatron_cfg.lora_cfg.enabled=true)
  • Validates training metrics with reasonable thresholds
examples/configs/sft.yaml (2)

117-129: LGTM! Comprehensive Megatron LoRA configuration.

The configuration block:

  • Includes all necessary LoRA parameters
  • Maintains enabled: false by default to preserve existing behavior (as per past review feedback)
  • Aligns with the DTensor LoRA configuration structure
  • Provides clear documentation for each parameter

132-135: Helpful clarification about AdamW usage.

The added comments correctly note that when weight_decay is set, the optimizer effectively uses AdamW. This helps users understand the actual optimizer behavior.

nemo_rl/models/policy/workers/megatron_policy_worker.py (4)

26-26: LGTM! Necessary imports for LoRA integration.

The imports correctly bring in:

  • LoRA for PEFT configuration
  • _create_peft_pre_wrap_hook for model wrapping
  • MegatronModule for type annotations

Also applies to: 54-54, 90-90


280-295: LGTM! Correct LoRA configuration extraction.

The code properly:

  • Checks if LoRA is enabled via policy configuration
  • Extracts all required LoRA parameters
  • Instantiates the LoRA PEFT configuration
  • Assigns it to cfg.peft for use in model setup

297-305: LGTM! Correct PEFT pre-wrap hook composition.

The code properly:

  • Creates the PEFT pre-wrap hook using the configuration
  • Registers it with the model configuration
  • Composes it into a callable hook for model construction
  • Falls back to an empty list when PEFT is not configured

314-314: LGTM! Correct integration of PEFT hook into model construction.

The peft_hook is properly passed as the pre_wrap_hook parameter to get_model, ensuring the PEFT transformations are applied during model instantiation.

docs/guides/sft.md (3)

171-172: LGTM! Clear documentation of LoRA backend support.

The notes correctly clarify:

  • DTensor v2 and Megatron backends support LoRA
  • DTensor v1 does not support LoRA
  • Triton kernel usage details

174-210: LGTM! Comprehensive DTensor LoRA documentation.

The section provides:

  • Clear configuration example with all parameters
  • Detailed parameter descriptions
  • Usage example with command-line override
  • Important note about Triton kernel compatibility with TP > 1

252-259: LGTM! Clear example of enabling Megatron LoRA.

The example correctly demonstrates:

  • Disabling DTensor backend
  • Enabling Megatron backend
  • Enabling Megatron LoRA configuration

This helps users understand the backend switching requirements.

@vadam5 vadam5 requested a review from terrykong December 19, 2025 22:07
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Dec 22, 2025
terrykong
terrykong previously approved these changes Dec 22, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
@yuki-97 yuki-97 added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 22, 2025
yuki-97
yuki-97 previously approved these changes Dec 22, 2025
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Co-authored-by: Yuki Huang <yukih@nvidia.com>
@vadam5 vadam5 requested review from terrykong and yuki-97 January 6, 2026 00:54
Copy link
Contributor

@yuki-97 yuki-97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update, LGTM.

as discussed in the meeting, we still have perf issue with tulu3 dataset now, tracked at #1719.

Comment on lines +36 to +42
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/loss"]["1"] < 1.0' \
'data["train/loss"]["50"] < 0.8' \
'max(data["ray/node.0.gpu.0.mem_gb"]) < 50' \
'mean(data["timing/train/total_step_time"], 2) < 10'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the latest test, max(data["ray/node.0.gpu.0.mem_gb"]) = 52.06640625 and mean(data["timing/train/total_step_time"], 2) = 22.511235587450923.

Suggested change
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/loss"]["1"] < 1.0' \
'data["train/loss"]["50"] < 0.8' \
'max(data["ray/node.0.gpu.0.mem_gb"]) < 50' \
'mean(data["timing/train/total_step_time"], 2) < 10'
# Revert to `mean(data["timing/train/total_step_time"], 2) < 30` once https://github.com/NVIDIA-NeMo/RL/issues/1719 resolved
# Only run metrics if the target step is reached
if [[ $(jq 'to_entries | .[] | select(.key == "train/loss") | .value | keys | map(tonumber) | max' $JSON_METRICS) -ge $MAX_STEPS ]]; then
uv run tests/check_metrics.py $JSON_METRICS \
'data["train/loss"]["1"] < 1.0' \
'data["train/loss"]["50"] < 0.8' \
'max(data["ray/node.0.gpu.0.mem_gb"]) < 60' \
'mean(data["timing/train/total_step_time"], 2) < 30'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation r0.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants