feat: Megatron SFT LoRA #1629

arendu · 2025-12-12T22:46:17Z

What does this PR do ?

Adds Llama LoRA SFT support via Megatron Bridge. To verify correctness we compared train and validation curves with the LoRA SFT DTensor path on the squad and tulu3 datasets. The wandb link for the verification runs

Issues

List issues that this PR closes (syntax):

Usage

uv run examples/run_sft.py \
  --config examples/configs/sft.yaml \
  policy.dtensor_cfg.enabled=false \
  policy.megatron_cfg.enabled=true \
  policy.megatron_cfg.lora_cfg.enabled=true

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

Documentation
- Enhanced LoRA configuration documentation clarifying DTensor v2 as the default backend and introducing Megatron-specific options with expanded parameter details and usage examples.
New Features
- Added comprehensive LoRA configuration support for both DTensor and Megatron backends with configurable parameters for initialization, dropout, and module targeting.
Tests
- Introduced functional test coverage for LoRA workflows with metrics validation.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: adithyare <adithyare@nvidia.com>

…s w.o crashing. WIP check correctness Signed-off-by: adithyare <adithyare@nvidia.com>

terrykong

@yaoyu-33 to review

examples/configs/sft.yaml

terrykong

@aasthajh @vadam5 could you add a companion recipe for mcore to test this path

a010564

coderabbitai · 2025-12-19T21:35:12Z

📝 Walkthrough

Walkthrough

This pull request introduces LoRA/PEFT integration for Megatron policy workers with conditional pre-wrap hooking and checkpoint-aware loading. Changes include updated documentation clarifying DTensor v2 as default, Megatron-specific LoRA configuration alongside DTensor, example configurations, implementation changes to the policy worker, and functional testing for the LoRA SFT workflow.

Changes

Cohort / File(s)	Summary
Documentation and Configuration `docs/guides/sft.md`, `examples/configs/sft.yaml`	Updated LoRA documentation to clarify DTensor v2 as default backend and added Megatron-specific LoRA configuration. Expanded parameter details for both DTensor and Megatron LoRA configurations with new fields: enabled, target_modules, exclude_modules, dim, alpha, dropout, dropout_position, lora_A_init_method, lora_B_init_method, a2a_experimental, and lora_dtype. Added example usage blocks for both backends.
LoRA/PEFT Integration `nemo_rl/models/policy/workers/megatron_policy_worker.py`	Integrated LoRA/PEFT functionality including: conditional PEFT config computation from policy.lora_cfg, pre-wrap hook generation and registration, composed peft_hook passed to model construction, and checkpoint loading logic that respects LoRA state with explicit finetune flag control.
Functional Testing `tests/functional/test_mbridge_lora_sft.sh`	New test script executing SFT run with LoRA configuration, capturing metrics, validating loss constraints (train/loss at step 3 < 5.9), and managing checkpoint cleanup.

Sequence Diagram(s)

sequenceDiagram
    participant PolicyWorker as Policy Worker
    participant Config as Policy Config
    participant PEFTHook as PEFT Hook<br/>Manager
    participant Model as Megatron<br/>Model
    participant Checkpoint as Checkpoint<br/>Loader
    
    PolicyWorker->>Config: Load policy.lora_cfg
    alt LoRA Enabled
        Config-->>PolicyWorker: lora_cfg config
        PolicyWorker->>PEFTHook: Create LoRA config<br/>from lora_cfg
        PEFTHook->>PEFTHook: Generate<br/>_create_peft_pre_wrap_hook()
        PEFTHook->>PEFTHook: Compose peft_hook<br/>with pre-wrap behavior
        PolicyWorker->>Model: Construct with<br/>pre_wrap_hook=peft_hook
    else LoRA Disabled
        Config-->>PolicyWorker: lora_cfg = None
        PEFTHook-->>PolicyWorker: peft_hook = []
        PolicyWorker->>Model: Construct with<br/>pre_wrap_hook=[]
    end
    
    Model-->>PolicyWorker: Model created
    
    alt should_load_checkpoint
        PolicyWorker->>Checkpoint: Check checkpoint<br/>availability
        Checkpoint-->>PolicyWorker: Checkpoint found
        alt LoRA Enabled
            PolicyWorker->>Checkpoint: Set finetune=False<br/>for state loading
        end
        Checkpoint->>Checkpoint: Load checkpoint
    else
        Checkpoint-->>PolicyWorker: No checkpoint
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Implementation logic in megatron_policy_worker.py: Conditional PEFT hook generation and checkpoint loading logic with LoRA state awareness requires careful verification of control flow
Configuration additions: New LoRA parameter fields across both DTensor and Megatron backends need validation against actual implementation expectations
Integration points: PEFT hook composition and pre-wrap behavior alteration affects model initialization sequence and warrants validation

Suggested reviewers

joyang-nv
terrykong
ffrujeri

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Megatron SFT LoRA' directly summarizes the main change: adding LoRA support for Megatron-based Supervised Fine-Tuning, which is the primary focus of the changeset.
Test Results For Major Changes	✅ Passed	PR contains major feature additions (Megatron SFT LoRA support) with comprehensive test result documentation including W&B verification link, training/validation curves comparison, and functional test assertions.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch adithyre/llama-lora-mcore

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a010564 and 0b81c63.

📒 Files selected for processing (4)

docs/guides/sft.md (3 hunks)
examples/configs/sft.yaml (1 hunks)
nemo_rl/models/policy/workers/megatron_policy_worker.py (6 hunks)
tests/functional/test_mbridge_lora_sft.sh (1 hunks)

🧰 Additional context used

📓 Path-based instructions (6)

docs/**/*.md