Skip to content

Conversation

@RolaoDenthu
Copy link

@RolaoDenthu RolaoDenthu commented Dec 21, 2025

What does this PR do ?

Add comprehensive test coverage for SGLang generation backend, including functional tests, unit tests, and nightly tests.

  • Functional Test (tests/functional/grpo_sglang.sh): Quick validation of SGLang-based GRPO training
  • Unit Tests (tests/unit/models/generation/test_sglang_generation.py): unit tests covering:
    • Basic configuration validation
    • Policy generation and tensor parallelism
    • Worker seed behavior for RLHF diversity
    • HTTP server direct API access
    • Weight updates with DTensor policy (colocated mode)
    • Prefix cache reset after weight updates
  • Nightly Test (tests/test_suites/llm/grpo-qwen3-0.6b-1n8g-sglang.sh): End-to-end convergence test for SGLang backend

Usage

  • You can potentially add a usage example below
    uv run pytest tests/unit/models/generation/test_sglang_generation.py -v
# Run functional test
bash tests/functional/grpo_sglang.sh

# Run unit tests
uv run pytest tests/unit/models/generation/test_sglang_generation.py -v

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Summary by CodeRabbit

Release Notes

  • New Features

    • Distributed generation engine using SGLang backend with HTTP weight streaming and multi-GPU support.
  • Configuration

    • New YAML configuration templates for SGLang-based experiments with customizable generation parameters.
  • Tests

    • Comprehensive test coverage for SGLang generation, including tensor parallelism, batching, and dynamic weight updates.

✏️ Tip: You can customize this high-level summary in your review settings.

PrinsYin and others added 30 commits December 6, 2025 21:12
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…a server

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
…p servers

Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
sglang: add 1B example
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Ryan <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
- Convert SGLangConfig from regular class to TypedDict inheriting GenerationConfig
- Align structure with VllmConfig pattern for consistency
- Mark all fields as NotRequired for backward compatibility
- Add sglang_kwargs field for additional ServerArgs parameters
- Add type casting in grpo.py for type safety

This maintains backward compatibility while aligning with the existing
generation config structure pattern.

Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Signed-off-by: Zhuoran Yin <yzr1914001753@gmail.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: Night <32424487+PrinsYin@users.noreply.github.com>
@RolaoDenthu
Copy link
Author

@RolaoDenthu I think in the previous PR #1580 there is a test for qwen 1.5b that shows vllm and sglang have same reward curve. It would be good to show results like that in this PR.

Right now the functional test you added is qwen3 0.6b grpo, we don't have such as test for vllm so we can't tell if they are aligned by running CI; could you change it or add a new functional test that runs the same workload as https://github.com/NVIDIA-NeMo/RL/blob/main/tests/test_suites/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.sh but with sglang backend, so that we can check the agreement of these two backends via CI runs?

I'll approve once you address my review comments and the above comment, and provided that tests are passing. I've started CI test for this PR.

Hi @guyueh1, Thanks for the review! I’ve added the new test and addressed the review comments above.

@guyueh1
Copy link
Contributor

guyueh1 commented Dec 30, 2025

@RolaoDenthu please fix the lint

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Dec 30, 2025
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@RolaoDenthu
Copy link
Author

@RolaoDenthu please fix the lint

fixed the lint and the file name.

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 5, 2026
@guyueh1
Copy link
Contributor

guyueh1 commented Jan 5, 2026

quote lint check

File nemo_rl/models/generation/sglang/init.py has zero errors but is not in pyrefly.toml in the 'project-includes' list. Please add it to this whitelist.

@guyueh1
Copy link
Contributor

guyueh1 commented Jan 5, 2026

quote build-container job

The lockfile at uv.lock needs to be updated, but --locked was provided. To update the lockfile, run uv lock.
You need to try locally build docker first to make sure it works (check docs/docker.md for instructions). During the process you will find you need to update the uv.lock file, add that change to the PR

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@RolaoDenthu
Copy link
Author

quote build-container job

The lockfile at uv.lock needs to be updated, but --locked was provided. To update the lockfile, run uv lock.
You need to try locally build docker first to make sure it works (check docs/docker.md for instructions). During the process you will find you need to update the uv.lock file, add that change to the PR

I have updated the uv.lock and pyrefly.toml.

Signed-off-by: RolaoDenthu <xinyis10@illinois.edu>
@guyueh1
Copy link
Contributor

guyueh1 commented Jan 6, 2026

@RolaoDenthu lint seems to be complaining a new file not added to the pyrefly.toml
Please check the test log, it has the shell command, please run it locally to make sure it passes, or use the pre-commit hook (as here )

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

⚠️ File Consistency Check

Check based on commit: 0513cbf (PR #1674 from add-tests)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/workers/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/workers/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/workers/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/workers/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@guyueh1 guyueh1 added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Jan 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L2 Run doctests, unit tests, functional tests, and convergence tests community-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants