-
Notifications
You must be signed in to change notification settings - Fork 208
fix: split dtensorv1 vllm dependency #1638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
📝 WalkthroughWalkthroughThe pull request reorganizes vLLM-related utilities across the codebase and adds support for multiple execution backends with vLLM variants. It moves FP8 calibration helper functions, introduces new backend executables, updates the actor/worker registry to support vLLM-specific variants, and includes temporary workarounds for backend selection logic. The changes decouple train workers from vLLM by supporting alternative backends through configurable mappings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring special attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (4)
nemo_rl/distributed/worker_groups.py (1)
169-172: Hyphen-suffix stripping cleanly decouples worker class resolution from backend taggingUsing
class_name = class_name.split("-")[0]lets you encode backend variants in the FQN (e.g.,...DTensorPolicyWorker-vllm) for environment selection while still importing the underlying class (DTensorPolicyWorker) without any code changes to that class. This is backward compatible for existing FQNs and matches the temporary workaround described in the comment; just keep this in sync with the naming used inPolicyandray_actor_environment_registry.nemo_rl/models/policy/lm_policy.py (1)
162-167: Backend-suffixed worker class name cleanly plugs into env selection, but assumes full generation configAppending
-backendtoworker_builder_clswhengeneration.colocated.enabledis False is a reasonable temporary workaround that pairs well with the FQN mangling inRayWorkerBuilderand the updated actor environment registry. It does, however, assume thatconfig["generation"]is non-Noneand contains both"colocated"and"backend"; if you ever construct aPolicywithout a fully-populatedgenerationsection, this will raise. If you expect such configs, consider guarding with a localgeneration_cfg = config.get("generation") or {}and raising a clearer error when required keys are missing.nemo_rl/models/policy/workers/megatron_policy_worker.py (1)
2474-2631: calibrate_qkv_fp8_scales now returns vLLM-style keys; update docs to matchImporting
convert_calibration_to_vllm_formatfromfp8_train_utilsand feedingresult_layersinto it means the"layers"field infinal_resultis now a flat dict keyed by vLLM parameter names (e.g.,model.layers.N.self_attn.k_scale), not by"layer_N"entries as described in the docstring. The implementation is consistent with the rest of the FP8/vLLM tooling, but the docstring should be updated to document the new structure so downstream callers know they can passfinal_result["layers"]directly into KV-scale–aware paths.nemo_rl/models/generation/vllm/quantization/fp8_train_utils.py (1)
51-96: Consider more robust parsing oflayer_<idx>keys in calibration conversionRight now
layer_idx = int(layer_key.split("_")[1])assumes a strict"layer_<int>"format; malformed keys will throw somewhat opaqueIndexError/ValueError. If you want slightly safer behavior, you could validate and re-raise with a clearer message:- vllm_scales = {} - for layer_key, scales in calibration_results.items(): - # Extract layer index from "layer_N" format - layer_idx = int(layer_key.split("_")[1]) + vllm_scales = {} + for layer_key, scales in calibration_results.items(): + # Extract layer index from keys like "layer_0" + _, _, idx_str = layer_key.partition("_") + try: + layer_idx = int(idx_str) + except ValueError as exc: + raise ValueError( + f"Invalid layer key {layer_key!r}, expected format 'layer_<int>'." + ) from excThis keeps failures explicit if the upstream calibration format changes.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (12)
nemo_rl/distributed/ray_actor_environment_registry.py(1 hunks)nemo_rl/distributed/virtual_cluster.py(1 hunks)nemo_rl/distributed/worker_groups.py(1 hunks)nemo_rl/models/generation/vllm/quantization/fp8.py(0 hunks)nemo_rl/models/generation/vllm/quantization/fp8_train_utils.py(1 hunks)nemo_rl/models/generation/vllm/vllm_backend.py(2 hunks)nemo_rl/models/generation/vllm/vllm_worker.py(1 hunks)nemo_rl/models/policy/lm_policy.py(1 hunks)nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py(0 hunks)nemo_rl/models/policy/workers/megatron_policy_worker.py(2 hunks)pyproject.toml(2 hunks)pyrefly.toml(1 hunks)
💤 Files with no reviewable changes (2)
- nemo_rl/models/policy/workers/dtensor_policy_worker_v2.py
- nemo_rl/models/generation/vllm/quantization/fp8.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/models/generation/vllm/quantization/fp8_train_utils.pynemo_rl/models/generation/vllm/vllm_backend.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/lm_policy.pynemo_rl/distributed/worker_groups.pynemo_rl/models/policy/workers/megatron_policy_worker.pynemo_rl/distributed/ray_actor_environment_registry.py
nemo_rl/**/*.py
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/models/generation/vllm/quantization/fp8_train_utils.pynemo_rl/models/generation/vllm/vllm_backend.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/lm_policy.pynemo_rl/distributed/worker_groups.pynemo_rl/models/policy/workers/megatron_policy_worker.pynemo_rl/distributed/ray_actor_environment_registry.py
!(**/tests/**|**/test_*.py|**/test_*.sh)
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/models/generation/vllm/quantization/fp8_train_utils.pynemo_rl/models/generation/vllm/vllm_backend.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/lm_policy.pynemo_rl/distributed/worker_groups.pynemo_rl/models/policy/workers/megatron_policy_worker.pynemo_rl/distributed/ray_actor_environment_registry.pypyproject.tomlpyrefly.toml
**/*.{py,sh}
📄 CodeRabbit inference engine (CODING_GUIDELINES.md)
The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)
Files:
nemo_rl/distributed/virtual_cluster.pynemo_rl/models/generation/vllm/quantization/fp8_train_utils.pynemo_rl/models/generation/vllm/vllm_backend.pynemo_rl/models/generation/vllm/vllm_worker.pynemo_rl/models/policy/lm_policy.pynemo_rl/distributed/worker_groups.pynemo_rl/models/policy/workers/megatron_policy_worker.pynemo_rl/distributed/ray_actor_environment_registry.py
🧬 Code graph analysis (3)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/vllm/quantization/fp8.py (1)
init_fp8(158-254)
nemo_rl/models/policy/workers/megatron_policy_worker.py (1)
nemo_rl/models/generation/vllm/quantization/fp8_train_utils.py (2)
get_vllm_qkv_scale_names(16-48)convert_calibration_to_vllm_format(51-96)
nemo_rl/distributed/ray_actor_environment_registry.py (1)
nemo_rl/distributed/virtual_cluster.py (1)
PY_EXECUTABLES(43-68)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint check
- GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (8)
nemo_rl/models/generation/vllm/vllm_backend.py (1)
180-187: FP8 utilities correctly re-pointed to vLLM-specific quantization moduleUpdating the FP8 imports to
nemo_rl.models.generation.vllm.quantization.fp8keeps the weight-loading logic identical while routing through the new vLLM-specific helper module, which matches the goal of isolating vLLM-related FP8 code. No issues from this change alone.Also applies to: 233-239
nemo_rl/models/generation/vllm/vllm_worker.py (1)
291-299: Scoped FP8 initialization import is consistent with new module layoutImporting
init_fp8fromnemo_rl.models.generation.vllm.quantization.fp8only whenprecision == "fp8"aligns this worker with the refactored FP8 utilities and avoids unnecessary imports on non-FP8 runs. Behavior remains the same assuminginit_fp8’s API is unchanged.pyproject.toml (1)
55-76: Splitting FSDP and training vLLM extras aligns with backend-specific executablesDefining an
fsdpextra and a separatevllm_for_trainextra cleanly supports new executables likeFSDP,FSDP_VLLM,AUTOMODEL_VLLM, andMCORE_VLLMwhile keeping the genericvllmextra available for inference-only flows. The duplicatevllm==0.11.0pin acrossvllmandvllm_for_trainis fine but should be maintained in lockstep when upgrading.Please confirm in your environment that:
uv run --locked --extra fsdp --extra vllm_for_train ...resolves correctly, and- the intended worker/py-executable mappings actually use
fsdpvsvllm_for_trainas designed.pyrefly.toml (1)
100-107: Includingfp8_train_utils.pyin pyrefly scope is appropriateAdding
nemo_rl/models/generation/vllm/quantization/fp8_train_utils.pytoproject-includeskeeps the new FP8 training helpers under static analysis and matches the refactor that moved these utilities out of the generic FP8 module.nemo_rl/distributed/virtual_cluster.py (1)
52-66: *New FSDP and _VLLM executables match the split extras modelIntroducing
PY_EXECUTABLES.FSDPand theFSDP_VLLM/AUTOMODEL_VLLM/MCORE_VLLMvariants provides a clear separation between pure training backends and those that also need vLLM installed on the training side viavllm_for_train. The string composition (... --extra fsdp ... --extra vllm_for_train) looks correct for uv’s CLI.Please confirm that:
- uv accepts multiple
--extraflags in this form, and- the updated
ray_actor_environment_registryactually routes the appropriate workers toFSDP,FSDP_VLLM,AUTOMODEL_VLLM, andMCORE_VLLMas intended.nemo_rl/models/policy/workers/megatron_policy_worker.py (1)
2056-2073: Local FP8 QKV scale-name import reduces coupling and matches the new helper modulePulling
get_vllm_qkv_scale_namesfromnemo_rl.models.generation.vllm.quantization.fp8_train_utilsat function scope in_iter_params_with_optional_kv_scaleskeeps this worker dependent only on NeMo-RL’s own FP8 utilities and avoids any implicit vLLM import at module import time. Combined with the existing logic for emitting KV-scale tensors, this cleanly supports the new calibration-to-vLLM naming flow.nemo_rl/models/generation/vllm/quantization/fp8_train_utils.py (1)
16-48: Centralized vLLM QKV scale naming looks goodThe helper cleanly encapsulates the vLLM naming convention (including the extra
.attnsegment forq_scale) and should help prevent drift from vLLM’s expected parameter structure.nemo_rl/distributed/ray_actor_environment_registry.py (1)
26-28: vLLM-specific policy worker executables are wired consistentlyDefining
MCORE_VLLM_EXECUTABLEand routing the*-vllmDTensor/Automodel/Megatron policy workers to the corresponding*_VLLMexecutables matches the existing pattern (VLLM_EXECUTABLE,MCORE_EXECUTABLE) and keeps the temporary vLLM-for-train environments clearly separated. Comments documenting the workaround are helpful.Also applies to: 33-40
guyueh1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to me for the fp8 changes
33ea56d to
e1f107c
Compare
|
e1f107c to
5cd8d8d
Compare
|
Signed-off-by: Yuki Huang <yukih@nvidia.com>
Signed-off-by: Yuki Huang <yukih@nvidia.com>
5cd8d8d to
9d49097
Compare
vllm==0.11.2itself instead of full vllm venv which includes many other things like deepgemm.Summary by CodeRabbit
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.