-
Notifications
You must be signed in to change notification settings - Fork 42
Upgrade container versions for common examples #776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pytorch to 25.12-py3 Nemo to 25.11.01
📝 WalkthroughWalkthroughUpdated container image tags and small test parameters across multiple test and test-scenario TOML files: PyTorch images bumped to 25.12-py3, NeMo images to 25.11.01, copyright year ranges extended, and NCCL test params added to one config. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (10)
🧰 Additional context used🧠 Learnings (4)📚 Learning: 2025-12-23T00:23:16.200ZApplied to files:
📚 Learning: 2026-01-05T22:24:31.807ZApplied to files:
📚 Learning: 2025-12-18T17:54:44.004ZApplied to files:
📚 Learning: 2025-12-17T22:24:51.805ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (10)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
conf/common/test_scenario/dse_nemo_run_llama3_8b.toml (1)
2-2: Fix copyright year to satisfy CI.The pipeline reports a copyright header year mismatch. Update the end year from 2025 to 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test_scenario/nccl_test.toml (1)
2-2: Fix copyright year to satisfy CI.The pipeline reports a copyright header year mismatch. Update the end year from 2025 to 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/nccl_test.toml (1)
2-2: Fix copyright year to satisfy CI.The pipeline reports a copyright header year mismatch. Update from 2025 to 2025-2026 (or just 2026 if this file was created this year).
Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/nemo_run_llama3_8b.toml (1)
2-2: Fix copyright year to satisfy CI.The pipeline reports a copyright header year mismatch. Update the end year from 2025 to 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/osu_test.toml (1)
2-2: Fix copyright year to satisfy CI.The pipeline reports a copyright header year mismatch. Update from 2025 to 2025-2026.
Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/dse_nccl_all_gather.toml (1)
2-2: Update copyright year to 2026.The CI pipeline reports a copyright header year mismatch. The end year should be 2026.
Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/ucc_test.toml (1)
2-2: Update copyright year to 2026.The CI pipeline reports a copyright header year mismatch. The end year should be 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test_scenario/ucc_generator_test.toml (1)
2-2: Update copyright year to 2026.The CI pipeline reports a copyright header year mismatch. The end year should be 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test_scenario/slurm_container.toml (1)
2-2: Update copyright year to 2026.The CI pipeline reports a copyright header year mismatch. The end year should be 2026.
Proposed fix
-# Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.conf/common/test/nccl_test_all_gather.toml (1)
2-2: Update copyright year to 2026.The CI pipeline reports a copyright header year mismatch. The end year should be 2026.
Proposed fix
-# Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
conf/common/test/dse_nccl_all_gather.tomlconf/common/test/nccl_test.tomlconf/common/test/nccl_test_all_gather.tomlconf/common/test/nemo_run_llama3_8b.tomlconf/common/test/osu_test.tomlconf/common/test/ucc_test.tomlconf/common/test_scenario/dse_nemo_run_llama3_8b.tomlconf/common/test_scenario/nccl_test.tomlconf/common/test_scenario/slurm_container.tomlconf/common/test_scenario/ucc_generator_test.toml
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2026-01-05T22:24:31.807Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 767
File: conf/experimental/megatron_bridge/test/gb300/megatron_bridge_qwen_30b.toml:28-37
Timestamp: 2026-01-05T22:24:31.807Z
Learning: In CloudAI Megatron-Bridge TOML configuration files, document and implement support for container_image to accept '#' as a separator in addition to '/'. For example, both 'nvcr.io/nvidia/nemo:25.11.01' and 'nvcr.io#nvidia/nemo:25.11.01' should be considered valid syntax. Update parsing/validation logic for container_image accordingly and add validation tests to cover both separator forms in all relevant TOML configs (e.g., under conf/**).
Applied to files:
conf/common/test_scenario/ucc_generator_test.tomlconf/common/test_scenario/slurm_container.tomlconf/common/test/dse_nccl_all_gather.tomlconf/common/test_scenario/dse_nemo_run_llama3_8b.tomlconf/common/test/nemo_run_llama3_8b.tomlconf/common/test_scenario/nccl_test.tomlconf/common/test/nccl_test_all_gather.tomlconf/common/test/ucc_test.tomlconf/common/test/osu_test.tomlconf/common/test/nccl_test.toml
📚 Learning: 2025-12-23T00:23:16.200Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 764
File: src/cloudai/workloads/megatron_bridge/megatron_bridge.py:98-101
Timestamp: 2025-12-23T00:23:16.200Z
Learning: In src/cloudai/workloads/megatron_bridge/megatron_bridge.py, the nemo_run_repo GitRepo uses commit="main" intentionally. Nemo Run is a Slurm executor (not a framework) used by Megatron Bridge to launch recipes, and tracking the main branch is acceptable for this dependency.
Applied to files:
conf/common/test_scenario/dse_nemo_run_llama3_8b.tomlconf/common/test/nemo_run_llama3_8b.toml
📚 Learning: 2025-12-18T17:54:44.004Z
Learnt from: allkoow
Repo: NVIDIA/cloudai PR: 742
File: doc/workloads/osu.rst:14-23
Timestamp: 2025-12-18T17:54:44.004Z
Learning: In the OSU benchmark workload (src/cloudai/workloads/osu_bench/), list-valued fields in OSUBenchCmdArgs (e.g., benchmark: str | List[str], message_size: Optional[str | List[str]]) are unrolled by the DSE (Design Space Exploration) flow before command generation, so the command generation strategy code receives only scalar values.
Applied to files:
conf/common/test/osu_test.toml
🪛 GitHub Actions: CI
conf/common/test_scenario/ucc_generator_test.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test_scenario/slurm_container.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/dse_nccl_all_gather.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test_scenario/dse_nemo_run_llama3_8b.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/nemo_run_llama3_8b.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test_scenario/nccl_test.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/nccl_test_all_gather.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/ucc_test.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/osu_test.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
conf/common/test/nccl_test.toml
[error] 2-2: Copyright header year range mismatch. Expected end year 2026, found 2025.
🔇 Additional comments (11)
conf/common/test_scenario/nccl_test.toml (1)
71-71: PyTorch container version update looks good.The update to
pytorch:25.12-py3aligns with the PR objectives.conf/common/test/nccl_test.toml (1)
22-28: Container update and new test parameters look good.The PyTorch container bump to
25.12-py3and the addition ofiters,warmup_iters, andstepfactorparameters align with the PR objectives and provide more explicit test configuration.conf/common/test/nemo_run_llama3_8b.toml (1)
22-22: NeMo container version update looks good.The update to
nemo:25.11.01aligns with the PR objectives.conf/common/test/osu_test.toml (1)
22-22: PyTorch container version update looks good.The update to
pytorch:25.12-py3aligns with the PR objectives.conf/common/test_scenario/dse_nemo_run_llama3_8b.toml (1)
29-29: NeMo container version update looks good.The update to
nemo:25.11.01aligns with project patterns. The#separator is valid per project conventions and used consistently across configuration files.conf/common/test/dse_nccl_all_gather.toml (2)
22-22: Container version update looks good.The PyTorch image upgrade from 25.06-py3 to 25.12-py3 aligns with the PR objectives.
26-31: Verify the new test parameters are intentional.These new test parameters (
maxbytes,iters,warmup_iters,extra_cmd_argswith--stepfactor) extend beyond the container version upgrade scope stated in the PR description. Please confirm these additions are intentional.conf/common/test/ucc_test.toml (1)
22-22: Container version update looks good.The PyTorch image upgrade from 25.06-py3 to 25.12-py3 aligns with the PR objectives.
conf/common/test_scenario/ucc_generator_test.toml (1)
33-33: Container version update looks good.The PyTorch image upgrade to 25.12-py3 aligns with the PR objectives. Note this file uses
/separator while others use#— both are valid per the project conventions.conf/common/test_scenario/slurm_container.toml (1)
29-29: Container version update looks good.The PyTorch image upgrade from 25.06-py3 to 25.12-py3 aligns with the PR objectives.
conf/common/test/nccl_test_all_gather.toml (1)
22-22: Container version update looks good.The PyTorch image upgrade from 25.06-py3 to 25.12-py3 aligns with the PR objectives.
Greptile OverviewGreptile SummaryThis PR upgrades container versions in common example configurations to newer releases: Container Version Updates:
Scope and Changes: Technical Review: Consistency Note: The updates are straightforward version bumps with no breaking API changes or functional modifications to the test configurations themselves. Confidence Score: 5/5
Important Files ChangedFile Analysis
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 files reviewed, 4 comments
Additional Comments (4)
This will ensure that users following the documentation are using the same container versions as the common examples. Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This will ensure consistency between documentation and the actual configuration examples.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
Summary
Pytorch to 25.12-py3
Nemo to 25.11.01
Test Plan
Additional Notes
—