-
Notifications
You must be signed in to change notification settings - Fork 42
Provide CMS-friendly documentation build #769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughAdds a CMS-specific documentation build and artifact flow, updates CI triggers to include tag pushes, simplifies the existing docs build, removes MyST/Markdown support in favor of reStructuredText, extends Makefile/Make targets for CMS mermaid builds, updates Sphinx config for CMS builds, migrates several Markdown docs to .rst, and applies minor formatting edits across workload docs and packaging metadata. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
doc/workloads/ddlb.rst (1)
61-62: Fix separator line length for "API Documentation" header.The separator line at line 62 is excessively long (33 dashes) compared to the "API Documentation" header (17 characters). This violates RST formatting conventions and is inconsistent with the same section in other workload files (e.g.,
chakra_replay.rst,ucc.rst).Fix RST separator line length
API Documentation --------------------------------- +-----------------doc/workloads/megatron_bridge.rst (1)
39-39: Standardizenum_nodestype representation in documentation examples.Line 39 shows
num_nodes = "2"(string), while line 49 showsnum_nodes = 2(integer). Use consistent format across both examples. The quoted string format is more prevalent in actual TOML configuration files and aligns with common TOML conventions.Also applies to: 49-49
🤖 Fix all issues with AI agents
In @.github/workflows/docs.yml:
- Around line 46-47: The workflow step "Set up Node.js" currently uses
actions/setup-node@v6 without specifying a Node version; update that step (uses:
actions/setup-node@v6) to pin a specific Node.js version by adding a with:
node-version: <major or exact version> field (e.g., 18 or 18.x) so the CI runs
with a reproducible Node runtime.
In @doc/USER_GUIDE.rst:
- Around line 365-369: There's a typo in the test scenario block where the key
"templat_test" is used; replace "templat_test" with the correct key "test_name"
in the [[Tests]] entry (e.g., the Tests.3 section containing "templat_test =
\"nccl_test_reduce_scatter\"") so the test definition uses the standard
"test_name" field.
In @pyproject.toml:
- Around line 61-66: The docs-cms dependency block pins Sphinx and
sphinxcontrib-mermaid to older releases; update the entries in the docs-cms
array (the package names "sphinx" and "sphinxcontrib-mermaid") to newer versions
(e.g., bump sphinx to the current 9.x series such as ~=9.1 and bump
sphinxcontrib-mermaid to at least ~=1.2.3), optionally refresh sphinx-rtd-theme
and sphinx-copybutton to their latest minors as well, then run your dependency
manager (poetry/pip) to update the lockfile and run the docs/test build to
verify no breakages.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (24)
.github/workflows/docs.ymldoc/DEV.mddoc/DEV.rstdoc/Makefiledoc/USER_GUIDE.mddoc/USER_GUIDE.rstdoc/conf.pydoc/index.mddoc/index.rstdoc/reporting.mddoc/reporting.rstdoc/workloads/bash_cmd.rstdoc/workloads/chakra_replay.rstdoc/workloads/ddlb.rstdoc/workloads/index.rstdoc/workloads/megatron_bridge.rstdoc/workloads/nemo_run.rstdoc/workloads/nixl_bench.rstdoc/workloads/nixl_kvbench.rstdoc/workloads/nixl_perftest.rstdoc/workloads/sleep.rstdoc/workloads/slurm_container.rstdoc/workloads/ucc.rstpyproject.toml
💤 Files with no reviewable changes (4)
- doc/reporting.md
- doc/index.md
- doc/USER_GUIDE.md
- doc/DEV.md
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-12-16T19:47:49.243Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:49.243Z
Learning: In the cloudai project, test cases are preferred over docstrings for documenting behavior, as tests serve as living documentation that remains relevant and verified.
Applied to files:
doc/USER_GUIDE.rst
📚 Learning: 2025-12-16T19:47:41.994Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:41.994Z
Learning: In this repository, prefer expressing behavioral documentation through tests rather than docstrings. Tests act as living, verified documentation. Reserve docstrings for interfaces or high-level descriptions, and avoid duplicating behavior that is already covered by tests.
Applied to files:
doc/conf.py
📚 Learning: 2026-01-05T22:23:56.998Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 767
File: conf/experimental/megatron_bridge/test/b200/megatron_bridge_qwen_30b.toml:38-38
Timestamp: 2026-01-05T22:23:56.998Z
Learning: For the Megatron Bridge workload, the hf_token field in test TOML configurations is intentionally left empty (hf_token = "") to fail validation if not explicitly set by the user. CloudAI does not distribute secret keys, so users must provide their own Hugging Face token, and the validation failure is by design to enforce this.
Applied to files:
doc/workloads/megatron_bridge.rst
📚 Learning: 2026-01-05T22:24:39.364Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 767
File: conf/experimental/megatron_bridge/test/gb300/megatron_bridge_qwen_30b.toml:28-37
Timestamp: 2026-01-05T22:24:39.364Z
Learning: In CloudAI Megatron-Bridge TOML configuration files, the container_image field accepts "#" as a valid separator in addition to the standard "/" separator (e.g., "nvcr.io#nvidia/nemo:25.11.01" is valid syntax).
Applied to files:
doc/workloads/megatron_bridge.rst
📚 Learning: 2025-12-23T00:23:16.200Z
Learnt from: srivatsankrishnan
Repo: NVIDIA/cloudai PR: 764
File: src/cloudai/workloads/megatron_bridge/megatron_bridge.py:98-101
Timestamp: 2025-12-23T00:23:16.200Z
Learning: In src/cloudai/workloads/megatron_bridge/megatron_bridge.py, the nemo_run_repo GitRepo uses commit="main" intentionally. Nemo Run is a Slurm executor (not a framework) used by Megatron Bridge to launch recipes, and tracking the main branch is acceptable for this dependency.
Applied to files:
doc/workloads/megatron_bridge.rst
🪛 checkmake (0.2.2)
doc/Makefile
[warning] 25-25: Missing required phony target "all"
(minphony)
[warning] 25-25: Missing required phony target "test"
(minphony)
🔇 Additional comments (21)
doc/workloads/chakra_replay.rst (1)
6-7: Formatting update aligns with RST conventions.The separator line added under "Usage Examples" is correctly sized (14 dashes for 14-character header) and follows proper reStructuredText formatting for section dividers. This change is consistent with the broader documentation formatting improvements across multiple workload files.
Please verify that the Sphinx documentation builds correctly with these formatting changes.
doc/workloads/ddlb.rst (1)
7-7: Formatting update aligns with RST conventions.The separator line added under "Usage Examples" is correctly sized (14 dashes for 14-character header) and follows proper reStructuredText formatting. This change is consistent across all workload documentation files.
doc/workloads/ucc.rst (1)
6-7: Formatting update aligns with RST conventions.The separator line added under "Usage Examples" is correctly sized (14 dashes for 14-character header) and follows proper reStructuredText formatting for section dividers. Formatting is consistent with all other section dividers in this file.
Please verify that the Sphinx documentation builds correctly with these formatting changes.
doc/workloads/sleep.rst (1)
6-7: Correct reStructuredText section formatting.The addition of the separator line "--------------" properly formats "Usage Examples" as a reStructuredText Level 2 section header, consistent with other major sections in the document (e.g., "API Documentation" on line 51). The line length correctly matches the header text width, adhering to RST conventions.
doc/workloads/nixl_kvbench.rst (1)
6-7: Correct reStructuredText section formatting.The addition of the separator line "--------------" properly formats "Usage Examples" as a reStructuredText Level 2 section header, consistent with other major sections in the document (e.g., "API Documentation" on line 62). The line length correctly matches the header text width, adhering to RST conventions.
doc/workloads/nemo_run.rst (1)
6-7: Correct reStructuredText section formatting.The addition of the separator line "--------------" properly formats "Usage Examples" as a reStructuredText Level 2 section header, consistent with other major sections in the document (e.g., "API Documentation" on line 53). The line length correctly matches the header text width, adhering to RST conventions.
pyproject.toml (1)
2-2: LGTM!The copyright year update to 2024-2026 is correct for the current year.
doc/workloads/index.rst (1)
45-45: LGTM!The update from
.mdto.rstcorrectly reflects the documentation migration from Markdown to reStructuredText, ensuring users create the proper file format when adding new workloads.doc/workloads/nixl_perftest.rst (1)
7-7: No RST formatting issue found.The file shows correct formatting with a single separator line (line 7:
----------) following the "Usage Examples" header. In reStructuredText, this is the proper and required format for section headers. The search for duplicate separator patterns returned no matches, confirming there is no duplicate line. No action needed.doc/workloads/bash_cmd.rst (1)
9-9: LGTM!The additional separator line improves visual consistency across workload documentation files.
doc/workloads/megatron_bridge.rst (1)
8-8: LGTM!The additional separator line aligns with the formatting improvements across workload documentation.
doc/conf.py (2)
23-23: LGTM!The Path import, copyright update, and version bump are routine maintenance changes that align with the project's evolution.
Also applies to: 52-52, 54-55, 60-60
96-96: LGTM!Restricting
source_suffixto RST-only correctly reflects the migration from Markdown to reStructuredText.doc/reporting.rst (1)
1-77: LGTM!The reporting documentation is well-structured with clear explanations, correct code examples, and proper cross-references. The two-level reporting system (per-test and per-scenario) is clearly explained with practical configuration examples.
doc/Makefile (4)
18-24: LGTM!The build variable definitions clearly separate standard and CMS-specific paths, aligning with the CMS build configuration in
doc/conf.py.
25-32: LGTM!The phony targets and help descriptions are clear. The static analysis warnings about missing "all" and "test" targets are makefile conventions but not required for this documentation-specific Makefile.
34-46: LGTM!The
htmlandcms-htmltargets are properly separated with distinct dependencies and environment configurations. TheCLOUDAI_DOC_TARGET=cmsenvironment variable (line 45) correctly coordinates with the conditional logic indoc/conf.py.
47-56: LGTM!The
install-mermaidprerequisite chain and thecleantarget are correctly implemented. The MMDC installation properly isolates the CMS build dependencies..github/workflows/docs.yml (1)
26-27: LGTM: Build simplification delegates to Makefile.The simplified build command properly delegates to the Makefile, which is a cleaner approach.
doc/DEV.rst (1)
1-91: LGTM: Well-structured developer documentation.The developer documentation provides clear architecture diagrams and explanations of core concepts. The Mermaid diagrams effectively visualize the module structure and class hierarchy.
doc/index.rst (1)
170-177: All toctree references in lines 170-177 are valid. Verification confirms thatreporting.rst,workloads/index.rst, andworkloads_requirements_installation.rstall exist in the doc directory.
Greptile SummaryThis PR completes a comprehensive migration of CloudAI documentation from Markdown to reStructuredText (RST) format and introduces a new CMS-friendly documentation build pipeline. The changes enable two parallel documentation builds: one for GitHub Pages using the NVIDIA Sphinx theme with inline Mermaid SVG diagrams, and another for BrightSpot CMS using the Read The Docs theme with Mermaid diagrams converted to PNG images. Key Changes:
Confidence Score: 5/5
Important Files Changed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In @doc/Makefile:
- Around line 50-55: Pin the mermaid CLI install to a specific version to make
builds reproducible: update the MMDC target so the npm install command for
@mermaid-js/mermaid-cli uses an explicit version (e.g.,
@mermaid-js/mermaid-cli@X.Y.Z) instead of the floating latest, and optionally
ensure a lockfile is used in $(BUILDDIR_CMS) or commit one; reference the MMDC
target and variables $(BUILDDIR_CMS) and $(PUPPETEER_CFG) when making the
change.
- Line 26: Add a conventional default target named "all" that depends on the
existing "html" target (i.e., add a target line like "all: html" placed before
the "help" target) and update the .PHONY declaration to include "all" alongside
help/html/cms-html/install-mermaid/clean so the new target is treated as phony.
- Around line 42-46: The Makefile's help text should mention that npm is
required for CMS builds because the cms-html target depends on install-mermaid
which installs @mermaid-js/mermaid-cli invoked via the $(MMDC) target; update
the help stanza (where build targets are documented) to add a short note that
npm/node is a prerequisite for CMS builds and reference CLOUDAI_DOC_TARGET=cms
(used in conf.py) so users know the CMS-specific build needs npm.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
doc/Makefiledoc/conf.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T19:47:41.994Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:41.994Z
Learning: In this repository, prefer expressing behavioral documentation through tests rather than docstrings. Tests act as living, verified documentation. Reserve docstrings for interfaces or high-level descriptions, and avoid duplicating behavior that is already covered by tests.
Applied to files:
doc/conf.py
🪛 checkmake (0.2.2)
doc/Makefile
[warning] 26-26: Missing required phony target "all"
(minphony)
[warning] 26-26: Missing required phony target "test"
(minphony)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Greptile Review
- GitHub Check: Run pytest (3.12)
- GitHub Check: Run pytest (3.10)
🔇 Additional comments (7)
doc/conf.py (3)
23-23: LGTM! Path import supports cross-platform path construction.The pathlib import is used appropriately at lines 91-92 to construct CMS-specific paths in a platform-independent manner.
52-52: LGTM! Metadata updates.Copyright year and version bumps are routine maintenance changes.
Also applies to: 54-55
97-97: The removal of Markdown support is safe and complete.No
.mdsource files remain in the documentation tree, confirming that the change from[".rst", ".md"]to[".rst"]aligns with the completed migration to reStructuredText.doc/Makefile (4)
17-24: LGTM! Clear and well-organized variable definitions.Build directory separation (BUILDDIR vs BUILDDIR_CMS) ensures GitHub Pages and CMS builds don't interfere with each other. Mermaid tooling paths are properly scoped under the CMS build directory.
28-33: LGTM! Clear target documentation.The help text effectively explains the two build modes and their output directories.
35-40: LGTM! GitHub Pages build target correctly configured.The build flow properly:
- Syncs documentation dependencies via uv
- Invokes Sphinx from the project root with correct source/build paths
- Creates .nojekyll for GitHub Pages static hosting
57-58: LGTM! Clean target properly removes both build directories.Removes both GitHub Pages (
_build) and CMS (_build-cms) artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/docs.yml:
- Around line 34-59: The build-cms job currently always runs; make it
conditional to save CI resources by adding an execution guard for the build-cms
job (job id: build-cms) so it only runs on desired branches or when docs change.
Update the workflow trigger or add an if: condition on the build-cms job that
checks branch/ref (e.g., main or release/*) or changes under the doc/ path (or
implement path filters in the workflow on: triggers) so the "Build CMS
documentation" step and subsequent "Upload CMS docs artifact" run only when
relevant.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
.github/workflows/docs.ymldoc/USER_GUIDE.rstpyproject.toml
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-16T19:47:49.243Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:49.243Z
Learning: In the cloudai project, test cases are preferred over docstrings for documenting behavior, as tests serve as living documentation that remains relevant and verified.
Applied to files:
doc/USER_GUIDE.rst
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Greptile Review
- GitHub Check: Run pytest (3.12)
🔇 Additional comments (5)
doc/USER_GUIDE.rst (1)
1-735: Comprehensive and well-structured user guide.The documentation is thorough, clearly written, and covers all major CloudAI workflows with practical examples. RST formatting is consistent, code blocks are properly marked, and the progression from setup through advanced topics (RunAI, hooks, Slurm modes, Nsys tracing) is logical and easy to follow.
pyproject.toml (2)
2-2: LGTM!Routine copyright year update to reflect the current year.
58-58: Good improvement!The bump from
~=1.0to~=1.2addresses the outdated dependency concern. The~=1.2specifier will allow patch updates (1.2.x) while maintaining compatibility..github/workflows/docs.yml (2)
27-27: LGTM!Delegating the build to the Makefile centralizes build logic and improves maintainability.
48-49: The Node.js version pinning addresses previous feedback. Node.js 24 is appropriate for this workflow—it was released in May 2025 and has been in Active LTS since October 2025, with support extending through October 2026.Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
24 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/workflows/docs.yml:
- Line 6: The YAML has inconsistent bracket spacing for the tags array: change
the value from "tags: [ 'v*' ]" to match the file style (no spaces inside
brackets) like "tags: ['v*']" so spacing is consistent with the other array on
the same file; update the tags declaration accordingly.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/docs.ymldoc/conf.py
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-07T14:47:30.295Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 769
File: pyproject.toml:61-66
Timestamp: 2026-01-07T14:47:30.295Z
Learning: The cloudai project requires Python >=3.10, which constrains Sphinx to version 8.1.x (Sphinx 8.2+ requires Python 3.11+).
Applied to files:
doc/conf.py
📚 Learning: 2025-12-16T19:47:41.994Z
Learnt from: amaslenn
Repo: NVIDIA/cloudai PR: 754
File: src/cloudai/_core/registry.py:226-234
Timestamp: 2025-12-16T19:47:41.994Z
Learning: In this repository, prefer expressing behavioral documentation through tests rather than docstrings. Tests act as living, verified documentation. Reserve docstrings for interfaces or high-level descriptions, and avoid duplicating behavior that is already covered by tests.
Applied to files:
doc/conf.py
🪛 YAMLlint (1.37.1)
.github/workflows/docs.yml
[error] 6-6: too many spaces inside brackets
(brackets)
[error] 6-6: too many spaces inside brackets
(brackets)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Greptile Review
- GitHub Check: Run pytest (3.12)
- GitHub Check: Run pytest (3.10)
🔇 Additional comments (6)
doc/conf.py (3)
86-92: CMS build configuration looks correct.The use of
Path(__file__).parentproperly resolves paths relative toconf.py's location, which addresses the previous concern aboutPath.cwd()pointing to the wrong directory when sphinx-build runs from the repository root. The conditional configuration for theme and mermaid output format follows Sphinx conventions.
52-55: Metadata updates are appropriate.Version bump to 1.5.0-beta and copyright year update to 2026 align with the project-wide updates.
97-97: Source suffix change aligns with RST migration.Removing
.mdfromsource_suffixis consistent with the PR objective to migrate fully to reStructuredText and the removal ofmyst_parserfrom extensions..github/workflows/docs.yml (3)
55-57: Artifact name sanitization is well-implemented.The use of
tr '/' '-'correctly handles branch names containing slashes (e.g.,feature/docs-update→feature-docs-update), ensuring valid artifact names.
35-64: CMS build job is well-structured.The job correctly:
- Sets up Node.js for Mermaid CLI (addressing the previous pin-version feedback)
- Delegates to the Makefile's
cms-htmltarget- Uploads artifacts with a dynamically generated, sanitized name
The trigger now includes tag pushes (
v*), which aligns with the suggestion to run CMS builds for releases.
47-50: Node.js 24 is confirmed to be in Active LTS (as of January 2026), making it suitable for CI. No action needed.Node.js 24 is the active long-term support version and receives full support, unlike Node.js 22 which is in maintenance mode. Using Node.js 24 in your CI workflow is appropriate and secure.
Likely an incorrect or invalid review comment.
|
Manually triggered build successfully generated cms-friendly artifact and was uploaded to web (internal, no link). A failure on deployment stage is legit as only main branch can be deployed. |
jeffnvidia
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Summary
Test Plan
Manual review and deployments.
Additional Notes
—