Skip to content

Conversation

@yuanhangsu1986
Copy link

@yuanhangsu1986 yuanhangsu1986 commented Dec 15, 2025

What does this PR do ?

Add video and audio dataloadling support for HF models

Usage

export NRL_FORCE_REBUILD_VENVS=true
uv venv
uv run python ./examples/run_vlm_sft.py cluster.gpus_per_node=8 --config ./examples/configs/sft_avlm.yaml

Additional Information

Major changes:

  1. nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py: base class GeneralConversationsJsonlDataset for loading conversation-based data structure which is described in the comments.
  2. nemo_rl/data/datasets/response_datasets/conversation_base.py: base class and functions for conversation-based data structure which is described in general_conversations_dataset.py
  3. nemo_rl/data/multimodal_utils.py: functions for loading omni data
  4. nemo_rl/data/llm_message_utils.py: loading omni data for HF-based llm model
  5. examples/configs/sft_avlm.yaml: example for loading video conversation-based dataset for Qwen3-VL with 16 frames per video.

Summary by CodeRabbit

  • New Features

    • Added support for general conversation datasets with multimodal content (images, audio, video).
    • Introduced runtime processor configuration for video and audio settings (fps, frames, sampling rate).
    • Added new configuration template for video-aware fine-tuning.
  • Enhancements

    • Separated train and validation data preprocessing for improved flexibility.
  • Dependencies

    • Added decord library for video processing support.

✏️ Tip: You can customize this high-level summary in your review settings.

@yuanhangsu1986 yuanhangsu1986 requested review from a team as code owners December 15, 2025 22:34
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

📝 Walkthrough

Walkthrough

Adds support for multimodal (audio, video, image) conversation datasets through a new GeneralConversationsJsonlDataset loader, multimodal utilities for media extraction and loading, split-specific data preprocessors, dynamic processor configuration overrides, and an example AVLM configuration file.

Changes

Cohort / File(s) Summary
Configuration
examples/configs/sft_avlm.yaml
New YAML configuration for audio-visual language model training with video tokenizer (16 frames) and GeneralConversationsJsonlDataset.
Training Script
examples/run_sft.py
Introduces separate train/val datum preprocessors; supports per-split preprocessor selection from dataset attributes or shared defaults for specific datasets (e.g., clevr_cogent).
Processor Configuration
nemo_rl/algorithms/utils.py
Adds runtime overrides in get_tokenizer for multimodal processor settings (audio sampling_rate, video fps/num_frames) when values differ from configuration; includes validation checks and warnings.
Dataset System – Base
nemo_rl/data/datasets/response_datasets/conversation_base.py
Introduces conversation metadata normalization and message fragment processing with multimodal tag support; utilities handle media file resolution, extension fallback, and validation.
Dataset System – Loader
nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
New GeneralConversationsJsonlDataset class loads JSONL conversation data with optional train/val splits and media directories; converts conversations to OpenAI-style messages with media reference resolution.
Dataset System – Registration
nemo_rl/data/datasets/response_datasets/__init__.py
Registers GeneralConversationsJsonlDataset in load_response_dataset; validates required paths and constructs dataset with optional media directories and split specifications.
Multimodal Utilities
nemo_rl/data/multimodal_utils.py
Adds multimodal constants (media tags, extensions, patterns), processor introspection utilities, message-level media extraction, and cross-format media loaders (images via PIL, audio via transformers/decord fallback, video via transformers/decord fallback).
Message Processing
nemo_rl/data/llm_message_utils.py
Refactors get_formatted_message_log to use generalized multimodal media loading via load_media_from_message; dynamically constructs tokenizer media kwargs instead of fixed image parameters; updates tensor packing dimension selection for multimodal data.
Dependencies
pyproject.toml
Adds decord dependency for video processing; updates causal-conv1d build metadata.

Sequence Diagram

sequenceDiagram
    participant App as Training Script
    participant DS as GeneralConversationsJsonlDataset
    participant Preproc as Preprocessor<br/>(conversation_base)
    participant MediaUtil as Multimodal Utils
    participant Tokenizer as Tokenizer
    
    App->>DS: Load dataset with media dirs
    DS->>DS: Load JSONL conversations
    DS->>Preproc: Process example
    Preproc->>Preproc: Normalize metadata
    Preproc->>Preproc: Split message by media tags
    Preproc->>MediaUtil: Resolve media file paths
    MediaUtil->>MediaUtil: Check/extend file extensions
    MediaUtil-->>Preproc: Resolved media paths
    Preproc-->>DS: Formatted messages with placeholders
    DS-->>App: Preprocessed example
    App->>Tokenizer: Tokenize + Load media
    Tokenizer->>MediaUtil: load_media_from_message
    MediaUtil->>MediaUtil: Load images (PIL)<br/>Load audio (transformers/decord)<br/>Load video (transformers/decord)
    MediaUtil-->>Tokenizer: Media tensors
    Tokenizer->>Tokenizer: Pack tensors<br/>by computed dimension
    Tokenizer-->>App: Token IDs + multimodal embeddings
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Multimodal utilities (multimodal_utils.py): High logic density with multiple media loaders, processor introspection, and fallback handling (audio/video decord fallbacks, file extension resolution).
  • Dataset processor logic (general_conversations_dataset.py, conversation_base.py): New multimodal-aware message processing with metadata normalization and media path validation.
  • Message processing integration (llm_message_utils.py): Refactored media loading pipeline and dynamic tensor packing dimension selection; requires careful verification of multimodal kwargs construction and tokenizer integration.
  • Split-specific preprocessor routing (run_sft.py): Conditional logic for selecting train/val preprocessors; validate correctness of data flow for datasets with/without datum_preprocessor attribute.

Possibly related PRs

  • refactor: refactor dataset module #977: Introduces the foundational response datasets system that this PR extends with multimodal support, conversation processing, and dataset loader registration.

Suggested labels

CI, Run CICD

Suggested reviewers

  • odelalleau
  • ashors1
  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning PR introduces major multimodal dataloader features without documenting any test results, performance metrics, convergence data, or CI test status. Include unit tests for new dataset/utility classes, integration tests for end-to-end data loading, performance benchmarks, and convergence/loss curves demonstrating no regression.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Omni dataloader for HF models' accurately summarizes the main change: adding multimodal (omni) dataloading support for Hugging Face models across multiple data files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/run_sft.py (1)

144-147: Inconsistent config access pattern between train and validation.

Training dataset (lines 131-132) accesses data_config["add_bos"] and data_config["add_eos"] directly, while validation (lines 145-146) uses .get("add_bos", True) with defaults. This inconsistency could mask configuration errors.

Either both should use direct access (if these are required config keys) or both should use .get() with defaults. As per coding guidelines, YAML should be the single source of truth for defaults, and required attributes should be accessed directly.

🧹 Nitpick comments (14)
nemo_rl/data/llm_message_utils.py (2)

614-616: Remove stray blank line inside code block.

Line 615 contains an empty line with extra whitespace that breaks the code block flow.

             if "video" in media_cur_message:
                 media_kwargs["videos"] = media_cur_message["video"]
-    
             processed_chunk = tokenizer(

542-564: Remove debug print statements before merge.

This debug block uses a function attribute (_debug_printed) to gate one-time printing, which is an unusual pattern that adds mutable state to the function object. This should be removed before merging, or converted to proper logging with configurable verbosity.

examples/run_sft.py (1)

121-123: Consider defensive key access for datum_preprocessor.

The code assumes data.datum_preprocessor contains both "train" and "val" keys when present. If a dataset provides only one key, this would raise a KeyError.

Consider using .get() with a fallback:

     elif hasattr(data, "datum_preprocessor"):
-        datum_preprocessor_train = data.datum_preprocessor["train"]
-        datum_preprocessor_val = data.datum_preprocessor["val"]
+        datum_preprocessor_train = data.datum_preprocessor.get("train")
+        datum_preprocessor_val = data.datum_preprocessor.get("val")

This would gracefully handle cases where only one split has a preprocessor.

nemo_rl/algorithms/utils.py (2)

327-330: Add stacklevel=2 to warnings for proper caller attribution.

When warnings.warn() is called from a utility function, setting stacklevel=2 ensures the warning points to the calling code rather than this function, making debugging easier.

                 warnings.warn(
-                    f"Overriding audio sampling rate from {processor.feature_extractor.sampling_rate} to {new_sampling_rate}"
+                    f"Overriding audio sampling rate from {processor.feature_extractor.sampling_rate} to {new_sampling_rate}",
+                    stacklevel=2,
                 )

Apply the same fix to warnings at lines 336-338 and 344-346.


340-347: Consider explicit validation for mutually exclusive fps/num_frames.

The comment acknowledges that fps and num_frames cannot co-exist, but the code defers the error to downstream components. Providing an early, clear error message would improve the user experience.

+            if "fps" in tokenizer_config["video"] and "num_frames" in tokenizer_config["video"]:
+                raise ValueError(
+                    "Cannot specify both 'fps' and 'num_frames' in video configuration; they are mutually exclusive."
+                )
             # fps and num_frames cannot co-exist, but let it crash later
             if "num_frames" in tokenizer_config["video"] and \
                 tokenizer_config["video"]["num_frames"] != processor.video_processor.num_frames:
nemo_rl/data/datasets/response_datasets/__init__.py (1)

146-157: Consider adding GeneralConversationsJsonlDataset to __all__.

The new class is imported but not exported in __all__. If it should be part of the public API (importable directly from this module), add it to the list:

 __all__ = [
     "CLEVRCoGenTDataset",
     "DeepScalerDataset",
     "DAPOMath17KDataset",
+    "GeneralConversationsJsonlDataset",
     "Geometry3KDataset",
     "OpenAIFormatDataset",
     "OasstDataset",
     "OpenMathInstruct2Dataset",
     "RefCOCODataset",
     "ResponseDataset",
     "SquadDataset",
 ]

If the class is only intended to be used through load_response_dataset(), this can be safely ignored.

nemo_rl/data/datasets/response_datasets/conversation_base.py (3)

15-24: Remove unused imports.

io, copy, dataclasses, and Path are imported but never used in this file.

 import os
 import re
-import io
-import copy
 import warnings
-import dataclasses
 from PIL import Image
-from pathlib import Path
 from collections import defaultdict
 from typing import Any, Dict, Callable, Optional

37-67: Add stacklevel to warnings.warn and fix minor style issue.

The warning should include stacklevel=2 so the warning points to the caller rather than this function. Also, there's an extra space before del on line 48.

             if tag_mapped not in data:
                 data[tag_mapped] = data[tag]
-                del  data[tag]
+                del data[tag]
             else:
                 warnings.warn(
-                    f"Trying to map {tag} to {tag_mapped}, but {tag_mapped} already exists in the raw data. Mapping is not carried out."
+                    f"Trying to map {tag} to {tag_mapped}, but {tag_mapped} already exists in the raw data. Mapping is not carried out.",
+                    stacklevel=2,
                 )

92-92: Unused loop variable.

The loop variable i is not used. Rename to _ to indicate it's intentionally unused.

-    for i, part in enumerate(parts):
+    for _, part in enumerate(parts):

Or simply iterate without enumerate:

-    for i, part in enumerate(parts):
+    for part in parts:
nemo_rl/data/multimodal_utils.py (2)

285-285: Avoid using assert for runtime validation.

Using assert for input validation can be disabled with -O flag. Use an explicit if check with ValueError instead.

-                assert "audio" in multimodal_load_kwargs and "sampling_rate"  in multimodal_load_kwargs["audio"]
+                if "audio" not in multimodal_load_kwargs or "sampling_rate" not in multimodal_load_kwargs["audio"]:
+                    raise ValueError("Audio loading requires 'sampling_rate' in multimodal_load_kwargs['audio']")

297-297: Variable name shadows module-level constant.

load_video_kwargs shadows the module-level constant defined on line 28. Use a different variable name to avoid confusion.

-                load_video_kwargs = multimodal_load_kwargs["video"] if "video" in multimodal_load_kwargs else {}
+                video_kwargs = multimodal_load_kwargs.get("video", {})
                 # seems decord backend loads video faster with multithread ffmpeg and it is easier to install
-                loaded_media["video"].append(load_video(vid, backend="decord", **load_video_kwargs)[0])
+                loaded_media["video"].append(load_video(vid, backend="decord", **video_kwargs)[0])
nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (3)

26-26: Remove unused debug flag.

_DEBUG=True is defined but never used in this file. Remove it or implement the intended debugging behavior.

-_DEBUG=True
-
-
 class GeneralConversationsJsonlDataset:

17-17: Remove unused import Literal.

-from typing import Any, Optional, Literal
+from typing import Any, Optional

30-34: Fix typos in docstring.

-    """Loads general conversation datasets that have the json (manifest) files and media files in separate files (jsonl datasets).
-    Each sample can be single/multi-turn converstaions with multiple modalities.
+    """Loads general conversation datasets that have the JSON (manifest) files and media files in separate files (JSONL datasets).
+    Each sample can be single/multi-turn conversations with multiple modalities.
     Each modality can have one or more number of media objects.
-    There is no requiement of where the media tag (e.g. '<sound>') should appear in the conversations.
+    There is no requirement of where the media tag (e.g. '<sound>') should appear in the conversations.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a010564 and 1c049b4.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • examples/configs/sft_avlm.yaml (1 hunks)
  • examples/run_sft.py (3 hunks)
  • nemo_rl/algorithms/utils.py (1 hunks)
  • nemo_rl/data/datasets/response_datasets/__init__.py (2 hunks)
  • nemo_rl/data/datasets/response_datasets/conversation_base.py (1 hunks)
  • nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (1 hunks)
  • nemo_rl/data/llm_message_utils.py (4 hunks)
  • nemo_rl/data/multimodal_utils.py (3 hunks)
  • pyproject.toml (2 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/algorithms/utils.py
  • nemo_rl/data/llm_message_utils.py
  • nemo_rl/data/datasets/response_datasets/conversation_base.py
  • examples/run_sft.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
  • nemo_rl/data/multimodal_utils.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/algorithms/utils.py
  • nemo_rl/data/llm_message_utils.py
  • nemo_rl/data/datasets/response_datasets/conversation_base.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
  • nemo_rl/data/multimodal_utils.py
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • nemo_rl/algorithms/utils.py
  • examples/configs/sft_avlm.yaml
  • nemo_rl/data/llm_message_utils.py
  • nemo_rl/data/datasets/response_datasets/conversation_base.py
  • examples/run_sft.py
  • pyproject.toml
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
  • nemo_rl/data/multimodal_utils.py
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • nemo_rl/algorithms/utils.py
  • nemo_rl/data/llm_message_utils.py
  • nemo_rl/data/datasets/response_datasets/conversation_base.py
  • examples/run_sft.py
  • nemo_rl/data/datasets/response_datasets/__init__.py
  • nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
  • nemo_rl/data/multimodal_utils.py
🧬 Code graph analysis (4)
nemo_rl/data/datasets/response_datasets/conversation_base.py (1)
nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (1)
  • process_message_fragment (107-118)
examples/run_sft.py (1)
nemo_rl/data/datasets/response_datasets/clevr.py (1)
  • format_clevr_cogent_dataset (33-62)
nemo_rl/data/datasets/response_datasets/__init__.py (2)
nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (1)
  • GeneralConversationsJsonlDataset (29-154)
nemo_rl/data/datasets/utils.py (1)
  • get_extra_kwargs (86-102)
nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (3)
nemo_rl/data/datasets/utils.py (1)
  • load_dataset_from_path (64-83)
nemo_rl/data/interfaces.py (1)
  • TaskDataSpec (53-86)
nemo_rl/data/datasets/response_datasets/conversation_base.py (2)
  • convert_metadata (38-67)
  • conversation_process_message (70-131)
🪛 Ruff (0.14.8)
nemo_rl/algorithms/utils.py

327-327: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)


336-336: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)


344-344: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

nemo_rl/data/datasets/response_datasets/conversation_base.py

50-50: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)


74-74: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


77-77: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


78-78: Unused function argument: tags_mapping_sample_to_allowed

(ARG001)


92-92: Loop control variable i not used within loop body

Rename unused i to _i

(B007)


117-117: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/data/datasets/response_datasets/__init__.py

119-121: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/data/multimodal_utils.py

271-271: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


288-288: Do not use bare except

(E722)


291-291: Undefined name tokenizer

(F821)


291-291: Undefined name key

(F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (5)
nemo_rl/algorithms/utils.py (1)

323-347: Multimodal processor configuration overrides look good.

The runtime override approach for audio sampling rate and video fps/num_frames is a reasonable way to synchronize processor settings with configuration without modifying the underlying model configs. The warnings appropriately inform users of the overrides.

nemo_rl/data/datasets/response_datasets/__init__.py (1)

117-135: Implementation follows existing patterns correctly.

The new GeneralConversationsJsonlDataset branch properly validates required config, extracts optional kwargs, and instantiates the dataset class. This mirrors the existing ResponseDataset pattern.

nemo_rl/data/multimodal_utils.py (2)

27-28: LGTM!

Dynamic extraction of function parameter names using inspect.signature is a clean approach for forward compatibility.


33-65: LGTM!

Well-organized media tag definitions with appropriate reverse mappings and regex pattern compilation. The warning comment about avoiding cyclic mappings is helpful.

nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py (1)

106-118: LGTM!

The process_message_fragment method correctly handles media path resolution with fallback to media directory, and properly splits compound tags like "video-audio" into separate fragment entries.

Comment on lines +11 to +14
train_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_10000.jsonl
val_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_100.jsonl
train_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
val_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Replace hardcoded user-specific paths with placeholders.

The example config contains absolute paths to user-specific directories on an internal filesystem (/lustre/fsw/portfolios/llmservice/users/yuanhangs/...). These paths won't work for other users and expose internal infrastructure details.

Consider using placeholder paths that users can customize:

 data:
   dataset_name: GeneralConversationsJsonlDataset
-  train_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_10000.jsonl
-  val_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_100.jsonl
-  train_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
-  val_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
+  train_data_path: /path/to/train_data.jsonl  # Replace with your training data path
+  val_data_path: /path/to/val_data.jsonl  # Replace with your validation data path
+  train_media_data_dir: /path/to/train_media/  # Directory containing training media files
+  val_media_data_dir: /path/to/val_media/  # Directory containing validation media files
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
train_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_10000.jsonl
val_data_path: /lustre/fsw/portfolios/llmservice/users/yuanhangs/codes/megatron-lm-omcat/megatron-lm-vlm2/examples/multimodal/avlm/test/datasets/miradata_bat1_filtered_vision_5min_100.jsonl
train_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
val_media_data_dir: /lustre/fsw/portfolios/edgeai/projects/edgeai_riva_rivamlops/data/videomme/MiraData/video/batch1/5min
data:
dataset_name: GeneralConversationsJsonlDataset
train_data_path: /path/to/train_data.jsonl # Replace with your training data path
val_data_path: /path/to/val_data.jsonl # Replace with your validation data path
train_media_data_dir: /path/to/train_media/ # Directory containing training media files
val_media_data_dir: /path/to/val_media/ # Directory containing validation media files
🤖 Prompt for AI Agents
In examples/configs/sft_avlm.yaml around lines 11 to 14, the file contains
hardcoded, user-specific absolute paths that expose internal infrastructure and
will fail for other users; replace those literal paths with configurable
placeholders (e.g., ${DATA_DIR}/train.jsonl, ${DATA_DIR}/val.jsonl,
${MEDIA_DIR}/train_media, ${MEDIA_DIR}/val_media) or relative paths and update
any docs/README to instruct users to set the environment variables or edit the
placeholders to point to their local dataset and media directories before
running.

Comment on lines +70 to +80
def conversation_process_message(
metadata: Dict[str, Any],
message: Dict[str, str],
media_index: dict,
raw: Dict[str, Any] = {},
allow_empty_text: bool = False,
check_if_media_file_exist: bool = True,
tried_default_extensions: set = set(),
tags_mapping_sample_to_allowed: Dict[str, str] = multimodal_utils.media_tags_to_allowed,
process_message_fragment: Callable = lambda tag, fragment: [{tag: fragment}],
) -> list[Dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Mutable default arguments will cause bugs.

raw={} and tried_default_extensions=set() are mutable defaults. The tried_default_extensions set is modified on line 106, meaning the same set instance will be shared across all calls that don't explicitly pass this argument, causing incorrect behavior.

 def conversation_process_message(
     metadata: Dict[str, Any],
     message: Dict[str, str],
     media_index: dict,
-    raw: Dict[str, Any] = {},
+    raw: Optional[Dict[str, Any]] = None,
     allow_empty_text: bool = False,
     check_if_media_file_exist: bool = True,
-    tried_default_extensions: set = set(),
+    tried_default_extensions: Optional[set] = None,
     tags_mapping_sample_to_allowed: Dict[str, str] = multimodal_utils.media_tags_to_allowed,
     process_message_fragment: Callable = lambda tag, fragment: [{tag: fragment}],
 ) -> list[Dict[str, Any]]:
     """
     Args:
         raw: dictionary with all webdataset compliant keys of a sample. 
             Emtpy for jsonl dataset, non-empty otherwise
         metadata: 
     """
+    if raw is None:
+        raw = {}
+    if tried_default_extensions is None:
+        tried_default_extensions = set()
     fragments = []    
🧰 Tools
🪛 Ruff (0.14.8)

74-74: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


77-77: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


78-78: Unused function argument: tags_mapping_sample_to_allowed

(ARG001)

🤖 Prompt for AI Agents
In nemo_rl/data/datasets/response_datasets/conversation_base.py around lines 70
to 80, the function uses mutable default arguments raw={} and
tried_default_extensions=set(), which leads to shared state across calls
(tried_default_extensions is mutated on line 106). Change both defaults to None
in the signature, then inside the function initialize raw = {} if raw is None
and tried_default_extensions = set() if tried_default_extensions is None so each
call gets a fresh object; ensure any downstream logic continues to use these
local variables.

allow_empty_text: bool = False,
check_if_media_file_exist: bool = True,
tried_default_extensions: set = set(),
tags_mapping_sample_to_allowed: Dict[str, str] = multimodal_utils.media_tags_to_allowed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Unused parameter tags_mapping_sample_to_allowed.

This parameter is declared but never used in the function body. Either remove it or implement the intended functionality.

🧰 Tools
🪛 Ruff (0.14.8)

78-78: Unused function argument: tags_mapping_sample_to_allowed

(ARG001)

🤖 Prompt for AI Agents
In nemo_rl/data/datasets/response_datasets/conversation_base.py around line 78,
the parameter tags_mapping_sample_to_allowed is declared but never used; either
remove it from the function signature and update any callers, or implement its
intended behavior by applying the mapping to the sample's tags (e.g., map each
sample tag via tags_mapping_sample_to_allowed to an allowed tag and use that
result wherever allowed tags are computed/validated), and then adjust type hints
and add/modify tests to cover the mapping behavior.

Comment on lines +96 to +97
if not isinstance(metadata[tag], list):
metadata[tag] = [metadata[tag]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n nemo_rl/data/datasets/response_datasets/conversation_base.py | sed -n '80,110p'

Repository: NVIDIA-NeMo/RL

Length of output: 1585


🏁 Script executed:

cat -n nemo_rl/data/datasets/response_datasets/conversation_base.py | sed -n '65,95p'

Repository: NVIDIA-NeMo/RL

Length of output: 1333


🏁 Script executed:

rg -n "conversation_process_message" nemo_rl/data/datasets/response_datasets/

Repository: NVIDIA-NeMo/RL

Length of output: 306


🏁 Script executed:

cat -n nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py | sed -n '130,160p'

Repository: NVIDIA-NeMo/RL

Length of output: 1302


🏁 Script executed:

cat -n nemo_rl/data/datasets/response_datasets/conversation_base.py | sed -n '1,70p'

Repository: NVIDIA-NeMo/RL

Length of output: 2819


🏁 Script executed:

rg -n "media_tags" nemo_rl/data/multimodal_utils.py | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 277


🏁 Script executed:

sed -n '33,70p' nemo_rl/data/multimodal_utils.py

Repository: NVIDIA-NeMo/RL

Length of output: 1062


Add validation to prevent KeyError when media tag is in message but missing from metadata.

At line 96, metadata[tag] is accessed without checking if tag exists in the metadata dictionary. If a message contains a media tag like <image> but the metadata dict lacks the corresponding key, a KeyError will be raised. Add a check before accessing metadata[tag] to handle missing tags gracefully (either skip the tag, raise a clear error, or provide a default value).

🤖 Prompt for AI Agents
In nemo_rl/data/datasets/response_datasets/conversation_base.py around lines
96-97, the code accesses metadata[tag] without verifying tag exists which can
raise KeyError; add a presence check before accessing metadata[tag] (e.g., if
tag not in metadata: either continue to skip processing that tag, or set a
sensible default like metadata[tag]=[] or raise a clear ValueError with a
descriptive message), then proceed with the existing list-wrapping logic.

Comment on lines +94 to +102
self.formatted_ds = {
"train": train_ds,
"validation": val_ds,
}

self.datum_preprocessor = {
"train": partial(self._datum_preprocessor, media_directory=train_media_data_dir),
"val": partial(self._datum_preprocessor, media_directory=val_media_data_dir)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent dictionary keys: "validation" vs "val".

formatted_ds uses "validation" as the key, but datum_preprocessor uses "val". This inconsistency could cause lookup failures when accessing preprocessors by split name.

         self.datum_preprocessor = {
             "train": partial(self._datum_preprocessor, media_directory=train_media_data_dir),
-            "val": partial(self._datum_preprocessor, media_directory=val_media_data_dir)
+            "validation": partial(self._datum_preprocessor, media_directory=val_media_data_dir)
         }
🤖 Prompt for AI Agents
In nemo_rl/data/datasets/response_datasets/general_conversations_dataset.py
around lines 94 to 102, the split keys are inconsistent: formatted_ds uses
"validation" while datum_preprocessor uses "val", which will cause lookups to
fail; fix by making the keys identical (e.g., change the "val" key in
datum_preprocessor to "validation" or rename "validation" to "val" so both dicts
use the same split name), or preferably derive both from a single split-name
constant; update only the dictionary keys to match and ensure any downstream
access uses the chosen split name.

Comment on lines +268 to +272
def load_media_from_message(
message: dict[str, Any],
processor = None,
multimodal_load_kwargs: dict[str, dict[str, Any]] = {},
) -> dict[list[Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mutable default argument and malformed type hint.

The multimodal_load_kwargs parameter has a mutable default {}. Also, the return type hint dict[list[Any]] is malformed; it should be dict[str, list[Any]].

 def load_media_from_message(
     message: dict[str, Any], 
     processor = None, 
-    multimodal_load_kwargs: dict[str, dict[str, Any]] = {},
-) -> dict[list[Any]]:
+    multimodal_load_kwargs: Optional[dict[str, dict[str, Any]]] = None,
+) -> dict[str, list[Any]]:
     loaded_media = defaultdict(list)
     media_in_message = get_media_from_message(message)
 
-    if not multimodal_load_kwargs and processor is not None:
+    if multimodal_load_kwargs is None:
+        multimodal_load_kwargs = {}
+    if not multimodal_load_kwargs and processor is not None:
         multimodal_load_kwargs = get_multimodal_default_settings_from_processor(processor)
🧰 Tools
🪛 Ruff (0.14.8)

271-271: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🤖 Prompt for AI Agents
In nemo_rl/data/multimodal_utils.py around lines 268 to 272, change the mutable
default and malformed return hint: replace multimodal_load_kwargs: dict[str,
dict[str, Any]] = {} with multimodal_load_kwargs: Optional[dict[str, dict[str,
Any]]] = None (import Optional if needed) and at function start set
multimodal_load_kwargs = multimodal_load_kwargs or {} to avoid a shared mutable
default; also correct the return type hint from dict[list[Any]] to dict[str,
list[Any]].

if "audio" in media_in_message:
for aud in media_in_message["audio"]:
if isinstance(aud, str):
print(multimodal_load_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove debug print statement.

This print(multimodal_load_kwargs) appears to be a debug artifact and should be removed before merging.

-                print(multimodal_load_kwargs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print(multimodal_load_kwargs)
🤖 Prompt for AI Agents
In nemo_rl/data/multimodal_utils.py around line 284, remove the stray debug
print statement "print(multimodal_load_kwargs)"; simply delete that line so the
function no longer emits debug output, and run tests/lint to ensure no leftover
unused variables or formatting issues.

Comment on lines +288 to +291
except:
# use decord
loaded_audio = decord.AudioReader(aud, sample_rate=multimodal_load_kwargs["audio"]["sampling_rate"], mono=True)
loaded_media["audio"].append(loaded_audio[:].asnumpy()[get_dim_to_pack_along(tokenizer, key)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Bare except clause and undefined variables.

  1. The bare except: catches all exceptions including KeyboardInterrupt and SystemExit, masking errors. Specify the expected exception type.
  2. Line 291 references tokenizer and key which are undefined in this scope—this will cause a NameError at runtime.
-                try:
-                    loaded_media["audio"].append(load_audio(aud, **multimodal_load_kwargs["audio"]))
-                except:
-                    # use decord
-                    loaded_audio = decord.AudioReader(aud, sample_rate=multimodal_load_kwargs["audio"]["sampling_rate"], mono=True)
-                    loaded_media["audio"].append(loaded_audio[:].asnumpy()[get_dim_to_pack_along(tokenizer, key)])
+                try:
+                    loaded_media["audio"].append(load_audio(aud, **multimodal_load_kwargs["audio"]))
+                except Exception as e:
+                    # Fallback to decord for unsupported formats
+                    loaded_audio = decord.AudioReader(aud, sample_rate=multimodal_load_kwargs["audio"]["sampling_rate"], mono=True)
+                    loaded_media["audio"].append(loaded_audio[:].asnumpy()[0])  # Fix: use appropriate index

The processor variable should likely be used instead of tokenizer, and key should be "audio" or similar. Please verify the intended logic.

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.14.8)

288-288: Do not use bare except

(E722)


291-291: Undefined name tokenizer

(F821)


291-291: Undefined name key

(F821)

🤖 Prompt for AI Agents
In nemo_rl/data/multimodal_utils.py around lines 288–291, replace the bare
except with a specific exception handler (e.g., except (RuntimeError,
ValueError, OSError):) so we don't swallow KeyboardInterrupt/SystemExit, and
inside the handler use the correct variables — replace the undefined tokenizer
with processor and use the explicit key "audio" (or the variable that holds the
modality string) when indexing, making sure processor is available in scope;
re-raise unexpected exceptions if needed or let them propagate.

transformer-engine-torch = [{ requirement = "torch", match-runtime = true }]
mamba-ssm = [{ requirement = "torch", match-runtime = true }]
causal-conv1d = [{ requirement = "torch", match-runtime = true }]
causal-conv1d = [{ requirement = "torch", match-runtime = true }, "setuptools.build_meta"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's locate and examine the pyproject.toml file
find . -name "pyproject.toml" -type f

Repository: NVIDIA-NeMo/RL

Length of output: 257


🏁 Script executed:

# Once found, let's look at the relevant lines around 232 and context
head -250 pyproject.toml | tail -50

Repository: NVIDIA-NeMo/RL

Length of output: 2448


🏁 Script executed:

# Also check around line 112 for the build dependency section
head -120 pyproject.toml | tail -30

Repository: NVIDIA-NeMo/RL

Length of output: 1285


🏁 Script executed:

# Let's see the entire dependency-groups section and extra-build-dependencies
grep -n "extra-build-dependencies\|dependency-groups\|causal-conv1d\|setuptools" pyproject.toml | head -20

Repository: NVIDIA-NeMo/RL

Length of output: 793


🌐 Web query:

uv extra-build-dependencies format package name validation

💡 Result:

Short answer — format and validation

  • Keys in tool.uv.extra-build-dependencies are package/project names (the package you want to augment). Values are either:

    • a list of requirement strings (PEP 508 requirement specifiers), e.g. ["setuptools", "torch==2.6.0"], or
    • a list of objects with fields like requirement (a PEP‑508 requirement string) and match-runtime (bool), e.g. [{ requirement = "torch", match-runtime = true }]. [1][2]
  • The requirement strings follow standard pip/PEP‑508 requirement syntax (project name + optional version specifiers, extras, markers, direct URLs, etc.). uv parses/validates these using a PEP‑508-compatible requirement parser, so invalid PEP‑508 requirement strings or invalid package identifiers will be rejected. [2][3]

  • Practical examples and behavior (matching runtime, rebuild triggers, experimental flag) are documented in the uv config docs and the project’s examples/issues. If you get a validation error, ensure your package names/requirement strings conform to PEP‑508 (or use the exploded table style shown in the docs). [1][2][4]

Sources:

  • uv docs — extra-build-dependencies and examples. [1]
  • uv docs — configuring projects / augmenting build dependencies (match-runtime examples). [2]
  • pip / PEP‑508 requirement specifier reference (requirement string format). [3]
  • uv GitHub issues / implementation notes showing PEP‑508 parsing/validation behavior. [4]

🌐 Web query:

causal-conv1d build requirements setuptools

💡 Result:

Short answer — to build causal-conv1d from source you need:

  • Python >= 3.9. [1]
  • setuptools (and wheel) available for building a Python wheel. [2]
  • A compatible PyTorch installation (headers) — the extension builds against PyTorch/C++/CUDA. [1]
  • CUDA toolkit (nvcc) and a matching CUDA driver for the prebuilt CUDA kernels (or ROCm toolchain for AMD/ROCm builds; the project documents a ROCm 6.0 patch). [1][3]
  • A C/C++ compiler toolchain (gcc/clang on Linux; MSVC on Windows) to compile the extension. [1]

References:

  • PyPI project page (requirements, Python version, ROCm note). [1]
  • Repository setup/build context and reports of setuptools being required during install. [2][3]

🏁 Script executed:

# Let's verify if setuptools.build_meta would be accepted by PEP 508 parsers
# by checking the causal-conv1d repository's own build configuration
curl -s https://raw.githubusercontent.com/Dao-AILab/causal-conv1d/v1.5.0.post8/pyproject.toml 2>/dev/null | head -50

Repository: NVIDIA-NeMo/RL

Length of output: 72


🏁 Script executed:

# Also check if uv has any special handling or if this causes issues
# by examining uv's requirement parsing behavior
python3 << 'EOF'
# Simulate PEP 508 parsing to verify if "setuptools.build_meta" is valid
try:
    # Try using packaging library for PEP 508 parsing
    import subprocess
    result = subprocess.run(['pip', 'list'], capture_output=True, text=True)
    if 'packaging' in result.stdout:
        from packaging.requirements import Requirement
        try:
            req = Requirement("setuptools.build_meta")
            print(f"✓ 'setuptools.build_meta' parsed as valid: {req}")
        except Exception as e:
            print(f"✗ 'setuptools.build_meta' failed PEP 508 parsing: {e}")
    else:
        print("packaging not installed, installing...")
        subprocess.run(['pip', 'install', 'packaging'], capture_output=True)
        from packaging.requirements import Requirement
        try:
            req = Requirement("setuptools.build_meta")
            print(f"✓ 'setuptools.build_meta' parsed as valid: {req}")
        except Exception as e:
            print(f"✗ 'setuptools.build_meta' failed PEP 508 parsing: {e}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: NVIDIA-NeMo/RL

Length of output: 122


🏁 Script executed:

# Check if "setuptools.build_meta" is actually a package on PyPI
python3 << 'EOF'
import urllib.request
import json

try:
    url = "https://pypi.org/pypi/setuptools.build_meta/json"
    with urllib.request.urlopen(url, timeout=5) as response:
        data = json.loads(response.read())
        print(f"✓ Package 'setuptools.build_meta' found on PyPI")
        print(f"  Latest version: {data['info']['version']}")
except urllib.error.HTTPError as e:
    if e.code == 404:
        print(f"✗ Package 'setuptools.build_meta' NOT found on PyPI (404)")
    else:
        print(f"✗ HTTP Error {e.code}: {e}")
except Exception as e:
    print(f"✗ Error checking PyPI: {e}")
EOF

Repository: NVIDIA-NeMo/RL

Length of output: 208


🏁 Script executed:

# Also verify what setuptools actually provides
python3 << 'EOF'
try:
    import setuptools
    # Check if setuptools.build_meta exists as a module
    try:
        import setuptools.build_meta
        print("✓ setuptools.build_meta module exists within setuptools package")
        print(f"  Location: {setuptools.build_meta.__file__}")
    except ImportError as e:
        print(f"✗ setuptools.build_meta module NOT found: {e}")
except ImportError:
    print("setuptools not installed in test environment")
EOF

Repository: NVIDIA-NeMo/RL

Length of output: 189


🏁 Script executed:

# Check the causal-conv1d's actual build configuration by looking at its setup
git ls-files | grep -E "causal.*conv.*1d" || echo "No causal-conv1d files in repo"

Repository: NVIDIA-NeMo/RL

Length of output: 88


🏁 Script executed:

# Test if uv would actually fail or succeed with setuptools.build_meta
# by checking if uv tries to resolve it as a package
python3 << 'EOF'
# PEP 508 parsing accepts it, but PyPI package resolution won't find it
# Let's check if setuptools.build_meta is a real PyPI package by normalizing the name

from packaging.utils import canonicalize_name

pkg_name = "setuptools.build_meta"
canonical = canonicalize_name(pkg_name)
print(f"Original name: {pkg_name}")
print(f"Canonical name: {canonical}")
print(f"Would uv try to find a package named '{canonical}' on PyPI: Yes")
print(f"\nAnalysis:")
print(f"- 'setuptools' is a real package on PyPI")
print(f"- 'setuptools.build_meta' is NOT a package on PyPI")
print(f"- It's a module within the setuptools package")
print(f"- Using it as a requirement string is misleading/incorrect")
EOF

Repository: NVIDIA-NeMo/RL

Length of output: 414


🏁 Script executed:

# Let's see if there are any comments or documentation in pyproject.toml explaining this
grep -B5 -A2 "setuptools.build_meta" pyproject.toml

Repository: NVIDIA-NeMo/RL

Length of output: 632


Remove or correct "setuptools.build_meta" in extra-build-dependencies.

The string "setuptools.build_meta" is not a valid PyPI package name. While PEP 508 parsing accepts it syntactically, uv will normalize it to "setuptools-build-meta" and fail to resolve it on PyPI (the package doesn't exist). This will cause causal-conv1d builds to fail.

setuptools.build_meta is a build backend module within the setuptools package, not a package itself. Since setuptools is already listed in the build dependency group (line 112) and is the correct build dependency for causal-conv1d, either remove this entry or replace it with "setuptools":

-causal-conv1d = [{ requirement = "torch", match-runtime = true }, "setuptools.build_meta"]
+causal-conv1d = [{ requirement = "torch", match-runtime = true }]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
causal-conv1d = [{ requirement = "torch", match-runtime = true }, "setuptools.build_meta"]
causal-conv1d = [{ requirement = "torch", match-runtime = true }]
🤖 Prompt for AI Agents
In pyproject.toml around line 232, the extras build-dependency entry includes
the invalid PyPI name "setuptools.build_meta"; remove that token or replace it
with the actual package name "setuptools" so the extras list resolves on PyPI
(e.g. change the causal-conv1d extras array to use "setuptools" or simply drop
the "setuptools.build_meta" item).

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yfw to review

@terrykong terrykong requested a review from yfw December 15, 2025 22:43
@terrykong terrykong changed the title Omni dataloader for HF models feat: Omni dataloader for HF models Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants