Skip to content

Conversation

@hhaAndroid
Copy link
Collaborator

@hhaAndroid hhaAndroid commented Dec 17, 2025

Previously, add_vision_id only took effect when the number of images or videos was greater than 1. Now it has been changed to always take effect.

Note: Breaking Change

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the add_vision_id logic to always take effect when enabled, removing the previous restriction that only applied it when multiple images or videos were present. This is a breaking change that aligns the behavior with HuggingFace's official implementation.

Key Changes:

  • Vision IDs (e.g., "Picture 1:", "Video 1:") are now always added when add_vision_id=True, even for single images/videos
  • Global counters (total_img_cnt, total_video_cnt) now track vision element numbering across entire conversations instead of per-message
  • Comprehensive test coverage added for both single and multi-image/video scenarios with the new behavior

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
xtuner/v1/datasets/mllm_tokenize_fn/qwen3_vl_tokenize_fn.py Added global total_video_cnt counter and updated video token replacement to always apply vision IDs when enabled; removed outdated comment explaining the old behavior
xtuner/v1/datasets/mllm_tokenize_fn/base_mllm_tokenize_fn.py Added global total_img_cnt counter and updated image token replacement to always apply vision IDs when enabled
tests/resource/mllm_sft_multi_image_example_data.jsonl Added new test case (id: 11) with images distributed across multiple messages to test global counter behavior
tests/datasets/test_qwen3_vl_tokenize_fn.py Parametrized tests to cover both add_vision_id=True and False cases; updated test logic to properly validate single-image scenarios and removed conditional logic that disabled vision IDs for single videos

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants