Hengrui Hu . Kaining Ying · Henghui Ding ✉️
Fudan University, China
This work addresses the task of multi-shot semi-supervised video object segmentation (MVOS), which requires segmenting the target object indicated by an initial mask in a video containing multiple shots and different shot transitions. We propose a transition mimicking data augmentation strategy (TMA) that enables cross-shot generalization using single-shot data, and a transition-aware method, SAAS, which detects and comprehends shot transitions during inference. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions.
The extended version paper with complete technical appendix is Here.
This repository is built upon and extends the open-source Segment Anything Model 2 (SAM 2) project released by Meta Platforms, Inc (https://github.com/facebookresearch/sam2). It incorporates portions of the original source code and inherits several project files from the upstream repository, including:
CODE_OF_CONDUCT.md
MANIFEST.in
LICENSE
LICENCE_cctorch.
All reused components retain their original copyright and license notices as required by the upstream license.
This repository further introduces modifications, extensions, and additional functionalities developed for research purposes.
The extended repository as a whole is distributed under the terms of the Apache License, Version 2.0.
⏳ Release the pretrained weights.
⏳ Release the code and model on Huggingface.
⏳ Release the CONTRIBUTING.md.
⏳ Release the complete training configuration and code.
Cut-VOS, a complex multi-shot video object segmentation benchmark is available now! It Contains 100 high-quality videos, 174 objects of different categories and 648 valid shots.
Some visualized cases:
![]() YouTube Link |
![]() YouTube Link |
Please download the complete benchmark from huggingface 🤗 or Google Drive ☁️.
Firstly you need to clone this repo and install specific version of torch in your device. A new environment is recommanded to create.
git clone https://github.com/FudanCVL/SAAS.git
conda create -n saas python=3.10 -y
conda activate saas
# depending on your own CUDA version
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126Just like SAM2 model, the method needs to be installed first before use. You can install SAAS on a GPU machine using:
cd SAAS
pip install -e .A packege for fast MST operation need to be installed as well. The code is adapted from the TreeFilter-Torch Repo: https://github.com/Megvii-BaseDetection/TreeFilter-Torch/
cd sam2/modeling/TreeGeneration/lib_tree_filter
python setup.py build develop
cd ../../../../Coming soon.
The pretrained and model weights can be download now⬇️.
| SAAS_b+_ytvos_tma.pt | huggingface 🤗 | Google Drive ☁️ |
| SAAS_l_ytvos_tma.pt | huggingface 🤗 | Google Drive ☁️ |
More weights are coming soon !
The inference under semi-supervised VOS setting are realized in tools/vos_inference.py.
python tools/vos_inference.py \
--sam2_cfg PATH_TO_YAML_FILE \
--sam2_checkpoint PATH_TO_YOUR_MODEL \
--base_video_dir PATH_TO_JPEGFRAMES \
--input_mask_dir PATH_TO_ANNOTATIONS \
--output_mask_dir PATH_TO_OUTPUT \Asuming your dataset is placed under the ./data/ directory, and you are using our default base-plus model
to run Cut-VOS evaluation, you can use the following commands:
python tools/vos_inference.py \
--sam2_cfg configs/saas/saas_hiera_b+.yaml \
--sam2_checkpoint checkpoints/SAAS_b+_ytvos_tma.pt \
--base_video_dir data/Cut-VOS/frames \
--input_mask_dir data/Cut-VOS/masks \
--output_mask_dir output/Cut-VOS/SAAS-B+ \If you want to test our model on your own datasets which have multiple mask folders for different objects in each video, remeber to use --per_obj_png_file option. And --track_object_appearing_later_in_video is used for samples that object doesn't appear in the first frame.
The evaluation protocal is provided in evaluation/evaluation_mvos.py.
python evaluation/evalutation_rich.py \
--seg_dir output/Cut-VOS/SAAS-B+ \
--ann_dir data/Cut-VOS/masks \
--trans_dir data/Cut-VOS/transitions.json \
--output_csv temp.csv \
--num_threads 15 We would like to express our gratitude to some other projects that have contributed to our work:
If you find our paper and dataset useful for your research, please generously cite our paper.
@inproceedings{SAAS2025,
title={{S}egment {A}nything {A}cross {S}hots: {A} Method and {B}enchmark},
author={Hu, Hengrui and Ying, Kaining and Ding, Henghui},
booktitle={AAAI},
year={2026}
}


