GitHub - FudanCVL/SAAS: [AAAI 2026] Segment Anything Across Shots: A Method and Benchmark

Segment Anything Across Shots:
A Method and Benchmark

Hengrui Hu . Kaining Ying · Henghui Ding^✉️

Fudan University, China

AAAI 2026, Singpore EXPO

This work addresses the task of multi-shot semi-supervised video object segmentation (MVOS), which requires segmenting the target object indicated by an initial mask in a video containing multiple shots and different shot transitions. We propose a transition mimicking data augmentation strategy (TMA) that enables cross-shot generalization using single-shot data, and a transition-aware method, SAAS, which detects and comprehends shot transitions during inference. To support evaluation and future study in MVOS, we introduce Cut-VOS, a new MVOS benchmark with dense mask annotations, diverse object categories, and high-frequency transitions.

The extended version paper with complete technical appendix is Here.

⚠️ Import Notice ⚠️

This repository is built upon and extends the open-source Segment Anything Model 2 (SAM 2) project released by Meta Platforms, Inc (https://github.com/facebookresearch/sam2). It incorporates portions of the original source code and inherits several project files from the upstream repository, including: CODE_OF_CONDUCT.md MANIFEST.in LICENSE LICENCE_cctorch.

All reused components retain their original copyright and license notices as required by the upstream license.

This repository further introduces modifications, extensions, and additional functionalities developed for research purposes.

The extended repository as a whole is distributed under the terms of the Apache License, Version 2.0.

🛠️ TODO List 🛠️

⏳ Release the pretrained weights.

⏳ Release the code and model on Huggingface.

⏳ Release the CONTRIBUTING.md.

⏳ Release the complete training configuration and code.

🧪 Datascope 🧪

Cut-VOS, a complex multi-shot video object segmentation benchmark is available now! It Contains 100 high-quality videos, 174 objects of different categories and 648 valid shots.

Some visualized cases:

YouTube Link

Please download the complete benchmark from huggingface 🤗 or Google Drive ☁️.

📦 Installation 📦

Firstly you need to clone this repo and install specific version of torch in your device. A new environment is recommanded to create.

git clone https://github.com/FudanCVL/SAAS.git

conda create -n saas python=3.10 -y
conda activate saas
# depending on your own CUDA version
pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/cu126

Just like SAM2 model, the method needs to be installed first before use. You can install SAAS on a GPU machine using:

cd SAAS
pip install -e .

A packege for fast MST operation need to be installed as well. The code is adapted from the TreeFilter-Torch Repo: https://github.com/Megvii-BaseDetection/TreeFilter-Torch/

cd sam2/modeling/TreeGeneration/lib_tree_filter
python setup.py build develop
cd ../../../../

Getting Started

Training:

Coming soon.

Inference & Evaluation

The pretrained and model weights can be download now⬇️.


SAAS_b+_ytvos_tma.pt	huggingface 🤗	Google Drive ☁️
SAAS_l_ytvos_tma.pt	huggingface 🤗	Google Drive ☁️

More weights are coming soon !

The inference under semi-supervised VOS setting are realized in tools/vos_inference.py.

python tools/vos_inference.py \
    --sam2_cfg PATH_TO_YAML_FILE   \
    --sam2_checkpoint PATH_TO_YOUR_MODEL \
    --base_video_dir PATH_TO_JPEGFRAMES \
    --input_mask_dir PATH_TO_ANNOTATIONS \
    --output_mask_dir PATH_TO_OUTPUT \

Asuming your dataset is placed under the ./data/ directory, and you are using our default base-plus model to run Cut-VOS evaluation, you can use the following commands:

python tools/vos_inference.py \
    --sam2_cfg configs/saas/saas_hiera_b+.yaml   \
    --sam2_checkpoint checkpoints/SAAS_b+_ytvos_tma.pt \
    --base_video_dir data/Cut-VOS/frames \
    --input_mask_dir data/Cut-VOS/masks \
    --output_mask_dir output/Cut-VOS/SAAS-B+ \

If you want to test our model on your own datasets which have multiple mask folders for different objects in each video, remeber to use --per_obj_png_file option. And --track_object_appearing_later_in_video is used for samples that object doesn't appear in the first frame.

The evaluation protocal is provided in evaluation/evaluation_mvos.py.

python evaluation/evalutation_rich.py \
    --seg_dir output/Cut-VOS/SAAS-B+ \
    --ann_dir data/Cut-VOS/masks \
    --trans_dir data/Cut-VOS/transitions.json \
    --output_csv temp.csv \
    --num_threads 15

Acknowledgments

We would like to express our gratitude to some other projects that have contributed to our work:

TransNetV2 & TreeEnergyLoss

Citation

If you find our paper and dataset useful for your research, please generously cite our paper.

@inproceedings{SAAS2025,
    title={{S}egment {A}nything {A}cross {S}hots: {A} Method and {B}enchmark}, 
    author={Hu, Hengrui and Ying, Kaining and Ding, Henghui},
    booktitle={AAAI},
    year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
demo		demo
sam2		sam2
tools		tools
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
LICENSE_cctorch		LICENSE_cctorch
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Segment Anything Across Shots:
A Method and Benchmark

⚠️ Import Notice ⚠️

🛠️ TODO List 🛠️

🧪 Datascope 🧪

📦 Installation 📦

Getting Started

Training:

Inference & Evaluation

Acknowledgments

Citation

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

FudanCVL/SAAS

Folders and files

Latest commit

History

Repository files navigation

Segment Anything Across Shots: A Method and Benchmark

⚠️ Import Notice ⚠️

🛠️ TODO List 🛠️

🧪 Datascope 🧪

📦 Installation 📦

Getting Started

Training:

Inference & Evaluation

Acknowledgments

Citation

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Segment Anything Across Shots:
A Method and Benchmark

Packages