Yu Liu3, Wenyuan Yu3, Jiangsu Du2, Wei Lin3†, Yang You1†
SRDiffusion is a novel video diffusion inference framework that reduces computation costs through Sketching-Rendering Cooperation between large and small models:
-
The large model handles high-noise steps, preserving semantic and motion fidelity (Sketching).
-
The small model processes low-noise steps to refine visual details (Rendering).
SRDiffusion achieves up to 3× speedup on Wan and 2× speedup on CogVideoX, with minimal quality degradation. And it is complementary to other optimization techniques.
-
Follow the Wan2.1 setup guide to set up the environment and download the 14B and 1.3B models.
-
Run the following command to generate a video:
python ./wan/srd_generate.py --task t2v-14B --size 832*480 \ --ckpt_dir ./Wan2.1-T2V-14B --rendering_ckpt_dir ./Wan2.1-T2V-1.3B \ --offload_model True --t5_cpu \ --self_diff_threshold 0.01 \ --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
-
Follow the CogVideo setup guide to prepare the environment.
-
Run the following command to generate a video:
python cli_demo_srd.py --seed 42 --num_frames 49 --fps 8 \ --self_diff_threshold 0.01 \ --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage."
This repository is built upon Wan2.1 and CogVideoX. We sincerely thank the contributors of these projects for their excellent work.
For contribution guidelines, please refer to CONTRIBUTING.md.
SRDiffusion is developed by Alibaba Group and NUS HPC-AI Lab. This work is supported by Alibaba Innovative Research(AIR).
SRDiffusion is licensed under the Apache License (Version 2.0). See the LICENSE file for more details. This project also includes third-party test cases released under other open-source licenses. Please refer to the NOTICE file for more information.
If you find SRDiffusion useful for you, please consider starring the project ⭐ and citing it using the following BibTeX entry:
@misc{cheng2025srdiffusion,
title={SRDiffusion: Accelerate Video Diffusion Inference via Sketching-Rendering Cooperation},
author={Shenggan Cheng and Yuanxin Wei and Lansong Diao and Yong Liu and Bujiao Chen and Lianghua Huang and Yu Liu and Wenyuan Yu and Jiangsu Du and Wei Lin and Yang You},
year={2025},
eprint={2505.19151},
archivePrefix={arXiv},
primaryClass={cs.GR},
url={https://arxiv.org/abs/2505.19151},
}
