StreamDiffusionWebcam — A solution for real-time diffusion-based image generation from a webcam feed.
Authors: Erenalp Çetintürk
This repository is a fork/extension of the original StreamDiffusion example that uses the system screen as input. This project adapts the pipeline to use a webcam (camera) as the input source so you can perform real-time image transformations using diffusion models.
Original project: cumulo-autumn/StreamDiffusion - examples/screen/main.py — many thanks to the original authors for the code and permissive usage.
Table of contents
- Overview
- Features
- Requirements
- Installation
- Quick start / Usage
- Configuration and CLI options
- Performance & tuning tips
- Troubleshooting
- Contributing
- Acknowledgements
- License
StreamDiffusionWebcam captures frames from a webcam, converts them into tensors, and feeds them into a StreamDiffusion model in img2img mode to produce stylized/generated frames in (near) real time. Output frames are displayed in a viewer process.
- Webcam input (replaces original screen-capture example)
- Uses StreamDiffusion for image-to-image diffusion generation
- Supports acceleration options (xformers / TensorRT)
- Frame buffering for batching
- Prompt / negative-prompt customization via CLI
- Simple viewer process to display generated frames and FPS
- Linux / Windows / macOS with a working webcam
- NVIDIA GPU recommended for real-time performance (RTX series suggested)
- Python 3.10 (tested with 3.10)
- CUDA and NVIDIA drivers appropriate for your chosen PyTorch build
- Conda recommended for environment management
- Clone repository
git clone https://github.com/ErenalpCet/StreamDiffusionWebcam.git
cd StreamDiffusionWebcam- Create and activate a conda environment (recommended)
conda create -n webcamdiffusion python=3.10 -y
conda activate webcamdiffusion- Install the correct PyTorch build for your CUDA version. Choose the matching index URL for your system:
CUDA 11.8:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118CUDA 12.1:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121CUDA 12.4:
pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu124- Install StreamDiffusion (and the optional tensorrt extras)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]- (Optional) Install TensorRT helpers required for Tensorrt acceleration:
python -m streamdiffusion.tools.install-tensorrt- Install other dependencies:
pip install -r requirements.txtThe main script uses python-fire for CLI arguments. The simplest way to run with default parameters:
python3 main.pyExample: run with a custom prompt and a specific model
python3 main.py --model_id_or_path="KBlueLeaf/kohaku-v2.1" \
--prompt="a boy with black short hair, smiling, brown eyes, wearing glasses" \
--negative_prompt="low quality, bad quality, blurry, low resolution" \
--frame_buffer_size=1 --width=512 --height=512All parameters available in main.py are exposed through the CLI via fire. Important options:
- model_id_or_path (str): Model repo or local model path (default: "KBlueLeaf/kohaku-v2.1")
- lora_dict (dict, optional): Dictionary of LoRA weights and scales, e.g. '{"lora_name":0.5}'
- prompt (str): Positive prompt text
- negative_prompt (str): Negative prompt text
- frame_buffer_size (int): Frames batched for generation. 1 for real-time single-frame processing.
- width, height (int): Output resolution (default 512x512)
- acceleration (str): "none", "xformers", or "tensorrt"
- use_denoising_batch (bool): Use denoising batches (True/False)
- seed (int): Random seed
- cfg_type (str): "none", "full", "self", "initialize" (as used by StreamDiffusionWrapper)
- guidance_scale (float): Guidance scale for the model
- delta (float): Delta parameter used by StreamDiffusionWrapper
- do_add_noise (bool): Add noise to inputs
- enable_similar_image_filter (bool): Skip generation if image is too similar to previous frame
- similar_image_filter_threshold (float)
- similar_image_filter_max_skip_frame (int)
To view all parameters and defaults:
python3 main.py --help- main.py spawns a process to run the StreamDiffusion model, a process to set the monitor size, and a viewer process to receive and display generated frames.
- The webcam capture runs in a thread in the image generation process and places tensor frames into a global
inputslist consumed by the generator. - The script uses multiprocessing with 'spawn' context to improve cross-platform compatibility.
- Recommended GPU: NVIDIA RTX 30/40 series. CPU-only will be very slow.
- For best throughput, use xformers if available (
acceleration="xformers"). Ensure xformers matches your torch build. - TensorRT can give much better performance but requires correct setup and may need additional tuning.
- Lower resolution (e.g., 384x384) and smaller
frame_buffer_sizecan increase FPS. - Increase
frame_buffer_sizeto batch multiple frames if memory and model support it. - Warmup steps are used in the wrapper; initial frames may be slower.
- If you want consistent results across runs, set
seed.
- Webcam not detected:
- Ensure webcam drivers are installed.
- Check device index in
cv2.VideoCapture(0); try other indices (1, 2...). - On Linux, ensure the user has permissions for
/dev/video*(usev4l2-ctl --list-devices).
- Low FPS:
- Verify GPU is used (nvidia-smi).
- Try lowering resolution or switching acceleration modes.
- Ensure xformers and torch are compatible and installed correctly.
- Torch / CUDA errors:
- Match your torch wheel to your CUDA and OS. Reinstall the appropriate wheel if needed.
- Tensorrt errors:
- Ensure TensorRT and its Python bindings are installed and compatible with your libcudnn and CUDA.
- Python warnings from upstream libs:
- The script suppresses several FutureWarning/UserWarning lines — these are usually harmless.
- Webcam input is processed locally — no webcam frames are uploaded by the script by default.
- If you modify the code to log or transmit frames, be mindful of privacy and legal considerations.
- Contributions are welcome. Please raise issues for problems or feature requests.
- If you propose code changes, follow standard GitHub fork/branch/PR workflow.
- This project is based on and adapted from the StreamDiffusion repository and example code (https://github.com/cumulo-autumn/StreamDiffusion). Huge thanks to the authors for their work and permissive sharing of code.
- Many thanks to the open-source community for PyTorch, xformers, diffusers, and related tooling.
- This repository inherits licenses from the upstream StreamDiffusion project and any models you download. Be sure to check license terms for any model weights you use.
- Include your chosen license file in the repo if you want to specify a license explicitly (MIT, Apache-2.0, etc.).
- Author: Erenalp Çetintürk
- GitHub: https://github.com/ErenalpCet