Skip to content

StreamDiffusionWebcam: A Solution for Real-Time Diffusion Generation

License

Notifications You must be signed in to change notification settings

ErenalpCet/StreamDiffusionWebcam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StreamDiffusionWebcam

StreamDiffusionWebcam — A solution for real-time diffusion-based image generation from a webcam feed.

Authors: Erenalp Çetintürk

This repository is a fork/extension of the original StreamDiffusion example that uses the system screen as input. This project adapts the pipeline to use a webcam (camera) as the input source so you can perform real-time image transformations using diffusion models.

Original project: cumulo-autumn/StreamDiffusion - examples/screen/main.py — many thanks to the original authors for the code and permissive usage.

Table of contents

  • Overview
  • Features
  • Requirements
  • Installation
  • Quick start / Usage
  • Configuration and CLI options
  • Performance & tuning tips
  • Troubleshooting
  • Contributing
  • Acknowledgements
  • License

Overview

StreamDiffusionWebcam captures frames from a webcam, converts them into tensors, and feeds them into a StreamDiffusion model in img2img mode to produce stylized/generated frames in (near) real time. Output frames are displayed in a viewer process.

Features

  • Webcam input (replaces original screen-capture example)
  • Uses StreamDiffusion for image-to-image diffusion generation
  • Supports acceleration options (xformers / TensorRT)
  • Frame buffering for batching
  • Prompt / negative-prompt customization via CLI
  • Simple viewer process to display generated frames and FPS

Requirements

  • Linux / Windows / macOS with a working webcam
  • NVIDIA GPU recommended for real-time performance (RTX series suggested)
  • Python 3.10 (tested with 3.10)
  • CUDA and NVIDIA drivers appropriate for your chosen PyTorch build
  • Conda recommended for environment management

Installation

  1. Clone repository
git clone https://github.com/ErenalpCet/StreamDiffusionWebcam.git
cd StreamDiffusionWebcam
  1. Create and activate a conda environment (recommended)
conda create -n webcamdiffusion python=3.10 -y
conda activate webcamdiffusion
  1. Install the correct PyTorch build for your CUDA version. Choose the matching index URL for your system:

CUDA 11.8:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu118

CUDA 12.1:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu121

CUDA 12.4:

pip3 install torch==2.1.0 torchvision==0.16.0 xformers --index-url https://download.pytorch.org/whl/cu124
  1. Install StreamDiffusion (and the optional tensorrt extras)
pip install git+https://github.com/cumulo-autumn/StreamDiffusion.git@main#egg=streamdiffusion[tensorrt]
  1. (Optional) Install TensorRT helpers required for Tensorrt acceleration:
python -m streamdiffusion.tools.install-tensorrt
  1. Install other dependencies:
pip install -r requirements.txt

Usage — Quick start

The main script uses python-fire for CLI arguments. The simplest way to run with default parameters:

python3 main.py

Example: run with a custom prompt and a specific model

python3 main.py --model_id_or_path="KBlueLeaf/kohaku-v2.1" \
  --prompt="a boy with black short hair, smiling, brown eyes, wearing glasses" \
  --negative_prompt="low quality, bad quality, blurry, low resolution" \
  --frame_buffer_size=1 --width=512 --height=512

Configuration and CLI options

All parameters available in main.py are exposed through the CLI via fire. Important options:

  • model_id_or_path (str): Model repo or local model path (default: "KBlueLeaf/kohaku-v2.1")
  • lora_dict (dict, optional): Dictionary of LoRA weights and scales, e.g. '{"lora_name":0.5}'
  • prompt (str): Positive prompt text
  • negative_prompt (str): Negative prompt text
  • frame_buffer_size (int): Frames batched for generation. 1 for real-time single-frame processing.
  • width, height (int): Output resolution (default 512x512)
  • acceleration (str): "none", "xformers", or "tensorrt"
  • use_denoising_batch (bool): Use denoising batches (True/False)
  • seed (int): Random seed
  • cfg_type (str): "none", "full", "self", "initialize" (as used by StreamDiffusionWrapper)
  • guidance_scale (float): Guidance scale for the model
  • delta (float): Delta parameter used by StreamDiffusionWrapper
  • do_add_noise (bool): Add noise to inputs
  • enable_similar_image_filter (bool): Skip generation if image is too similar to previous frame
  • similar_image_filter_threshold (float)
  • similar_image_filter_max_skip_frame (int)

To view all parameters and defaults:

python3 main.py --help

Notes on the code

  • main.py spawns a process to run the StreamDiffusion model, a process to set the monitor size, and a viewer process to receive and display generated frames.
  • The webcam capture runs in a thread in the image generation process and places tensor frames into a global inputs list consumed by the generator.
  • The script uses multiprocessing with 'spawn' context to improve cross-platform compatibility.

Performance & tuning tips

  • Recommended GPU: NVIDIA RTX 30/40 series. CPU-only will be very slow.
  • For best throughput, use xformers if available (acceleration="xformers"). Ensure xformers matches your torch build.
  • TensorRT can give much better performance but requires correct setup and may need additional tuning.
  • Lower resolution (e.g., 384x384) and smaller frame_buffer_size can increase FPS.
  • Increase frame_buffer_size to batch multiple frames if memory and model support it.
  • Warmup steps are used in the wrapper; initial frames may be slower.
  • If you want consistent results across runs, set seed.

Troubleshooting

  • Webcam not detected:
    • Ensure webcam drivers are installed.
    • Check device index in cv2.VideoCapture(0); try other indices (1, 2...).
    • On Linux, ensure the user has permissions for /dev/video* (use v4l2-ctl --list-devices).
  • Low FPS:
    • Verify GPU is used (nvidia-smi).
    • Try lowering resolution or switching acceleration modes.
    • Ensure xformers and torch are compatible and installed correctly.
  • Torch / CUDA errors:
    • Match your torch wheel to your CUDA and OS. Reinstall the appropriate wheel if needed.
  • Tensorrt errors:
    • Ensure TensorRT and its Python bindings are installed and compatible with your libcudnn and CUDA.
  • Python warnings from upstream libs:
    • The script suppresses several FutureWarning/UserWarning lines — these are usually harmless.

Security & privacy

  • Webcam input is processed locally — no webcam frames are uploaded by the script by default.
  • If you modify the code to log or transmit frames, be mindful of privacy and legal considerations.

Contributing

  • Contributions are welcome. Please raise issues for problems or feature requests.
  • If you propose code changes, follow standard GitHub fork/branch/PR workflow.

Acknowledgements

  • This project is based on and adapted from the StreamDiffusion repository and example code (https://github.com/cumulo-autumn/StreamDiffusion). Huge thanks to the authors for their work and permissive sharing of code.
  • Many thanks to the open-source community for PyTorch, xformers, diffusers, and related tooling.

License

  • This repository inherits licenses from the upstream StreamDiffusion project and any models you download. Be sure to check license terms for any model weights you use.
  • Include your chosen license file in the repo if you want to specify a license explicitly (MIT, Apache-2.0, etc.).

Contact

About

StreamDiffusionWebcam: A Solution for Real-Time Diffusion Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages