Skip to content
975 changes: 975 additions & 0 deletions cache_dit.hpp

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions docs/caching.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
## Caching

Caching methods accelerate diffusion inference by reusing intermediate computations when changes between steps are small.

### Cache Modes

| Mode | Target | Description |
|------|--------|-------------|
| `ucache` | UNET models | Condition-level caching with error tracking |
| `easycache` | DiT models | Condition-level cache |
| `dbcache` | DiT models | Block-level L1 residual threshold |
| `taylorseer` | DiT models | Taylor series approximation |
| `cache-dit` | DiT models | Combined DBCache + TaylorSeer |

### UCache (UNET Models)

UCache caches the residual difference (output - input) and reuses it when input changes are below threshold.

```bash
sd-cli -m model.safetensors -p "a cat" --cache-mode ucache --cache-option "threshold=1.5"
```

#### Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `threshold` | Error threshold for reuse decision | 1.0 |
| `start` | Start caching at this percent of steps | 0.15 |
| `end` | Stop caching at this percent of steps | 0.95 |
| `decay` | Error decay rate (0-1) | 1.0 |
| `relative` | Scale threshold by output norm (0/1) | 1 |
| `reset` | Reset error after computing (0/1) | 1 |

#### Reset Parameter

The `reset` parameter controls error accumulation behavior:

- `reset=1` (default): Resets accumulated error after each computed step. More aggressive caching, works well with most samplers.
- `reset=0`: Keeps error accumulated. More conservative, recommended for `euler_a` sampler.

### EasyCache (DiT Models)

Condition-level caching for DiT models. Caches and reuses outputs when input changes are below threshold.

```bash
--cache-mode easycache --cache-option "threshold=0.3"
```

#### Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `threshold` | Input change threshold for reuse | 0.2 |
| `start` | Start caching at this percent of steps | 0.15 |
| `end` | Stop caching at this percent of steps | 0.95 |

### Cache-DIT (DiT Models)

For DiT models like FLUX and QWEN, use block-level caching modes.

#### DBCache

Caches blocks based on L1 residual difference threshold:

```bash
--cache-mode dbcache --cache-option "threshold=0.25,warmup=4"
```

#### TaylorSeer

Uses Taylor series approximation to predict block outputs:

```bash
--cache-mode taylorseer
```

#### Cache-DIT (Combined)

Combines DBCache and TaylorSeer:

```bash
--cache-mode cache-dit --cache-preset fast
```

#### Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `Fn` | Front blocks to always compute | 8 |
| `Bn` | Back blocks to always compute | 0 |
| `threshold` | L1 residual difference threshold | 0.08 |
| `warmup` | Steps before caching starts | 8 |

#### Presets

Available presets: `slow`, `medium`, `fast`, `ultra` (or `s`, `m`, `f`, `u`).

```bash
--cache-mode cache-dit --cache-preset fast
```

#### SCM Options

Steps Computation Mask controls which steps can be cached:

```bash
--scm-mask "1,1,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,1"
```

Mask values: `1` = compute, `0` = can cache.

| Policy | Description |
|--------|-------------|
| `dynamic` | Check threshold before caching |
| `static` | Always cache on cacheable steps |

```bash
--scm-policy dynamic
```

### Performance Tips

- Start with default thresholds and adjust based on output quality
- Lower threshold = better quality, less speedup
- Higher threshold = more speedup, potential quality loss
- More steps generally means more caching opportunities
9 changes: 8 additions & 1 deletion examples/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,5 +127,12 @@ Generation Options:
--skip-layers layers to skip for SLG steps (default: [7,8,9])
--high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9])
-r, --ref-image reference image for Flux Kontext models (can be used multiple times)
--easycache enable EasyCache for DiT models with optional "threshold,start_percent,end_percent" (default: 0.2,0.15,0.95)
--cache-mode caching method: 'easycache' (DiT), 'ucache' (UNET), 'dbcache'/'taylorseer'/'cache-dit' (DiT block-level)
--cache-option named cache params (key=value format, comma-separated):
- easycache/ucache: threshold=,start=,end=,decay=,relative=,reset=
- dbcache/taylorseer/cache-dit: Fn=,Bn=,threshold=,warmup=
Examples: "threshold=0.25" or "threshold=1.5,reset=0"
--cache-preset cache-dit preset: 'slow'/'s', 'medium'/'m', 'fast'/'f', 'ultra'/'u'
--scm-mask SCM steps mask: comma-separated 0/1 (1=compute, 0=can cache)
--scm-policy SCM policy: 'dynamic' (default) or 'static'
```
4 changes: 2 additions & 2 deletions examples/cli/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ int main(int argc, const char* argv[]) {
gen_params.pm_style_strength,
}, // pm_params
ctx_params.vae_tiling_params,
gen_params.easycache_params,
gen_params.cache_params,
};

results = generate_image(sd_ctx, &img_gen_params);
Expand All @@ -642,7 +642,7 @@ int main(int argc, const char* argv[]) {
gen_params.seed,
gen_params.video_frames,
gen_params.vace_strength,
gen_params.easycache_params,
gen_params.cache_params,
};

results = generate_video(sd_ctx, &vid_gen_params, &num_results);
Expand Down
Loading
Loading