Skip to content

Commit 298b110

Browse files
authored
feat: add more caching methods (#1066)
1 parent 30a9113 commit 298b110

File tree

9 files changed

+2023
-144
lines changed

9 files changed

+2023
-144
lines changed

cache_dit.hpp

Lines changed: 975 additions & 0 deletions
Large diffs are not rendered by default.

docs/caching.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
## Caching
2+
3+
Caching methods accelerate diffusion inference by reusing intermediate computations when changes between steps are small.
4+
5+
### Cache Modes
6+
7+
| Mode | Target | Description |
8+
|------|--------|-------------|
9+
| `ucache` | UNET models | Condition-level caching with error tracking |
10+
| `easycache` | DiT models | Condition-level cache |
11+
| `dbcache` | DiT models | Block-level L1 residual threshold |
12+
| `taylorseer` | DiT models | Taylor series approximation |
13+
| `cache-dit` | DiT models | Combined DBCache + TaylorSeer |
14+
15+
### UCache (UNET Models)
16+
17+
UCache caches the residual difference (output - input) and reuses it when input changes are below threshold.
18+
19+
```bash
20+
sd-cli -m model.safetensors -p "a cat" --cache-mode ucache --cache-option "threshold=1.5"
21+
```
22+
23+
#### Parameters
24+
25+
| Parameter | Description | Default |
26+
|-----------|-------------|---------|
27+
| `threshold` | Error threshold for reuse decision | 1.0 |
28+
| `start` | Start caching at this percent of steps | 0.15 |
29+
| `end` | Stop caching at this percent of steps | 0.95 |
30+
| `decay` | Error decay rate (0-1) | 1.0 |
31+
| `relative` | Scale threshold by output norm (0/1) | 1 |
32+
| `reset` | Reset error after computing (0/1) | 1 |
33+
34+
#### Reset Parameter
35+
36+
The `reset` parameter controls error accumulation behavior:
37+
38+
- `reset=1` (default): Resets accumulated error after each computed step. More aggressive caching, works well with most samplers.
39+
- `reset=0`: Keeps error accumulated. More conservative, recommended for `euler_a` sampler.
40+
41+
### EasyCache (DiT Models)
42+
43+
Condition-level caching for DiT models. Caches and reuses outputs when input changes are below threshold.
44+
45+
```bash
46+
--cache-mode easycache --cache-option "threshold=0.3"
47+
```
48+
49+
#### Parameters
50+
51+
| Parameter | Description | Default |
52+
|-----------|-------------|---------|
53+
| `threshold` | Input change threshold for reuse | 0.2 |
54+
| `start` | Start caching at this percent of steps | 0.15 |
55+
| `end` | Stop caching at this percent of steps | 0.95 |
56+
57+
### Cache-DIT (DiT Models)
58+
59+
For DiT models like FLUX and QWEN, use block-level caching modes.
60+
61+
#### DBCache
62+
63+
Caches blocks based on L1 residual difference threshold:
64+
65+
```bash
66+
--cache-mode dbcache --cache-option "threshold=0.25,warmup=4"
67+
```
68+
69+
#### TaylorSeer
70+
71+
Uses Taylor series approximation to predict block outputs:
72+
73+
```bash
74+
--cache-mode taylorseer
75+
```
76+
77+
#### Cache-DIT (Combined)
78+
79+
Combines DBCache and TaylorSeer:
80+
81+
```bash
82+
--cache-mode cache-dit --cache-preset fast
83+
```
84+
85+
#### Parameters
86+
87+
| Parameter | Description | Default |
88+
|-----------|-------------|---------|
89+
| `Fn` | Front blocks to always compute | 8 |
90+
| `Bn` | Back blocks to always compute | 0 |
91+
| `threshold` | L1 residual difference threshold | 0.08 |
92+
| `warmup` | Steps before caching starts | 8 |
93+
94+
#### Presets
95+
96+
Available presets: `slow`, `medium`, `fast`, `ultra` (or `s`, `m`, `f`, `u`).
97+
98+
```bash
99+
--cache-mode cache-dit --cache-preset fast
100+
```
101+
102+
#### SCM Options
103+
104+
Steps Computation Mask controls which steps can be cached:
105+
106+
```bash
107+
--scm-mask "1,1,1,1,0,0,1,0,0,0,1,0,0,0,1,0,0,0,1,1"
108+
```
109+
110+
Mask values: `1` = compute, `0` = can cache.
111+
112+
| Policy | Description |
113+
|--------|-------------|
114+
| `dynamic` | Check threshold before caching |
115+
| `static` | Always cache on cacheable steps |
116+
117+
```bash
118+
--scm-policy dynamic
119+
```
120+
121+
### Performance Tips
122+
123+
- Start with default thresholds and adjust based on output quality
124+
- Lower threshold = better quality, less speedup
125+
- Higher threshold = more speedup, potential quality loss
126+
- More steps generally means more caching opportunities

examples/cli/README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,5 +127,12 @@ Generation Options:
127127
--skip-layers layers to skip for SLG steps (default: [7,8,9])
128128
--high-noise-skip-layers (high noise) layers to skip for SLG steps (default: [7,8,9])
129129
-r, --ref-image reference image for Flux Kontext models (can be used multiple times)
130-
--easycache enable EasyCache for DiT models with optional "threshold,start_percent,end_percent" (default: 0.2,0.15,0.95)
130+
--cache-mode caching method: 'easycache' (DiT), 'ucache' (UNET), 'dbcache'/'taylorseer'/'cache-dit' (DiT block-level)
131+
--cache-option named cache params (key=value format, comma-separated):
132+
- easycache/ucache: threshold=,start=,end=,decay=,relative=,reset=
133+
- dbcache/taylorseer/cache-dit: Fn=,Bn=,threshold=,warmup=
134+
Examples: "threshold=0.25" or "threshold=1.5,reset=0"
135+
--cache-preset cache-dit preset: 'slow'/'s', 'medium'/'m', 'fast'/'f', 'ultra'/'u'
136+
--scm-mask SCM steps mask: comma-separated 0/1 (1=compute, 0=can cache)
137+
--scm-policy SCM policy: 'dynamic' (default) or 'static'
131138
```

examples/cli/main.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,7 @@ int main(int argc, const char* argv[]) {
617617
gen_params.pm_style_strength,
618618
}, // pm_params
619619
ctx_params.vae_tiling_params,
620-
gen_params.easycache_params,
620+
gen_params.cache_params,
621621
};
622622

623623
results = generate_image(sd_ctx, &img_gen_params);
@@ -642,7 +642,7 @@ int main(int argc, const char* argv[]) {
642642
gen_params.seed,
643643
gen_params.video_frames,
644644
gen_params.vace_strength,
645-
gen_params.easycache_params,
645+
gen_params.cache_params,
646646
};
647647

648648
results = generate_video(sd_ctx, &vid_gen_params, &num_results);

0 commit comments

Comments
 (0)