diff --git a/_posts/2025-08-18-diff-distill.md b/_posts/2025-08-18-diff-distill.md index 38e0745..9881610 100644 --- a/_posts/2025-08-18-diff-distill.md +++ b/_posts/2025-08-18-diff-distill.md @@ -94,10 +94,6 @@ We provide some popular instances We ignore the diffusion models wit The simplest form of conditional probability path is $$\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$$ with the corresponding default conditional velocity field OT target $$v(\mathbf{x}_t, t \vert \mathbf{x}_0)=\mathbb{E}[\dot{\mathbf{x}}_t\vert \mathbf{x}_0]=\mathbf{x}_1- \mathbf{x}_0.$$ -Borrowed from this [slide](https://rectifiedflow.github.io/assets/slides/icml_07_distillation.pdf) at ICML2025, the objectives of ODE distillation have been categorized into three cases, i.e., (a) **forward loss**, (b) **backward loss** and (c) **self-consistency loss**. - - - Training: Since minimizing the conditional Flow Matching (FM) loss is equivalent to minimize the marginal FM loss, the optimization problem becomes $$ @@ -118,6 +114,8 @@ At its core, ODE distillation boils down to how to strategically construct the t In the context of distillation, the forward direction $$s sampling, the conditional probability path is traversed twice. In our flow map formulation, this can be replaced with the flow maps $$f_{\tau_i\to 0}(\mathbf{x}_{\tau_i}, \tau_i, 0), f_{0\to \tau_{i-1}}(\mathbf{x}_0, 0, \tau_{i-1})$$ where $$0<\tau_{i-1}<\tau_i<1$$. Intuitively, the flow map $$f_{t\to s}(\mathbf{x}_t, t, s)$$ represents a direct mapping of some **displacement field** where $$F_{t\to s}(\mathbf{x}_t, t, s)$$ measures the increment which corresponds to a **velocity field**. +Our unified framework is closely resembles the flow map, which transports points along trajectories of solutions to a probability flow ODE system. We provide some new insights on how this framework can connect with many popular distillation methods nowadays. Based on the [slide](https://rectifiedflow.github.io/assets/slides/icml_07_distillation.pdf), the objectives of ODE trajectory distillation have been categorized into three cases, i.e., (a) **forward loss**, (b) **backward loss** and (c) **self-consistency loss**. In the context of self-distilling a flow map model $$f_{t\to s}(\mathbf{x}_t, t, s)$$ from scratch, these objectives correspond to equivalent formulations under different names, (a) **Lagrangian Map Distillation loss** (b) **Eulerian Map Distillation loss** and (c) **Progressive self-distillation loss**. + ### MeanFlow MeanFlow can be trained from scratch or distilled from a pretrained FM model. The conditional probability path is defined as the linear interpolation between noise and data $$\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$$ with the corresponding default conditional velocity field OT target $$v(\mathbf{x}_t, t \vert \mathbf{x}_0)=\mathbf{x}_1- \mathbf{x}_0.$$ The main contribution consists of identifying and defining an **average velocity field** which coincides with our flow map as