Skip to content

azmozaffari/GuidedDiffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GuidedDiffusion

Recently, it has been shown in the literature that using CLIP as a classifier to guide generative models like GAN and StyleGAN with text can lead to interesting results. Here, I've implemented a part of DiffusionClip paper that proposed to use CLIP as a classifier to guide a denoising diffusion model.

Here, to generate the samples with guided text and clip classifier, I only fine-tune the model for 2 epochs. The samples are generated during the fine-tuning phase. That means to generate every new sample, we have to fine-tune the model for that sample first.

To use fewer GPU resources, the GPU-efficient fine-tuning approach in DiffusionClip is only implemented. The diagram below shows the difference between the two classifier-guided approaches proposed in DiffusionClip. The diagram is selected from DiffusionClip.

Alt text

Here, I also compare the generative results for using Emonet versus CLIP as the classifier to guide the diffusion model. As we can see, not all the classifiers are suitable to guide a generative model.

A part of this code is grabbed from DiffusionClip original implementation. The DDPM implementation is downloaded from https://github.com/explainingai-code/DDPM-Pytorch/tree/main.

Quick Start

Configuration

  • celeba.yml - Allows you to play with different components of the Guided Diffusion model

Results

CLIP results:

  • original samples from CelebA dataset

    1 2 3 4
  • Results after fine-tuning the diffusion model with CLIP classifier with guided text: "fearful face"

    1 2 3 4
  • Results after fine-tuning the diffusion model with CLIP classifier with guided text: "happy face"

    1 2 3 4

Emonet results:

To run the model for emonet, in celeba.yml, change the classifier part to emonet from clip

  • Results after fine-tuning the diffusion model with Emonet classifier to generate the images that maximizes the Fear output of the model.

    1 2 3 4
  • Results after fine-tuning the diffusion model with Emonet classifier to generate the images that maximizes the Happy output of the model.

    1 2 3 4

Discussion

Not all the classifiers can guide the diffusion model effectively. CLIP as trained on a huge dataset, has better resistance to adversaries and can guide the model better than Emonet. We should know that x0 predicted in each step of the diffusion model is guided to maximize the classifier score. Then, the classifier that is more robust to noisy images and adversaries leads to better results in this case.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages