explainingai-code / ddpm-pytorch Goto Github PK

24.0 2.0 6.0 26 KB

This repo implements Denoising Diffusion Probabilistic Models (DDPM) in Pytorch

Python 100.00%

ddpm denoising-diffusion-models diffusion-model diffusion-models denoising-diffusion-probabilistic-models

ddpm-pytorch's Introduction

Denoising Diffusion Probabilistic Models Implementation in Pytorch

This repository implements DDPM with training and sampling methods of DDPM and unet architecture mimicking the stable diffusion unet used in diffusers library from huggingface from scratch.

DDPM Explanation Videos

Sample Output by trained DDPM on Mnist

Data preparation

For setting up the mnist dataset:

Follow - https://github.com/explainingai-code/Pytorch-VAE#data-preparation

Training on your own images

For this one would need to make the following changes

Put the image files in a folder created within the repo root (example: data/images/*.png ). The data folder should only have one directory 'images'
Comment https://github.com/explainingai-code/DDPM-Pytorch/blob/main/dataset/mnist_dataset.py#L42 as this is only valid for mnist
Update the expected number of channels here and image dimensions(assumed square images) here - https://github.com/explainingai-code/DDPM-Pytorch/blob/main/config/default.yaml#L10
Change the config path here to point to 'data' directory('data' and not 'data/images') - https://github.com/explainingai-code/DDPM-Pytorch/blob/main/config/default.yaml#L2
Right now the code has been written for picking up png files in mnist data directory format, so I assume there are subdirectories inside the directory mentioned in config and these sub-directories have .png files. This would work if you have .png files. If the images are of other formats or combination of different formats then one would have to change the load_images function correspondingly here - https://github.com/explainingai-code/DDPM-Pytorch/blob/main/dataset/mnist_dataset.py#L29C9-L29C9
As of now code is written assuming square images, if thats not the case then just changing the dimensions to desired one during sampling should work - https://github.com/explainingai-code/DDPM-Pytorch/blob/main/tools/sample_ddpm.py#L20

Quickstart

Create a new conda environment with python 3.8 then run below commands
git clone https://github.com/explainingai-code/DDPM-Pytorch.git
cd DDPM-Pytorch
pip install -r requirements.txt
For training/sampling use the below commands passing the desired configuration file as the config argument in case you want to play with it.
python -m tools.train_ddpm for training ddpm
python -m tools.sample_ddpm for generating images

Configuration

config/default.yaml - Allows you to play with different components of ddpm

Output

Outputs will be saved according to the configuration present in yaml files.

For every run a folder of task_name key in config will be created

During training of DDPM the following output will be saved

Latest Model checkpoint in task_name directory

During sampling the following output will be saved

Sampled image grid for all timesteps in task_name/samples/*.png

Citations

@misc{ho2020denoising,
      title={Denoising Diffusion Probabilistic Models}, 
      author={Jonathan Ho and Ajay Jain and Pieter Abbeel},
      year={2020},
      eprint={2006.11239},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

ddpm-pytorch's People

Contributors

Stargazers

Watchers

Forkers

kiristern imraunav mognc heli-dawnlab703 kahchanlow

ddpm-pytorch's Issues

In Diffusion Models: Why does not use 'Positional Encoding' in self-attention layers?

Thx for nice practicing about DM.
Actually, I'm really curious about why does not use 'Positional Encoding' (which was used in ViT or VanillaTransformer.. etc..) in self-attention layers?
Is that any reason and can we ensure self-attention in DDPM U-Net can maintain its position(pixel-wise) information?

what changes would we need to do if we used our own dataset?

Thanks for the awesome explanation. Could you tell me which changes we need before training the model on our data?

what parameter changes would I need to make sure it runs on our dataset?

I am running this code on set of images but getting thisu error
" CUDA out of memory. Tried to allocate 150.06 GiB (GPU 0; 15.89 GiB total capacity; 720.18 MiB already allocated; 14.31 GiB free; 736.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. " I have updated the batch size, and also resize images to 224, 224 shape but it still giving me this CUDA error.

Can you please tell me what shold I do?

Thanks

Noise on all generated images

(Epoch 5)

No matter how many epochs I train the model for, I keep getting noise on my images. I think this is because there is still noise on timestep 0

Note: I still get noise on my image even with 1000 timesteps instead of 300

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

I get this error when using this repository.
This seems to fix it, but it's probably not the most efficient way to do it:

def add_noise(self, original, noise, t):
        original_shape = original.shape
        batch_size = original_shape[0]

        sqrt_alpha_cum_prod = self.sqrt_alpha_cum_prod[t.cpu()].reshape(batch_size)
        sqrt_one_minus_alpha_cum_prod = self.sqrt_one_minus_alpha_cum_prod[t.cpu()].reshape(batch_size)

        for _ in range(len(original_shape)-1):
            sqrt_alpha_cum_prod = sqrt_alpha_cum_prod.unsqueeze(-1)
        
        for _ in range(len(original_shape)-1):
            sqrt_one_minus_alpha_cum_prod = sqrt_one_minus_alpha_cum_prod.unsqueeze(-1)

        return (sqrt_alpha_cum_prod.to(original.device) * original + sqrt_one_minus_alpha_cum_prod.to(original.device) * noise)
    
    def backward(self, xt, noise_pred, t):
        x0 = (xt - (self.sqrt_one_minus_alpha_cum_prod[t.cpu()].to(noise_pred.device) * noise_pred)) / torch.sqrt(self.alpha_cum_prod[t.cpu()].to(noise_pred.device))
        x0 = torch.clamp(x0, -1., 1.)

        mean = xt - ((self.betas[t.cpu()]).to(noise_pred.device)*noise_pred).to(noise_pred.device)/(self.sqrt_one_minus_alpha_cum_prod[t.cpu()]).to(noise_pred.device)
        mean = mean / torch.sqrt(self.alphas[t.cpu()].to(noise_pred.device))

        if t == 0:
            return mean, mean
        else:
            variance = (1-self.alpha_cum_prod[(t-1).cpu()]).to(noise_pred.device) / (1. - self.alpha_cum_prod[t.cpu()]).to(noise_pred.device)
            variance = variance * self.betas[t.cpu()].to(noise_pred.device)
            sigma = variance ** 0.5
            z = torch.randn(xt.shape).to(xt.device)

            return mean + sigma*z, x0