explainingai-code / stablediffusion-pytorch Goto Github PK

View Code? Open in Web Editor NEW

79.0 4.0 16.0 91 KB

This repo implements a Stable Diffusion model in PyTorch with all the essential components.

Python 100.00%

latent-diffusion latent-diffusion-models stable-diffusion stable-diffusion-tutorial

stablediffusion-pytorch's Issues

Question about sampling data

Hello,
First of all thanks again for sharing the code and good work on that!

I have one doubt about sampling data (after training fase).

When "t" is not equal to zero you are using x0 into "im" variable (instead using mean + sigma * z) but when t is equal to zero you are using the mean variable. Why?

I guess you should use always mean + sigma * z every time when t is not equal to zero right?

Bug when saving Latent information?

Hello,
When you run infer_vqvae.py you save the latent information (encoded information) but you do not clamp it (torch.clamp(encoded_output, -1., 1.)).

I also checked when you read it from dataset and when the variable of use_latents is equal to True you don't clamp it.

Maybe its a bug?

Thank you!

The training loss is not decreasing

Hi,
Thank you for your code.

I am training a ldm model with the config file I have attached.
I have training with multiple dataset and settings. Always the training loss doesnt converge after certain epochs. usually it is when the loss is somewhere around 0.1. The loss does goes down consistently but very slowly.
As I am using MSE loss 0.1 is large for image generation.

Once I continued training until 400 epochs at that time the model was overfitted but the loss was minimum around 0.02.
May be could you share your insights or does anyone has faced this issue?
tuned_class_cond_bdd_1.zip

Unexpected output after sampling using Conditional LDM

I am getting a bunch of noodle-like waves after sampling for conditional LDM instead of proper digits. The unconditional LDM works fine. I am using the MNIST dataset that the Torchvision library has (torchvision.datasets.MNIST).

Can you tell what could be wrong in this scenario?

I have attached x0_0.png outputs for both Unconditional_LDM and Conditional_LDM.

why the ldm can't generate images well?

i trained the unconditional ldm for 200 epochs, and the result was not satisfactory, although the auto encoder gave a better result.

Why your model generated mnist images are noises?

Dear explainingai-code:
Your codebase for ddpm is so detailed and helpful that I would like to thank you very much for your great work !
I have downloaded this codebase and follow your instructions carefully, luckly I got a good VQVAE result as follows:

So I continue to train unconditional and class conditional ddpm models. 
I run python tools/train_ddpm_vqvae.py to train unconditional ddpm model and it seems to have converged as follows:

The loss decreases from 0.2833 to 0.0886. Then I run python tools/sample_ddpm_vqvae.py to check model output, I get mnist/samples/x0_0.png to x0_999.png, all seems like PURE NOISE as follows:

 Simiarly I trained and tested mnist class conditional ddpm and all results seems like PURE NOISE as follows:

 My code structure as follows:

I am sure that I just follow your instructions and DO NOT modify model related code, but the results seems weired. Appreciated if you could give some help.

Sincere,
CatLoves

Running out of memory... best number of samples for custom data sets?

Hello!

I was wondering if you have any intuition on how many training samples are required to get good results/how much memory is required to train the unconditional VQVAE?

I have about 200k grayscale images at 256x256... which was obviously too much, so I scaled back to 70 images just to see if it would start training, but it didn't... throwing the too little memory error.

Is this something batch size can fix or do I need to mess with a bunch of other parameters? I only changed the im_channels and save_latent parameters from their defaults.

Thank you!

Losses for conditional diffusion models

Hey thanks for the videos and codes, I am experimenting with conditional ldms.

Do you happen to have loss plots or logs of the loss? I have a feeling that the loss is decreasing really slowly or not decreasing at all.
Could you let me know if you had similar loss decrease? Here is the screenshot for your reference.

How to improve text-conditioned generation?

I see that model not very good at text conditioned generation. How to improve this situation? Maybe train CLIP model itself, or just train ldm for longer?

It wuold be greatly appreciated if model ckpts could be provided

Hello @explainingai-code !
I have read your code carefully and can train unconditional and text-conditioned celebHQ model now, but my GPU is only one V100 card and it's expected to train for ~110 hours to get one result, which is very time consuming. Renting a GPU cluster is also very expensive and time-consuming. If you have already trained the model, could you provide download links for model ckpts for convenience? I think this would be very helpful for people to quickly use your codebase.
Moreover, thank you again for your great codebase and I think it wuold be greatly appreciated if model ckpts could be provided.

Sincerely,
CatLoves

Unable to run

Hello there, I am trying to run conditional text part and have followed all the instructions but at the end I am facing following error. Screenshot attached below. It says "Model checkpoint celebhq/ddpm_ckpt_text_cond_clip.pth not found"

How to modify config files to generate higher resolution images

I am working on a use case where I want to generate larger resolution images something like 1024x1024. How do I modify the configuration to do that?

Question - Why using VAE and not just UNet to predict the final output image?

Hello,

First of all thank you for your great work!

I have a question about the use of VAE. Why don't use only UNet to predict the generated image?

ldm for inpainting

Many thanks for your amazing work.

Im wondering for the cases of the inpainting do you pass the image and masked image through the same pretrained VAE. and what is the training data of this VAE is it on the images or the dataset contain also the masked images?? or you train separate VAEs for each one ?

How to condition based on multiple features?

I would like to condition the model using multiple features. In my case, I have lot of columns say A, B, C and D and some of the columns are categorical and some are numerical. Now I wanted to implement stable diffusion by conditioning on all the columns together. Please advise what are the modifications I need to do.

Thanks.

How to train a text-conditioned ldm for MNIST dataset?

I have a text data for MNIST dataset (1 line text description for each image). I tried to train a text-conditional ldm, using MNIST class conditioned config file (mnist_class_cond.yaml) and setting the condition config for text in this file. However, the trained model is not responding to text input, and it generates random images with different configurations even when the text is asking for a specific requirement (For example, the model is generating images of random numbers even when the text asks to generate images of 3). How to modify the entire setup so that it works for text-condition on MNIST dataset?

explainingai-code / stablediffusion-pytorch Goto Github PK

stablediffusion-pytorch's Issues

Question about sampling data

Bug when saving Latent information?

The training loss is not decreasing

Unexpected output after sampling using Conditional LDM

why the ldm can't generate images well?

Why your model generated mnist images are noises?

Running out of memory... best number of samples for custom data sets?

Losses for conditional diffusion models

How to improve text-conditioned generation?

It wuold be greatly appreciated if model ckpts could be provided

Unable to run

How to modify config files to generate higher resolution images

Question - Why using VAE and not just UNet to predict the final output image?

ldm for inpainting

How to condition based on multiple features?

How to train a text-conditioned ldm for MNIST dataset?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent