jd-p / cloob-latent-diffusion Goto Github PK

View Code? Open in Web Editor NEW

112.0 112.0 8.0 59 KB

CLOOB Conditioned Latent Diffusion training and inference code

License: MIT License

Python 100.00%

cloob-latent-diffusion's People

Contributors

Stargazers

Watchers

Forkers

kalufinnle wn1695173791 rishistyping nbardy smithee77 dashstander deephim marcus-arcadius

cloob-latent-diffusion's Issues

`autoencoder_scale` was something like 6.85

Hi, I am about to train my own cloob latent diffusion and would like to confirm this is right.
autoencoder_scale in your example was about 100 but I got something like 6.85.
It depends on the training dataset but I saw a different line in your previous log to compute the final scale.
So, I just would like to double check if the master's code is doing right, or not.
Thank you!

in the previous commit

autoencoder_scale = torch.tensor(var_accum ** 0.5)

in the master branch

autoencoder_scale = torch.tensor((var_accum / 32) ** 0.5)

Trouble running inference

Hi there, this is great work and I cant wait to get this running and to test it!

I have tried to create a colab notebook and believe I have downloaded the neccesary models and requirements, however unfortunately Im getting an error. When i run
!./cfg_sample.py prompts "A photorealist detailed snarling goblin" --autoencoder kl_f8 --method "plms" --checkpoint yfcc-latent-diffusion-f8-e2-s250k.ckpt --seed 4485 --steps 50 && v-diffusion-pytorch/make_grid.py out_*.png
I get an error like:

/bin/bash: line 1: 969 Killed

My full code is simply:

!git clone --recursive https://github.com/JD-P/cloob-latent-diffusion
!pip install omegaconf
!pip install pytorch-lightning
!pip3 install pillow einops wandb ftfy regex pycocotools
!pip3 install -r /content/cloob-latent-diffusion/CLIP/requirements.txt
%cd cloob-latent-diffusion

#Get models
!wget https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl #Cloob Checkpoint
!wget https://ommer-lab.com/files/latent-diffusion/kl-f8.zip #Autoencoder
!wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml #Autoencoder config
!wget https://the-eye.eu/public/AI/models/yfcc-latent-diffusion-f8-e2-s250k.ckpt
!unzip /content/cloob-latent-diffusion/kl-f8.zip
%cd /content/cloob-latent-diffusion
sys.path.append("/content/cloob-latent-diffusion")
os.rename("/content/cloob-latent-diffusion/autoencoder_kl_32x32x4.yaml","/content/cloob-latent-diffusion/kl_f8.yaml")
os.rename("model.ckpt","kl_f8.ckpt")
os.rename("cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl","cloob_laion_400m_vit_b_16_16_epochs.pkl")

Hoping you can help, thanks!

Error making demo grid

If you try training with a number of prompts other than 16, you'll get a runtime error like RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 50 but got size 32 for tensor number 1 in the list. (caused by trying 25 prompts)

Might be worth making clear in the readme that the list of demo prompts MUST be 16 lines long, and maybe check that in def on_batch_end(self, trainer, module): (train_latent_diffusion.py, line 435ish) and throw an error if there's a shape mismatch that is more descriptive?

I also had trouble loading the pretrained model lined in the readme by just passing in the ckpt file as the --resume-from argument. Instead, I had to modify the training script to do self.model.load_state_dict(torch.load(path_to_ckpt)) in the init function of class LightningDiffusion. I'd guess this is because the checkpoint shared is just for the model (not the full bundle with ema_model, cloob and the autoencoder that would be saved if someone trained from scratch themselves. It's not a biggie, but for people wanting to fine-tune from your shared checkpoint it's currently something that requires a bit of figuring out and I wanted to share in case it's an easy fix.

Thanks for all the work you've done on this!

Trouble running danbooru sample command line

I'm getting an error when I try to run the danbooru command line:

$ python cfg_sample.py "anime portrait of a man in a flight jacket leaning against a biplane" --autoencoder danbooru-kl-f8 --checkpoint danbooru-latent-diffusion-e88.ckpt --cloob-checkpoint cloob_laion_400m_vit_b_16_32_epochs --base-channels 128 --channel-multipliers 4,4,8,8 -n 16 --seed 4485 && v-diffusion-pytorch/make_grid.py out_*.png
Using device: cuda:0
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
Restored from danbooru-kl-f8.ckpt
{'url': 'https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_32_epochs-646f61628eb4bc03a01ce5c23b727a348105f0405b6037a329da062739a0644
1.pkl', 'd_embed': 512, 'inv_tau': 30.0, 'scale_hopfield': 15.0, 'image_encoder': {'type': 'ViT', 'image_size': 224, 'input_channels': 3, 'normalize':
{'mean': [0.48145466, 0.4578275, 0.40821073], 'std': [0.26862954, 0.26130258, 0.27577711]}, 'patch_size': 16, 'n_layers': 12, 'd_model': 768, 'n_head
s': 12}, 'text_encoder': {'type': 'transformer', 'tokenizer': 'clip', 'text_size': 77, 'vocab_size': 49408, 'n_layers': 12, 'd_model': 512, 'n_heads':
8}}
Traceback (most recent call last):
File "cfg_sample.py", line 208, in
main()
File "cfg_sample.py", line 144, in main
cloob.text_encoder(cloob.tokenize(txt).to(device)).float())
File "C:\Users\Bart\anaconda3\envs\cloob\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\ai\cloob-latent-diffusion./cloob-training\cloob_training\model_pt.py", line 105, in forward
padding_mask = torch.cumsum(eot_mask, dim=-1) == 0 | eot_mask
TypeError: unsupported operand type(s) for |: 'int' and 'Tensor'

jd-p / cloob-latent-diffusion Goto Github PK

cloob-latent-diffusion's People

Contributors

Stargazers

Watchers

Forkers

cloob-latent-diffusion's Issues

`autoencoder_scale` was something like 6.85

Trouble running inference

Error making demo grid

Trouble running danbooru sample command line

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent