Giter Site home page Giter Site logo

cloob-latent-diffusion's People

Contributors

jd-p avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cloob-latent-diffusion's Issues

`autoencoder_scale` was something like 6.85

Hi, I am about to train my own cloob latent diffusion and would like to confirm this is right.
autoencoder_scale in your example was about 100 but I got something like 6.85.
It depends on the training dataset but I saw a different line in your previous log to compute the final scale.
So, I just would like to double check if the master's code is doing right, or not.
Thank you!

in the previous commit

autoencoder_scale = torch.tensor(var_accum ** 0.5)

in the master branch

autoencoder_scale = torch.tensor((var_accum / 32) ** 0.5)

Trouble running inference

Hi there, this is great work and I cant wait to get this running and to test it!

I have tried to create a colab notebook and believe I have downloaded the neccesary models and requirements, however unfortunately Im getting an error. When i run
!./cfg_sample.py prompts "A photorealist detailed snarling goblin" --autoencoder kl_f8 --method "plms" --checkpoint yfcc-latent-diffusion-f8-e2-s250k.ckpt --seed 4485 --steps 50 && v-diffusion-pytorch/make_grid.py out_*.png
I get an error like:

/bin/bash: line 1: 969 Killed

My full code is simply:

!git clone --recursive https://github.com/JD-P/cloob-latent-diffusion
!pip install omegaconf
!pip install pytorch-lightning
!pip3 install pillow einops wandb ftfy regex pycocotools
!pip3 install -r /content/cloob-latent-diffusion/CLIP/requirements.txt
%cd cloob-latent-diffusion

#Get models
!wget https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl #Cloob Checkpoint
!wget https://ommer-lab.com/files/latent-diffusion/kl-f8.zip #Autoencoder
!wget https://raw.githubusercontent.com/CompVis/latent-diffusion/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml #Autoencoder config
!wget https://the-eye.eu/public/AI/models/yfcc-latent-diffusion-f8-e2-s250k.ckpt
!unzip /content/cloob-latent-diffusion/kl-f8.zip
%cd /content/cloob-latent-diffusion
sys.path.append("/content/cloob-latent-diffusion")
os.rename("/content/cloob-latent-diffusion/autoencoder_kl_32x32x4.yaml","/content/cloob-latent-diffusion/kl_f8.yaml")
os.rename("model.ckpt","kl_f8.ckpt")
os.rename("cloob_laion_400m_vit_b_16_16_epochs-405a3c31572e0a38f8632fa0db704d0e4521ad663555479f86babd3d178b1892.pkl","cloob_laion_400m_vit_b_16_16_epochs.pkl")

Hoping you can help, thanks!

Error making demo grid

If you try training with a number of prompts other than 16, you'll get a runtime error like RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 50 but got size 32 for tensor number 1 in the list. (caused by trying 25 prompts)

Might be worth making clear in the readme that the list of demo prompts MUST be 16 lines long, and maybe check that in def on_batch_end(self, trainer, module): (train_latent_diffusion.py, line 435ish) and throw an error if there's a shape mismatch that is more descriptive?

I also had trouble loading the pretrained model lined in the readme by just passing in the ckpt file as the --resume-from argument. Instead, I had to modify the training script to do self.model.load_state_dict(torch.load(path_to_ckpt)) in the init function of class LightningDiffusion. I'd guess this is because the checkpoint shared is just for the model (not the full bundle with ema_model, cloob and the autoencoder that would be saved if someone trained from scratch themselves. It's not a biggie, but for people wanting to fine-tune from your shared checkpoint it's currently something that requires a bit of figuring out and I wanted to share in case it's an easy fix.

Thanks for all the work you've done on this!

Trouble running danbooru sample command line

I'm getting an error when I try to run the danbooru command line:

$ python cfg_sample.py "anime portrait of a man in a flight jacket leaning against a biplane" --autoencoder danbooru-kl-f8 --checkpoint danbooru-latent-diffusion-e88.ckpt --cloob-checkpoint cloob_laion_400m_vit_b_16_32_epochs --base-channels 128 --channel-multipliers 4,4,8,8 -n 16 --seed 4485 && v-diffusion-pytorch/make_grid.py out_*.png
Using device: cuda:0
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
loaded pretrained LPIPS loss from taming/modules/autoencoder/lpips\vgg.pth
Restored from danbooru-kl-f8.ckpt
{'url': 'https://the-eye.eu/public/AI/models/cloob/cloob_laion_400m_vit_b_16_32_epochs-646f61628eb4bc03a01ce5c23b727a348105f0405b6037a329da062739a0644
1.pkl', 'd_embed': 512, 'inv_tau': 30.0, 'scale_hopfield': 15.0, 'image_encoder': {'type': 'ViT', 'image_size': 224, 'input_channels': 3, 'normalize':
{'mean': [0.48145466, 0.4578275, 0.40821073], 'std': [0.26862954, 0.26130258, 0.27577711]}, 'patch_size': 16, 'n_layers': 12, 'd_model': 768, 'n_head
s': 12}, 'text_encoder': {'type': 'transformer', 'tokenizer': 'clip', 'text_size': 77, 'vocab_size': 49408, 'n_layers': 12, 'd_model': 512, 'n_heads':
8}}
Traceback (most recent call last):
File "cfg_sample.py", line 208, in
main()
File "cfg_sample.py", line 144, in main
cloob.text_encoder(cloob.tokenize(txt).to(device)).float())
File "C:\Users\Bart\anaconda3\envs\cloob\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "D:\ai\cloob-latent-diffusion./cloob-training\cloob_training\model_pt.py", line 105, in forward
padding_mask = torch.cumsum(eot_mask, dim=-1) == 0 | eot_mask
TypeError: unsupported operand type(s) for |: 'int' and 'Tensor'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.