haoosz / vico Goto Github PK

Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"

License: MIT License

Python 10.58% Shell 0.06% Jupyter Notebook 89.36%

text-to-image-diffusion personalized-generation

vico's People

Contributors

Stargazers

Watchers

Forkers

eiffiy treksis eltociear dawei03896 techthiyanes c00renut quyqp1505 tonywhite11 vana-com keyman9848 5l1v3r1 peterzs guspan-tanadi songyang86

vico's Issues

init text on evaluation

When I do the evaluation, I need to replace the * character with the self.init_text on this line

ViCo/ldm/data/personalized.py

Line 263 in 28c4c9c

example["text_init"] = text.replace("*", self.init_text)

but I get an error saying that TypeError: replace() argument 2 must be str, not None
so how should I set the self.init_text?

Thanks in advance!

Multiple conditioning images

Is it possible to apply multiple conditioning images? Examples can be, multiple subjects, subject + style. driven by different tokens and different conditioning images.

Hardware Needs and GPU Colab?

I got few questions,

Vram need to train model?
Training time?
Google Colab is there?

about evaluation code

Thank you for your excellent work.

May I please request the evaluation code?
I'm interested in the specific evaluation processes for Dreambooth, Textual Inversion, Custom Diffusion, and VICO.

Thank you.

Image cross attention across all tokens?

@haoosz Thank you for the amazing work and the open source code. I have been working to implement it on huggingface/diffusers. I believe the architecture in place but even with regularization and masking my models don't converge in terms of loss but the results are overfitted and distorted with the subject appearance.

I have gone through the code and the paper several time and I have one question I can't answer.

Is image cross attention applied for all tokens in the prompt or is it only computed for S* token (therefore using vanilla attention maps for other tokens)

About classifier free guidance

I was wondering how ViCo handles classifier free guidance without sacrificing compute time

do you plan on releasing training code?

Evaluation Code

Thank you for your excellent work.

May I please request the evaluation code?
I'm interested in the specific evaluation processes for Dreambooth, Textual Inversion, Custom Diffusion, and VICO.

Thank you.

need detailed tutorial to use Vico with AMD GPU on linux with Rocm

Peace, i need detailed tutorial to use Vico with AMD GPU on linux with Rocm.

尝试使用base model训练后无法推理

作者好，最近关注到你们的工作，我尝试使用sd-v1.4的模型进行训练与推理时遇到一些问题，训练时在log文件夹中看测试的图没有问题，但使用vico_txt2img.py进行推理时的结果是彩噪。

此外，在v1-finetune.yaml配置文件中修改batch_size会导致训练错误：请问是在代码中硬编码了参数吗？

  Traceback (most recent call last):
    File "main-real.py", line 820, in <module>
      trainer.fit(model, data)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
      self._call_and_handle_interrupt(
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
      return trainer_fn(*args, **kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
      self._run(model, ckpt_path=ckpt_path)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
      self._dispatch()
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
      self.training_type_plugin.start_training(self)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
      self._results = trainer.run_stage()
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
      return self._run_train()
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
      self.fit_loop.run()
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
      self.advance(*args, **kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
      self.epoch_loop.run(data_fetcher)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
      self.advance(*args, **kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 216, in advance
      self.trainer.call_hook("on_train_batch_end", batch_end_outputs, batch, batch_idx, **extra_kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1495, in call_hook
      callback_fx(*args, **kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 179, in on_train_batch_end
      callback.on_train_batch_end(self, self.lightning_module, outputs, batch, batch_idx, 0)
    File "/home/azureuser/ViCo/main.py", line 442, in on_train_batch_end
      self.log_img(pl_module, batch, batch_idx, split="train")
    File "/home/azureuser/ViCo/main.py", line 410, in log_img
      images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
      return func(*args, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1409, in log_images
      sample_scaled, _ = self.sample_log(cond=c, 
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
      return func(*args, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1337, in sample_log
      samples, intermediates =ddim_sampler.sample(ddim_steps,batch_size,
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
      return func(*args, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 98, in sample
      samples, intermediates = self.ddim_sampling(conditioning, image_cond, ph_pos, size,
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
      return func(*args, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 151, in ddim_sampling
      outs = self.p_sample_ddim(img, cond, image_cond, ts, ph_pos, index=index, total_steps=total_steps, use_original_steps=ddim_use_original_steps,
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
      return func(*args, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 187, in p_sample_ddim
      e_t_uncond, e_t = self.model.apply_model(x_in, c_img_in, t_in, c_in, c_in, ph_pos_in, use_img_cond=True)[0].chunk(2)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1062, in apply_model
      x_recon, loss_reg = self.model(x_noisy, x_ref, t, cond_init, ph_pos, use_img_cond, **cond,)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
      return forward_call(*input, **kwargs)
    File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1624, in forward
      out, loss_reg = self.diffusion_model(x, xr, t, cc_init, ph_pos, use_img_cond, context=cc)        
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
      return forward_call(*input, **kwargs)
    File "/home/azureuser/ViCo/ldm/modules/diffusionmodules/openaimodel.py", line 766, in forward
      h, hr, loss_reg, attn = module(h, hr, emb, context, cc_init, ph_pos, use_img_cond)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
      return forward_call(*input, **kwargs)
    File "/home/azureuser/ViCo/ldm/modules/diffusionmodules/openaimodel.py", line 87, in forward
      x, xr, loss_reg, attn = layer(x, xr, context, cc_init, ph_pos, use_img_cond, return_attn=True)
    File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
      return forward_call(*input, **kwargs)
    File "/home/azureuser/ViCo/ldm/modules/attention.py", line 333, in forward
      attn_ph = attn[ph_idx].squeeze(1) # bs, n_patch
  IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [4], [2]

感谢你们的回复。

Any plans for a diffusers version?

Hey guys, this paper looks great. Really excited to see the full training code. Was curious-- do you had any plans to make a diffusers port?

reference image on evaluation?

Hello!

On paper's quantitative comparison,
what did you use reference image for each sample and prompt?
Was it sampled from training images for each prompt or fixed?

And also, which checkpoints(300, 350, 400?) did you use for quantitative comparison?

Thanks in advance!

How about the results with human images as inputs?

Hi @haoosz ,
Thanks for your fantastic work, I‘m curious about the results with human images as input. Could you show more results of human images?

Does it support inpainting?

I am wondering if the the finetuned model can do inpainting task as well

SDXL support?

The results are truly exceptional. I tried many methods: dreambooth with loras, textual inversion, perfusion, ip-adapters and ViCo compared to them proved to be outstanding, even though it uses SD1.4.
Right now only loras on sdxl prove to be better, but I would love to see how ViCo on SDXL would compare to it.
Do you plan to support SDXL?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.