haoosz / vico Goto Github PK
View Code? Open in Web Editor NEWOfficial PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"
License: MIT License
Official PyTorch codes for the paper: "ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation"
License: MIT License
When I do the evaluation, I need to replace the * character with the self.init_text
on this line
Line 263 in 28c4c9c
TypeError: replace() argument 2 must be str, not None
Thanks in advance!
Is it possible to apply multiple conditioning images? Examples can be, multiple subjects, subject + style. driven by different tokens and different conditioning images.
I got few questions,
Thank you for your excellent work.
May I please request the evaluation code?
I'm interested in the specific evaluation processes for Dreambooth, Textual Inversion, Custom Diffusion, and VICO.
Thank you.
@haoosz Thank you for the amazing work and the open source code. I have been working to implement it on huggingface/diffusers. I believe the architecture in place but even with regularization and masking my models don't converge in terms of loss but the results are overfitted and distorted with the subject appearance.
I have gone through the code and the paper several time and I have one question I can't answer.
Is image cross attention applied for all tokens in the prompt or is it only computed for S* token (therefore using vanilla attention maps for other tokens)
I was wondering how ViCo handles classifier free guidance without sacrificing compute time
Thank you for your excellent work.
May I please request the evaluation code?
I'm interested in the specific evaluation processes for Dreambooth, Textual Inversion, Custom Diffusion, and VICO.
Thank you.
Peace, i need detailed tutorial to use Vico with AMD GPU on linux with Rocm.
作者好,最近关注到你们的工作,我尝试使用sd-v1.4的模型进行训练与推理时遇到一些问题,训练时在log文件夹中看测试的图没有问题,但使用vico_txt2img.py进行推理时的结果是彩噪。
此外,在v1-finetune.yaml配置文件中修改batch_size会导致训练错误:请问是在代码中硬编码了参数吗?
Traceback (most recent call last):
File "main-real.py", line 820, in <module>
trainer.fit(model, data)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 740, in fit
self._call_and_handle_interrupt(
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 685, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 777, in _fit_impl
self._run(model, ckpt_path=ckpt_path)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1199, in _run
self._dispatch()
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1279, in _dispatch
self.training_type_plugin.start_training(self)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 202, in start_training
self._results = trainer.run_stage()
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
return self._run_train()
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
self.fit_loop.run()
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
self.epoch_loop.run(data_fetcher)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/base.py", line 145, in run
self.advance(*args, **kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 216, in advance
self.trainer.call_hook("on_train_batch_end", batch_end_outputs, batch, batch_idx, **extra_kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1495, in call_hook
callback_fx(*args, **kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/pytorch_lightning/trainer/callback_hook.py", line 179, in on_train_batch_end
callback.on_train_batch_end(self, self.lightning_module, outputs, batch, batch_idx, 0)
File "/home/azureuser/ViCo/main.py", line 442, in on_train_batch_end
self.log_img(pl_module, batch, batch_idx, split="train")
File "/home/azureuser/ViCo/main.py", line 410, in log_img
images = pl_module.log_images(batch, split=split, **self.log_images_kwargs)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1409, in log_images
sample_scaled, _ = self.sample_log(cond=c,
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1337, in sample_log
samples, intermediates =ddim_sampler.sample(ddim_steps,batch_size,
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 98, in sample
samples, intermediates = self.ddim_sampling(conditioning, image_cond, ph_pos, size,
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 151, in ddim_sampling
outs = self.p_sample_ddim(img, cond, image_cond, ts, ph_pos, index=index, total_steps=total_steps, use_original_steps=ddim_use_original_steps,
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddim.py", line 187, in p_sample_ddim
e_t_uncond, e_t = self.model.apply_model(x_in, c_img_in, t_in, c_in, c_in, ph_pos_in, use_img_cond=True)[0].chunk(2)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1062, in apply_model
x_recon, loss_reg = self.model(x_noisy, x_ref, t, cond_init, ph_pos, use_img_cond, **cond,)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/ViCo/ldm/models/diffusion/ddpm.py", line 1624, in forward
out, loss_reg = self.diffusion_model(x, xr, t, cc_init, ph_pos, use_img_cond, context=cc)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/ViCo/ldm/modules/diffusionmodules/openaimodel.py", line 766, in forward
h, hr, loss_reg, attn = module(h, hr, emb, context, cc_init, ph_pos, use_img_cond)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/ViCo/ldm/modules/diffusionmodules/openaimodel.py", line 87, in forward
x, xr, loss_reg, attn = layer(x, xr, context, cc_init, ph_pos, use_img_cond, return_attn=True)
File "/opt/miniconda/envs/vico/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/azureuser/ViCo/ldm/modules/attention.py", line 333, in forward
attn_ph = attn[ph_idx].squeeze(1) # bs, n_patch
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [4], [2]
感谢你们的回复。
Hey guys, this paper looks great. Really excited to see the full training code. Was curious-- do you had any plans to make a diffusers port?
Hi @haoosz ,
Thanks for your fantastic work, I‘m curious about the results with human images as input. Could you show more results of human images?
I am wondering if the the finetuned model can do inpainting task as well
The results are truly exceptional. I tried many methods: dreambooth with loras, textual inversion, perfusion, ip-adapters and ViCo compared to them proved to be outstanding, even though it uses SD1.4.
Right now only loras on sdxl prove to be better, but I would love to see how ViCo on SDXL would compare to it.
Do you plan to support SDXL?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.