Comments (14)
Thank you for being interested in our work!
I train the model on the images of Gal Gadot and here are some results:
"A photo of S* sitting in a movie theater"
"A photo of S* sitting in the kitchen"
I also upload the training images and pretrained weights and you can try it.
from vico.
I have updated the training code. You can train the model on your desired human images now. Thanks!
from vico.
Hello! I am trying to replicate some of the results shown here, but i am not getting good results. Is S* the special token for all the checkpoints provided? Thanks!
from vico.
Not getting good results with a trained model either. Something seems off. The images generated during training look okay, but then those generated by vico_txt2img.py are not even close...
from vico.
That's weird. What are your results directly using the pretrained weights?
Put all the pretrained weights under logs/gal_gadot/checkpoints
and the training images under images/gal_gadot
.
Use the following command to test:
python scripts/vico_txt2img.py --ddim_eta 0.0 --n_samples 4 --n_iter 2 --scale 7.5 --ddim_steps 50 --ckpt_path models/ldm/stable-diffusion-v1/sd-v1-4.ckpt --image_path images/gal_gadot/1.jpg --ft_path logs/gal_gadot --load_step 399 --prompt "a cyberpunk photo of *" --outdir outputs/gal_gadot
It is supposed to produce similar results as my run above. Please try it out and put your outputs here. I may locate the problem based on that. Thank you.
from vico.
Hi @Landroval2,
The pretrained weight file embeddings_gs-STEP.pt
is the trained S*. It varies among different steps (300, 350, 400). You need to ensure you use the weights at the same step for the image attention module and the S*.
from vico.
Hi @haoosz, thanks for your answer!
I have been testing this again and getting good images with the gal gadot model. However, the results with the batman model are not great, almost no variability. Could you share some prompts/steps that you used in that case?
Thanks again!
from vico.
The images of the batman toy are casually self-collected. The results with the batman indeed show low variability using some prompts.
You can try the following prompts (I use the default time step = 400):
- A photo of a S* in the jungle
- A photo of a S* on top of a dirt road
- A photo of a S* among the skyscrapers
- A photo of a S* on top of a wooden floor
- A photo of a S* with a city in the background
- A photo of a S* with a wheat field in the background
- A photo of a S* with the Eiffel Tower in the background
- A photo of a S* on top of green grass with sunflowers around it
- A photo of a S* with Japanese modern city street in the background
Thanks!
from vico.
Thanks for your answer! I will be trying those prompts to see what happens.
from vico.
@Landroval2 were you able to get better results? Can you share some insights please?
from vico.
@haoosz I was finally able to get the same results with gal_gadot for inferences. Could you share the training parameters and command for that particular run please?
from vico.
The issue I had was simply not using an identifierKeith *. I think I was trying a 3 letter token. When I changed to the same * type token in the config currently, everything worked as expected. It almost felt like prompt influence is even worse than TI, but otherwise results were stellar.
from vico.
Hi @haoosz ,
Thanks for your great work, I got similar results using the images of Gal Gadot, but got bad results on my own datasets. Is there anything I need to pay attention to when making a data set? Are there any requirements?
from vico.
You can try to adjust the training step, the random seed, and the initial word. Besides, the quality of the training data is also important. I have tried on my own images and got reasonable results.
from vico.
Related Issues (15)
- Multiple conditioning images
- Evaluation Code HOT 1
- init text on evaluation HOT 2
- reference image on evaluation?
- SDXL support? HOT 1
- 尝试使用base model训练后无法推理 HOT 1
- about evaluation code
- do you plan on releasing training code? HOT 1
- Any plans for a diffusers version? HOT 8
- Hardware Needs and GPU Colab? HOT 1
- need detailed tutorial to use Vico with AMD GPU on linux with Rocm HOT 1
- Image cross attention across all tokens? HOT 4
- About classifier free guidance HOT 3
- Does it support inpainting? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vico.