After watching experiments on Textual Inversion from the sidelines, I decided to jump

Just a follow-up to the guide provided by <a class="user-mention notranslate" data-hov

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This is gamechanging. Wow.,about xavierxiao/dreambooth-stable-diffusion

Comments (37)

Oscerlot commented on July 21, 2024 17

I managed to do a training sesh on vast.ai for ~$1, but since this looks like the hip-thread where all the cool kids are coming, I figure this might be the best spot to just post a quick few steps here for anyone who does come across this to have a bit of direction on how to go about it if you're clueless to get it going, like me (Not a full tutorial, but more like a look here to start and here are some issues and how to fix!):

prep your training and regularization data in advance
pick an instance with at least 1 A6000 (cheapest that meets the VRAM reqs, I've found - and 1 is good to start with since you might be spending more time figuring out how to set it up than actually training it). Make sure the download (and upload) speeds are decent, like >100mbps
EDIT: Forgot to mention ^ when finding an instance, make sure to select a PyTorch instance config
go in and open a terminal sesh
clone this repo git clone https://github.com/XavierXiao/Dreambooth-Stable-Diffusion.git
cd into the directory cd Dreambooth-Stable-Diffusion and make a conda environment for it conda env create -f environment.yaml this will take a lil' while
while that's happening, create a new terminal instance and pull down the SD EMA model, make sure you're in the project directory (Dreambooth-Stable...). Easiest way to do this is download it from hugging face wget --http-user=USERNAME --http-password=PASSWORD https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt, you should use an API key for a more secure method of doing so, but just for the ease of use for anyone unfamiliar
While these two things are going on, you can use the time to go and upload your training/regularization data into some new subfolders in the project. Something like /Dreambooth-Stable-Diffusion/training or something
when the conda environment is set up, initialize conda with conda init bash, then reset the terminal with reset, or create a new terminal session
navigate to the project directory again (Dreambooth-Stable...) if you aren't there already, and activate the environment with conda activate ldx < or whatever the environment was called
you should be ready to train! python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume sd-v1-4-full-ema.ckpt -n <whatever you want this training to be called> --gpus 0, --data_root <the relative path to your training images> --reg_data_root <the relative path to your regularization images> --class_word <the word you used to get for regularization data>
^ Note that this will map your training to the default sks prompt, so go change that in the personalized.py file if you want
let it roll, should be a bit over 1 it/s on 1 A6000, if it crashes after 500 or so steps, you'd probably be hitting the font error as @nikopueringer mentioned above ^^ so follow those instructions
After that's all done, all that's left is to download your model. I've run into issues with the vast.ai frontend and can't actually navigate into the /logs/xx/checkpoints folder, so if you hit this, try moving it out. In a terminal, cd into the checkpoints folder and do a mv final.ckpt .. to move it back a directory, so it should now be selectable and you can download it!
alternatively, if you have a bad net connection, you could upload it to google drive to save some time running the instance - look into the 'gdrive' github repo to install this on the ubuntu cli

I might have gotten a couple things wrong, this was mostly from memory and notes of my trial-and-error approach, so if it doesn't work, I probably screwed something up in writing this, but hopefully it puts you on the right track if you get stuck. Or you know, don't find or do this at all, so the pricing of the instances don't go up and I can keep training for cheap 👀

from dreambooth-stable-diffusion.

XavierXiao commented on July 21, 2024 7

Wow. I am not familiar with how to write the best prompt for SD (or any text-to-image model), so even myself cannot generate the cool things you did. I try to train with some images of newfoundland dogs (my profile pic) and use exactly the same zombie prompt, and I also got some interesting results. Prompt is so important!

from dreambooth-stable-diffusion.

prettydeep commented on July 21, 2024 7

Just a follow-up to the guide provided by @Oscerlot ...

If using Vast.ai, make sure to get an instance with pytorch, A6000 gpu, and 100GB drive space (in case you want to generate multiple models).

I used ~100 training images and ~300 regularization images. All were 512x512.
I modified the /Dreambooth-Stable-Diffusion/configs/stable-diffusion/v1-finetune_unfrozen.yaml file to allow for 4000 steps (line 120). The main.py saves a last.ckpt every 500 steps, so for each save I just moved last.ckpt to <#steps>.ckpt so I could try multiple models.
I added "--no-test" to the main.py training command to prevent this issue.
Before creating the conda env, I did:
a) apt -y install gcc (this was missing on my first attempt, and caused a failed pip install)
b) conda update -n base conda (just to make sure conda is up to date)
I ran the following within the Dreambooth directory to prevent the font error referenced above by Niko:

mkdir data/
wget https://github.com/prawnpdf/prawn/raw/master/data/fonts/DejaVuSans.ttf -P data/

Training ran well. But I could not directly download the ckpt file (~12GB) for some reason, so I moved the ckpt file to the root directory and downloaded it via scp. To set up ssh for scp just follow the directions provided by vastai when you click the terminal button on your specific instance.
EDIT 9/20: Before downloading, you can prune your ckpt down to 2-3GB by using this script.

On your instance...

mv /workspace/Dreambooth-Stable-Diffusion/logs/<training folder>/checkpoints/last.ckpt /workspace/Dreambooth-Stable-Diffusion/last.ckpt

then on your local machine...

scp -P <port#> [email protected]:/workspace/Dreambooth-Stable-Diffusion/last.ckpt <local folder>

Below are some sample output images.

Thoughts:

The results highly depend on using a prompt with "a photo of sks man" rather than "sks man" alone.
Separating "a photo of sks man" by a comma from the rest of your prompt tends to produce better results.
If your training images are only headshot selfies, which mine were, then producing wide shot images have a poor likeness quality.
I tested multiple models, and the difference beyond 1000 steps is negligible.
I found that the lower the guidance value was, the better the image style but the lower the likeness. I found a nice balance at ~7. Although, I could never get a good zombie result using Niko's prompt regardless of the guidance value. Maybe Niko has a natural zombieness about him :)

from dreambooth-stable-diffusion.

ShivamShrirao commented on July 21, 2024 7

Happy to share that it now runs in just 18 GB VRAM and even 2 times faster. Hoping to get below 16 GB in a day or two.

More details here: #35

from dreambooth-stable-diffusion.

ExponentialML commented on July 21, 2024 6

Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.

I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.

from dreambooth-stable-diffusion.

nikopueringer commented on July 21, 2024 2

Ha, those dog pictures are amazing! That’s so awesome.

truly, the dreambooth method implemented here is next level. It’s the beginning of a huge technological shift, and a solution to one of the key problems preventing image generators from being a useful tool. I hope development continues!

from dreambooth-stable-diffusion.

hopibel commented on July 21, 2024 2

@Desm0nt The saying that comes to mind is "Beware trivial inconveniences". By which I mean adoption of two competing things is heavily influenced by which one is more accessible.

Being able to set up the repo on a cloud compute service is a nontrivial amount of additional time and know-how needed compared to clicking "run all cells" in a colab notebook. Not to mention (correct me if I'm wrong) this technique involves modifying the weights of the model itself, while textual inversion produces embeddings that can be shared independently of the model, which makes it possible for the community to build repositories of object and style "plugins" like we see with huggingface's sd-concepts. I'd argue the collaborative nature of sd-concepts has the potential to be an even bigger gamechanger.

from dreambooth-stable-diffusion.

ExponentialML commented on July 21, 2024 1

Also just to confirm in case someone is wondering, I haven't had any luck fine-tuning multiple concepts on the newly trained models. So if you train a concept, then train another concept on that newly trained model, it will combine them, even with a different identifier.

I haven't checked if the identifier is linked to the seed or not, but I will when I get the chance. Hopefully it's possible because my disk space only has so much 🙂.

from dreambooth-stable-diffusion.

XavierXiao commented on July 21, 2024 1

Agree with ExponentialML. We should generate more reg images, 8 seems to be too small. I have updated the readme.

from dreambooth-stable-diffusion.

Desm0nt commented on July 21, 2024 1

@hopibel Jupiter notebook on vast.ai works the same way as in colab =)

Textual inversion is good and really very comfortable compare to this solution, but it actually can't add knowledge of an object to the model. It's just add non-textual description that can produce a similar to required object thing with use of existing model knowledge. If model don't have enough knowledge to produce it accuratly - it will be slightly (or very much) different from required object. And with 3-5 samples it's create a description, that try to produce exactly the same object as on samples.

But with Dreambooth you actually fine tune the model. With 100+ samples the model actually learns information about the object in different conditions (not just a description of something similar), which allows it to recreate exactly the object we are looking for (if we want - even as photorealistically close to the images from the training sample as possible). This makes it possible to fully stylize the object and substitute it in different conditions without fear of losing features and similarity.

The difference between Dreambooth and textual inversion as the difference between the real knowledge in the artist's style in the model (which allows you to apply it to any query) and the handpicked combinations of descriptions that give an apparently similar style under certain conditions, but lose similarity under other.

It's a real true finetune of the model, but it doesn't require such huge amounts of data and takes a lot less time to learn new concepts without the risk of screwing up the generalization of the model.

The only significant drawback - the resulting model weighs 12.4 gb instead of the 4 gb of the original model, and my knowledge is not enough to somehow compress it before downloading from the cloud machine.

from dreambooth-stable-diffusion.

JoePenna commented on July 21, 2024

Some similar results here. Thank you for the help setting it up, Niko!

Training set:

Best I got with traditional textual encoding:

With fine-tuning:

And some prompts (older model that wasn't as faithful):

.

Some more info here (Stable discord).

from dreambooth-stable-diffusion.

robertsehlke commented on July 21, 2024

Second this, I've been getting some great results as well. Customization and workflow integration will be the killer features of open source image generation models.

I wonder if there is an efficient way to save and use the newly learned concepts, similar to the embedding files produced by textual inversion (which I realize wouldn't be directly applicable here).

from dreambooth-stable-diffusion.

khronimo commented on July 21, 2024

Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.

I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.

It would be absolutely amazing to see this running on a 3090.

from dreambooth-stable-diffusion.

robertsehlke commented on July 21, 2024

Also just to confirm in case someone is wondering, I haven't had any luck fine-tuning multiple concepts on the newly trained models. So if you train a concept, then train another concept on that newly trained model, it will combine them, even with a different identifier.

That's curious - I naively trained a different concept (also different class id) on top of the first model I fine-tuned and that worked out pretty well. The model can generate images from both concepts individually, though it tends to blend them if used together in a prompt.

Edit: I did notice one thing that may be related: running the prompt that generates the regularization images with the first- and second-generation fine-tuned models visibly degrades/collapses the results. Will make a separate issue.

from dreambooth-stable-diffusion.

1blackbar commented on July 21, 2024

WEll i had different approach about this, as you know sly stallone is looking crap in SD so.. i overfit his face with SD a lot, and i laso trained his regular so-so likeness default setting embedding, i use them both, the worse likeness one is to change a style, and the high llikeness one overfit embedding is to bring back his likeness
So, this is regular tex inversion, you can make it work if you really want to .
My take on is is that you can change a style with heavy overfitting but its harder but you cant do caricatured styles too much, more like classic painters or comic book styles that dont distort faces too much.
It takes hela GPU to train the repo were on , im sure the code works much better as well, but regular texinversion works pretty well if youre willing to go extra steps
Some outputs from sd with texture inversion ( not dreambooth)
Advantage of this is filesize !!! This is huge deal for me at least vs dreambooth way.

from dreambooth-stable-diffusion.

1blackbar commented on July 21, 2024

OK You might say but hes already in the SD ckpt file , right ? Yeah he is so heres my mom when she was 20yrs old, regular texinversion as well
Overal what would impress me the most would be a code that would let us finetune robocop with his suit and resynthesize the suit perfectly, i dont hink SD architecture is capable tho.There is ironman suit trained nicely in SD but all of them are mutations of the suits but its not so jarring cause movies had a lot of revisions.
Cant wait for an episode on SD !

from dreambooth-stable-diffusion.

hopibel commented on July 21, 2024

Gamechanging or not, it's unlikely to catch on if hobbyists can't play around with it, so the fact that this technique requires a workstation class gpu is a huge downside :P

from dreambooth-stable-diffusion.

burgalon commented on July 21, 2024

@nikopueringer what did you use for regularization on your own photo training?

from dreambooth-stable-diffusion.

Desm0nt commented on July 21, 2024

Gamechanging or not, it's unlikely to catch on if hobbyists can't play around with it, so the fact that this technique requires a workstation class gpu is a huge downside :P

It's require ~1 hour (for training and download tuned model) of Tesla A40/A6000 on Vast.ai with 0.4-0.8$ investment. It's really cheap. And then you can run this model in any colab or any local repo on 6+ Gb VRAM as you usual do with original model.

from dreambooth-stable-diffusion.

ThereforeGames commented on July 21, 2024

@Desm0nt Interesting, I was wondering how Dreambooth would fare with general style transfer. The Google page advertises it as "subject-driven generation" but, based on your comment, it should outperform Textual Inversion even at abstract tasks.

Right now I'm particularly interested in seeing if it's possible to create good pixel art in SD via img2img. It's already somewhat possible with complex prompt engineering, but the results are inconsistent.

I've tried finetuning a collection of 14 pixelized character portraits in textual inversion, but even after 50k iterations, the style transfer is a complete mess.

Any successful examples of something like this in Dreambooth? Doesn't have to be pixel art.

from dreambooth-stable-diffusion.

TingTingin commented on July 21, 2024

I naively trained a different concept (also different class id) on top of the first model I fine-tuned

What was the size of the model at the end of both training runs?

from dreambooth-stable-diffusion.

nikopueringer commented on July 21, 2024

@nikopueringer what did you use for regularization on your own photo training?

Just 10 pictures or so generated from “man” as my prompt

from dreambooth-stable-diffusion.

1blackbar commented on July 21, 2024

For more examples you can read here about oinversion guys, dont want to hijack this repo as its something else but prettyu much doing similar thing
rinongal/textual_inversion#35

from dreambooth-stable-diffusion.

TingTingin commented on July 21, 2024

I believe inversion is better at styles and this is better at subjects

from dreambooth-stable-diffusion.

dboshardy commented on July 21, 2024

I believe inversion is better at styles and this is better at subjects

Are there any side-by-sides to assess that yet? I do agree, textual-inversion does achieve some pretty incredible style transfer. But as @1blackbar demonstrated, it can do subjects pretty well too.

from dreambooth-stable-diffusion.

1blackbar commented on July 21, 2024

regular inversion needs embedding to be put late in a prompt if you finetuned with overfitting, and i did, that sly is 60 vectors, so prompt was - "dawn of the dead comics undead decomposed bloody zombie , painting by greg rutkowski, by wlop by artstation zombie portrait of zombie slyf as a zombie"
Embedding is a word "slyf" i use AUTOMATIC1111 repo for all this and nicolai25 repo for inversion.
I keep 2 embeddings like i wrote =- one is average likeness but great stylisation, other one is this one you see - great likeness and harder to obtain stylisation but definitely possible , as for inversion being better at something or not - no proof , not matter, anyway i ddont have a gpu to try out this fork but entire 4gb file for getting one likeness is a stretch for me so id rather wait for some faster solution like embeddings that have like 20kb in size, im building a library of fixed/repaired subjects

from dreambooth-stable-diffusion.

Maki9009 commented on July 21, 2024

i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not

from dreambooth-stable-diffusion.

nikopueringer commented on July 21, 2024

i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not

If it’s crashing at 500 samples you’re probably encountering the missing font file. Just replace it and edit the .py file that calls for it to use whatever font you replaced it with

from dreambooth-stable-diffusion.

Maki9009 commented on July 21, 2024

i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not

If it’s crashing at 500 samples you’re probably encountering the missing font file. Just replace it and edit the .py file that calls for it to use whatever font you replaced it with

yup thank you, I figured that out eventually, just last question though... once I've trained the model. Do I have to still run it on a GPU that has 40GB of VRAM or can I run this model now locally if I want? or on free colab? i'd like to test it out on defuorm diffusion.

from dreambooth-stable-diffusion.

nikopueringer commented on July 21, 2024

It will run with the same VRAM requirements as the regular model!

from dreambooth-stable-diffusion.

nicolai256 commented on July 21, 2024

Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.

I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.

maybe using this method on top of yours might be good for distribution vram without errors, with this method i can generate 2304x2304px images on my 3090, not sure if only pasting those files might help dreambooth but maybe it could? it will probably be another file u'll have to apply this method to but it seems like it could work..
https://drive.google.com/drive/folders/1lqcWpHBHV_UAlaPtdfaSVwisdb2uGT8x?usp=sharing
if you could make a repo of your efforts i could try giving it a shot :)

from dreambooth-stable-diffusion.

prettydeep commented on July 21, 2024

Do the training and regularization images need to be 512x512, like the original SD model is based on?

from dreambooth-stable-diffusion.

Luvata commented on July 21, 2024

Can some of you share the trained Dreambooth checkpoint for me? I'm gonna run some tests on the finetuned weights but my current hardware is not feasible to run the current training

from dreambooth-stable-diffusion.

martin-888 commented on July 21, 2024

I managed to do a training sesh on vast.ai for ~$1, but since this looks like the hip-thread where all the cool kids are coming, I figure this might be the best spot to just post a quick few steps here for anyone who does come across this to have a bit of direction on how to go about it if you're clueless to get it going, like me (Not a full tutorial, but more like a look here to start and here are some issues and how to fix!):

prep your training and regularization data in advance

pick an instance with at least 1 A6000 (cheapest that meets the VRAM reqs, I've found - and 1 is good to start with since you might be spending more time figuring out how to set it up than actually training it). Make sure the download (and upload) speeds are decent, like >100mbps
...

I'm following your exact steps but 1 epoch on A6000 takes ~75s, is there a param to tweak or what can be wrong?

EDIT: 1 epoch = 100 iterations ~ 1s per iteration => training is done in less then 15min!

from dreambooth-stable-diffusion.

burgalon commented on July 21, 2024

We're building a service which easily allows fine-tuning and generation of images from the new fine-tuned model.
Please join our discord to further discuss fine-tune and DM me if you'd like an invite https://discord.gg/mp6QRuNN

from dreambooth-stable-diffusion.

Randy-H0 commented on July 21, 2024

Hi there! I would love to know how to run this on this site called Google Colab. I only have the Pro tier on it, which allows P100 to appear in the wild and for me to grab them, they aren't that high end, about 16 gb of Vram,

from dreambooth-stable-diffusion.

shahruk10 commented on July 21, 2024

Got decent results with mixed precision training on a NVIDIA RTX 3090 (changes made can be seen here)

It would be interesting to see some experiments with partially unfrozen models such that the finetuning can be done on GPUs with < 24 GB VRAM ... It feels like unfreezing only the the input block of the U-Net architecture (the left | in | _ | ) could work in adding new subjects since that is primarily responsible for encoding the inputs.

from dreambooth-stable-diffusion.

This is gamechanging. Wow. about dreambooth-stable-diffusion HOT 37 OPEN

Comments (37)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent