Comments (37)
I managed to do a training sesh on vast.ai for ~$1, but since this looks like the hip-thread where all the cool kids are coming, I figure this might be the best spot to just post a quick few steps here for anyone who does come across this to have a bit of direction on how to go about it if you're clueless to get it going, like me (Not a full tutorial, but more like a look here to start and here are some issues and how to fix!):
-
prep your training and regularization data in advance
-
pick an instance with at least 1 A6000 (cheapest that meets the VRAM reqs, I've found - and 1 is good to start with since you might be spending more time figuring out how to set it up than actually training it). Make sure the download (and upload) speeds are decent, like >100mbps
-
EDIT: Forgot to mention ^ when finding an instance, make sure to select a PyTorch instance config
-
go in and open a terminal sesh
-
clone this repo
git clone https://github.com/XavierXiao/Dreambooth-Stable-Diffusion.git
-
cd into the directory
cd Dreambooth-Stable-Diffusion
and make a conda environment for itconda env create -f environment.yaml
this will take a lil' while -
while that's happening, create a new terminal instance and pull down the SD EMA model, make sure you're in the project directory (Dreambooth-Stable...). Easiest way to do this is download it from hugging face
wget --http-user=USERNAME --http-password=PASSWORD https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4-full-ema.ckpt
, you should use an API key for a more secure method of doing so, but just for the ease of use for anyone unfamiliar -
While these two things are going on, you can use the time to go and upload your training/regularization data into some new subfolders in the project. Something like /Dreambooth-Stable-Diffusion/training or something
-
when the conda environment is set up, initialize conda with
conda init bash
, then reset the terminal withreset
, or create a new terminal session -
navigate to the project directory again (Dreambooth-Stable...) if you aren't there already, and activate the environment with
conda activate ldx
< or whatever the environment was called -
you should be ready to train!
python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume sd-v1-4-full-ema.ckpt -n <whatever you want this training to be called> --gpus 0, --data_root <the relative path to your training images> --reg_data_root <the relative path to your regularization images> --class_word <the word you used to get for regularization data>
-
^ Note that this will map your training to the default sks prompt, so go change that in the
personalized.py
file if you want -
let it roll, should be a bit over 1 it/s on 1 A6000, if it crashes after 500 or so steps, you'd probably be hitting the font error as @nikopueringer mentioned above ^^ so follow those instructions
-
After that's all done, all that's left is to download your model. I've run into issues with the vast.ai frontend and can't actually navigate into the /logs/xx/checkpoints folder, so if you hit this, try moving it out. In a terminal, cd into the checkpoints folder and do a
mv final.ckpt ..
to move it back a directory, so it should now be selectable and you can download it! -
alternatively, if you have a bad net connection, you could upload it to google drive to save some time running the instance - look into the 'gdrive' github repo to install this on the ubuntu cli
I might have gotten a couple things wrong, this was mostly from memory and notes of my trial-and-error approach, so if it doesn't work, I probably screwed something up in writing this, but hopefully it puts you on the right track if you get stuck. Or you know, don't find or do this at all, so the pricing of the instances don't go up and I can keep training for cheap 👀
from dreambooth-stable-diffusion.
Wow. I am not familiar with how to write the best prompt for SD (or any text-to-image model), so even myself cannot generate the cool things you did. I try to train with some images of newfoundland dogs (my profile pic) and use exactly the same zombie prompt, and I also got some interesting results. Prompt is so important!
from dreambooth-stable-diffusion.
Just a follow-up to the guide provided by @Oscerlot ...
If using Vast.ai, make sure to get an instance with pytorch, A6000 gpu, and 100GB drive space (in case you want to generate multiple models).
-
I used ~100 training images and ~300 regularization images. All were 512x512.
-
I modified the /Dreambooth-Stable-Diffusion/configs/stable-diffusion/v1-finetune_unfrozen.yaml file to allow for 4000 steps (line 120). The main.py saves a last.ckpt every 500 steps, so for each save I just moved last.ckpt to <#steps>.ckpt so I could try multiple models.
-
I added "--no-test" to the main.py training command to prevent this issue.
-
Before creating the conda env, I did:
a)apt -y install gcc
(this was missing on my first attempt, and caused a failed pip install)
b)conda update -n base conda
(just to make sure conda is up to date) -
I ran the following within the Dreambooth directory to prevent the font error referenced above by Niko:
mkdir data/
wget https://github.com/prawnpdf/prawn/raw/master/data/fonts/DejaVuSans.ttf -P data/
- Training ran well. But I could not directly download the ckpt file (~12GB) for some reason, so I moved the ckpt file to the root directory and downloaded it via scp. To set up ssh for scp just follow the directions provided by vastai when you click the terminal button on your specific instance.
EDIT 9/20: Before downloading, you can prune your ckpt down to 2-3GB by using this script.
On your instance...
mv /workspace/Dreambooth-Stable-Diffusion/logs/<training folder>/checkpoints/last.ckpt /workspace/Dreambooth-Stable-Diffusion/last.ckpt
then on your local machine...
scp -P <port#> [email protected]:/workspace/Dreambooth-Stable-Diffusion/last.ckpt <local folder>
Below are some sample output images.
Thoughts:
- The results highly depend on using a prompt with "a photo of sks man" rather than "sks man" alone.
- Separating "a photo of sks man" by a comma from the rest of your prompt tends to produce better results.
- If your training images are only headshot selfies, which mine were, then producing wide shot images have a poor likeness quality.
- I tested multiple models, and the difference beyond 1000 steps is negligible.
- I found that the lower the guidance value was, the better the image style but the lower the likeness. I found a nice balance at ~7. Although, I could never get a good zombie result using Niko's prompt regardless of the guidance value. Maybe Niko has a natural zombieness about him :)
from dreambooth-stable-diffusion.
Happy to share that it now runs in just 18 GB VRAM and even 2 times faster. Hoping to get below 16 GB in a day or two.
More details here: #35
from dreambooth-stable-diffusion.
Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.
I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.
from dreambooth-stable-diffusion.
Ha, those dog pictures are amazing! That’s so awesome.
truly, the dreambooth method implemented here is next level. It’s the beginning of a huge technological shift, and a solution to one of the key problems preventing image generators from being a useful tool. I hope development continues!
from dreambooth-stable-diffusion.
@Desm0nt The saying that comes to mind is "Beware trivial inconveniences". By which I mean adoption of two competing things is heavily influenced by which one is more accessible.
Being able to set up the repo on a cloud compute service is a nontrivial amount of additional time and know-how needed compared to clicking "run all cells" in a colab notebook. Not to mention (correct me if I'm wrong) this technique involves modifying the weights of the model itself, while textual inversion produces embeddings that can be shared independently of the model, which makes it possible for the community to build repositories of object and style "plugins" like we see with huggingface's sd-concepts. I'd argue the collaborative nature of sd-concepts has the potential to be an even bigger gamechanger.
from dreambooth-stable-diffusion.
Also just to confirm in case someone is wondering, I haven't had any luck fine-tuning multiple concepts on the newly trained models. So if you train a concept, then train another concept on that newly trained model, it will combine them, even with a different identifier.
I haven't checked if the identifier is linked to the seed or not, but I will when I get the chance. Hopefully it's possible because my disk space only has so much 🙂.
from dreambooth-stable-diffusion.
Agree with ExponentialML. We should generate more reg images, 8 seems to be too small. I have updated the readme.
from dreambooth-stable-diffusion.
@hopibel Jupiter notebook on vast.ai works the same way as in colab =)
Textual inversion is good and really very comfortable compare to this solution, but it actually can't add knowledge of an object to the model. It's just add non-textual description that can produce a similar to required object thing with use of existing model knowledge. If model don't have enough knowledge to produce it accuratly - it will be slightly (or very much) different from required object. And with 3-5 samples it's create a description, that try to produce exactly the same object as on samples.
But with Dreambooth you actually fine tune the model. With 100+ samples the model actually learns information about the object in different conditions (not just a description of something similar), which allows it to recreate exactly the object we are looking for (if we want - even as photorealistically close to the images from the training sample as possible). This makes it possible to fully stylize the object and substitute it in different conditions without fear of losing features and similarity.
The difference between Dreambooth and textual inversion as the difference between the real knowledge in the artist's style in the model (which allows you to apply it to any query) and the handpicked combinations of descriptions that give an apparently similar style under certain conditions, but lose similarity under other.
It's a real true finetune of the model, but it doesn't require such huge amounts of data and takes a lot less time to learn new concepts without the risk of screwing up the generalization of the model.
The only significant drawback - the resulting model weighs 12.4 gb instead of the 4 gb of the original model, and my knowledge is not enough to somehow compress it before downloading from the cloud machine.
from dreambooth-stable-diffusion.
Some similar results here. Thank you for the help setting it up, Niko!
Best I got with traditional textual encoding:
And some prompts (older model that wasn't as faithful):
.
Some more info here (Stable discord).
from dreambooth-stable-diffusion.
Second this, I've been getting some great results as well. Customization and workflow integration will be the killer features of open source image generation models.
I wonder if there is an efficient way to save and use the newly learned concepts, similar to the embedding files produced by textual inversion (which I realize wouldn't be directly applicable here).
from dreambooth-stable-diffusion.
Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.
I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.
It would be absolutely amazing to see this running on a 3090.
from dreambooth-stable-diffusion.
Also just to confirm in case someone is wondering, I haven't had any luck fine-tuning multiple concepts on the newly trained models. So if you train a concept, then train another concept on that newly trained model, it will combine them, even with a different identifier.
That's curious - I naively trained a different concept (also different class id) on top of the first model I fine-tuned and that worked out pretty well. The model can generate images from both concepts individually, though it tends to blend them if used together in a prompt.
Edit: I did notice one thing that may be related: running the prompt that generates the regularization images with the first- and second-generation fine-tuned models visibly degrades/collapses the results. Will make a separate issue.
from dreambooth-stable-diffusion.
WEll i had different approach about this, as you know sly stallone is looking crap in SD so.. i overfit his face with SD a lot, and i laso trained his regular so-so likeness default setting embedding, i use them both, the worse likeness one is to change a style, and the high llikeness one overfit embedding is to bring back his likeness
So, this is regular tex inversion, you can make it work if you really want to .
My take on is is that you can change a style with heavy overfitting but its harder but you cant do caricatured styles too much, more like classic painters or comic book styles that dont distort faces too much.
It takes hela GPU to train the repo were on , im sure the code works much better as well, but regular texinversion works pretty well if youre willing to go extra steps
Some outputs from sd with texture inversion ( not dreambooth)
Advantage of this is filesize !!! This is huge deal for me at least vs dreambooth way.
from dreambooth-stable-diffusion.
OK You might say but hes already in the SD ckpt file , right ? Yeah he is so heres my mom when she was 20yrs old, regular texinversion as well
Overal what would impress me the most would be a code that would let us finetune robocop with his suit and resynthesize the suit perfectly, i dont hink SD architecture is capable tho.There is ironman suit trained nicely in SD but all of them are mutations of the suits but its not so jarring cause movies had a lot of revisions.
Cant wait for an episode on SD !
from dreambooth-stable-diffusion.
Gamechanging or not, it's unlikely to catch on if hobbyists can't play around with it, so the fact that this technique requires a workstation class gpu is a huge downside :P
from dreambooth-stable-diffusion.
@nikopueringer what did you use for regularization on your own photo training?
from dreambooth-stable-diffusion.
Gamechanging or not, it's unlikely to catch on if hobbyists can't play around with it, so the fact that this technique requires a workstation class gpu is a huge downside :P
It's require ~1 hour (for training and download tuned model) of Tesla A40/A6000 on Vast.ai with 0.4-0.8$ investment. It's really cheap. And then you can run this model in any colab or any local repo on 6+ Gb VRAM as you usual do with original model.
from dreambooth-stable-diffusion.
@Desm0nt Interesting, I was wondering how Dreambooth would fare with general style transfer. The Google page advertises it as "subject-driven generation" but, based on your comment, it should outperform Textual Inversion even at abstract tasks.
Right now I'm particularly interested in seeing if it's possible to create good pixel art in SD via img2img. It's already somewhat possible with complex prompt engineering, but the results are inconsistent.
I've tried finetuning a collection of 14 pixelized character portraits in textual inversion, but even after 50k iterations, the style transfer is a complete mess.
Any successful examples of something like this in Dreambooth? Doesn't have to be pixel art.
from dreambooth-stable-diffusion.
I naively trained a different concept (also different class id) on top of the first model I fine-tuned
What was the size of the model at the end of both training runs?
from dreambooth-stable-diffusion.
@nikopueringer what did you use for regularization on your own photo training?
Just 10 pictures or so generated from “man” as my prompt
from dreambooth-stable-diffusion.
For more examples you can read here about oinversion guys, dont want to hijack this repo as its something else but prettyu much doing similar thing
rinongal/textual_inversion#35
from dreambooth-stable-diffusion.
I believe inversion is better at styles and this is better at subjects
from dreambooth-stable-diffusion.
I believe inversion is better at styles and this is better at subjects
Are there any side-by-sides to assess that yet? I do agree, textual-inversion does achieve some pretty incredible style transfer. But as @1blackbar demonstrated, it can do subjects pretty well too.
from dreambooth-stable-diffusion.
regular inversion needs embedding to be put late in a prompt if you finetuned with overfitting, and i did, that sly is 60 vectors, so prompt was - "dawn of the dead comics undead decomposed bloody zombie , painting by greg rutkowski, by wlop by artstation zombie portrait of zombie slyf as a zombie"
Embedding is a word "slyf" i use AUTOMATIC1111 repo for all this and nicolai25 repo for inversion.
I keep 2 embeddings like i wrote =- one is average likeness but great stylisation, other one is this one you see - great likeness and harder to obtain stylisation but definitely possible , as for inversion being better at something or not - no proof , not matter, anyway i ddont have a gpu to try out this fork but entire 4gb file for getting one likeness is a stretch for me so id rather wait for some faster solution like embeddings that have like 20kb in size, im building a library of fixed/repaired subjects
from dreambooth-stable-diffusion.
i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not
from dreambooth-stable-diffusion.
i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not
If it’s crashing at 500 samples you’re probably encountering the missing font file. Just replace it and edit the .py file that calls for it to use whatever font you replaced it with
from dreambooth-stable-diffusion.
i dont know mine fails at OSError: cannot open resource... it always fails on 59% after it already did a first 100%... so like 511 samples and it crashes after that. i hope it works though, currently downloading the model maybe it worked maybe not
If it’s crashing at 500 samples you’re probably encountering the missing font file. Just replace it and edit the .py file that calls for it to use whatever font you replaced it with
yup thank you, I figured that out eventually, just last question though... once I've trained the model. Do I have to still run it on a GPU that has 40GB of VRAM or can I run this model now locally if I want? or on free colab? i'd like to test it out on defuorm diffusion.
from dreambooth-stable-diffusion.
It will run with the same VRAM requirements as the regular model!
from dreambooth-stable-diffusion.
Nice results! I've been running some experiments as well. I've upped the class images from the suggested 8 to about 100. It seems to generalize better.
I almost got it to work on a 3090 by trying to train the model using DeepSpeed ZeRO-3 Offload, but it seems like there would be a bit more grunt work (casting tensors, conversions, model splitting, etc.) than just changing the training method.
maybe using this method on top of yours might be good for distribution vram without errors, with this method i can generate 2304x2304px images on my 3090, not sure if only pasting those files might help dreambooth but maybe it could? it will probably be another file u'll have to apply this method to but it seems like it could work..
https://drive.google.com/drive/folders/1lqcWpHBHV_UAlaPtdfaSVwisdb2uGT8x?usp=sharing
if you could make a repo of your efforts i could try giving it a shot :)
from dreambooth-stable-diffusion.
Do the training and regularization images need to be 512x512, like the original SD model is based on?
from dreambooth-stable-diffusion.
Can some of you share the trained Dreambooth checkpoint for me? I'm gonna run some tests on the finetuned weights but my current hardware is not feasible to run the current training
from dreambooth-stable-diffusion.
I managed to do a training sesh on vast.ai for ~$1, but since this looks like the hip-thread where all the cool kids are coming, I figure this might be the best spot to just post a quick few steps here for anyone who does come across this to have a bit of direction on how to go about it if you're clueless to get it going, like me (Not a full tutorial, but more like a look here to start and here are some issues and how to fix!):
- prep your training and regularization data in advance
- pick an instance with at least 1 A6000 (cheapest that meets the VRAM reqs, I've found - and 1 is good to start with since you might be spending more time figuring out how to set it up than actually training it). Make sure the download (and upload) speeds are decent, like >100mbps
...
I'm following your exact steps but 1 epoch on A6000 takes ~75s, is there a param to tweak or what can be wrong?
EDIT: 1 epoch = 100 iterations ~ 1s per iteration => training is done in less then 15min!
from dreambooth-stable-diffusion.
We're building a service which easily allows fine-tuning and generation of images from the new fine-tuned model.
Please join our discord to further discuss fine-tune and DM me if you'd like an invite https://discord.gg/mp6QRuNN
from dreambooth-stable-diffusion.
Hi there! I would love to know how to run this on this site called Google Colab. I only have the Pro tier on it, which allows P100 to appear in the wild and for me to grab them, they aren't that high end, about 16 gb of Vram,
from dreambooth-stable-diffusion.
Got decent results with mixed precision training on a NVIDIA RTX 3090 (changes made can be seen here)
It would be interesting to see some experiments with partially unfrozen models such that the finetuning can be done on GPUs with < 24 GB VRAM ... It feels like unfreezing only the the input block of the U-Net architecture (the left |
in | _ |
) could work in adding new subjects since that is primarily responsible for encoding the inputs.
from dreambooth-stable-diffusion.
Related Issues (20)
- Interface changed for add_argparse_args() of lightning.Trainer HOT 1
- RuntimeError HOT 4
- AttributeError: module 'torch.linalg' has no attribute 'solve'
- Is there any method for loop t-step denoising to restore images and parallel speed up in stable diffusion?
- .
- This repo has many problem on windows
- cuda out of memory on RTX 24gb 3090 HOT 4
- ERROR: Failed building wheel for dlib
- Nothing Habben when Traning
- How to use DreamBooth for unconditional image synthesis.
- Questions about parameters
- ERROR: huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name':
- Implementation of metrics in the Dreambooth paper
- RuntimeError: Error(s) in loading state_dict for LatentDiffusion: size mismatch
- Unable to train Dreambooth on Mac M1
- Dreambooth training with image captions HOT 1
- Size of the trained checkpoint (ckpt) file HOT 1
- Support for inpainting training for dreambooth?
- TypeError: __init__() missing 1 required positional argument: 'personalization_config' HOT 2
- 支持多GPU训练吗 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dreambooth-stable-diffusion.