lllyasviel / controlnet Goto Github PK
View Code? Open in Web Editor NEWLet us control diffusion models!
License: Apache License 2.0
Let us control diffusion models!
License: Apache License 2.0
The tutorial_train.py script and documentation are fantastic.
One thing that is not clear is if this is automatically updating the control_sd15_ini.ckpt
file, or if the script needs to be altered to save a new checkpoint.
is there google collab version ??
Hi,
When I try to run the batch I have this error after the first image :
AttributeError: 'numpy.ndarray' object has no attribute 'save'
I think it try to save the second image (sketch) but don't find the save information.
Can you help me ?
Thanks
Will a version of GitHub Actions be created?
https://github.com/lllyasviel/ControlNet/actions
on RTX 3060 currently testing bird and nothing appears after 200 seconds just gradio error :/
i
Is it possible to apply this method to a custom SD (1.5) model without using a dedicated training set?
I would like to use your ControlNet + the canny edge detection but with a custom fine-tuned SD 1.5 model.
Is that possible without using a training set?
If not, how would one get a model with the canny edge detection, what dataset would you use? Generating the dataset with the annotator (like in gradio_annotator.py)?
Thanks a lot and keep up this amazing work!
I'm having trouble getting the Gradio app to load. Most likely user error but I can't figure out my problem. I followed the instructions on the Readme. I downloaded the models and detectors from Huggingface and put them in the correct folders. But no matter what python file I try to run, I get a similar error. I'm trying on Windows 11.
Any ideas on what I'm doing wrong? Here is the error:
(control) C:\Users\BeauCarnes\Documents\ControlNet-main>python gradio_scribble2image.py
logging improved.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Traceback (most recent call last):
File "gradio_scribble2image.py", line 18, in <module>
model.load_state_dict(load_state_dict('./models/control_sd15_scribble.pth', location='cpu'))
File "C:\Users\BeauCarnes\Documents\ControlNet-main\cldm\model.py", line 18, in load_state_dict
state_dict = get_state_dict(torch.load(ckpt_path, map_location=torch.device(location)))
File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
Hi, thank you for the code, it works flawlessly.
I have one question though... I am trying to add more control to the network so it can be controlled with two images instead of one. I am doing it in a way that I add 3 new channels to the control image and then dstack two images going to the training. Is it the right way or do you have any better approach in mind?
Hi,
Can this be changed to allow the use of other posture estimation libraries that are commercially available, such as tf-pose-estimation or PoseNet? This would require invoking tf-pose-estimation or PoseNet and converting its output to JSON in the same format as Openpose.
Thanks.
Just curious, if it helped the convergence and helped with the gradual adaptation on the new dataset, I would understand it.
Do you plan to make it available? I read it's trained with SD2.1 on your paper
how do we load the same image into controlnet that is loaded into img2img? So we dont' have to manually load it twice each time
In the file ControlNet/annotator/uniformer/init.py, the following are not relative paths:
checkpoint_file = "annotator/ckpts/upernet_global_small.pth"
config_file = 'annotator/uniformer/exp/upernet_global_small/config.py'
For a lot of use cases, this means that it will try to find those files from the root directory, which causes errors
when image input is not square size I tried to change canvas ratio or tried to generated Image in portrait but seem like it's not working right now? or I do something wrong?
RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 8.00 GiB total capacity; 7.14 GiB already allocated; 0 bytes free; 7.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
What a wonderful work! I'm thinking about how this model merge with Dreambooth. I mean, how to train like Dreambooth style, just the same as vanilla stable diffusion is fine?
Hi! Thanks for the great work on ControlNet, it is no less than a game changer. Upon closer look try to port it over to Draw Things, I find that because we duplicate parameters, we also have a time embedding passed into the ControlNet. This requires us to evaluate the network at each denoising step. If we can safely remove the time embedding, we don't need to do that and the evaluation can be faster.
Just wondering if you considered that and what's the thought process. Thanks!
I am training a custom control model using depth maps. In the image log all of the "control images" are being saved with the lower half of color values (less than 128 from the .png, less than 0.0 in torch space) clipped to black.
In this example the left is the source depth map, and the right is the output "control image" log.
All of the color images (reconstruction and sample) come out as proper full color images.
Is this just a bug in how log images are saved? Or is there something wrong with how the control images are being converted for training?
I am using the provided unmodified tutorial_dataset.py
and tutorial_train.py
scripts.
Are you guys planning on eventually training an openpose model, including handpose data?
https://github.com/Hzzone/pytorch-openpose#hand-pose-estimation
I feel like if we had the openpose model usable with hand data then stable diffusion will finally be able to consistently reproduce hands/fingers.
I'm assuming the biggest hurdle will be obtaining enough annotated data to produce such a model in the first place though.
There are a lot of scripts in this repo, so a way to keep track of all of them is needed.
Each script should save outputs to it's own folder like how AUTOMATIC1111 webui saves images is necessary for hundreds/thousands of outputs generated. E.g. When using "gradio_canny2image.py", it should be saved in an output folder named something like "canny2image".
Also, the way AUTOMATIC1111 webui names the image's filenames in the output folder would be much more helpful determining how the output was generated. The settings used to generate the image are also embedded in the generated PNG file which include the prompt, negative prompt, steps, sampler, CFG scale, seed, size, and model hash.
hello, i really appreciate this interesting work. i want to know whether to train the sdencoder layer in controlnet? or only enable the gradient calculation of sdencoder layer but keep the parameters of sdencoder layer unchanged?
Is ControlNet compatible with MacOS.
I could install it, and see it in the webUI, but when I use it, I get following error:
Loading preprocessor: canny, model: control_sd15_canny [fef5e48e]
Loaded state_dict from [/Users/koen/Documents/SD/stable-diffusion-webui/extensions/sd-webui-controlnet/models/control_sd15_canny.pth]
ControlNet model control_sd15_canny [fef5e48e] loaded.
0%| | 0/16 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort ./webui.sh
koen@Koens-MacBook-Pro-2 stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Shortly after "Python quit unexpectedly"
Koen
As the subject says. Is it possible to install this in venv instead of using Conda? I don't like all the virtual environments that is created on C: all the time. It slowly fills upp C: instead of just filling up the folder where the program is run. Venv is so much easier to handle
Python version: 3.10
MMCV version: 1.7.1
System: windows 10
Error content:
File "A:\ControlNet-main\annotator\uniformer\mmcv\utils\ext_loader.py", line 15, in load_ext assert hasattr(ext, fun), f'{fun} miss in module {name}' AssertionError: top_pool_forward miss in module _ext
When will ControlNet be published with Anime Line Drawing?
@lllyasviel
Hi Love this looks fantastic. I knew there must be a way of combining open pose with diffusion.
Had it working great on a previous commit.
Rebuilt environment and a fresh pull and re downloaded models
Now if fails after
Loaded model config from [./models/cldm_v15.yaml]
Killed
any help?
Thanks!
Title says all, just wondering if there's a way to do it (I'm only familiar with diffusers way)
Hi, I have a question regarding the additional retraining of your model. Imagine that I want to reuse segmentation conditioning and combine it together with depth estimation conditioning. Is this achievable?
btw, you should definitely make this project as automatic1111 sd webui plugin and integrate it into diffusers library. It will be a hit!
-1 seed not working :/
Hey,
Thanks for your nice work. Scrilbble works really well for generation. But there are a lot of scenarios where we want the input image to contain colors and strokes and we want sd to process it. For example this image.
It has scribble and the color as well, and the model should take input the color and the the sketch as well, but current implementation does not consider the color and only consider sketches and hence input color is not being considered.
Example is above.
If you can train a model that takes the colored scribbled and produces results according to it would be perfect.
Thanks
For now, pose2image requires an image of a real person. If the skeleton cannot be manipulated due to a problem with the Gradion UI, I wonder if it is possible to create pose data using another program and import it instead.
There are developers who are making something similar using Blender.
https://twitter.com/toni_nimono/status/1624642604666339329
https://twitter.com/toni_nimono/status/1624797906799833088
Originally posted by lllyasviel February 12, 2023
We plan to train some models with "double controls", use two concat control maps and we are considering using images with holes as the second control map. This will lead to some model like "depth-aware inpainting" or "canny-edge-aware inpainting". Please also let us know if you have good suggestions.
This is a re-post. Please go to disscussion for disscussion.
In the paper, there is the masked diffusion example (Fig. 16
). I see that in the sampling functions the (eg. ddim_sampling
function) mask image and the initial image (x0) are also input parameters but their default values are None
. I couldn't make it work when I want to use a mask image and an initial image. Please provide an example script of masked diffusion usage. Specifically, how do we prepare the x0
, x_T
and mask
?
Just want to say thank you for creating so many helpful examples, including data processing, training guide, etc.
I found myself smiling while reading train.md - it's so helpful to walk through an example like you did, and e.g. include diagrams to show the effects of options like only_mid_control
. So much empathy for the reader :)
Thank you 🙏 🙏 🙏
Great project and it is revolutionary for SD. I've modified and added the project to my own application wrapped in a flask server. It works pretty well but I would like to be able to use models without merging them like the A1111 extension and also the control weight is a much needed option as well. Being able to set the denoising value would be good too.
We really need batch size or infinite generation with save output in a folder
i wish they were ready now :/ i need to generate hundreds of images asap to generate a good thumbnail :d
Can i use controlNet and stable diffusion on a smart phone?
i didn't fully understand how the loss is calculated
regular diffusion models takes input image adding noise and during backward stage we learn how to undo that noise
the loss calculate base on how good we learn that noise for every stage in T
so what i don't understand is what the part of the target image ? ( we add the noise to input image )
that is what says in the paper
I also not sure what is it "task-specific conditions cf" in the loss , is it the target image ?
thanks
Can You guys release just weights for controlling ? In webuit it pretty much ignores builting 1.5 fomr your models and is using our own so the issue is it loads SD1.5 twice and wasting VRAM...
I think weights with just control withoyut additional SD weights would still work, but would need to test it first...
Or is there a way i can extract just control weights on my own and reduce fiesize ?
Hi, curious how one would use an in-painting mask while also using the depth or posing control. I will update this if I figure it out.
It seems that I already used many tricks but training still OOM for 8GB gpu. But inference is good now.
This is strange because I know some textural inversion or dreambooth can be trained on 8GB.
What is the secrect of Automatic1111's optimization? Although xformers may help a bit, the currect sliced attention should require even smaller mem than xformers.
Does it make sence to move text encoder and vae outside gpu when training?
The train documentation uses a single image for control. Even if the control is a black and white line image it seems to be a 3 channel RGB image.
Is it possible to use 4 or more channels (RGBA or concatenating multiple images) for control? Or is the 3 channel a limitation of the model?
(control) F:\ControlNet-main>python F:\ControlNet-main\gradio_canny2image.py
logging improved.
Enabled sliced_attention.
logging improved.
Enabled clip hacks.
cuda
cuda
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_any3_openpose.pth]
Running on local URL: http://0.0.0.0:7861
To create a public link, set share=True
in launch()
.
Traceback (most recent call last):
File "F:\1\envs\control\lib\site-packages\gradio\routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 1015, in process_api
result = await self.call_function(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 833, in call_function
prediction = await anyio.to_thread.run_sync(
File "F:\1\envs\control\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "F:\ControlNet-main\gradio_canny2image.py", line 33, in process
detected_map = apply_canny(img, low_threshold, high_threshold)
NameError: name 'apply_canny' is not defined
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.