lllyasviel / controlnet Goto Github PK

View Code? Open in Web Editor NEW

28.1K 28.1K 2.6K 122.34 MB

Let us control diffusion models!

License: Apache License 2.0

Python 99.97% Shell 0.03%

controlnet's Introduction

Style2Paints Research

Lvmin Zhang (Lyumin Zhang)

controlnet's People

Contributors

Stargazers

Watchers

Forkers

ngcthuong miguilimzero ricklentz vinesmsuic jagilley cuirobert treksis colintitahi naturallydeer universewill amadeuzou coderx7 codeaudit aayushshah196 techthiyanes batrlatom polyphron shunterhotspur anilcosaran cian0 c00renut jeffery9707 camenduru dmingod replicate cheremnov marcus-arcadius ella77 imwildcat-archived z14git furkangozukara elacosse scarbain omsrisagar felixzhang7 w0lramd yutong-zhou-cv chi2nagisa ymzlygw aerovfx sadelcri barleyj21 hubin858130 jazz161 mili-inch tazlin iuriimattos2 undercontroller igor772 eltociear trauter1 lortc mbrukman canonical-llc nicachris76 enternalsaga jaedukseo kandy22 billschumacher dnasdw hologerry 06juliamartinez06 billfei nikaidou100 sarvex araimono samuelpietri dalungswood larthas mkhennoussi efewijum kimgeonung hadryan marcojoao zoombapup akankushjnvku mrcodechef mhussar alexanderdaly thalesbonini backnotprop seominkey nanaya081 kastnerkyle yangsenwxy visual-synthesizer rarhs gokitau jjandnn lixinyicqu alfredplpl xuhui1994 arifsaeed hertmis7 saltaccount zeenatthawut gfcprogramer hmy626 pravinshahi0007 ingeniousfrog

controlnet's Issues

tutorial_train.py save checkpoint

The tutorial_train.py script and documentation are fantastic.

One thing that is not clear is if this is automatically updating the control_sd15_ini.ckpt file, or if the script needs to be altered to save a new checkpoint.

is there google collab version ??

img2img Batch error

Hi,
When I try to run the batch I have this error after the first image :
AttributeError: 'numpy.ndarray' object has no attribute 'save'

I think it try to save the second image (sketch) but don't find the save information.

Can you help me ?
Thanks

Will a version of GitHub Actions be created?

Will a version of GitHub Actions be created?
https://github.com/lllyasviel/ControlNet/actions

please add xformers and also progress

on RTX 3060 currently testing bird and nothing appears after 200 seconds just gradio error :/

Custom SD model without extra trainingset

Is it possible to apply this method to a custom SD (1.5) model without using a dedicated training set?
I would like to use your ControlNet + the canny edge detection but with a custom fine-tuned SD 1.5 model.
Is that possible without using a training set?
If not, how would one get a model with the canny edge detection, what dataset would you use? Generating the dataset with the annotator (like in gradio_annotator.py)?

Thanks a lot and keep up this amazing work!

Help w/ debugging Issues with Scribbles

I've not been able to get high quality results using scribble2image. Any ideas on where I should look?

Model: control_sd15_scribble.pth
Sampler: DDIM, Steps: 30
Resolution: 512
Prompt: photo of an owl

how to use in google colab?

Can't get gradio app to load: pickle.UnpicklingError: invalid load key, 'v'

I'm having trouble getting the Gradio app to load. Most likely user error but I can't figure out my problem. I followed the instructions on the Readme. I downloaded the models and detectors from Huggingface and put them in the correct folders. But no matter what python file I try to run, I get a similar error. I'm trying on Windows 11.

Any ideas on what I'm doing wrong? Here is the error:

(control) C:\Users\BeauCarnes\Documents\ControlNet-main>python gradio_scribble2image.py
logging improved.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Traceback (most recent call last):
  File "gradio_scribble2image.py", line 18, in <module>
    model.load_state_dict(load_state_dict('./models/control_sd15_scribble.pth', location='cpu'))
  File "C:\Users\BeauCarnes\Documents\ControlNet-main\cldm\model.py", line 18, in load_state_dict
    state_dict = get_state_dict(torch.load(ckpt_path, map_location=torch.device(location)))
  File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

correct procedure in case of more conditioning/control

Hi, thank you for the code, it works flawlessly.
I have one question though... I am trying to add more control to the network so it can be controlled with two images instead of one. I am doing it in a way that I add 3 new channels to the control image and then dstack two images going to the training. Is it the right way or do you have any better approach in mind?

[Request] Enables the use of other posture estimation libraries

Hi,

Can this be changed to allow the use of other posture estimation libraries that are commercially available, such as tf-pose-estimation or PoseNet? This would require invoking tf-pose-estimation or PoseNet and converting its output to JSON in the same format as Openpose.

Thanks.

Why init at zero for the convs?

Just curious, if it helped the convergence and helped with the gradual adaptation on the new dataset, I would understand it.

doesn't work

Openpifpf model file?

Do you plan to make it available? I read it's trained with SD2.1 on your paper

[Feature Request] Support for batch img2img?

how do we load the same image into controlnet that is loaded into img2img? So we dont' have to manually load it twice each time

Why was the result so terrible?

Config file path for uniformer annotator is hardcoded

In the file ControlNet/annotator/uniformer/init.py, the following are not relative paths:
checkpoint_file = "annotator/ckpts/upernet_global_small.pth"
config_file = 'annotator/uniformer/exp/upernet_global_small/config.py'

For a lot of use cases, this means that it will try to find those files from the root directory, which causes errors

problem when input image is not square aka 1:1 ratio size

when image input is not square size I tried to change canvas ratio or tried to generated Image in portrait but seem like it's not working right now? or I do something wrong?

Low VRAM tests (now 8GB ok, begin to solve 6gb)

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 8.00 GiB total capacity; 7.14 GiB already allocated; 0 bytes free; 7.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

merge with Dreambooth

What a wonderful work! I'm thinking about how this model merge with Dreambooth. I mean, how to train like Dreambooth style, just the same as vanilla stable diffusion is fine?

Time embedding and why it is needed in the separate network?

Hi! Thanks for the great work on ControlNet, it is no less than a game changer. Upon closer look try to port it over to Draw Things, I find that because we duplicate parameters, we also have a time embedding passed into the ControlNet. This requires us to evaluate the network at each denoising step. If we can safely remove the time embedding, we don't need to do that and the evaluation can be faster.

Just wondering if you considered that and what's the thought process. Thanks!

control depth clipping in image log

I am training a custom control model using depth maps. In the image log all of the "control images" are being saved with the lower half of color values (less than 128 from the .png, less than 0.0 in torch space) clipped to black.

In this example the left is the source depth map, and the right is the output "control image" log.

All of the color images (reconstruction and sample) come out as proper full color images.

Is this just a bug in how log images are saved? Or is there something wrong with how the control images are being converted for training?

I am using the provided unmodified tutorial_dataset.py and tutorial_train.py scripts.

Plans for an openpose model including hand pose data?

Are you guys planning on eventually training an openpose model, including handpose data?
https://github.com/Hzzone/pytorch-openpose#hand-pose-estimation

I feel like if we had the openpose model usable with hand data then stable diffusion will finally be able to consistently reproduce hands/fingers.
I'm assuming the biggest hurdle will be obtaining enough annotated data to produce such a model in the first place though.

An outputs folder is needed

There are a lot of scripts in this repo, so a way to keep track of all of them is needed.

Each script should save outputs to it's own folder like how AUTOMATIC1111 webui saves images is necessary for hundreds/thousands of outputs generated. E.g. When using "gradio_canny2image.py", it should be saved in an output folder named something like "canny2image".

Also, the way AUTOMATIC1111 webui names the image's filenames in the output folder would be much more helpful determining how the output was generated. The settings used to generate the image are also embedded in the generated PNG file which include the prompt, negative prompt, steps, sampler, CFG scale, seed, size, and model hash.

is sdencode layer in controlnet trainable？

hello, i really appreciate this interesting work. i want to know whether to train the sdencoder layer in controlnet? or only enable the gradient calculation of sdencoder layer but keep the parameters of sdencoder layer unchanged?

MacOS?

Is ControlNet compatible with MacOS.
I could install it, and see it in the webUI, but when I use it, I get following error:

Loading preprocessor: canny, model: control_sd15_canny [fef5e48e]
Loaded state_dict from [/Users/koen/Documents/SD/stable-diffusion-webui/extensions/sd-webui-controlnet/models/control_sd15_canny.pth]
ControlNet model control_sd15_canny [fef5e48e] loaded.
0%| | 0/16 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort ./webui.sh
koen@Koens-MacBook-Pro-2 stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Shortly after "Python quit unexpectedly"

Koen

Venv instead of Conda

As the subject says. Is it possible to install this in venv instead of using Conda? I don't like all the virtual environments that is created on C: all the time. It slowly fills upp C: instead of just filling up the folder where the program is run. Venv is so much easier to handle

Report an error when unsing Uniformer Segmentation

Python version: 3.10
MMCV version: 1.7.1
System: windows 10
Error content:
File "A:\ControlNet-main\annotator\uniformer\mmcv\utils\ext_loader.py", line 15, in load_ext assert hasattr(ext, fun), f'{fun} miss in module {name}' AssertionError: top_pool_forward miss in module _ext

When will ControlNet be published with Anime Line Drawing?

When will ControlNet be published with Anime Line Drawing?
@lllyasviel

Now not working was going great

Hi Love this looks fantastic. I knew there must be a way of combining open pose with diffusion.
Had it working great on a previous commit.
Rebuilt environment and a fresh pull and re downloaded models
Now if fails after
Loaded model config from [./models/cldm_v15.yaml]
Killed

any help?

在Conda的base环境中，CUDA是GPU版本的11.7，Pytorch和torch版本都为1.13.1的情况下，运行gradio_scribble2image.py失败

在Pytorch和torch以及CUDA都没有错误的情况下，且在pytorch官网显示一致的情况下，我运行gradio_scribble2image.py一直提示AssertionError:torch not compiled with cuda enabled，然后查询发现，torch.cuda.is_available())始终是False

请求帮助，希望能帮忙指出错误的原因在哪，感恩～！

Is this Style2Paints V5 paper?

Thanks!

Is it possible to load a .pt (textual inversion) file into the model?

Title says all, just wondering if there's a way to do it (I'm only familiar with diffusers way)

is it possible to reuse your conditioning?

Hi, I have a question regarding the additional retraining of your model. Imagine that I want to reuse segmentation conditioning and combine it together with depth estimation conditioning. Is this achievable?

btw, you should definitely make this project as automatic1111 sd webui plugin and integrate it into diffusers library. It will be a hit!

How to set random seed?

-1 seed not working :/

Colored Scribble input

Hey,
Thanks for your nice work. Scrilbble works really well for generation. But there are a lot of scenarios where we want the input image to contain colors and strokes and we want sd to process it. For example this image.

It has scribble and the color as well, and the model should take input the color and the the sketch as well, but current implementation does not consider the color and only consider sketches and hence input color is not being considered.

Example is above.

If you can train a model that takes the colored scribbled and produces results according to it would be perfect.

Thanks

More direct pose input for pose2image

For now, pose2image requires an image of a real person. If the skeleton cannot be manipulated due to a problem with the Gradion UI, I wonder if it is possible to create pose data using another program and import it instead.
There are developers who are making something similar using Blender.
https://twitter.com/toni_nimono/status/1624642604666339329
https://twitter.com/toni_nimono/status/1624797906799833088

[Double Control] What double-control model is most needed?

Discussed in #30

^{Originally posted by lllyasviel February 12, 2023}
We plan to train some models with "double controls", use two concat control maps and we are considering using images with holes as the second control map. This will lead to some model like "depth-aware inpainting" or "canny-edge-aware inpainting". Please also let us know if you have good suggestions.

This is a re-post. Please go to disscussion for disscussion.

Masked diffusion example

In the paper, there is the masked diffusion example (Fig. 16). I see that in the sampling functions the (eg. ddim_sampling function) mask image and the initial image (x0) are also input parameters but their default values are None. I couldn't make it work when I want to use a mask image and an initial image. Please provide an example script of masked diffusion usage. Specifically, how do we prepare the x0 , x_T and mask?

Thank you

Just want to say thank you for creating so many helpful examples, including data processing, training guide, etc.

I found myself smiling while reading train.md - it's so helpful to walk through an example like you did, and e.g. include diagrams to show the effects of options like only_mid_control. So much empathy for the reader :)

Thank you 🙏 🙏 🙏

Request: Implement as much as possible from the Mikubill repo

Great project and it is revolutionary for SD. I've modified and added the project to my own application wrapped in a flask server. It works pretty well but I would like to be able to use models without merging them like the A1111 extension and also the control weight is a much needed option as well. Being able to set the denoising value would be good too.

Can you add batch size or infinite generation?

We really need batch size or infinite generation with save output in a folder

i wish they were ready now :/ i need to generate hundreds of images asap to generate a good thumbnail :d

ControlNet

Can i use controlNet and stable diffusion on a smart phone?

How to calculate the loss

i didn't fully understand how the loss is calculated

regular diffusion models takes input image adding noise and during backward stage we learn how to undo that noise
the loss calculate base on how good we learn that noise for every stage in T

so what i don't understand is what the part of the target image ? ( we add the noise to input image )

that is what says in the paper

I also not sure what is it "task-specific conditions cf" in the loss , is it the target image ?

thanks

[REQUEST] Release of control weights without SD1.5 model built in

Can You guys release just weights for controlling ? In webuit it pretty much ignores builting 1.5 fomr your models and is using our own so the issue is it loads SD1.5 twice and wasting VRAM...
I think weights with just control withoyut additional SD weights would still work, but would need to test it first...
Or is there a way i can extract just control weights on my own and reduce fiesize ?

Use img2img mask on generation while also controlling for depth/pose?

Hi, curious how one would use an in-painting mask while also using the depth or posing control. I will update this if I figure it out.

Training still OOM on 8GB gpu.

It seems that I already used many tricks but training still OOM for 8GB gpu. But inference is good now.

This is strange because I know some textural inversion or dreambooth can be trained on 8GB.

What is the secrect of Automatic1111's optimization? Although xformers may help a bit, the currect sliced attention should require even smaller mem than xformers.

Does it make sence to move text encoder and vae outside gpu when training?

How many channels can be used in the control image?

The train documentation uses a single image for control. Even if the control is a black and white line image it seems to be a 3 channel RGB image.

Is it possible to use 4 or more channels (RGBA or concatenating multiple images) for control? Or is the 3 channel a limitation of the model?

NameError: name 'apply_canny' is not defined

(control) F:\ControlNet-main>python F:\ControlNet-main\gradio_canny2image.py
logging improved.
Enabled sliced_attention.
logging improved.
Enabled clip hacks.
cuda
cuda
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_any3_openpose.pth]
Running on local URL: http://0.0.0.0:7861

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "F:\1\envs\control\lib\site-packages\gradio\routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 1015, in process_api
result = await self.call_function(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 833, in call_function
prediction = await anyio.to_thread.run_sync(
File "F:\1\envs\control\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "F:\ControlNet-main\gradio_canny2image.py", line 33, in process
detected_map = apply_canny(img, low_threshold, high_threshold)
NameError: name 'apply_canny' is not defined