Giter Site home page Giter Site logo

controlnet's Introduction

controlnet's People

Contributors

camenduru avatar eltociear avatar lllyasviel avatar scarbain avatar sethupavan12 avatar williamyang1991 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

controlnet's Issues

tutorial_train.py save checkpoint

The tutorial_train.py script and documentation are fantastic.

One thing that is not clear is if this is automatically updating the control_sd15_ini.ckpt file, or if the script needs to be altered to save a new checkpoint.

img2img Batch error

Hi,
When I try to run the batch I have this error after the first image :
AttributeError: 'numpy.ndarray' object has no attribute 'save'

I think it try to save the second image (sketch) but don't find the save information.

Can you help me ?
Thanks

i

i

Custom SD model without extra trainingset

Is it possible to apply this method to a custom SD (1.5) model without using a dedicated training set?
I would like to use your ControlNet + the canny edge detection but with a custom fine-tuned SD 1.5 model.
Is that possible without using a training set?
If not, how would one get a model with the canny edge detection, what dataset would you use? Generating the dataset with the annotator (like in gradio_annotator.py)?

Thanks a lot and keep up this amazing work!

Help w/ debugging Issues with Scribbles

I've not been able to get high quality results using scribble2image. Any ideas on where I should look?

Model: control_sd15_scribble.pth
Sampler: DDIM, Steps: 30
Resolution: 512
Prompt: photo of an owl

WhatsApp Image 2023-02-15 at 8 49 16 AM

Can't get gradio app to load: pickle.UnpicklingError: invalid load key, 'v'

I'm having trouble getting the Gradio app to load. Most likely user error but I can't figure out my problem. I followed the instructions on the Readme. I downloaded the models and detectors from Huggingface and put them in the correct folders. But no matter what python file I try to run, I get a similar error. I'm trying on Windows 11.

Any ideas on what I'm doing wrong? Here is the error:

(control) C:\Users\BeauCarnes\Documents\ControlNet-main>python gradio_scribble2image.py
logging improved.
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Traceback (most recent call last):
  File "gradio_scribble2image.py", line 18, in <module>
    model.load_state_dict(load_state_dict('./models/control_sd15_scribble.pth', location='cpu'))
  File "C:\Users\BeauCarnes\Documents\ControlNet-main\cldm\model.py", line 18, in load_state_dict
    state_dict = get_state_dict(torch.load(ckpt_path, map_location=torch.device(location)))
  File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\BeauCarnes\miniconda3\envs\control\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

correct procedure in case of more conditioning/control

Hi, thank you for the code, it works flawlessly.
I have one question though... I am trying to add more control to the network so it can be controlled with two images instead of one. I am doing it in a way that I add 3 new channels to the control image and then dstack two images going to the training. Is it the right way or do you have any better approach in mind?

[Request] Enables the use of other posture estimation libraries

Hi,

Can this be changed to allow the use of other posture estimation libraries that are commercially available, such as tf-pose-estimation or PoseNet? This would require invoking tf-pose-estimation or PoseNet and converting its output to JSON in the same format as Openpose.

Thanks.

Why init at zero for the convs?

Just curious, if it helped the convergence and helped with the gradual adaptation on the new dataset, I would understand it.

Config file path for uniformer annotator is hardcoded

In the file ControlNet/annotator/uniformer/init.py, the following are not relative paths:
checkpoint_file = "annotator/ckpts/upernet_global_small.pth"
config_file = 'annotator/uniformer/exp/upernet_global_small/config.py'

For a lot of use cases, this means that it will try to find those files from the root directory, which causes errors

Low VRAM tests (now 8GB ok, begin to solve 6gb)

RuntimeError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 8.00 GiB total capacity; 7.14 GiB already allocated; 0 bytes free; 7.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

merge with Dreambooth

What a wonderful work! I'm thinking about how this model merge with Dreambooth. I mean, how to train like Dreambooth style, just the same as vanilla stable diffusion is fine?

Time embedding and why it is needed in the separate network?

Hi! Thanks for the great work on ControlNet, it is no less than a game changer. Upon closer look try to port it over to Draw Things, I find that because we duplicate parameters, we also have a time embedding passed into the ControlNet. This requires us to evaluate the network at each denoising step. If we can safely remove the time embedding, we don't need to do that and the evaluation can be faster.

Just wondering if you considered that and what's the thought process. Thanks!

control depth clipping in image log

I am training a custom control model using depth maps. In the image log all of the "control images" are being saved with the lower half of color values (less than 128 from the .png, less than 0.0 in torch space) clipped to black.

In this example the left is the source depth map, and the right is the output "control image" log.
depth_clipping_example

All of the color images (reconstruction and sample) come out as proper full color images.

Is this just a bug in how log images are saved? Or is there something wrong with how the control images are being converted for training?

I am using the provided unmodified tutorial_dataset.py and tutorial_train.py scripts.

An outputs folder is needed

There are a lot of scripts in this repo, so a way to keep track of all of them is needed.

Each script should save outputs to it's own folder like how AUTOMATIC1111 webui saves images is necessary for hundreds/thousands of outputs generated. E.g. When using "gradio_canny2image.py", it should be saved in an output folder named something like "canny2image".

Also, the way AUTOMATIC1111 webui names the image's filenames in the output folder would be much more helpful determining how the output was generated. The settings used to generate the image are also embedded in the generated PNG file which include the prompt, negative prompt, steps, sampler, CFG scale, seed, size, and model hash.

is sdencode layer in controlnet trainable?

hello, i really appreciate this interesting work. i want to know whether to train the sdencoder layer in controlnet? or only enable the gradient calculation of sdencoder layer but keep the parameters of sdencoder layer unchanged?

MacOS?

Is ControlNet compatible with MacOS.
I could install it, and see it in the webUI, but when I use it, I get following error:

Loading preprocessor: canny, model: control_sd15_canny [fef5e48e]
Loaded state_dict from [/Users/koen/Documents/SD/stable-diffusion-webui/extensions/sd-webui-controlnet/models/control_sd15_canny.pth]
ControlNet model control_sd15_canny [fef5e48e] loaded.
0%| | 0/16 [00:00<?, ?it/s]loc("mps_add"("(mpsFileLoc): /AppleInternal/Library/BuildRoots/a0876c02-1788-11ed-b9c4-96898e02b808/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShadersGraph/mpsgraph/MetalPerformanceShadersGraph/Core/Files/MPSGraphUtilities.mm":219:0)): error: input types 'tensor<2x1280xf32>' and 'tensor<*xf16>' are not broadcast compatible
LLVM ERROR: Failed to infer result type(s).
zsh: abort ./webui.sh
koen@Koens-MacBook-Pro-2 stable-diffusion-webui % /opt/homebrew/Cellar/[email protected]/3.10.10/Frameworks/Python.framework/Versions/3.10/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Shortly after "Python quit unexpectedly"

Koen

Venv instead of Conda

As the subject says. Is it possible to install this in venv instead of using Conda? I don't like all the virtual environments that is created on C: all the time. It slowly fills upp C: instead of just filling up the folder where the program is run. Venv is so much easier to handle

Report an error when unsing Uniformer Segmentation

Python version: 3.10
MMCV version: 1.7.1
System: windows 10
Error content:
File "A:\ControlNet-main\annotator\uniformer\mmcv\utils\ext_loader.py", line 15, in load_ext assert hasattr(ext, fun), f'{fun} miss in module {name}' AssertionError: top_pool_forward miss in module _ext

Now not working was going great

Hi Love this looks fantastic. I knew there must be a way of combining open pose with diffusion.
Had it working great on a previous commit.
Rebuilt environment and a fresh pull and re downloaded models
Now if fails after
Loaded model config from [./models/cldm_v15.yaml]
Killed

any help?

is it possible to reuse your conditioning?

Hi, I have a question regarding the additional retraining of your model. Imagine that I want to reuse segmentation conditioning and combine it together with depth estimation conditioning. Is this achievable?

btw, you should definitely make this project as automatic1111 sd webui plugin and integrate it into diffusers library. It will be a hit!

Colored Scribble input

Hey,
Thanks for your nice work. Scrilbble works really well for generation. But there are a lot of scenarios where we want the input image to contain colors and strokes and we want sd to process it. For example this image.
image
It has scribble and the color as well, and the model should take input the color and the the sketch as well, but current implementation does not consider the color and only consider sketches and hence input color is not being considered.
image
Example is above.

If you can train a model that takes the colored scribbled and produces results according to it would be perfect.

Thanks

[Double Control] What double-control model is most needed?

Discussed in #30

Originally posted by lllyasviel February 12, 2023
We plan to train some models with "double controls", use two concat control maps and we are considering using images with holes as the second control map. This will lead to some model like "depth-aware inpainting" or "canny-edge-aware inpainting". Please also let us know if you have good suggestions.

This is a re-post. Please go to disscussion for disscussion.

Masked diffusion example

In the paper, there is the masked diffusion example (Fig. 16). I see that in the sampling functions the (eg. ddim_sampling function) mask image and the initial image (x0) are also input parameters but their default values are None. I couldn't make it work when I want to use a mask image and an initial image. Please provide an example script of masked diffusion usage. Specifically, how do we prepare the x0 , x_T and mask?

Thank you

Just want to say thank you for creating so many helpful examples, including data processing, training guide, etc.

I found myself smiling while reading train.md - it's so helpful to walk through an example like you did, and e.g. include diagrams to show the effects of options like only_mid_control. So much empathy for the reader :)

Thank you 🙏 🙏 🙏

Request: Implement as much as possible from the Mikubill repo

Great project and it is revolutionary for SD. I've modified and added the project to my own application wrapped in a flask server. It works pretty well but I would like to be able to use models without merging them like the A1111 extension and also the control weight is a much needed option as well. Being able to set the denoising value would be good too.

Can you add batch size or infinite generation?

We really need batch size or infinite generation with save output in a folder

i wish they were ready now :/ i need to generate hundreds of images asap to generate a good thumbnail :d

ControlNet

Can i use controlNet and stable diffusion on a smart phone?

How to calculate the loss

i didn't fully understand how the loss is calculated

regular diffusion models takes input image adding noise and during backward stage we learn how to undo that noise
the loss calculate base on how good we learn that noise for every stage in T

so what i don't understand is what the part of the target image ? ( we add the noise to input image )

that is what says in the paper

image

I also not sure what is it "task-specific conditions cf" in the loss , is it the target image ?

thanks

[REQUEST] Release of control weights without SD1.5 model built in

Can You guys release just weights for controlling ? In webuit it pretty much ignores builting 1.5 fomr your models and is using our own so the issue is it loads SD1.5 twice and wasting VRAM...
I think weights with just control withoyut additional SD weights would still work, but would need to test it first...
Or is there a way i can extract just control weights on my own and reduce fiesize ?

Training still OOM on 8GB gpu.

It seems that I already used many tricks but training still OOM for 8GB gpu. But inference is good now.

This is strange because I know some textural inversion or dreambooth can be trained on 8GB.

What is the secrect of Automatic1111's optimization? Although xformers may help a bit, the currect sliced attention should require even smaller mem than xformers.

Does it make sence to move text encoder and vae outside gpu when training?

NameError: name 'apply_canny' is not defined

(control) F:\ControlNet-main>python F:\ControlNet-main\gradio_canny2image.py
logging improved.
Enabled sliced_attention.
logging improved.
Enabled clip hacks.
cuda
cuda
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Loaded model config from [./models/cldm_v15.yaml]
Loaded state_dict from [./models/control_any3_openpose.pth]
Running on local URL: http://0.0.0.0:7861

To create a public link, set share=True in launch().
Traceback (most recent call last):
File "F:\1\envs\control\lib\site-packages\gradio\routes.py", line 337, in run_predict
output = await app.get_blocks().process_api(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 1015, in process_api
result = await self.call_function(
File "F:\1\envs\control\lib\site-packages\gradio\blocks.py", line 833, in call_function
prediction = await anyio.to_thread.run_sync(
File "F:\1\envs\control\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "F:\1\envs\control\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "F:\ControlNet-main\gradio_canny2image.py", line 33, in process
detected_map = apply_canny(img, low_threshold, high_threshold)
NameError: name 'apply_canny' is not defined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.