Comments (9)
This paper is about adding additional condition to an existing text-conditioned generative model. In the paper, there are many tasks mentioned (e.g. controlling generation with canny-edge, controlling generation with hough-line, controlling generation with user scribble, etc). For canny-edge input, c_f is the canny-edge image obtained from the ground truth image.
from controlnet.
so it c_f is the ground truth (canny-edge for that task) and your model learn how to make prediction with that input how you can use that model for another/test image ?
from controlnet.
so it c_f is the ground truth (canny-edge for that task) and your model learn how to make prediction with that input how you can use that model for another/test image ?
During training, there is a set of such inputs and targets ((canny_edge_1, target_image_1), (canny_edge_2, target_image_2), ... (canny_edge_N, target_image_N)), and we minimize the loss on this dataset. The ultimate goal of all neural net training is for it to generalize (meaning also work reasonably well on other test images that it may not have seen during training). Hope this helps.
from controlnet.
thanks for the answer but it's still not fully clear for me
for canny edge missing, during training the input to the model is [(canny_edge_1, target_image_1,text prompt)...()]
canny_edge_1 - is an image with only black & white edges
target_image - the image that the canny_edge image extracted from
text - prompt that describe the canny_edge_1 image (which is also the same description for the target_image)
now target_image + text area are used as "hint" to the backward process
that mean the diffusion process add noise to the canny_edge_1 ?
that also mean that for new image i can generate random noise, then add these "hint" (target_image + text) and get new canny edge to my new image & prompt
if that is true , that also mean that the function the learn how to reverse the noise should NOT get the canny_edge_1 image but the target_image which is different from the formula in the paper
i really get confuse and need some help to figure it out
from controlnet.
The diffusion process do NOT add noise to the canny edge input, it adds noise to the target image. Just like how the diffusion process also do NOT add noise to the text input. When comparing to standard SD, there is no change in how the forward and reverse diffusion process works.
What is changed is that new inputs are created (canny edge input) to influence the denoising process via the SD Unet.
The time variable, text conditioning input, and canny edge input is input to the Controlnet which is used to control/modify the SD Unet behavior.
from controlnet.
To make this easier, just imagine we replace the Unet in SD with a Unet that can take in additional input (canny edge input).
from controlnet.
Consider this image, to go from x_t to x_t-1, in normal SD, a Unet is used, and the time variable t, text conditioning data from CLIP text encoder is inputted to this Unet.
Now, with ControlNet, this Unet is modified, and this time in addition to the time variable t and text conditiioning data from CLIP text encoder, it also takes in canny edge map data.
from controlnet.
thank you for your detailed response
for making canny edge mission when you say " it adds noise to the target image."
target image is that kind of images?
or that ?
from controlnet.
The target or label image is what you want the model to predict.
from controlnet.
Related Issues (20)
- a
- How long it takes to test an image HOT 3
- Unable to install gradio=3.40.1 HOT 3
- Training a ControlNet to generate furnished room -> empty room (and vice versa). Improvement plateau...
- Comfy UI / Stable Diffusion error
- I would like a complete technical description of how controlnet works
- Unable to Connect to Hugging Face for Model Download in China Mainland HOT 4
- save checkpoint
- How do I find the size of the set epoch
- dataset include different sizes
- trainable copy also means "require_grad==False"
- Art for Web Platforms grid, various set of product items, kitchenware, cutlery, biodegradable bags, bioplastics, eco friendly, trashbag, pet waste, varied, grid, 3D objects set, plastic bags, Creating art for web platforms, often for websites, online galleries, or digital user experiences. Negative prompt: text, repetition, same, noisy, noise, glitch, grainy, blurred, pixelated, unrealistic, malformed, bad, abnormal, print art, digital art Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 1178069234, Size: 1024x1024, Model hash: 9550adb9d6, Model: nightvisionXLPhotorealisticPortrait_v0743ReleaseBakedvae, ControlNet 0: "Module: softedge_pidinet, Model: bdsqlsz_controlllite_xl_softedge [c28ff1c4], Weight: 0.5, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 0.19, Pixel Perfect: False, Control Mode: Balanced, Hr Option: Both, Save Detected Map: True", Version: v1.7.0 Error Time taken: 16.2 sec. A: 5.58 GB, R: 12.72 GB, Sys: 13.2/15.7314 GB (84.1%)
- cannot import name 'Undefined' from 'pydantic.fields' Error HOT 2
- Use openpose in Guessmode
- nvm
- Load CLIPTokenizer error HOT 1
- erorr "cond_stage_model.transformer.text_model.embeddings.position_ids" HOT 2
- training is so slow HOT 2
- ERROR: Exception in ASGI application
- ControlNet cannot refresh the model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from controlnet.