tencent-ailab / ip-adapter Goto Github PK
View Code? Open in Web Editor NEWThe image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
License: Apache License 2.0
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
License: Apache License 2.0
Hi the paper is really interesting and your results kick ass. not sure how to use this in automatic 1111 though, I tried putting the models in the controlnet model folder but they weren't showing up. Any chance you can update the readme with the process? Thanks.
I notice that you provide image encoder on your own space, is it different from the models released by openai?
Hi, Dear Authors.
After reading the code, I found that you let the image features after projection (concept features) put into the Adapter layers.
However, in some relevant works (personally), eg. InstantBooth (https://arxiv.org/abs/2304.03411), Subject Diffusion (https://arxiv.org/abs/2307.11410), they inject the image token features (patch features) to the adapter layers in the UNet of SD (they use self-attn).
It seems like patch features may contain more detailed features so that the model can preserve the characteristics of input images better.
Concept features seem to have more high-level semantic info. Maybe this change can help IP-Adapter be more flexible but to some extent lose some abilities of subject/identity-driven generation abilities.
It is just my personal opinion, talks are welcome.
base_model_path = "yiffymix16_32"
image_encoder_path = "models/image_encoder/"
ip_ckpt = "models/ip-adapter_sd15.bin"
device = "cuda"
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-openpose", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained("yiffymix16_32", safety_checker=None, controlnet=controlnet, torch_dtype=torch.float16)
# the below line is causing issues
ip_model = IPAdapter(pipe, image_encoder_path, ip_ckpt, device)
pipe = pipe.to("cuda")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
I have this above code. Now when i run the pipe()
independently (without IPAdapter, for example image = pipe(...)
), it produces hazy, unnatural images like this:
But if I comment out the line ip_model = IPAdapter(pipe, image_encoder_path, ip_ckpt, device)
, the pipe.() gives proper results. any idea why?
https://colab.research.google.com/drive/1_Vtos4PRqZWAg69sC9XuSBBt6Mw_aDCz?usp=sharing
ip_model = IPAdapterPlus(pipe, image_encoder_path, ip_ckpt, device, num_tokens=16)
UnpicklingError Traceback (most recent call last)
in <cell line: 1>()
----> 1 ip_model = IPAdapterPlus(pipe, image_encoder_path, ip_ckpt, device, num_tokens=16)
3 frames
/usr/local/lib/python3.10/dist-packages/torch/serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
1031 "functionality.")
1032
-> 1033 magic_number = pickle_module.load(f, **pickle_load_args)
1034 if magic_number != MAGIC_NUMBER:
1035 raise RuntimeError("Invalid magic number; corrupt file?")
UnpicklingError: invalid load key, '<'.
Hi! Thank you for amazing work! It works like a charm!
I wonder which dataset you've used during training? Can you share more info about it? You specified in the paper that this is subset of LAION & COYO datasets. Maybe you have parameters that you've used for filtering such data? Aesthetic score threshold, p_unsafe / p_watermark, image size? And the proportion of LAION and COYO in your data
Do you think results will be different when trained on smaller dataset, let's say 1M samples? Do you think results would improve if used full resolution using variable aspect ratio bucketing instead of center crop?
Greetings,
First of all thank you for this achievement, the potential of this tool is astounding.
I have a problem with the sample code:
Following precisely the code in the Colab demo, I notice some differences between the output I get and the one shown by you in the results.
I can't understand why.
Anyone know why?
Note: I left the SD1.5 set as per code.
Hello, the plugin you made is very useful, but I used it the day before and it was fine. The next day, I made an error and many others couldn't use it. 1.5 is okay, but xl will make an error. My error prompt is
Error occurred when executing IPAdapter:
Input type (torch. FloatTensor) and weight type (torch. HalfTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
File "D: AI StableSwarmUI dlbackend comfy ComfyUI execution. py", line 151, in recursive_ Execute
Output_ Data, output_ Ui=get_ Output_ Data (obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D: AI StableSwarmUI dlbackend comfy ComfyUI execution. py", line 81, in get_ Output_ Data
Return_ Values=map_ Node_ Over_ List (obj, input_data_all, obj. JUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D: AI StableSwarmUI dlbackend comfy ComfyUI execution. py", line 74, in map_ Node_ Over_ List
Results. append (getattr (obj, func) (* * sliceDict (input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D: AI StableSwarmUI dlbackend comfy ComfyUI custom_nodes IPAda
can we try to do ipadapter face and ipadapter together?
maybe like this
hidden_states = hidden_states + self.scale1 * ip_hidden_states + self.scale2 * ip_face_hidden_states
I run tutorial_train.py
and save the related param of ‘unet’ and ‘ip-adapter_sd15.bin’. But when I load the unet param with StableDiffusionPipeline, I get the warning:
weights of the model checkpoint were not used when initializing UNet2DConditionModel:
['down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight']
I think maybe the reason about the training code with the lines:
else:
layer_name = name.split(".processor")[0]
weights = {
"to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
"to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
}
attn_procs[name] = IPAttnProcessor(hidden_size=hidden_size, cross_attention_dim=cross_attention_dim)
attn_procs[name].load_state_dict(weights)
So, I want to know why it is set like this, and how should I modify my inference code so that the code does not have such warnings?
File "/home/code/third_party/IP-Adapter/ip_adapter/attention_processor.py", line 192, in init
raise ImportError("AttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.")
ImportError: AttnProcessor2_0 requires PyTorch 2.0, to use it, please upgrade PyTorch to 2.0.
maybe I should modify below code
if is_torch2_available:
from .attention_processor import IPAttnProcessor2_0 as IPAttnProcessor, AttnProcessor2_0 as AttnProcessor, CNAttnProcessor2_0 as CNAttnProcessor
else:
from .attention_processor import IPAttnProcessor, AttnProcessor, CNAttnProcessor
to
if is_torch2_available():
First, thanks for your great job. there is question about the differences between the promt image and real-value image. could you offer some examples?
Hi, Thanks for you great works! I'm very interested in the IP-Adapter with fine-grained features, do you have the plan to release this version?
I change the controlnet demo from IPAdapter to IPAdapterPlus, while using "models/ip-adapter-plus_sd15.bin" as adapter model checkpoint. but failed in loading ip-adapter.
ip_model = IPAdapterPlus(pipe, image_encoder_path, ip_ckpt, device)
error log:
Cell In[8], line 2
1 # load ip-adapter
----> 2 ip_model = IPAdapterPlus(pipe, image_encoder_path, ip_ckpt, device)
File /mnt/data/aigc/IP-Adapter/ip_adapter/ip_adapter.py:52, in IPAdapter.__init__(self, sd_pipe, image_encoder_path, ip_ckpt, device, num_tokens)
49 # image proj model
50 self.image_proj_model = self.init_proj()
---> 52 self.load_ip_adapter()
File /mnt/data/aigc/IP-Adapter/ip_adapter/ip_adapter.py:84, in IPAdapter.load_ip_adapter(self)
82 def load_ip_adapter(self):
83 state_dict = torch.load(self.ip_ckpt, map_location="cpu")
---> 84 self.image_proj_model.load_state_dict(state_dict["image_proj"])
85 ip_layers = torch.nn.ModuleList(self.pipe.unet.attn_processors.values())
86 ip_layers.load_state_dict(state_dict["ip_adapter"])
File ~/anaconda3/envs/IP-Adapter/lib/python3.10/site-packages/torch/nn/modules/module.py:2041, in Module.load_state_dict(self, state_dict, strict)
2036 error_msgs.insert(
2037 0, 'Missing key(s) in state_dict: {}. '.format(
2038 ', '.join('"{}"'.format(k) for k in missing_keys)))
2040 if len(error_msgs) > 0:
-> 2041 raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
2042 self.__class__.__name__, "\n\t".join(error_msgs)))
2043 return _IncompatibleKeys(missing_keys, unexpected_keys)
RuntimeError: Error(s) in loading state_dict for Resampler:
size mismatch for latents: copying a param with shape torch.Size([1, 16, 768]) from checkpoint, the shape in current model is torch.Size([1, 4, 768]).
Hi,
This project is just awesome, thanks for it !
Will you release the training/finetuning code ?
Thanks
really awesome work thank guys for that
do you have a plan for supporting SDXL?
First off, congratulations on this project and thank you so much for your work!
I'm testing using your SDXL demo, and am generally getting good results. However, my use case is really for human "personalization", like DreamBooth. I've tried using your multi-model prompt with some of my images and the likeness of the face is not quite what I'd like, meaning that the face shows variation where I wish it would be more true to the original input.
Do you have any suggestions on settings or values that I could tweak to try and improve this? Or other ideas?
Again, thanks for everything!
Thanks for your brilliant work.
Do you consider releasing the model of SD 2.1 version? I
Hi, I am trying to run the ip_adapter_controlnet_demo_new.ipynb notebook, however I keep getting this error from the following line below. I also tried to download the model manually but couldn't find it on huggingface. I definately don't have a local directory with the same name.
Thank you so much for your help.
Line:
# load ip-adapter
ip_model = IPAdapter(pipe, image_encoder_path, ip_ckpt, device)
Error:
---------------------------------------------------------------------------
HFValidationError Traceback (most recent call last)
File [c:\Users\avika\anaconda3\envs\interactive_dance_thesis\lib\site-packages\transformers\configuration_utils.py:675](file:///C:/Users/avika/anaconda3/envs/interactive_dance_thesis/lib/site-packages/transformers/configuration_utils.py:675), in PretrainedConfig._get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
673 try:
674 # Load from local folder or from cache or download from model Hub and cache
--> 675 resolved_config_file = cached_file(
676 pretrained_model_name_or_path,
677 configuration_file,
678 cache_dir=cache_dir,
679 force_download=force_download,
680 proxies=proxies,
681 resume_download=resume_download,
682 local_files_only=local_files_only,
683 token=token,
684 user_agent=user_agent,
685 revision=revision,
686 subfolder=subfolder,
687 _commit_hash=commit_hash,
688 )
689 commit_hash = extract_commit_hash(resolved_config_file, commit_hash)
File [c:\Users\avika\anaconda3\envs\interactive_dance_thesis\lib\site-packages\transformers\utils\hub.py:428](file:///C:/Users/avika/anaconda3/envs/interactive_dance_thesis/lib/site-packages/transformers/utils/hub.py:428), in cached_file(path_or_repo_id, filename, cache_dir, force_download, resume_download, proxies, token, revision, local_files_only, subfolder, repo_type, user_agent, _raise_exceptions_for_missing_entries, _raise_exceptions_for_connection_errors, _commit_hash, **deprecated_kwargs)
426 try:
427 # Load from URL or cache if already cached
--> 428 resolved_file = hf_hub_download(
...
703 try:
704 # Load config dict
705 config_dict = cls._dict_from_json_file(resolved_config_file)
OSError: Can't load the configuration of 'models/image_encoder/'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'models/image_encoder/' is the correct path to a directory containing a config.json file
I'm just trying to understand how IP Adapter does these powerful manipulations.
So we are encoding the image into embeddings, combining them with the prompt's embeddings and then creating an image based on that. I see we are changing the unet, but I don't understand how.
I also ran into issues using embeddings directly instead of passing in the prompt as a string. Here are the 2 doubts in more detail:
What does this do:
input_ids = pipe.tokenizer(prompt, return_tensors="pt").input_ids
I saw this in the set_ip_adapter
function. Is this the one that's changing the vae decoder? What does this do, and am I incorrect in assessing this?
I changed some function inputs, in order to use prompt embeds directly, for example doing this:
input_ids = pipe.tokenizer(prompt, return_tensors="pt").input_ids
for i in range(0, input_ids.shape[-1], max_length):
concat_embeds.append(pipe.text_encoder(input_ids[:, i: i + max_length])[0])
prompt_embeds = torch.cat(concat_embeds, dim=1)
Tried inputting this, and commenting out the following in IP Adapter's generate(...)
function.
prompt_embeds = self.pipe._encode_prompt(...)
This however gave incorrect results in the output image. What am I doing wrong?
I know this is out of scope, but is there any way of also loading a LoRA model's weights in the model's pipeline? If I load a LoRA model using pipe.load_lora_weights(...)
before calling ip_adapter, will it retain those weights?
I know these are many questions. Thanks a lot in advance!
Forgot to mention earlier, but this is some amazing work you guys did, I love the idea.
Hello,
thank you for this wonderful model!
I am trying to run ImgtoImg pipeline using IP Adapter Plus following the example in the original notebook:
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
base_model_path,
torch_dtype=torch.float16,
scheduler=noise_scheduler,
vae=vae,
feature_extractor=None,
safety_checker=None
)
ip_model = IPAdapterPlus(pipe, image_encoder_path, ip_ckpt, device, num_tokens=16)
images = ip_model.generate(pil_image=image, num_samples=4, num_inference_steps=50, seed=seed, image=g_image, strength=scale)
but I am getting the following error
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[7], line 6
4 for scale in [0.7, 0.75, 0.8, 0.95]:
5 print(scale)
----> 6 images = ip_model.generate(pil_image=image, num_samples=4, num_inference_steps=50, seed=seed, image=g_image, strength=scale)
7 grid = image_grid(images, 1, 4)
8 display(grid)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/IP_Adapter/ip_adapter/ip_adapter.py:132, in IPAdapter.generate(self, pil_image, prompt, negative_prompt, scale, num_samples, seed, guidance_scale, num_inference_steps, **kwargs)
129 if not isinstance(negative_prompt, List):
130 negative_prompt = [negative_prompt] * num_prompts
--> 132 image_prompt_embeds, uncond_image_prompt_embeds = self.get_image_embeds(pil_image)
133 bs_embed, seq_len, _ = image_prompt_embeds.shape
134 image_prompt_embeds = image_prompt_embeds.repeat(1, num_samples, 1)
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File /mnt/2287294e-32c7-437b-84bd-452a29548b1a/IP_Adapter/ip_adapter/ip_adapter.py:239, in IPAdapterPlus.get_image_embeds(self, pil_image)
237 clip_image = self.clip_image_processor(images=pil_image, return_tensors="pt").pixel_values
238 clip_image = clip_image.to(self.device, dtype=torch.float16)
--> 239 clip_image_embeds = self.image_encoder(clip_image, output_hidden_states=True).hidden_states[-2]
240 image_prompt_embeds = self.image_proj_model(clip_image_embeds)
241 uncond_clip_image_embeds = self.image_encoder(torch.zeros_like(clip_image), output_hidden_states=True).hidden_states[-2]
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:1311, in CLIPVisionModelWithProjection.forward(self, pixel_values, output_attentions, output_hidden_states, return_dict)
1288 r"""
1289 Returns:
1290
(...)
1307 >>> image_embeds = outputs.image_embeds
1308 ```"""
1309 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
-> 1311 vision_outputs = self.vision_model(
1312 pixel_values=pixel_values,
1313 output_attentions=output_attentions,
1314 output_hidden_states=output_hidden_states,
1315 return_dict=return_dict,
1316 )
1318 pooled_output = vision_outputs[1] # pooled_output
1320 image_embeds = self.visual_projection(pooled_output)
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:866, in CLIPVisionTransformer.forward(self, pixel_values, output_attentions, output_hidden_states, return_dict)
863 if pixel_values is None:
864 raise ValueError("You have to specify pixel_values")
--> 866 hidden_states = self.embeddings(pixel_values)
867 hidden_states = self.pre_layrnorm(hidden_states)
869 encoder_outputs = self.encoder(
870 inputs_embeds=hidden_states,
871 output_attentions=output_attentions,
872 output_hidden_states=output_hidden_states,
873 return_dict=return_dict,
874 )
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py:195, in CLIPVisionEmbeddings.forward(self, pixel_values)
193 def forward(self, pixel_values: torch.FloatTensor) -> torch.Tensor:
194 batch_size = pixel_values.shape[0]
--> 195 patch_embeds = self.patch_embedding(pixel_values) # shape = [*, width, grid, grid]
196 patch_embeds = patch_embeds.flatten(2).transpose(1, 2)
198 class_embeds = self.class_embedding.expand(batch_size, 1, -1)
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
1496 # If we don't have any hooks, we want to skip the rest of the logic in
1497 # this function, and just call forward.
1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1499 or _global_backward_pre_hooks or _global_backward_hooks
1500 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501 return forward_call(*args, **kwargs)
1502 # Do not call functions when jit is used
1503 full_backward_hooks, non_full_backward_hooks = [], []
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/conv.py:463, in Conv2d.forward(self, input)
462 def forward(self, input: Tensor) -> Tensor:
--> 463 return self._conv_forward(input, self.weight, self.bias)
File /mnt/a0b764eb-cdc5-4f46-9a2e-e2f11deba631/PYTHON_CACHE/IP_Adapter/lib/python3.10/site-packages/torch/nn/modules/conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
455 if self.padding_mode != 'zeros':
456 return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
457 weight, bias, self.stride,
458 _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
460 self.padding, self.dilation, self.groups)
RuntimeError: GET was unable to find an engine to execute this computation
Am I doing something wrong or this feature is not implemented in Adapter Plus?
Hello author, I have a stupid question. I read the code you provided and found that ip_token and encoder_hidden_state were cated. Is this different from the add described in the paper
This project is awesome!!!
I have two small questions,
Hello..
is there an SDXL version of this model "ip_adapter-plus-face"? .. or is there a way to use it with SDXL?
thank you :)
Hi It takes almost 15 minutes to create an image with the RTX4090. Is this normal?
7/30 [02:38<09:06, 23.76s/it]
Awesome work! Thanks for your contribution
I am using ip_adapter_multimodal_prompts: generation with multimodal prompts in python. But I want to download the images in a folder whenever I run this code. How to do that after the below line?
images = ip_model.generate(pil_image=image, num_samples=1, num_inference_steps=50, seed=42,
prompt="wearing a hat on the beach", scale=0.6)
have you experimented adding a little noise to the zeroed tensors?
I made a few tests with the Plus model and the results are... interesting. Basically instead of a zeroed embed I'm passing a random +/- 0.5 noise (or even higher).
This is a quick example, sometimes the result is quite better, you just need to keep it low otherwise it starts "burning" the image.
wondering if this makes any sense
Hello, I have a question. During inference, the foreground objects need to be picked out using instance segmentation. Is this necessary during training? Also, what is the specific segmentation method used in the paper? Looking forward to your reply, thank you.
Hi! Thank you for your amazing work!
When I'm compiling unet with torch.compile(unet) decoupled text/image attention seems to stop working. Do you have fix for that?
Hi authors,
I wonder how to crop face when you train the face ip-adaptor?
I have tried running it but always run into memory issues and it terminates at this part - IPAdapterXL(pipe, image_encoder_path, ip_ckpt, device)
Even though loading the model with StableDiffusionXLPipeline.from_pretrained works fine. I also tried using accelerate but still face issues. Does this mean it's not possible to run it on Colab's free tier?
How to install IP-Adapter? Please help.
Is it possible to train an adapter, that will be grab ONLY perspective/scene/location condition from image?
SD1 often struggling with good perspective and angles. It will fix big problem.
The paper said that when the scale param is set to 0, it is equivalent to the ability of the base text2img model.
But when I set the scale to 0 and compared it with the original text2img base model, I found that there are still some differences in the generation images (all other parameters remain the same setting).
Is it because of the def set_ip_adapter()
and def load_ip_adapter()
in the inference code?
First of all you need to change you runtime to t4 because automatically is cpu
Code.txt
And is that!!!
Hi, I placed the models ip-adaptater_sd15.bin , ip-adapter-plus_sd15.bin and ip-adapter-plus-face_sd15.bin into ../stable-diffusion-webui > extensions > sd-webui-controlnet > models but when I restart a1111, they not showing into the model field of controlnet ( 1.1.4 )
Thanks
Hi, I recently saw a work very similar to yours, also from Tencent, called StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation. Do you know this paper?
Thanks for your work
I observe ipadapter can't generate similar images with image prompt when image prompt is anime character style. But when i use the anime style dreambooth or corresponding character lora, the ipadapter performs better. I'd like to ask whether the ipadapter only works when the foundation model can produce similar results with the image prompt.
getting the issue of AttributeError: module 'torch.nn.functional' has no attribute 'scaled_dot_product_attention'
I have the latest torch, controlnet 1.1.409, renamed to .pth,
everything looks good just erroring out.
seeing if anyone else has this issue
Using the SD1.5 IP-Adaptor models in v1.6.0 Automatic1111 environment (python: 3.10.13, torch: 2.0.1 ) with latest ControlNet on Apple ARM architecture generates a random image and produces console runtime error below. COMMANDLINE_ARGS="--skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate".
2023-09-10 21:04:57,087 - ControlNet - STATUS - preprocessor resolution = 512
*** Error running process: /Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlnet.py
Traceback (most recent call last):
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/modules/scripts.py", line 619, in process
script.process(p, *script_args)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlnet.py", line 977, in process
self.controlnet_hack(p)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlnet.py", line 966, in controlnet_hack
self.controlnet_main_entry(p)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/controlnet.py", line 808, in controlnet_main_entry
detected_map, is_image = preprocessor(
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/utils.py", line 75, in decorated_func
return cached_func(*args, **kwargs)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/utils.py", line 63, in cached_func
return func(*args, **kwargs)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/global_state.py", line 35, in unified_preprocessor
return preprocessor_modules[preprocessor_name](*args, **kwargs)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/scripts/processor.py", line 350, in clip
from annotator.clipvision import ClipVisionDetector
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/extensions/sd-webui-controlnet/annotator/clipvision/__init__.py", line 81, in <module>
clip_vision_h_uc = torch.load(clip_vision_h_uc)['uc']
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/modules/safe.py", line 108, in load
return load_with_extra(filename, *args, extra_handler=global_extra_handler, **kwargs)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/modules/safe.py", line 156, in load_with_extra
return unsafe_torch_load(filename, *args, **kwargs)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 809, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1172, in _load
result = unpickler.load()
File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load
dispatch[key[0]](self)
File "/opt/homebrew/Cellar/[email protected]/3.10.13/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1254, in load_binpersid
self.append(self.persistent_load(pid))
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1142, in persistent_load
typed_storage = load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 1116, in load_tensor
wrap_storage=restore_location(storage, location),
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 217, in default_restore_location
result = fn(storage, location)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 182, in _cuda_deserialize
device = validate_cuda_device(location)
File "/Users/guestuser/Documents/Projects/StableDiffusion/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/serialization.py", line 166, in validate_cuda_device
raise RuntimeError('Attempting to deserialize object on a CUDA '
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.```
i use self.ip_ckpt ./IP-Adapter/sdxl_models/ip-adapter_sdxl.bin,
but when i load it for my sdxl inpainting model :
RuntimeError: Error(s) in loading state_dict for ImageProjModel: size mismatch for proj.weight: copying a param with shape torch.Size([8192, 1280]) from checkpoint, the shape in current model is torch.Size([32768, 1280]). size mismatch for proj.bias: copying a param with shape torch.Size([8192]) from checkpoint, the shape in current model is torch.Size([32768]).
First of all, this work is amazing truly!
Could there also be a support for multicontrol in ipadapters. I was trying canny and inpaint controlnets and faced errors.
Hi! first of all thanks for a really great work.
I see that you've added a face-conditioned model for SD1.5 recently, do you have any plans on releasing similar model for SDXL? Also could you give any estimates on how long does it take to train the IP-Adapter model? You mentioned 1M steps in your paper, could you elaborate how many hours/days is it?
Also I have few improvements ideas, based on my experience:
hidden_states[-2]
for text conditioningMaybe you guys have seen this error before
Traceback (most recent call last):
File "/pkg/modal/_container_entrypoint.py", line 351, in handle_input_exception
yield
File "/pkg/modal/_container_entrypoint.py", line 437, in run_inputs
res = imp_fun.fun(*args, **kwargs)
File "/root/modal_testing/adapter.py", line 188, in run
ip_model = IPAdapterPlus(
File "/content/IP-Adapter/ip_adapter/ip_adapter.py", line 52, in __init__
self.load_ip_adapter()
File "/content/IP-Adapter/ip_adapter/ip_adapter.py", line 84, in load_ip_adapter
self.image_proj_model.load_state_dict(state_dict["image_proj"])
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Resampler:
Missing key(s) in state_dict: "latents", "proj_in.weight", "proj_in.bias", "proj_out.weight", "proj_out.bias", "norm_out.weight", "norm_out.bias", "layers.0.0.norm1.weight", "layers.0.0.norm1.bias", "layers.0.0.norm2.weight", "layers.0.0.norm2.bias", "layers.0.0.to_q.weight", "layers.0.0.to_kv.weight", "layers.0.0.to_out.weight", "layers.0.1.0.weight", "layers.0.1.0.bias", "layers.0.1.1.weight", "layers.0.1.3.weight", "layers.1.0.norm1.weight", "layers.1.0.norm1.bias", "layers.1.0.norm2.weight", "layers.1.0.norm2.bias", "layers.1.0.to_q.weight", "layers.1.0.to_kv.weight", "layers.1.0.to_out.weight", "layers.1.1.0.weight", "layers.1.1.0.bias", "layers.1.1.1.weight", "layers.1.1.3.weight", "layers.2.0.norm1.weight", "layers.2.0.norm1.bias", "layers.2.0.norm2.weight", "layers.2.0.norm2.bias", "layers.2.0.to_q.weight", "layers.2.0.to_kv.weight", "layers.2.0.to_out.weight", "layers.2.1.0.weight", "layers.2.1.0.bias", "layers.2.1.1.weight", "layers.2.1.3.weight", "layers.3.0.norm1.weight", "layers.3.0.norm1.bias", "layers.3.0.norm2.weight", "layers.3.0.norm2.bias", "layers.3.0.to_q.weight", "layers.3.0.to_kv.weight", "layers.3.0.to_out.weight", "layers.3.1.0.weight", "layers.3.1.0.bias", "layers.3.1.1.weight", "layers.3.1.3.weight".
Unexpected key(s) in state_dict: "proj.weight", "proj.bias", "norm.weight", "norm.bias".
@xiaohu2015 any ideas??
Congrats again on the great work.
Could you please help clarify how the data subset was selected from the LAION?
Thanks.
I trained with the deepspeed 2 Zero, after I run zero_to_fp32.py, the size of "pytorch_model.bin" generated is ~1.7G, but the pretrained model "ip-adapter_sd15.bin" is only ~43M
can this be installed locally to stable diffusion Automatic 111?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.