sled-group / infedit Goto Github PK

View Code? Open in Web Editor NEW

202.0 6.0 5.0 267.56 MB

[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"

Home Page: https://sled-group.github.io/InfEdit/

License: Apache License 2.0

Python 100.00%

diffusion-models consistency-models image-edit inversion attention-is-all-you-need prompt-to-prompt

infedit's People

Contributors

Stargazers

Watchers

Forkers

sihanxu jackzhousz mobled37 graystonez valeriol29

infedit's Issues

The implementation of mapper

Hello, I'm confused about the following code.

InfEdit/seq_aligner.py

Lines 152 to 154 in d9f6c1b

    
           for t in e_seq: 
        
             if x_seq[max_s] == t: 
        
               alpha_e[max_t] = 1

The code fragment before this snippet does the following things:

search the most similar token in target prompt in a mismatch "gap" interval.
fill the 0 in mapper with the index of the most similar token in the target prompt.

However, II'm confused about the code I pasted. Under what circumstances does the if statement satisfies?

Code realease？

Congratulations on your paper being accepted to CVPR24!

I have been following this work for a long time and am eagerly looking forward to seeing the code being open-sourced.

PIE-Bench

Dear InfEdit team,

Thank you for sharing this great repo, I really like it.

I run the run_pie_bench.py and regenerate the metric. But I find the numbers are very different from the numbers you show in the paper.

InfEdit|structure_distance                         0.029941
InfEdit|psnr_unedit_part                          20.846335
InfEdit|lpips_unedit_part                          0.113928
InfEdit|mse_unedit_part                            0.016414
InfEdit|ssim_unedit_part                           0.766929
InfEdit|clip_similarity_source_image              26.071685
InfEdit|clip_similarity_target_image              23.430599
InfEdit|clip_similarity_target_image_edit_part    20.670541

Could you tell me why this happens?

Thank you for your help.

Best Wishes,

Zongze

can you prompt to prompt editing

can you prompt to prompt editing,I would be appreciated if your code could be provided

Code release?

good work！！！
your project looks very interesting, do you have a date for it's release?

Does it support regular non-consistency models?

Hi! Thanks for your great work! But I have some questions.

Does this method only applies to consistency models? Or does it support regular diffusion models as well?

Thanks!

Inconsistent implementation with description in the paper

regarding mutual self attention control

Hello, I noticed an inconsistent implementation with description in the paper.

InfEdit/app_infedit.py

Lines 244 to 253 in d9f6c1b

    
               vc=torch.cat([vc[:num_heads*2],vc[:num_heads]]) 
        
           else: 
        
               qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]]) 
        
               qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]]) 
        
               ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]]) 
        
               kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]]) 
        
               vu=torch.cat([vu[:num_heads*2],vu[:num_heads]]) 
        
               vc=torch.cat([vc[:num_heads*2],vc[:num_heads]]) 
        
           return torch.cat([qu, qc], dim=0) ,torch.cat([ku, kc], dim=0), torch.cat([vu, vc], dim=0)

This piece of code correspond to section 4.2 paragraph 1 in the paper. However, it replaces q_tgt, k_tgt with q_src, k_src instead of replacing k_tgt, v_tgt as described in the paper.

I wonder which one should I follow? The code or the paper?

run app_infedit.py

A good work!!!
When I run python app_infedit.py, get the following error:
gradio=4.24.0
diffusers=0.28.0.dev0
torch=2.0.1
python=3.8
cuda=11.7

how to adapt lora to lcm?

Amazing work! But I wonder is there possible to add lora to the base lcm ?

Awesome work!!
I want to run the code without using Gradio. Is there a script available for this?
Additionally, it seems that the method depends on prompts from GPT-4. I do not have a gpt-4 subscription. It would be helpful if you could provide some default responses from GPT-4 so that I can run the code.

Code difference between github and huggingface

Hi, I noticed there're some file differences between the code on github and code on huggingface.

For instance, code on huggingface has a pipeline_ddcm.py.

Could you please elaborate a bit about their differences?

Also, what does the abbrevation "ead" in "pipeline_ead.py" stand for? Thanks!

Support for Ollama

The instruction following feature is based on GPT4, are there any plans on supporting Ollama in the future, for local support?

Help with virtual inversion questions please

Hello and thank you for your excellent work. Here is a puzzling question that I would like to have answered by you. In lines 3-4 of Algorithm 1, I can't figure out if there is any difference between z0 in the third line and z in the fourth line.The transformation in lines 3 to 4 doesn't seem to have any additional loss and each of the term's is deterministic. Also, what is the output z of the algorithm? I instinctively feel that the process of inversion is z0 to zT (noise), but it seems that this output z is still z0?

Question for your code: The claimed consistent noise seems unused in demo code?

def ddcm_sampler(scheduler, x_s, x_t, timestep, e_s, e_t, x_0, noise, eta, to_next=True):
    if scheduler.num_inference_steps is None:
        raise ValueError(
            "Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler"
        )
    
    if scheduler.step_index is None:
        scheduler._init_step_index(timestep)

    prev_step_index = scheduler.step_index + 1
    if prev_step_index < len(scheduler.timesteps):
        prev_timestep = scheduler.timesteps[prev_step_index]
    else:
        prev_timestep = timestep

    alpha_prod_t = scheduler.alphas_cumprod[timestep]
    alpha_prod_t_prev = (
        scheduler.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else scheduler.final_alpha_cumprod
    )
    beta_prod_t = 1 - alpha_prod_t
    beta_prod_t_prev = 1 - alpha_prod_t_prev
    variance = beta_prod_t_prev
    std_dev_t = eta * variance
    noise = std_dev_t ** (0.5) * noise

    e_c = (x_s - alpha_prod_t ** (0.5) * x_0) / (1 - alpha_prod_t) ** (0.5)

    pred_x0 = x_0 + ((x_t - x_s) - beta_prod_t ** (0.5) * (e_t - e_s)) / alpha_prod_t ** (0.5)  # + mv_offset
    eps = (e_t - e_s) + e_c
    dir_xt = (beta_prod_t_prev - std_dev_t) ** (0.5) * eps

    # Noise is not used for one-step sampling.
    if len(scheduler.timesteps) > 1:
        prev_xt = alpha_prod_t_prev ** (0.5) * pred_x0 + dir_xt + noise
        prev_xs = alpha_prod_t_prev ** (0.5) * x_0 + dir_xt + noise
    else:
        prev_xt = pred_x0
        prev_xs = x_0

    if to_next:
      scheduler._step_index += 1
    return prev_xs, prev_xt, pred_x0

Here the eta is set to 1 in your code, but this will lead dir_xt to be always 0.
Besides, I'm a bit of confused of the computation of pred_x0, it seems add a target branch latent to original image's latent and then subtract the source branch latent.

Would be appreciate for your reply!

	vc=torch.cat([vc[:num_heads*2],vc[:num_heads]])
	else:
	qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]])
	qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]])
	ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]])
	kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]])
	vu=torch.cat([vu[:num_heads*2],vu[:num_heads]])
	vc=torch.cat([vc[:num_heads*2],vc[:num_heads]])

	return torch.cat([qu, qc], dim=0) ,torch.cat([ku, kc], dim=0), torch.cat([vu, vc], dim=0)