Giter Site home page Giter Site logo

sled-group / infedit Goto Github PK

View Code? Open in Web Editor NEW
202.0 6.0 5.0 267.56 MB

[CVPR 2024] Official implementation of CVPR 2024 paper: "Inversion-Free Image Editing with Natural Language"

Home Page: https://sled-group.github.io/InfEdit/

License: Apache License 2.0

Python 100.00%
diffusion-models consistency-models image-edit inversion attention-is-all-you-need prompt-to-prompt

infedit's People

Contributors

h6kplus avatar sihanxu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

infedit's Issues

The implementation of mapper

Hello, I'm confused about the following code.

InfEdit/seq_aligner.py

Lines 152 to 154 in d9f6c1b

for t in e_seq:
if x_seq[max_s] == t:
alpha_e[max_t] = 1

The code fragment before this snippet does the following things:

  1. search the most similar token in target prompt in a mismatch "gap" interval.
  2. fill the 0 in mapper with the index of the most similar token in the target prompt.

However, II'm confused about the code I pasted. Under what circumstances does the if statement satisfies?

Code realease?

Congratulations on your paper being accepted to CVPR24!

I have been following this work for a long time and am eagerly looking forward to seeing the code being open-sourced.

PIE-Bench

Dear InfEdit team,

Thank you for sharing this great repo, I really like it.

I run the run_pie_bench.py and regenerate the metric. But I find the numbers are very different from the numbers you show in the paper.

InfEdit|structure_distance                         0.029941
InfEdit|psnr_unedit_part                          20.846335
InfEdit|lpips_unedit_part                          0.113928
InfEdit|mse_unedit_part                            0.016414
InfEdit|ssim_unedit_part                           0.766929
InfEdit|clip_similarity_source_image              26.071685
InfEdit|clip_similarity_target_image              23.430599
InfEdit|clip_similarity_target_image_edit_part    20.670541

Could you tell me why this happens?

Thank you for your help.

Best Wishes,

Zongze

Code release?

good work!!!
your project looks very interesting, do you have a date for it's release?

Inconsistent implementation with description in the paper

regarding mutual self attention control

Hello, I noticed an inconsistent implementation with description in the paper.

InfEdit/app_infedit.py

Lines 244 to 253 in d9f6c1b

vc=torch.cat([vc[:num_heads*2],vc[:num_heads]])
else:
qu=torch.cat([qu[:num_heads],qu[:num_heads],qu[:num_heads]])
qc=torch.cat([qc[:num_heads],qc[:num_heads],qc[:num_heads]])
ku=torch.cat([ku[:num_heads],ku[:num_heads],ku[:num_heads]])
kc=torch.cat([kc[:num_heads],kc[:num_heads],kc[:num_heads]])
vu=torch.cat([vu[:num_heads*2],vu[:num_heads]])
vc=torch.cat([vc[:num_heads*2],vc[:num_heads]])
return torch.cat([qu, qc], dim=0) ,torch.cat([ku, kc], dim=0), torch.cat([vu, vc], dim=0)

This piece of code correspond to section 4.2 paragraph 1 in the paper. However, it replaces q_tgt, k_tgt with q_src, k_src instead of replacing k_tgt, v_tgt as described in the paper.

I wonder which one should I follow? The code or the paper?

run app_infedit.py

A good work!!!
When I run python app_infedit.py, get the following error:
gradio=4.24.0
diffusers=0.28.0.dev0
torch=2.0.1
python=3.8
cuda=11.7
image

How to run without gradio?

Awesome work!!
I want to run the code without using Gradio. Is there a script available for this?
Additionally, it seems that the method depends on prompts from GPT-4. I do not have a gpt-4 subscription. It would be helpful if you could provide some default responses from GPT-4 so that I can run the code.

Support for Ollama

The instruction following feature is based on GPT4, are there any plans on supporting Ollama in the future, for local support?

Help with virtual inversion questions please

image
Hello and thank you for your excellent work. Here is a puzzling question that I would like to have answered by you. In lines 3-4 of Algorithm 1, I can't figure out if there is any difference between z0 in the third line and z in the fourth line.The transformation in lines 3 to 4 doesn't seem to have any additional loss and each of the term's is deterministic. Also, what is the output z of the algorithm? I instinctively feel that the process of inversion is z0 to zT (noise), but it seems that this output z is still z0?

Question for your code: The claimed consistent noise seems unused in demo code?

def ddcm_sampler(scheduler, x_s, x_t, timestep, e_s, e_t, x_0, noise, eta, to_next=True):
    if scheduler.num_inference_steps is None:
        raise ValueError(
            "Number of inference steps is 'None', you need to run 'set_timesteps' after creating the scheduler"
        )
    
    if scheduler.step_index is None:
        scheduler._init_step_index(timestep)

    prev_step_index = scheduler.step_index + 1
    if prev_step_index < len(scheduler.timesteps):
        prev_timestep = scheduler.timesteps[prev_step_index]
    else:
        prev_timestep = timestep

    alpha_prod_t = scheduler.alphas_cumprod[timestep]
    alpha_prod_t_prev = (
        scheduler.alphas_cumprod[prev_timestep] if prev_timestep >= 0 else scheduler.final_alpha_cumprod
    )
    beta_prod_t = 1 - alpha_prod_t
    beta_prod_t_prev = 1 - alpha_prod_t_prev
    variance = beta_prod_t_prev
    std_dev_t = eta * variance
    noise = std_dev_t ** (0.5) * noise

    e_c = (x_s - alpha_prod_t ** (0.5) * x_0) / (1 - alpha_prod_t) ** (0.5)

    pred_x0 = x_0 + ((x_t - x_s) - beta_prod_t ** (0.5) * (e_t - e_s)) / alpha_prod_t ** (0.5)  # + mv_offset
    eps = (e_t - e_s) + e_c
    dir_xt = (beta_prod_t_prev - std_dev_t) ** (0.5) * eps

    # Noise is not used for one-step sampling.
    if len(scheduler.timesteps) > 1:
        prev_xt = alpha_prod_t_prev ** (0.5) * pred_x0 + dir_xt + noise
        prev_xs = alpha_prod_t_prev ** (0.5) * x_0 + dir_xt + noise
    else:
        prev_xt = pred_x0
        prev_xs = x_0

    if to_next:
      scheduler._step_index += 1
    return prev_xs, prev_xt, pred_x0

Here the eta is set to 1 in your code, but this will lead dir_xt to be always 0.
Besides, I'm a bit of confused of the computation of pred_x0, it seems add a target branch latent to original image's latent and then subtract the source branch latent.

Would be appreciate for your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.