Giter Site home page Giter Site logo

haozheliu-st / t-gate Goto Github PK

View Code? Open in Web Editor NEW
310.0 12.0 19.0 33.26 MB

T-GATE: Temporally Gating Attention to Accelerate Diffusion Model for Free!

License: MIT License

Python 100.00%
cross-attention cross-attention-diffusers diffusers diffusion efficiency inference pytorch text2image training-free transformer

t-gate's People

Contributors

eltociear avatar haozheliu-st avatar jetthu avatar sierkinhane avatar wentianzhang-ml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

t-gate's Issues

confusion about speedup

Hello, thank you for your excellent work. Your work considers the redundancy of crossattn and uses the cache approach to solve the above problem, and finally achieves the speedup of the generation. As far as I know, the computational cost of self-attn and ffn is larger than that of cross-attn. However, it is pointed out in the paper that t-gate can achieve nearly 40% speedup with only cache cross-attn. How such a high speedup is achieved, if I have some misunderstanding of the technology. I would appreciate it if you could help me solve this confusion. Thanks! 🌹

About Playground-v2.5-1024 model.

Hi!
Thanks for your amazing work.

Playground-v2.5-1024 is a stronger T2I model based on the SD-XL architecture.
(https://huggingface.co/playgroundai/playground-v2.5-1024px-aesthetic)
I try to use the follow code to speed up the model, but the result seems terrible.

import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
        "playgroundai/playground-v2.5-1024px-aesthetic",
        torch_dtype=torch.float16,
        variant="fp16",
        use_safetensors=True,
)

from tgate import TgateSDXLLoader
gate_step = 10
inference_step = 25
pipe = TgateSDXLLoader(
       pipe,
       gate_step=gate_step,
       num_inference_steps=inference_step,
)
pipe = pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k."
        
image = pipe.tgate(
        prompt,
        gate_step=gate_step,
        num_inference_steps=inference_step
).images[0]
image.save(f"{prompt}.png") 

Astronaut in a jungle, cold color palette, muted colors, detailed, 8k

Is there any way to solve the problem?
I am looking for your reply.

TGATE v0.1.1 encounter ValueError when performing multiple forward inferences

Hi! Thank you for the amazing work.

I encounter the ValueError when performing multiple forward inferences:
image

Here's the testing code I used:

pipe = TgateSDLoader(
            pipe,
            gate_step=gate_step,
            num_inference_steps=inference_step
       ).to("cuda")
start_time = time.time()
for _ in range(infer_times):
    tagate_image = pipe.tgate(
          prompt,
          gate_step=gate_step,
          num_inference_steps=inference_step
      ).images    
    latency = (time.time() - start_time) / infer_times
    logging.info("T-GATE: {:.2f} seconds".format(latency))

Hope you can resolve this issue.

cross-attention Difference code

Hi,
thank you for your indepth analysis,
could you open source how to compute the cross-attention Difference code given in Figue 2 ?

How to reproduce FID from paper?

Hi! I'am trying to reproduce results of T_GATE (FID metric) that described in your technical report using SDXL model, DPM scheduler with 25 inference steps and gate step is 10. I'am using MS_COCO 256x256 benchmark from https://github.com/Nota-NetsPresso/BK-SDM.git repository and got very big FID instead of 22.738 that presented in your paper on arxiv. Other metrics that I measure like Inception score and CLIP score is normal. Can you please provide more information about hyperparameters (guidance scale for example), image resolution? What captions used for generation (full validation set from MSCOCO-2014 or MSCOCO-2017, or maybe some subset from them) and what real images was used to measure FID between real and generated samples?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.