Giter Site home page Giter Site logo

Help ! : "No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile." about stable-diffusion-webui-tensorrt HOT 6 CLOSED

nvidia avatar nvidia commented on August 23, 2024
Help ! : "No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile."

from stable-diffusion-webui-tensorrt.

Comments (6)

contentis avatar contentis commented on August 23, 2024 6

Hi, HIRES fix is something that I have not considered when developing this. Due to the engine having to be swapped in and out the overhead overshadows all performance gains.

This is something I need to address, but this will take some time.

from stable-diffusion-webui-tensorrt.

PaoloNeoz avatar PaoloNeoz commented on August 23, 2024

Ok I found a post that tells that we have to generate a specific profile for the target resolution if we use HIRES FIX.
Otherwise the generation will fail.

So if I disable HIRES FIX, that I always use, the generation works and I see an improvement from 14 it/s to 23 it/s with my 4070 TI.

But ... if I enable HIRES FIX to upscale 512 x 768 image to 1024 x 1536, even after the genration of a dedicated profile for 1024*1536, the generation speed does not increase by much (18.7 sec vs 17.9 sec using 150 sampling steps to better check the speed).
The iterations increses to 23 from 14 again but something works wrong appartently ...
The hires fix speed does not increase at all.

My profiles :
Screenshot 2023-10-18 120404

See console messages below (it seems like it is using the wrong profile ???) :

_0%| | 0/150 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt | 0/10 [00:00<?, ?it/s]
[I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt
Profile 0:
sample = [(1, 4, 64, 64), (2, 4, 64, 64), (8, 4, 96, 96)]
timesteps = [(1,), (2,), (8,)]
encoder_hidden_states = [(1, 77, 768), (2, 77, 768), (8, 154, 768)]
latent = [(-1945884672), (-1945884160), (-1945878668)]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:09<00:00, 15.65it/s]
0%| | 0/10 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt
[I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt
Profile 0:
sample = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]
timesteps = [(2,), (2,), (2,)]
encoder_hidden_states = [(2, 77, 768), (2, 77, 768), (2, 77, 768)]
latent = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.23it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 8.83it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 3.25it/s]_

from stable-diffusion-webui-tensorrt.

tyen0 avatar tyen0 commented on August 23, 2024

Maybe it can fall back to not using the tensorrt engine automatically if no matching profile found? Faster trt first pass, normal hires fix, faster trt adetailer inpainting. In the spirit of quality over quantity. :)

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on August 23, 2024

@tyen0 The limiting factor is reloading weights. Torch and TensorRT use different memory pools. Therefore you wouldn't see any performance benefit from that...

from stable-diffusion-webui-tensorrt.

contentis avatar contentis commented on August 23, 2024

The dev branch contains a fix for the engine selection heuristic. It will now try to select an engine that covers both the hi and lowres. If it can't it will inform the user.

Please not that this is still WIP and might have some bugs.

from stable-diffusion-webui-tensorrt.

thanayut1750 avatar thanayut1750 commented on August 23, 2024

Ok I found a post that tells that we have to generate a specific profile for the target resolution if we use HIRES FIX. Otherwise the generation will fail.

So if I disable HIRES FIX, that I always use, the generation works and I see an improvement from 14 it/s to 23 it/s with my 4070 TI.

But ... if I enable HIRES FIX to upscale 512 x 768 image to 1024 x 1536, even after the genration of a dedicated profile for 1024*1536, the generation speed does not increase by much (18.7 sec vs 17.9 sec using 150 sampling steps to better check the speed). The iterations increses to 23 from 14 again but something works wrong appartently ... The hires fix speed does not increase at all.

My profiles : Screenshot 2023-10-18 120404

See console messages below (it seems like it is using the wrong profile ???) :

_0%| | 0/150 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt | 0/10 [00:00<?, ?it/s] [I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt Profile 0: sample = [(1, 4, 64, 64), (2, 4, 64, 64), (8, 4, 96, 96)] timesteps = [(1,), (2,), (8,)] encoder_hidden_states = [(1, 77, 768), (2, 77, 768), (8, 154, 768)] latent = [(-1945884672), (-1945884160), (-1945878668)]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:09<00:00, 15.65it/s] 0%| | 0/10 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt [I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt Profile 0: sample = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)] timesteps = [(2,), (2,), (2,)] encoder_hidden_states = [(2, 77, 768), (2, 77, 768), (2, 77, 768)] latent = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.23it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 8.83it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 3.25it/s]_

this problem still exits

The dev branch contains a fix for the engine selection heuristic. It will now try to select an engine that covers both the hi and lowres. If it can't it will inform the user.

Please not that this is still WIP and might have some bugs.

even switch to the dev branch and reinstall the extension

from stable-diffusion-webui-tensorrt.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.