Hi ! I start by saying that I'm not a power user and I don't edit code and stuff l

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Help ! : "No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile.",about nvidia/stable-diffusion-webui-tensorrt

contentis commented on August 23, 2024 6

Hi, HIRES fix is something that I have not considered when developing this. Due to the engine having to be swapped in and out the overhead overshadows all performance gains.

This is something I need to address, but this will take some time.

from stable-diffusion-webui-tensorrt.

PaoloNeoz commented on August 23, 2024

Ok I found a post that tells that we have to generate a specific profile for the target resolution if we use HIRES FIX.
Otherwise the generation will fail.

So if I disable HIRES FIX, that I always use, the generation works and I see an improvement from 14 it/s to 23 it/s with my 4070 TI.

But ... if I enable HIRES FIX to upscale 512 x 768 image to 1024 x 1536, even after the genration of a dedicated profile for 1024*1536, the generation speed does not increase by much (18.7 sec vs 17.9 sec using 150 sampling steps to better check the speed).
The iterations increses to 23 from 14 again but something works wrong appartently ...
The hires fix speed does not increase at all.

My profiles :

See console messages below (it seems like it is using the wrong profile ???) :

_0%| | 0/150 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt | 0/10 [00:00<?, ?it/s]
[I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt
Profile 0:
sample = [(1, 4, 64, 64), (2, 4, 64, 64), (8, 4, 96, 96)]
timesteps = [(1,), (2,), (8,)]
encoder_hidden_states = [(1, 77, 768), (2, 77, 768), (8, 154, 768)]
latent = [(-1945884672), (-1945884160), (-1945878668)]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:09<00:00, 15.65it/s]
0%| | 0/10 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt
[I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt
Profile 0:
sample = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]
timesteps = [(2,), (2,), (2,)]
encoder_hidden_states = [(2, 77, 768), (2, 77, 768), (2, 77, 768)]
latent = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.23it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 8.83it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 3.25it/s]_

from stable-diffusion-webui-tensorrt.

tyen0 commented on August 23, 2024

Maybe it can fall back to not using the tensorrt engine automatically if no matching profile found? Faster trt first pass, normal hires fix, faster trt adetailer inpainting. In the spirit of quality over quantity. :)

from stable-diffusion-webui-tensorrt.

contentis commented on August 23, 2024

@tyen0 The limiting factor is reloading weights. Torch and TensorRT use different memory pools. Therefore you wouldn't see any performance benefit from that...

from stable-diffusion-webui-tensorrt.

contentis commented on August 23, 2024

The dev branch contains a fix for the engine selection heuristic. It will now try to select an engine that covers both the hi and lowres. If it can't it will inform the user.

Please not that this is still WIP and might have some bugs.

from stable-diffusion-webui-tensorrt.

thanayut1750 commented on August 23, 2024

Ok I found a post that tells that we have to generate a specific profile for the target resolution if we use HIRES FIX. Otherwise the generation will fail.

So if I disable HIRES FIX, that I always use, the generation works and I see an improvement from 14 it/s to 23 it/s with my 4070 TI.

But ... if I enable HIRES FIX to upscale 512 x 768 image to 1024 x 1536, even after the genration of a dedicated profile for 1024*1536, the generation speed does not increase by much (18.7 sec vs 17.9 sec using 150 sampling steps to better check the speed). The iterations increses to 23 from 14 again but something works wrong appartently ... The hires fix speed does not increase at all.

My profiles :

See console messages below (it seems like it is using the wrong profile ???) :

_0%| | 0/150 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt | 0/10 [00:00<?, ?it/s] [I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt Profile 0: sample = [(1, 4, 64, 64), (2, 4, 64, 64), (8, 4, 96, 96)] timesteps = [(1,), (2,), (8,)] encoder_hidden_states = [(1, 77, 768), (2, 77, 768), (8, 154, 768)] latent = [(-1945884672), (-1945884160), (-1945878668)]

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:09<00:00, 15.65it/s] 0%| | 0/10 [00:00<?, ?it/s]Loading TensorRT engine: F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt [I] Loading bytes from F:\A1111\stable-diffusion-webui\models\Unet-trt\SD_cyberrealistic_v33_82b0d085_cc89_sample=2x4x192x128-timesteps=2-encoder_hidden_states=2x77x768.trt Profile 0: sample = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)] timesteps = [(2,), (2,), (2,)] encoder_hidden_states = [(2, 77, 768), (2, 77, 768), (2, 77, 768)] latent = [(2, 4, 192, 128), (2, 4, 192, 128), (2, 4, 192, 128)]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.23it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 8.83it/s] Total progress: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 160/160 [00:18<00:00, 3.25it/s]_

this problem still exits

The dev branch contains a fix for the engine selection heuristic. It will now try to select an engine that covers both the hi and lowres. If it can't it will inform the user.

Please not that this is still WIP and might have some bugs.

even switch to the dev branch and reinstall the extension

from stable-diffusion-webui-tensorrt.

Help ! : "No valid profile found. Please go to the TensorRT tab and generate an engine with the necessary profile." about stable-diffusion-webui-tensorrt HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent