I did not see any existing issues for this so I wager I'm just an idiot... and have mi

Is there an upscaler guide? about shark HOT 3 OPEN

XeroCreator commented on June 13, 2024

Is there an upscaler guide?

from shark.

Comments (3)

NeedsMoar commented on June 13, 2024

The stable diffusion upscaler is always generated with a 128x128 config since it operates in blocks. I can't say why you're getting that error though, unless you've got a custom VAE plugged in or something is broken with the huggingface repo right now. I'll point some stuff out while I'm here:

The "size" parameters are a red herring, sorta. AFAICT the input image is always upscaled by 4 to a max of 2048, but for some pants-crappingly insane reason you need to set those dimensions to the ratio of the two dimensions... not the actual ratio, though, the inverted ratio. So to upscale a 768h x 512w image, since the highest number you can set in the UI is 512, I have to set the input as 384h x 512w or the output is squashed. Then, since it's done in 128x128 blocks, this is how long it takes:


50it [00:04, 11.06it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.85it/s]
50it [00:04, 11.07it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.86it/s]
50it [00:04, 11.21it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.79it/s]
50it [00:04, 11.08it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.82it/s]
50it [00:04, 11.15it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.81it/s]
50it [00:04, 11.08it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.74it/s]
50it [00:04, 11.28it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.70it/s]
50it [00:04, 11.11it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.72it/s]
50it [00:04, 11.16it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.82it/s]
50it [00:04, 10.97it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.75it/s]
50it [00:04, 10.97it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.92it/s]
50it [00:04, 11.02it/s]
100%|████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.88it/s]

for a 1536x2048 output.

My honest advice is to avoid it entirely. It has a bad tendency of producing blocky artifacts at boundaries, some of them are reliably in the same spots. I suspect it's maybe a precision issue, but it's not very impressive.

Until Shark gets UI support I'd just grab https://github.com/xinntao/Real-ESRGAN-ncnn-vulkan
I've been using it on video for a while now. It was faster than realtime for 480p video sources with smaller models on my 6900XT. On the 7900XTX it takes under 3 minutes to convert 22 minutes of frames from 30fps SD video.
Single frames take longer to load the model and initialize vulkan than they do to upscale it.

Compared to the above 12 step process, realesrgan with one of the much slower fp32 precision model versions (Ultrasharp-x4) takes under 4s:

The current time is:  5:56:44.84

realesrgan-ncnn-vulkan -n UltraSharp-opt-fp32-x4 -i 013243__photorealistic_2460524766.png -o up.png -s 4 -j 1:16:1 -v
[0 AMD Radeon RX 7900 XTX]  queueC=1[2]  queueG=0[1]  queueT=2[2]
[0 AMD Radeon RX 7900 XTX]  bugsbn1=0  bugbilz=0  bugcopc=0  bugihfa=0
[0 AMD Radeon RX 7900 XTX]  fp16-p/s/a=1/1/1  int8-p/s/a=1/1/1
[0 AMD Radeon RX 7900 XTX]  subgroup=64  basic=1  vote=1  ballot=1  shuffle=1
0.00%
8.33%
16.67%
25.00%
33.33%
41.67%
50.00%
58.33%
66.67%
75.00%
83.33%
91.67%
013243__photorealistic_2460524766.png -> up.png done

The current time is:  5:56:48.56

This results in a 2048x3072 image which is higher than you'll be able to get shark to produce with sd-x4.

Upscaling that upscaled image through the same pipeline again takes 21 seconds this time which is still ~2.5x faster than sd-4x and results in an 8192 x 12288 image which is more than you'll conceivably need for anything. It still looks impressively good at that scale.

stable-diffusion-x4-upscaler is the only thing that works with the UI right now regardless of what upscalers you drop in. This may change.

The only conceivable use would be prompting + LoRAs and CFG weighting to change the output, but in my limited tests this didn't really change the output.

There's a script in the full shark git to call the pytorch versions of the ESRGAN models but I haven't tried it to compare speed. The ncnn-vulkan build hasn't been updated since last year so it's presumably non-optimal for RDNA3 but it's still faster than anything I need. Whether or not the shark version will be faster is anybody's guess, I've just been too lazy to play with the script for it.

from shark.

XeroCreator commented on June 13, 2024

I feel quite dumb, as i've only been doing this a few days, but one of my problems ended up being that I was using an upscaler download (which i found somewhere on this git), that was a .ckpt extension. versus the .safetensors extension.
After i got the .safetensors extension one, it worked in SHARK just fine.

Does the x4 upscaler work for SD version 2 only or does it work with past versions too? I noticed it says Stable Diffusion 2 on the page so that's why I never got that one.

from shark.

NeedsMoar commented on June 13, 2024

It probably should have worked... I think the Shark upscaler UI hardcodes the image block size @ 128x128 for one thing. You might be able to get it to work by changing that in "apps\stable_diffusion\web\ui\upscaler_ui.py" where args.width and args.height are set to 128 for the model config. I'm still not convinced x4 upscale works correctly in shark because of the bizarre block artifacts. You should be able to spot 128x128 blocks of the upscaled image where some have very high detail / texture and some don't, and the colors are off between them. The algorithm was supposed to be recognizing neighbor block features and blending them for each block but I don't think it does that in Shark's case, I dunno. There's a bunch of difficulty comparing code with base diffusers because none of the classes in Shark derive from the class hierarchy in diffusers, which uses a bunch of Mixins and such to deal with things like embeddings across various pipelines more easily, whereas Shark has tons of custom model loading code due to the compilation

Edit: Safetensors and .ckpt files are loaded differently by shark I think, too. There was some recent checkin about requiring accelerators to load .safetensors. I can't test it because some unknown change broke model compilation entirely in the most recent snapshot release. It looks like it's potentially a fix for the newest AMD driver but I have to look through changes to figure it out.

I'm guessing what broke 128x128 working was the weird hack in place to redirect the 768 stability-2 / stability-2.1 models to the base versions of same when pulling the configs and everything. I've hit issues with this more or less constantly (there's an exciting issue where if you install the stability-2 model which is 768x768, shark tunes it (or not, either way the wrong thing is being compiled) with the stability-2.1-base configs and downloads the .bins from that, which causes a catastrophic issue where model speed drops from ~10it/s to 10+s / it. I don't know exactly what it's doing except a tight loop using very little surface area of the GPU is being executed causing the boost speed to jump to 100MHz higher than I have the max set at (over 3GHz) while still using very little power (130W) and slows down the entire UI in Windows, but not enough to cause the driver to go unresponsive and be force-reset.

Supposedly the 768x768 models are disabled because of an issue with the latents not decoding correctly in fp16, although fixed VAEs for this issue have existed for a long time (there's already a fixed fp16 VAE for SDXL 1.0 and that came out like yesterday) so I'm not sure where the issue is there at this point. I've only been able to get 768x768 to work by leaving the model as the default stability-2.1 variant pulled directly from huggingface. Using a local 768x768 model like stablity-v2 causes some kind of catastrophic compilation failure I keep meaning to file a bug on where speed decreases 100x per iteration (tuned or not), the GPU starts running faster than the maximum specified boost clock (but only pulls about 1/3 its available wattage, probably indicating it's running a tight loop that isn't distributed across CUs properly but I can't say why it violates boost clocks), and the whole system becomes choppy due to whatever is going on there but the video driver stays just responsive enough to not trigger the Windows driver reset. If you're bored enough and wait the 10-12s per iteration to let this finish it still produces noise because it's using the non FP16 compatible VAEs. I can't say why 768x768 works on the default 2.1 model unless someone fixed the VAE there or Shark is force-loading something.

from shark.

Is there an upscaler guide? about shark HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent