Hello, Problem : When the selection of the lora or the model change, the generatio

Selection of a new model or lora is slow about shark HOT 3 OPEN

workpioupiou commented on June 14, 2024

Selection of a new model or lora is slow

from shark.

Comments (3)

NeedsMoar commented on June 14, 2024 1

Yes, I found them yesterday, a litlle 150GB to clear. I try to understand the differnces between scheduler, vae etc. But testing them take a lot of time.

If you want to learn the interactions without doing what amounts to a bunch of obnoxious boilerplate programming that's been repeated hundreds of times, I'd strongly suggest taking a look at https://github.com/comfyanonymous/ComfyUI ... It's node based so you can see how things link up, and it kinda animates what step it's running in the graph. It's not fast on AMD compared to shark; 512x512 images generate at 4it/s (vs ~23 on shark with sd1.5 models and 26-27 for sd2.1-base, currently); Not fast but not unusable, and since you can ei(ther use the builtin preview or download TAESD and live-preview the UNET as it iterates (--preview-method=taesd / =auto) you can see how the images start off and form based on your sampler selection.

ComfyUI is way more likely to OOM your video card, but it'll just fall back onto shared system memory and get incredibly slow rather than throw an out-of-memory error so keep an eye on your memory use.

I tried OliveML / ONNX, but it's maybe 15% slower than Shark and has even less support at the moment.

Just a warning, two of the samplers (the DPMs I think) throw non-critical errors because they attempt to pass a non-DML tensor to the DirectML backend (shouldn't be too bad a fix), and the DDIM sampler just freezes the command prompt it was run in (who knows); it has to be force-quit so avoid it. A few other samplers state that they're falling back on CPU (really, scalar lerp isn't implemented?) but most of the code still runs on the GPU and there's no diffference. Lerp can probably be replaced with a supported op or the scalar value pulled from the result. The _gpu samplers don't work either, they seem to be NVidia specific.

Once you get a combination of models and LoRAs you like worked out you can load them up in shark and skip the recompile delays / run them at 5x the speed again.

from shark.

NeedsMoar commented on June 14, 2024

The LoRA needs to be compiled into the model when a new one is selected. Likewise the models themselves need to be recompiled when new ones are selected, the batch count (simultaneous images) is changed, the size of the image is changed, or a different VAE is selected. It does keep old compiled part+batch size+image size+model+lora+vae.vmfb files so any combination of them you've used at some point is probably still laying around. If you check the directory shark is in you can find the combinations you've already used pretty easily and use the same again to avoid a recompile. A few things like the scheduler files only get compiled once for a given size, not for every model.

The VMFBs normally get deleted when the --clear_all command line option is used or you delete them. I have 20GB of them right now and I've only run shark a little bit since the last clear so they're worth keeping an eye on. If you try out a lot of random model / LoRAS combinations you can easily be eating up a couple hundred gigs of space before you realize it.

from shark.

workpioupiou commented on June 14, 2024

Yes, I found them yesterday, a litlle 150GB to clear. I try to understand the differnces between scheduler, vae etc. But testing them take a lot of time.

from shark.

Selection of a new model or lora is slow about shark HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent