Comments (4)
TLDR: I think HiDiffusion could be supported in a way that is compatible with all of our other features. But, it would definitely be more effort than the one-liner that they advertise. We should do more testing to make sure that this feature is worth the implementation / maintenance effort (the examples in the paper look great).
I spent some time reading the HiDiffusion paper today. Here are my notes on what it would take to implement this:
HiDiffusion modifies the UNet in two ways: RAU-Net (Resolution-Aware U-Net) and MSW-MSA (Modified Shifted Window Multi-head Self-Attention). These are both tuning-free modifications to the UNet i.e. no new weights are needed.
The RAU-Net is intended to avoid subject duplication at high resolutions. It achieves this by changing the downsampling/upsampling pattern of the UNet layers so that the deep layers operate at resolutions closer to what they were trained on.
The MSW-MSA modification improves generation time at high resolution by applying windowing to the self-attention layers of the top UNet blocks.
I think we should be able make these changes in a way that is compatible with most other features, the main question is how much effort it will take.
Compatibility:
- Regional prompting: I think there are some places where we make assumptions about the UNet downsampling scheme, but those shouldn't be too hard to modify.
- TI: No changes required.
- LoRA: No changes required, but HiDiffusion might interfere with the effectiveness of some LoRAs.
- ControlNet: The HiDiffusion repo includes support for ControlNet in diffusers. We don't use the diffusers ControlNet implementation as-is, so there would probably be a bit of effort to get this working.
- Custom attention processors (regional prompting and IP-Adapter): Should just work, but some risk of conflict with MSW-MSA that I haven't anticipated.
- Sequential vs. batched conditioning: No changes required.
from invokeai.
W have a lot of custom logic around diffusers, and the "just add a single line!" doesn't necessarily apply to our implementation.
@RyanJDick @lstein Can you advise on effort to implement this? It would replace the HRO feature (automatic 2nd pass img2img).
from invokeai.
Is this limited to image sizes greater than the model's trained dimensions, or is the improvement greater at those dimensions (but still present at trained dimensions)?
from invokeai.
Is this limited to image sizes greater than the model's trained dimensions, or is the improvement greater at those dimensions (but still present at trained dimensions)?
MSW-MSA can be applied at native model resolutions to get some speedup. But, the amount of speedup would be much greater at higher resolutions. Based on some of the numbers reported in the paper, I'd guess that we could get a ~20% speedup from SDXL at 1024x1024. I'm not sure if there would be perceptible quality degradation. We'd have to test that.
from invokeai.
Related Issues (20)
- there is no option 4 in the invoke.bat starting menu HOT 2
- [bug]: ConnectionError when offline HOT 2
- [enhancement]: Better prompt adherence with Ella HOT 1
- Problem installing: version does not satisfy requirement rich + no matching distribution found for rich[bug]: HOT 1
- [bug]: xinsir/controlnet-tile-sdxl-1.0 is not probed correctly when scanned from a directory
- [enhancement]: Image graph does not contain random seeds used to generate an image (Could be a bug)
- [enhancement]:on send2trash exception there should be a force delete option HOT 1
- [bug]: Sliced Attention: `TypeError: '>' not supported between instances of 'str' and 'int'` HOT 1
- [enhancement]: Canvas - Zoom 100% (actual size) of the image/canvas HOT 2
- [enhancement]: Please support FLUX.1 HOT 53
- [bug]: Unable to install any upscaling models in Starter Models HOT 1
- [BUG]: Manual InvokeAI installation fails on metadata (pyproject.toml) HOT 5
- [enhancement]: Magic Prompts
- [bug]: UserWarning: 1Torch was not compiled with flash attention. HOT 5
- [bug]: Very slow with IPadapter (SDXL) HOT 3
- [bug]: docker/run.sh fails to parse the GPU_DRIVER parameter into the `profile` variable unless you delete commented lines above
- [enhancement]: Model Preview Image HOT 1
- [enhancement]: Rate and Sort option for gallery
- [bug]: Images saved as full.png rather than the actual file name HOT 1
- [bug]: Skeleton images are not converted correctly in openpose-sdxl HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from invokeai.