Comments (4)
How did you generate the codegen in the diffs above? With the unchecked change, I think those copies should have been vectorized.
It's honestly sort of jank. I compile results with cargo and use rust flags to make rustc output the *.asm files, then use rustfilt to pretty it up a bit, then using git to track the diff in a nice interface. I'd use something like Godbolt, but setting up my own server with the Rust crates I want tends to be a lot more effort.
Regarding const fns not being inlined, that's odd to me, I thought all const fns would be evaluated at compile time and their result inlined. If you could open a PR that adds the #[inline] attribute on all const fns, that would be great!
If the entire expression is constant (inputs, outputs, static dependencies), the result will be computed at compile time, but if used in a non-const context, it will be treated as a normal function, including not inlining if the function is considered big enough.
from encase.
This was discussed on the Bevy Discord (link). A summarization of the analysis of the codegen:
- encase already does a capacity check only once per write:
Line 26 in 308bb72
ArrayMetadata
's accessors not being inlined, even on the most aggressive compilation settings.- Experimental change: Commit
- Change in codegen: diff
- Observed result: Metadata not being accessed causes unsupported use cases (i.e. certain structs in uniform buffers) to collapsing into the panic, resulting in a lot more codegen than necessary. Though this is probably not actively impacting hotpath performance
- Additional note: this also probably applies to the other metadata types as well, even if they're all
const
.
SliceExt::array
andSliceExt::array_mut
both use checked conversions into&mut [T]
on the target subslice. This results in a lot of extra branching. Attempted to replace this with an unchecked conversion instead.- Experimental change: commit
- Change in codegen: diff
- Observed result: All of the branches disappeared. The copy is not vectorized, though it should be, but there aren't any unnecessary branches anymore.
- Additional note: This implementation is unsound as there is no capacity check when converting to the array. Either the unsafe needs to be lifted out and added as an invariant (basically treating the slice as a raw pointer), or we need to find another way to avoid the branch.
TODO: Actually benchmark the changes here to see if the gains are significant enough to warrant these kinds of changes.
from encase.
One potential other middle ground is to change the vector and matrix implementations to directly copy their bytes instead of relying on the underlying components' implementations. This would eliminate the vast majority of the branches being produced, potentially allows for vectorized copies, and avoids the need for infectious use of unsafe.
from encase.
This is great stuff, thanks for looking into it!
I think the most promising optimization here would be using unchecked versions of SliceExt::array
and SliceExt::array_mut
, making all read
and write
methods unsafe
and making sure we always check the bounds in the buffer wrappers which is the API most users will interact with anyway. A PR doing this would be welcome but I'm curious how much perf we will get from this, we should benchmark it.
How did you generate the codegen in the diffs above? With the unchecked change, I think those copies should have been vectorized.
Regarding const fn
s not being inlined, that's odd to me, I thought all const fn
s would be evaluated at compile time and their result inlined. If you could open a PR that adds the #[inline]
attribute on all const fn
s, that would be great!
from encase.
Related Issues (20)
- How can you pad array of structures properly? HOT 2
- Allow a `Vec` newtype to be used within uniform buffers
- nalgebra support: ShaderSize is not implemented for Matrix<f32, Const<3>... HOT 10
- Provide a way to lay out structs with fields that are unknowable at compile time
- New release HOT 1
- Derived ShaderType rust-analyzer error HOT 3
- Consider updating `encase_derive_impl` to use `syn@2` HOT 3
- Provide a way to fetch whether a ShaderType is runtime sized or not HOT 2
- Consider inlining hot functions for primitives, vectors, and matricies HOT 1
- Stride of [f32; 4] fields in struct HOT 1
- Array stride in uniform address space HOT 1
- Not possible to retrieve the *aligned* written size? HOT 1
- Consider reexporting crates for which `ShaderType` are implemented HOT 3
- Implement `ShaderType` for packed types HOT 2
- MIRI reports UB on `<Vec<u8> as ByteVecExt>::try_extend_zeroed` HOT 2
- Wrong min_size() for struct HOT 4
- Implement `ShaderType` for `u64` and `i64` HOT 1
- How to Derive `Shadertype` for generic structs without unsafe code? HOT 3
- Vector of Structs support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from encase.