Comments (10)
#1306 has an interesting line here: https://github.com/google/clspv/pull/1306/files#diff-d82932682c3cce860c8c728b57e2a68e6b04593c1edb206934c340665fb71f29L16
kernel void bar(global int *A, global int *B, int n) { apple(A + 1, B, n); }
That PR changes the argument from A+1 to A+n, where n is non-constant.
This is an interesting distinction from the perspective of a general solution to the type inference problem. (Because the constant multiple of the element size is folded into the constant offset, and eventually has to be disentangled from a member offset at a deeper level of the type hierarchy. The folding loses information that has to be recovered somehow, in general.)
Any general solution will need to have test cases that do add a constant to an decayed array in this way.
I expect in the future that the solution applied to that test will also fall over and LLVM will instead produce something akin to:
%mul = mul i32 %n, %size_of_A
%gep = getelementptr i8, ptr addrspace(1) %A, %mul
from clspv.
I have a local hack that undoes this in certain circumstances. That makes the CTS pass, but this is likely uncovering a bug where we are generating the wrong addressing logic already that is just being exposed by the new code generation.
from clspv.
This change in LLVM is negatively impacting performance on ChromeOS. Between 10% to 30% regression depending on the various benchmark we have.
from clspv.
Clspv will generate invalid SPIR-V some cases due to this issue. Take test/Coherent/frexp.cl
for example. When compiled with -cl-native-math
the resulting SPIR-V generates the following validation error:
error: line 78: GLSL.std.450 Frexp: expected operand Exp data type to be a 32-bit int scalar or vector type
%50 = OpExtInst %float %49 Frexp %45 %48
The data + 1
results in gep canonicalization happening and clspv pick i8
as the underlying type for data
. This is a pre-existing issue that is triggered more frequently now.
I plan to update the coherent tests so here is the source for reference:
kernel void foo(global int *data, global float* x) {
float y = data[0];
barrier(CLK_GLOBAL_MEM_FENCE);
*x = frexp(y, data + 1);
}
from clspv.
Thinking more about a general solution I think clspv needs to distinguish between the two needs for type inference:
- Data type of a particular instruction
- Data type of the SPIR-V interface
Type inference can't just look past geps in the first case since we need to know how the type is interpreted for address calculations, but for the second case it's clear we need to have a more holistic view of how the interface is used.
from clspv.
#1306 has an interesting line here: https://github.com/google/clspv/pull/1306/files#diff-d82932682c3cce860c8c728b57e2a68e6b04593c1edb206934c340665fb71f29L16
kernel void bar(global int *A, global int *B, int n) { apple(A + 1, B, n); }
That PR changes the argument from A+1 to A+n, where n is non-constant.
This is an interesting distinction from the perspective of a general solution to the type inference problem. (Because the constant multiple of the element size is folded into the constant offset, and eventually has to be disentangled from a member offset at a deeper level of the type hierarchy. The folding loses information that has to be recovered somehow, in general.)
Any general solution will need to have test cases that do add a constant to an decayed array in this way.
from clspv.
At the beginning of this year, we implemented the experimental i8 Canonicalization of GEPs in our project. Initially, this feature simply converted offsets to i8, maintaining the original sizes of loads and stores without imposing a speed penalty. Additionally, we incorporated the fix from Issue #1269 and a custom optimization sequence, particularly running Clspv-opt with lower-addrspacecast and replace-llvm-intrinsics passes prior to Clspv.
The impact has been remarkable! Since these changes, Clspv has achieved unprecedented stability. We've successfully compiled a wide array of algorithms and higher-level language constructs from c++26 and b, including custom vectors, templated tensors, lambda chains, recursive filters, and ray tracing, all with almost no crashes from Clspv.
However, the main branches of Clspv and LLVM have seen significant changes recently, which initially prevented us from merging the latest commits without issues. Fortunately, this situation has improved, and we are now able to compile most of our projects using the current and unmodified version of Clspv.
Despite these advances, there is a new challenge: a great number of load operations have become fragmented into bytes, leading to an average 15% performance decrease, even on modern Nvidia hardware. This suggests a possible larger impact were it not for the mitigating effects of the drivers' compiler optimizations or hardwares' memory latency reduction mechanisms.
As engineers, we should strive to restore the original sizes of these loads and stores to minimize their instruction sequences. This will yield more succinct Spirv, and faster loading times as well.
Given the talent and dedication in this project, I am confident we will address this issue soon.
from clspv.
@BukeBeyond At the moment it feels like all identified regressions have been addressed from ChromeOS point-of-view. Meaning that we do not have regressions on the benchmarks we run, nor on the OpenCL-CTS.
If you have some small examples using bytes access while it's clear it could coalesce, I'll be happy to have a look at them to try to solve any potential regression.
from clspv.
Fortunately, Romaric fixed this #1318 case of load fragmentation right away. He was able to do it within minutes, whereas the closer investigation took me a couple of hours. :-)
It was a specific case of i32 loads, where we use them extensively for Unified Addressing (and Pointer) simulation under Vulkan. Now, our ray tracer is back to full speed, and with a slightly smaller Spirv size to boot. :-)
There remains only a handful of cases of load fragmentation on our end, but they will take a bit more time to investigate and simplify.
It seems the problems of i8 GEPs and the upcoming ptradd have been mostly solved. Yet the great benefits of this evolution are already here.
We program very interactively, hundreds of compiles per hour.
When first experimenting with Clspv, we would get a few crashes per Hour.
After Romaric's bug fixes last August, Clspv's crash rate dropped to a few a Day.
After the Canonicalization of GEPs to i8, the crashes reduced to just a Few a Month!
So things are looking quite good for the future of Clspv!
Happy Saint Patrick's Day! May luck's gentle hand guide ye, and, of far greater worth, may the warmth of good kin and comrades be ever by your side! :-) 🍀🌈 🥇☘️
from clspv.
And here is the latest fragmentation find #1322.
from clspv.
Related Issues (20)
- Loads of i32s are fragmented into 4 bytes HOT 3
- how can I cross compile clspv in x86 for risc-v HOT 1
- Loads and Stores of i32s from offset addresses are fragmented HOT 1
- Loads and Stores from offset addresses are fragmented HOT 5
- Implement compatibility for external LLVM-IL HOT 15
- Clspv Fragments access to global memory by the Smallest access size
- Improve `-cl-mad-enable` support
- MADs can be obfuscated by clspv HOT 4
- Segfault with default initialization
- Issue with C++ references
- Error in LinkBuiltinLibrary HOT 5
- -mfmt=c is not a valid option HOT 3
- Addrspace cast lowering pass creates invalid IR for intrinsic calls
- Bitcast lost due to llvm's opaque pointer HOT 4
- Crash with structures containing generic pointers
- LowerAddrSpaceCast cannot handle multiple llvm.memcpy
- Always generate MemoryAccessAlignedMask for loads/stores
- physical_ptrtoint tests failing
- Invalid OpPtrAccessChain into a Function Storage pointer
- Handle PHINode in LowerAddrSpaceCast better
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clspv.