Comments (2)
So a divergent warp error happens when not all threads in a 2x2 pixel quad are active during a fragment shader execution when performing an FSWZ instruction.
The FSWZ allows the 4 threads each corresponding to one of the pixels in a 2x2 pixel quad to share data. FSWZ is used to implement DDX, DDY, and TXD in the NV_gpu_program5 specification. Essentially these instructions provide a partial derivative approximation in the screen-space X and Y directions.
When a divergent error happens, the result is either 0 or positive infinity, depending on the default partial value setting (State.ShaderControl.DefaultPartial).
There a NDV flag to the FSWZ instruction that forces the pixel quad to be treated as non-divergent. But in this case, you’ll be differencing with an adjacent pixel’s register that may or may not have the same value. So this could result in a bogus partial derivative approximation.
Let me give an example. Say you had some shader execution like (assume all pixel threads are non-diverged entering this code):
R0 = 0;
if (R1 > 3) {
R0 = sin(R1);
R2 = DDX(R0); // implemented by FSWZ
}
If some, but not all the threads for the pixel fragment, execute the DDX and its FSWZ, you’ll get the DIVERGENT error.
Normally, a DDX would be something like:
FSWZ.1032 R2,R0,R0,PNNPPNNP;
The DDY and TXD are more complicated expansions. DDY is complicated because you have to handle the Y origin.
Now if R0 is the same logical value (in other words, the same variable) inside and outside the if, it would make sense to use the NDV flag. Now if the compiler had done some register renaming and so R0 inside the if was really a totally different variable from outside, it wouldn’t make sense to have the NDV flag.
The other way to fix this would be:
R0 = 0;
if (R1 > 3) {
R0 = sin(R1);
}
R2 = DDX(R0); // implemented by FSWZ
Now the DDX is outside the conditionally and the execution wouldn’t be possibly diverged. This is probably the better solution.
You can execute FSWZ in non-fragment shaders, but the NDZ flag is required then.
In conclusion, the naïve fix is just slap the NDV flag on every FSWZ as this will suppress the error. This will mean the partial derivatives won’t be trustworthy as you’ll l be differencing with what is potentially whatever-value-happens-to-be-in-the-adjacent-threads-register.
The better fix is making sure the compiler moves FSWZ instructions outside possibly divergent code (case 2 above) or arranging the code so the FSWZ can be legitimately used with the NDZ flag (case 1 above).
Does this help?
from nouveau.
Ultimately there's only so much that the compiler can do -- there can be loops/etc which make it impossible to move it out of the divergent section. I'm definitely tending towards just slapping NDV everywhere and not worrying about it (as this situation is disallowed by GLSL).
Thanks for the detailed analysis!
from nouveau.
Related Issues (16)
- Kernel panic with GK20A (Shield Tablet)
- [question] nvbios: mapping of extedvs with 0x28 P table HOT 3
- [question] What is EVO method 0x100 on base channel? HOT 1
- [question] HDMI pixel clock limits HOT 5
- [question] how does ZCULL work? HOT 2
- [question] I2C Device Table Entry. Type: 0xa0 HOT 2
- [question] Voltage calculation HOT 7
- Phoronix test: GTX 970 only Maxwell GPU that doesn't work with GM2xx initial Nouveau patches.. HOT 1
- [question] [Fermi] Is there a way to accumulate buffer offset after transform feedback (aka streamout) HOT 3
- [question] [Tesla] How to know whether to post the card?
- GM20x sw_nonctx bundles touch non-existent register
- GK104+ SUSTP image format enums HOT 4
- [question] P+0x50 table documentation
- [question] ACPI/PCIe lockup issue on Skylake + Maxwell hybrid graphics laptops
- nouveau [ DRM] 0xD576: Parsing digital output script table HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nouveau.