Comments (9)
reduced.trace.gcn.txt
reduced.spvasm.txt
from llpc.
I need to go back to the drawing board. The first instructions in BB0_4 is an OR, but I treated it like an AND.
from llpc.
What will happen if change the sample function from texture to textureLod?
use textureLod can avoid set whole quad mode (s_wqm_b64), and simplify the generated gcn code.
from llpc.
After looking at it more, I believe the implicit lod is required for to see the problem. The s_wqm_b64
instruction is done at the start of the function. Presumebly because the implicit lod needs all threads in the quad to be active.
However, after the code related to the first OpKill
, some threads will get turned off. When the image_sameple_b is reached, not every thread in quads with active threads will be active.
Does this sound right?
from llpc.
After looking at it further, I believe that the "SI Whole Quad Mode" is not complete.
Non uniform control flow will cause a work group that is in whole quad mode to no longer be in whole quad mode because part of a quad could go one way and the rest another. However, there is no instruction to put it back into whole quad mode.
I can take a look at it next week if nobody is looking at it yet.
Here is another example:
OpCapability Shader
OpCapability SampledBuffer
OpCapability StorageImageExtendedFormats
%1 = OpExtInstImport "GLSL.std.450"
OpMemoryModel Logical GLSL450
OpEntryPoint Fragment %ShadePixel "main" %gl_FragCoord
OpExecutionMode %ShadePixel OriginUpperLeft
OpSource HLSL 600
OpName %ShadePixel "ShadePixel"
OpDecorate %gl_FragCoord BuiltIn FragCoord
%float = OpTypeFloat 32
%v4float = OpTypeVector %float 4
%bool = OpTypeBool
%float_0 = OpConstant %float 0
%_ptr_Input_v4float = OpTypePointer Input %v4float
%void = OpTypeVoid
%26 = OpTypeFunction %void
%gl_FragCoord = OpVariable %_ptr_Input_v4float Input
%ShadePixel = OpFunction %void None %26
%27 = OpLabel
%28 = OpLoad %v4float %gl_FragCoord
%29 = OpCompositeExtract %float %28 0
%30 = OpFOrdLessThan %bool %29 %float_0
OpSelectionMerge %42 None
OpBranchConditional %30 %42 %31
%31 = OpLabel
%37 = OpDPdx %v4float %28
%38 = OpCompositeExtract %float %37 0
%39 = OpFOrdLessThan %bool %38 %float_0
OpSelectionMerge %40 None
OpBranchConditional %39 %41 %40
%41 = OpLabel
OpKill
%40 = OpLabel
OpBranch %42
%42 = OpLabel
OpReturn
OpFunctionEnd
from llpc.
Here is the code that llpc generates for the new example
Disassembly of section .text:
_amdgpu_ps_main:
s_mov_b64 s[0:1], exec // 000000000000: BE80017E
s_wqm_b64 exec, exec // 000000000004: BEFE077E
v_cmp_ngt_f32_e32 vcc, 0, v2 // 000000000008: 7C960480
s_and_saveexec_b64 s[2:3], vcc // 00000000000C: BE82206A
s_xor_b64 s[2:3], exec, s[2:3] // 000000000010: 8882027E
s_cbranch_execz BB0_4 // 000000000014: BF88000D
BB0_1:
v_mov_b32_e32 v0, v2 // 000000000018: 7E000302
s_nop 1 // 00000000001C: BF800001
v_mov_b32_dpp v0, v0 quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0// 000000000020: 7E0002FA FF085500
s_nop 1 // 000000000028: BF800001
v_subrev_f32_dpp v0, v2, v0 quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0// 00000000002C: 060000FA FF080002
v_mov_b32_e32 v0, v0 // 000000000034: 7E000300
v_cmp_gt_f32_e32 vcc, 0, v0 // 000000000038: 7C880080
s_and_saveexec_b64 s[4:5], vcc // 00000000003C: BE84206A
s_xor_b64 s[4:5], exec, s[4:5] // 000000000040: 8884047E
BB0_2:
s_mov_b64 exec, 0 // 000000000044: BEFE0180
BB0_3:
s_or_b64 exec, exec, s[4:5] // 000000000048: 87FE047E
BB0_4:
s_or_b64 exec, exec, s[2:3] // 00000000004C: 87FE027E
s_and_b64 exec, exec, s[0:1] // 000000000050: 86FE007E
v_mov_b32_e32 v0, 0 // 000000000054: 7E000280
exp mrt0 v0, off, off, off done vm // 000000000058: C4001801 00000000
s_endpgm // 000000000060: BF810000
from llpc.
The "SI Whole Quad Mode" implementation currently only supports Vulkan's model of implicit derivatives. The behaviour of these is only defined for uniform control flow.
From SPIRV 1.5 spec.:
2.20 Code Motion
Texturing instructions in the Fragment Execution Model that rely on an implicit derivative cannot be moved into control flow that is not known to be uniform control flow within each derivative group.
I haven't stared hard at the shader here, but I suspect this limitation might be what you are running into.
from llpc.
If this is the case, then you probably need to be using the DemoteToHelperInvocationEXT
extension, substituting OpDemoteToHelperInvocationEXT
for OpKill
instructions.
Note that OpDemoteToHelperInvocationEXT
is not a terminator, so it's not always a simple substitution as branches must converge.
I've been looking at adding a pass to do this automatically for shaders which rely on this kind of behaviour.
from llpc.
Thanks for the explanation. That sound exactly like what is going on. This was an hlsl shader being compiled for Vulkan instead of directx. I now think this should be fixed in DXC. It should generate the new instruction instead of an OpKill.
Also the pass you are looking at writing might be better in spirv-tools. It could be used by anybody instead of being specific to AMD hardware. I'll see if it will be needed on our side.
from llpc.
Related Issues (20)
- IR verification failures HOT 3
- Build options required for check-amdllpc appear to be undocumented HOT 5
- Code traceability strategies HOT 5
- Cleanup the pass pipeline HOT 2
- Make XFB query counter increment even if XFB output are not written to HOT 12
- Discussion about mesh shaders and GFX11 HOT 23
- Substitution of %gfxip in update_llpc_test_checks.py missing
- Redesign the processing of task payload HOT 1
- Warnings in debug support
- Refactor export built-ins via generic outputs
- Add LLVM assumptions for descriptor table size and alignment HOT 14
- Deprecate pre-GFX10 support in LLPC HOT 3
- PatchReadFirstLane breaks ballot code HOT 2
- [Tests] Re-enable test disabled in ffc49b2
- [Tests] Re-enable test disabled in f0436bb.
- [Tests] Re-enable tests disabled in 854b7b8ab1
- [Tests] Re-enable tests disabled 5db5dabbef
- [Tests] Re-enable tests disabled in 4d8954d9ec HOT 1
- [Tests] Re-enable test disabled in 82ef9665ff
- [Tests] Re-enable tests disabled in 0b156027cc
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llpc.