Giter Site home page Giter Site logo

Comments (9)

s-perron avatar s-perron commented on September 1, 2024

reduced.trace.gcn.txt
reduced.spvasm.txt

from llpc.

s-perron avatar s-perron commented on September 1, 2024

I need to go back to the drawing board. The first instructions in BB0_4 is an OR, but I treated it like an AND.

from llpc.

linqun avatar linqun commented on September 1, 2024

What will happen if change the sample function from texture to textureLod?
use textureLod can avoid set whole quad mode (s_wqm_b64), and simplify the generated gcn code.

from llpc.

s-perron avatar s-perron commented on September 1, 2024

After looking at it more, I believe the implicit lod is required for to see the problem. The s_wqm_b64 instruction is done at the start of the function. Presumebly because the implicit lod needs all threads in the quad to be active.

However, after the code related to the first OpKill, some threads will get turned off. When the image_sameple_b is reached, not every thread in quads with active threads will be active.

Does this sound right?

from llpc.

s-perron avatar s-perron commented on September 1, 2024

After looking at it further, I believe that the "SI Whole Quad Mode" is not complete.

Non uniform control flow will cause a work group that is in whole quad mode to no longer be in whole quad mode because part of a quad could go one way and the rest another. However, there is no instruction to put it back into whole quad mode.

I can take a look at it next week if nobody is looking at it yet.

Here is another example:

               OpCapability Shader
               OpCapability SampledBuffer
               OpCapability StorageImageExtendedFormats
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %ShadePixel "main" %gl_FragCoord
               OpExecutionMode %ShadePixel OriginUpperLeft
               OpSource HLSL 600
               OpName %ShadePixel "ShadePixel"
               OpDecorate %gl_FragCoord BuiltIn FragCoord
      %float = OpTypeFloat 32
    %v4float = OpTypeVector %float 4
       %bool = OpTypeBool
    %float_0 = OpConstant %float 0
%_ptr_Input_v4float = OpTypePointer Input %v4float
       %void = OpTypeVoid
         %26 = OpTypeFunction %void
%gl_FragCoord = OpVariable %_ptr_Input_v4float Input
 %ShadePixel = OpFunction %void None %26
         %27 = OpLabel
         %28 = OpLoad %v4float %gl_FragCoord
         %29 = OpCompositeExtract %float %28 0
         %30 = OpFOrdLessThan %bool %29 %float_0
               OpSelectionMerge %42 None
               OpBranchConditional %30 %42 %31
         %31 = OpLabel
         %37 = OpDPdx %v4float %28
         %38 = OpCompositeExtract %float %37 0
         %39 = OpFOrdLessThan %bool %38 %float_0
               OpSelectionMerge %40 None
               OpBranchConditional %39 %41 %40
         %41 = OpLabel
               OpKill
         %40 = OpLabel
               OpBranch %42
         %42 = OpLabel
               OpReturn
               OpFunctionEnd

from llpc.

s-perron avatar s-perron commented on September 1, 2024

Here is the code that llpc generates for the new example

Disassembly of section .text:
_amdgpu_ps_main:
        s_mov_b64 s[0:1], exec                                     // 000000000000: BE80017E 
        s_wqm_b64 exec, exec                                       // 000000000004: BEFE077E 
        v_cmp_ngt_f32_e32 vcc, 0, v2                               // 000000000008: 7C960480 
        s_and_saveexec_b64 s[2:3], vcc                             // 00000000000C: BE82206A 
        s_xor_b64 s[2:3], exec, s[2:3]                             // 000000000010: 8882027E 
        s_cbranch_execz BB0_4                                      // 000000000014: BF88000D 

BB0_1:
        v_mov_b32_e32 v0, v2                                       // 000000000018: 7E000302 
        s_nop 1                                                    // 00000000001C: BF800001 
        v_mov_b32_dpp v0, v0  quad_perm:[1,1,1,1] row_mask:0xf bank_mask:0xf bound_ctrl:0// 000000000020: 7E0002FA FF085500 
        s_nop 1                                                    // 000000000028: BF800001 
        v_subrev_f32_dpp v0, v2, v0  quad_perm:[0,0,0,0] row_mask:0xf bank_mask:0xf bound_ctrl:0// 00000000002C: 060000FA FF080002 
        v_mov_b32_e32 v0, v0                                       // 000000000034: 7E000300 
        v_cmp_gt_f32_e32 vcc, 0, v0                                // 000000000038: 7C880080 
        s_and_saveexec_b64 s[4:5], vcc                             // 00000000003C: BE84206A 
        s_xor_b64 s[4:5], exec, s[4:5]                             // 000000000040: 8884047E 

BB0_2:
        s_mov_b64 exec, 0                                          // 000000000044: BEFE0180 

BB0_3:
        s_or_b64 exec, exec, s[4:5]                                // 000000000048: 87FE047E 

BB0_4:
        s_or_b64 exec, exec, s[2:3]                                // 00000000004C: 87FE027E 
        s_and_b64 exec, exec, s[0:1]                               // 000000000050: 86FE007E 
        v_mov_b32_e32 v0, 0                                        // 000000000054: 7E000280 
        exp mrt0 v0, off, off, off done vm                         // 000000000058: C4001801 00000000 
        s_endpgm                                                   // 000000000060: BF810000 

from llpc.

perlfu avatar perlfu commented on September 1, 2024

The "SI Whole Quad Mode" implementation currently only supports Vulkan's model of implicit derivatives. The behaviour of these is only defined for uniform control flow.

From SPIRV 1.5 spec.:

2.20 Code Motion
Texturing instructions in the Fragment Execution Model that rely on an implicit derivative cannot be moved into control flow that is not known to be uniform control flow within each derivative group.

I haven't stared hard at the shader here, but I suspect this limitation might be what you are running into.

from llpc.

perlfu avatar perlfu commented on September 1, 2024

If this is the case, then you probably need to be using the DemoteToHelperInvocationEXT extension, substituting OpDemoteToHelperInvocationEXT for OpKill instructions.
Note that OpDemoteToHelperInvocationEXT is not a terminator, so it's not always a simple substitution as branches must converge.
I've been looking at adding a pass to do this automatically for shaders which rely on this kind of behaviour.

from llpc.

s-perron avatar s-perron commented on September 1, 2024

Thanks for the explanation. That sound exactly like what is going on. This was an hlsl shader being compiled for Vulkan instead of directx. I now think this should be fixed in DXC. It should generate the new instruction instead of an OpKill.

Also the pass you are looking at writing might be better in spirv-tools. It could be used by anybody instead of being specific to AMD hardware. I'll see if it will be needed on our side.

from llpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.