Giter Site home page Giter Site logo

glsl's Introduction

GLSL

GLSL and ESSL are Khronos high-level shading languages.

Khronos Registries are available for

Extension specifications in this repository are listed below.

This Project Contains

This GLSL shading language project contains the following for the GLSL and ESSL Khronos shading languages:

  • issue tracking for the core specifications
  • issue tracking for shading language extensions (however, vendor-specific extension issues should be discussed with the vendor)
  • new shading language extension proposals and discussions
  • shading language extensions that do not live in the Khronos registries for OpenGL or OpenGL ES (e.g., those created to enable access to Vulkan features)

Note this family of languages is used by (at least) the following APIs:

  • OpenGL: consumes GLSL and ESSL
  • OpenGL ES: consumes ESSL
  • Vulkan: makes use of GLSL and ESSL, via SPIR-V

While OpenGL and OpenGL ES normatively accept GLSL and ESSL as input into their APIs, this is not true of core Vulkan, which normatively accepts SPIR-V but does not normatively consume a high-level shading language.

Extension Specifications in this Repository

glsl's People

Contributors

aaronhaganamd avatar abhilash1910 avatar alan-baker avatar alegal-arm avatar alelenv avatar cmarcelo avatar dadschoorse avatar dgkoch avatar ewerness-nv avatar gfxstrand avatar gnl21 avatar jackohound avatar jeffbolznv avatar johnkslang avatar kpet avatar m-kim avatar mbechard avatar mchock-nv avatar nvpbrown avatar oddhack avatar pdaniell-nv avatar pknowlesnv avatar pow2clk avatar sparmarnv avatar stu-s avatar tobski avatar tsuoranta avatar tylernowicki avatar wooyoungqcom avatar wyvernwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

glsl's Issues

Mismatch in the subgroup spec and subgroup tutorial about subgroupBroadcast

Atfer comparing the Vulkan subgroup tutorial and this subgroup spec, I found the explanation of subgroupBroadcast in Vulkan subgroup tutorial is contradictory to the explanation in the subgroup spec.

In the Vulkan subgroup tutorial (https://www.khronos.org/blog/vulkan-subgroup-tutorial), it said,

T subgroupBroadcast(T value, uint id) broadcasts the value whose gl_SubgroupInvocationID == id
to all other invocations (id must be a compile time constant). ”

So in this tutorial, subgroupBroadcast is a WRITE operation.

However, in the subgroup extension spec(https://github.com/KhronosGroup/GLSL/blob/master/extensions/khr/GL_KHR_shader_subgroup.txt), it said,

The function subgroupBroadcast() returns the <value> from the invocation
whose <gl_SubgroupInvocationID> is equal to <id>.  <id> must be an integral
constant expression.  If the <id> is an inactive invocation or is
greater than or equal to <gl_SubgroupSize>, an undefined value is returned. 

Which means in the spec, it is a READ operations.

Did I miss anything?

[Edited to see the full quote, much of which was disappearing due to mark down.]

Is it valid to access UBOs from rmiss/intersection shaders

I am observing a blocking pipeline when I try to access a UBO block from the miss shader for e.g. returning a "skybox" color configured from a uniform.

I skimmed the vulkan spec and the extension addendum and there is no clear wording on raygen shaders being able to access uniforms and miss shaders not.

So is this a nvidia bug or expected behavior?

Missing interaction with ARB_bindless_texture and ARB_gpu_shader_int64

The ARB_bindless_texture extension predates ARB_gpu_shader_int64 by a couple years. At the time 64-bit integer types were only provided by NVIDIA vendor extensions. As a result, ARB_bindless_texture says:

(19) How does ARB_bindless_texture differ from NV_bindless_texture?

  RESOLVED:

  - The constructors to convert between sampler and integer types use uvec2
    rather than uint64_t to avoid a dependency on the 64-bit integer types
    in the shader from NV_gpu_shader5.

However, these types were later added by ARB_gpu_shader_int64, but neither extension mentions any interaction with the other. Specifically, I believe that ARB_gpu_shader_int64 should say that uint64_t(sampler) and sampler(uint64_t) constructors exist if ARB_bindless_texture is also supported.

As far as I'm aware, only NVIDIA currently supports both these extensions, and I believe they support these constructors. Mesa currently supports ARB_gpu_shader_int64, and ARB_bindless_texture is on the way.

fmin / fmax don't match GLSL.std.460 behavior

The GLSL 4.60 (and also earlier) spec says for min:

Returns y if y < x; otherwise it returns x.

The GLSL.std.460 spec says for FMin:

Result is y if x < y; otherwise result is x. Which operand is the result is undefined if one of the operands is a NaN.

It seems like when GLSL.std.460 adopted this sensible behavior that the GLSL 4.60 (and possibly earlier) spec should have been updated to match. I suspect that most drivers and hardware have been implementing the GLSL.std.460 behavior for years.

GL_KHR_shader_subgroup_ballot typo

(Retracted and closing upon rereading...)

  • Bits are packed such that the first invocation is represented in bit
    0 of the first vector component, and the last (up to
    <gl_SubgroupSize>) is the highest bit number in the last vector
    component needed to represent all bits for the total number of
    subgroup invocations.

Clarification for layout rules in section 7.6.2.2 of the OpenGL 4.5 spec.

Reading the layout rules, rule 6.

  • If the member is an array of S column-major matrices with C columns and R rows, the matrix is stored identically to a row of S x C column vectors with R components each, according to rule (4).

What is the meaning of row in this point (similar with point 8). Should it instead say array to signal the array rules should apply (so, effectively, a mat2x2[] becomes an array of arrays of vec2's?)

Typo: "uint64_t, uint64_t" ?

The same conversion type is repeated.

| uint32_t | uint64_t, uint64_t |

 The following table shows allowed integral conversions:
      -------------------------------------------------------------------------
      | Type of    |     Can be implicitly converted to                       |
      | expression |                                                          |
      -------------------------------------------------------------------------
      | ...        |                                                          |
      | uint32_t   | uint64_t, uint64_t                                       |
      | ...        |                                                          |
      -------------------------------------------------------------------------

should samplerND(samplerND, sampler) be added?

SPIR-V has language that allows to extract the OpTypeImage via OpImage from OpTypeSampledImage, which allows the recombination to a new OpTypeSampledImage with a different OpTypeSampler

therefore shouldn't the language also allow samplerND constructors from existing combined samplers?
samplerND(samplerND, sampler)

Compute: When can gl_GlobalInvocationID + gl_LocalInvocationIndex be used?

gl_GlobalInvocationID and gl_LocalInvocationIndex are both derived from, among others, gl_WorkGroupSize. There is language that says that

It is a compile-time error to use gl_WorkGroupSize in a shader that does not declare a fixed local group size, or before that shader has declared a fixed local group size, using local_size_x, local_size_y, and local_size_z.

So ... is it an error to use the derived values before a layout(local_size_*) is declared in that shader? The corollary to all this, is there some variant of

[shader 1]
#extension GL_ARB_compute_variable_group_size: enable;
layout(local_size_variable) in;

[shader 2]
void main() { use(gl_GlobalInvocationID); }

that is allowed when these are linked together into a single program?

Fragment shader invocation generation for uncovered neighbors in subgroup quads

Hello again Khronos!

I was intrigued by this section of extensions/khr/GL_KHR_shader_subgroup.txt (lines 606-611):

    If a primitive covers a fragment at (x, y), its fragment shader
    invocation will be in a quad with fragment shader invocations
    corresponding to the three neighboring pixels at (x + 1, y), (x, y + 1),
    and (x + 1, y + 1).  These four invocations are arranged in a 2x2 grid,
    that make up the quad.  If the neighbors of a fragment are not covered
    by the primitive, fragment shader invocations will still be generated.

What isn't clear to me from the existing text is the nature of the fragment shader invocations generated for uncovered neighbors. More specifically, I'd like to know how to distinguish quads or, barring that, fragment shader invocations that include such uncovered neighbors from those that don't.

A "knowledgeable source" (okay fine it was @nsubtil) suggested it might be done with subgroupBallot(true) or gl_HelperInvocation, but as the text neither says the invocations are inactive nor that they are helpers, it is not at all clear.

GL_EXT_shader_8bit_storage doesn't have an extension file

You can use GL_EXT_shader_8bit_storage in latest glslang, but this repository doesn't have a spec for it. I'm guessing it's basically the same as GL_EXT_shader_16bit_storage but would be good to have a reference text regardless.

Can gl_in be redeclared with an implicit array size in tessellation shaders?

From KhronosGroup/glslang#1300:

  • Generally, an array declared as explicitly-sized cannot be redeclared without a size
  • Tessellation evaluation and control shaders pre-declare gl_in as gl_in[gl_MaxPatchVertices]
  • A later example of redeclaring gl_in uses gl_in[], which seems okay for geometry shading, but not for tessellation

Now, the purpose of redeclaring a built-in block is to subset its members, not establish array size. So, it could be tolerated to redeclare gl_in[] even if it was pre-declared with a specific size.

Or, we could be firm that it needs a size, and must be redeclared as gl_in[gl_MaxPatchVertices].

One question is how have drivers interpreted the specification? As, I think, the specification could be clarified in either direction. We would then ensure glslang follows suit.

Another question is whether another part of the specification already makes this more clear.

(Currently, glslang is narrow, requiring the explicit redeclaration, which seems quite intentional, as the check is specific to built-in block redeclarations.)

Compute: Control barrier and shared memory within the local work group

The GLSL spec isn't very clear if a control barrier is all that is needed to synchronize access to shared memory in compute shaders. There are two options:

  • barrier() synchronizes execution order and makes writes to shared memory visible to other invocations with the local work group,
  • barrier() synchronizes execution and additionally memoryBarrierShared() is required in order to make memory writes visible to other invocations within the local work group.

There is language in the GLSL spec (section 8.16 Shader Invocation Control Functions) that says:

The function barrier() provides a partially defined order of execution between shader invocations. This ensures that values written by one invocation prior to a given static instance of barrier() can be safely
read by other invocations after their call to the same static instance barrier().

The above quotation suggests that barrier() is sufficient to synchronize access to shared memory in compute shaders. It's worth to mention that this language was initially introduced in ARB_tessellation_shaders.

On the other hand, the SPIR-V spec states explicitly that control barriers make writes visibile only for tessellation shaders:

When used with the TessellationControl execution model, it also implicitly synchronizes the Output Storage Class:
Writes to Output variables performed by any invocation executed prior to a OpControlBarrier will be visible to any other invocation after return from that OpControlBarrier.

So, are memoryBarrierShared() barriers required together with control barriers in compute shaders in order to make memory writes visible to other invocations within the local work group? Is SPIR-V different than GLSL?

No matching requirement for rayPayloadInNV

I suppose there is missing explicit requirement, that is though implied as for example @ewerness-nv mentions here KhronosGroup/glslang#1638 (comment)

A closest hit shader with no functional rayPayloadInNV is allowed, but the trace call has to have one identified, so because of the matching rules, the closest hit shader has to have one, even if it's just an empty structure, so that's probably the best thing to warn on.

I can't find any traces of matching rules for rayPayloadNV / rayPayloadInNV

4.3.X rayPayloadInNV Variables
These are allowed only in any-hit, closest-hit, and miss shaders only.
It is a compile-time error to use them in any other stage.
They can be read to and written in any-hit, closest-hit, and miss shaders
There can be only a single variable at global scope with this qualifier in
stages where this qualifier is permitted. If multiple variables are present
results of accessing these variables are undefined.
It is a compile-time error to declare unsized arrays of this type.

I suppose it should contain something similar to callableDataInNV

unsized arrays of this type. Type of this variable must match the type of callableDataNV
as passed by parent shader invoking this callable otherwise results are undefined.

May be:

Type of this variable must match the type of rayPayloadNV as passed
by an upstream shader otherwise results are undefined.

GLSL 4.60: Typo - static use *of* gl_FragCoord

In section 4.4.1.3, we can read:

gl_FragCoord is redeclared in any fragment shader in a program, it must be redeclared in all the fragment shaders in that program that have a static use [of] gl_FragCoord.

"of" is missing from the spec.

Possible error in GL_KHR_shader_subgroup.txt : Useless subgroupMemoryBarrier functions?

The function subgroupBarrier performs both an execution and a full memory barrier

The function subgroupBarrier() enforces that all active invocations within a
subgroup must execute this function before any are allowed to continue their
execution and the results of any memory stores performed using coherent
variables performed prior to the call will be visible to any future
coherent access to the same memory performed by any other shader invocation
within the same subgroup.

I wonder if there is any usefulness for subgroupMemoryBarrier functions since they do not perform an execution barrier :

The function subgroupMemoryBarrier() enforces the ordering of all memory
transactions issued within a single shader invocation, as viewed by other
invocations in the same subgroup.

However, it is written that the invocations within a subgroup run in parallel :

A subgroup is a set of invocations exposed as running concurrently with
the current shader invocation. The number of invocations within a
subgroup (the size of the subgroup) is a fixed property of the device.

Since they are running in parallel, is it useful to have an execution barrier? If it is not needed, the subgroupMemoryBarrier functions became useful, but the subgroupBarrier function becomes useless.

However, if we do need an execution barrier within a subgroup, I think it is necessary to synchronize other operation like shuffling as well. However, it is probably implicit like __shfl_down_sync in CUDA.

So,

  • is the subgroupMemoryBarrier functions are useful and subgroupBarrier useless?
  • Is the subgroupBarrier useful and subgroupMemoryBarrier functions useless?
  • Is the specs are going to change (like for barrier and memoryBarrier functions for a local group)?
  • Is my understanding really bad?

texelFetch cannot be used with texture2D

SPIR-V uses texelFetch without a sampler, and so does HLSL, but not GLSL, where you need a combined image sampler (for legacy reasons I assume). This causes interoperability issues between HLSL and GLSL.

A bug was filed to SPIRV-Cross which cannot be solved properly without a fix to Vulkan GLSL. KhronosGroup/SPIRV-Cross#424

Allow passing readonly SSBO members as "in" parameters

In KhronosGroup/glslang#1870, glslang was changed to reject passing a readonly buffer member to a function if the formal parameter is not qualified as readonly. The spec language is:

Variables qualified with coherent, volatile, readonly, or writeonly may not be passed to functions whose formal parameters lack such qualifiers. (See section 6.1 “Function Definitions” for more detail on function calling.)

This made sense when image_load_store was added to GLSL (before SSBOs), since any image variable is a reference to memory. It was also a convenient way to say "you can't pass a readonly image to imageStore()." Then when SSBOs came along, they were added to this section by just saying "it's the same as with images":

The memory qualifiers coherent, volatile, restrict, readonly, and writeonly may be used in the declaration of buffer variables (i.e., members of shader storage blocks). When a buffer variable is declared with a memory qualifier, the behavior specified for memory accesses involving image variables described above applies identically to memory accesses involving that buffer variable.

But consider the following example using a readonly buffer:

layout(set = 0, binding = 0, std430) readonly restrict buffer A {
    float b;
} a;

void main()
{
   round(a.b);
}

According to this spec language, it is illegal to pass a.b to round() because a.b is readonly (inherited from its parent) and round's formal parameter is not qualified as readonly. However, round's formal parameter is implicitly in, and IMO it should be valid to pass a readonly buffer member as an in parameter. in is defined as "The keyword in is used as a qualifier to denote a parameter is to be copied in, but not copied out."

My mental model for this is something like "Reading a readonly buffer member returns an rvalue. values passed to an in parameter must be rvalues. Therefore it is valid to pass a readonly buffer member to an in parameter."

I propose we relax the GLSL spec to allow passing readonly buffer members to formal parameters qualified with in. To be consistent, it probably also makes sense to allow passing writeonly buffer members to formal parameters qualified with out.

No available machine processable glsl specification

This comes from [email protected] in bugzilla 1283:
https://www.khronos.org/bugzilla/show_bug.cgi?id=1283

The specification for the GLSL language is in a good format for human-reading but can be problematic for computer based reading. If the GLSL specification was also publish in an XML-like format (https://cvs.khronos.org/svn/repos/ogl/trunk/doc/registry/public/api/gl.xml) similar to the OpenGL specification then it can be better used for applications that benefit from knowledge of GLSL feature-sets per version, such as interactive editors.

GL_EXT_subgroupuniform_qualifier and flow control rejoining

cc @sheredom

Consider this example:

  subgroupuniformEXT int x;
  if ((gl_InvocationID & 1) != 0) {
    x = 0
  } else {
    x = 1
  }  
  use x;

Both assignments can be correctly decorated as Uniform in the SPIR-V, but when the if and else blocks rejoin there would be an OpPhi in the SPIR-V that is not Uniform. It seems like the GLSL extension spec doesn't clarify what happens in this case. Is it an error? Undefined behavior? Does the frontend compiler stop propagating Uniform when flow control rejoins?

ESSL specification appendix was wrong. (version before gpu_shader5)

According to Khronos GLES 3.1 ES shader language specification,

I see this version of specification and before.

--
Language Version: 3.10
Document Revision: 4
29 January 2016
Editor: Robert J. Simpson, Qualcomm

I found a strange notation like as below.
last RESOLUTION paragraph clearly said that dynamic indexing of sampler arrays is now prohibited.
but other paragraph said allowed as if indexing is allowed. It seems contradictory notations.
What does mean this allowed?
As we know before gpu_shader5 extension ESSL doesn't have dynamically indexing for sampler.
This paragraph will only induce us misread.
Main purpose of this paragraph is only l-value not indexing. This paragraph about indexing should be omitted.

13.29 Samplers
Should samplers be allowed as l-values? The specification already allows an equivalent behavior:
Current specification: uniform sampler2D sampler8;

 int index = f(...);
 vec4 tex = texture(sampler[index], xy); // allowed

Using assignment of sampler types:

 uniform sampler2D s;
 s = g(...);
 vec4 tex = texture(s, xy); // not allowed

RESOLUTION: Dynamic indexing of sampler arrays is now prohibited by the specification. Restrict
indexing of sampler arrays to constant integral expressions.

Improve docs for gl_ClipDistance: Interface blocks can't be redeclared in GLSL < 4.10

Redeclaring blocks needs GL_ARB_separate_shader_objects or #version 410. When redeclaring gl_ClipDistance[] with a specific array size you have to use:
out float gl_ClipDistance[6];
The wiki page does not state a specific version requirement. The reference page and the built-ins wiki page don't either. The GLSL 3.3 language specs say redeclaring built in variables it forbidden unless specifically allowed for certain variables, so that probably covers it, but is easy to miss. Some implementations seem to complain, some don't.
See also this issue.

How does NV_compute_shader_derivatives relate with variable workgroup sizes?

The text makes clear that we need to validate the work group size in compile time, so that would exclude variable size work groups.

If that's not the case, what to do when the size is set to something that don't follow the rules for the given arrangement?

Adding a clarification about this could be nice. (Same comment applies to the corresponding SPV spec)

[GL_NV_ray_tracing] Is shaderRecordNV buffer readonly or writable?

Per Vulkan spec:
The shader binding tables to use in a ray tracing query are passed to vkCmdTraceRaysNV. Shader binding tables are read-only in shaders that are executing on the ray tracing pipeline.

But in GL_NV_ray_tracing spec, layout shaderRecordNV only can be bound to buffer and it is writable in glslang test spv.RayGenShader.rgen.

Could you help to clarify which one is expected? thanks!

Precise calculation and branch conditions

Given

#version 450

layout (location = 0) in float x;
layout (location = 1) in float y;
layout (location = 2) in float z;

precise gl_Position;

void main(void)
{
    gl_Position.xyzw = vec4(0.0);
    if ((x * y + z) > 0.0) {
        gl_Position.x = 1.0;
    } else {
        gl_Position.x = 2.0;
    }
}

Does the precise on gl_Position apply to the calculation x * y + z?

Want pass-by-reference function-call syntax to avoid implication of doing large copies.

Here's my minimal test code:

#version 450
#extension GL_ARB_separate_shader_objects : enable

layout(location = 0) in vec2 fragScreenCoord;
layout(location = 0) out vec4 resColor;

#define MaxObjectsCount 20
struct Object
{
  uint bla1;
  uint bla2;
};

struct Objects
{
  Object data[MaxObjectsCount];
};

Objects objects;

void CompareExchange(/*inout Objects objects, */uint i, uint j)
{
  if(objects.data[i].bla1 > objects.data[j].bla1)
  {
    Object tmp = objects.data[i];
    objects.data[i] = objects.data[j];
    objects.data[j] = tmp;
  }
}
void main()
{
  resColor = vec4(1.0f, 0.5f, 0.0f, 1.0f);
  for(int i = 0; i < MaxObjectsCount; i++)
  {
    objects.data[i].bla1 = uint(gl_FragCoord.x) - i;
    objects.data[i].bla2 = uint(gl_FragCoord.y) + i;
  }
  for(int i = 0; i < MaxObjectsCount; i++)
  {
    for(int j = 0; j < MaxObjectsCount - i - 1; j++)
    {
      CompareExchange(/*objects, */j, j + 1);
    }
  }
  resColor.r += objects.data[0].bla1 * 1e-3f;
}

This code runs in fragment shader and takes approximately 30ms on my GTX1050. Uncommenting the function argument makes this code run approximately 20(!) times slower to about 650ms. The results are the same both in OpenGL application that uses glCompileShader() and in Vulkan that uses glslang to compile this glsl into spirv.

So my questions are: why is there no syntax for proper function inlining to make sure that this does not happen? Why is there no syntax for proper pass-by-reference instead of pass-by-copy-in/copy-out semantic "inout"?

Can share variables be declared with initializer?

GLSL ES 3.10 says "Variables declared as shared may not have initializers and their contents are undefined at the beginning of shader execution.". Does the wording 'may not' means 'must not'?' It's actually allowed in many GL driver implementations per my observation, at least Linux INTEL/AMD/NVIDIA, and Windows NVIDIA. It would be better to have it clarified in spec. Thanks a lot!

GLSL 4.60: Missing mention of application of storage block names.

In 4.3.9: Interface Blocks, the specification says:

For uniform blocks, the application uses the block name to identify the block.

This should be for both uniform and storage blocks.

The same goes for:

Matched uniform block names (but not input or output block names) must also either all be lacking an instance name or all having an instance name, putting their members at the same scoping level.

Why is `gl_SubgroupSize` not a constant?

Why is gl_SubgroupSize not a constant?
Per Vulkan spec it is PhysicalDevice property, so it feels it could be a built-in constant.
Currently it seems to be listed under built-in variables.

I am also not certain why gl_NumSubgroups is only "uniform across the invocation group". Why would it ever change within the workgroup?

Explicit uniformity / convergence

This is a proposal to add new keyword modifiers for variables for improved compile-time warning and error generation, compile-time diagnosed, and improved optimization opportunities.

Variables in GLSL can be of 4 different convergency levels:

  1. uniform
  2. threadgroup_uniform
  3. simd_uniform (aka dynamically uniform)
  4. dynamic

Each one a downgrade from the previous one.

Variables like gl_WorkGroupID are threadgroup_uniform by nature.

Any variable coming out from anyInvocationARB and family are explicitly upgraded to simd_uniform.

Uniform variables mixed up with simd_uniform variables get downgraded to simd_uniform
simd_uniform variables mixed up with dynamic variables get downgraded to dynamic.
"Mixing" can be addition, subtraction, multiplication, etc.

Any operation where two or more variables are involved with different attributes result in a variable that has the lowest attribute.

When a variable of lower convergence is used on a function that strictly requires a variable of higher convergence (e.g. using a dynamic variable on a function that requires a simd_uniform, threadgroup_uniform or uniform argument), it should throw an error.

The are several cases that would benefit from this.

Example 1
Execution barriers result in UB if called from within a branch evaluating variables of convergence lower than threadgroup_uniform.
This error could be easily detected at compile-time.

dynamic float value = ...;

if( value >= threshold )
    memoryBarrier(); //Error, memoryBarrier calling from a non-uniform branch at line 123

Solution:

dynamic float value = ...;

if( threadgroup_uniform_promise( value >= threshold ) )
    memoryBarrier(); //Allowed, but it's the user responsibility to ensure the data always meets 'value >= threshold' uniformly

Please note that threadgroup_uniform_promise( value >= threshold ) is not the same as threadgroup_uniform float value = threadgroup_uniform_promise( ... ).

The latter gives authorization to the compiler to put 'value' in an SGPR register which can have other undesired side effects if 'value' is used for something else than the evaluation of this branch.

Precedent: FXC compiler performs this type of diagnostic. See microsoft/DirectXShaderCompiler#1306

Example 2

The following statements can produce compile-time error unless VK_EXT_descriptor_indexing is present (or any similar extension that interacts with this feature):

int i = input_pixelshader.uv.x;
albedo = texture( myTex[i], uv ); //Compiler error, i is dynamic.

simd_uniform int i = input_pixelshader.uv.x; //Compiler error, cannot implicitly cast
albedo = texture( myTex[i], uv ); //OK

simd_uniform int i = readFirstInvocationARB( input_pixelshader.uv.x ); //OK
albedo = texture( myTex[i], uv ); //OK, i is explicitly simd_uniform

uint i = gl_WorkGroupID.x;
albedo = texture( myTex[i], uv ); //OK, i is implicitly threadgroup_uniform.

uint i = gl_WorkGroupID.x + some_dynamic_variable;
albedo = texture( myTex[i], uv ); //Compiler error, i is dynamic.

simd_uniform uint i = gl_WorkGroupID.x + some_dynamic_variable; //Compiler error, cannot implicitly cast
albedo = texture( myTex[i], uv ); //OK

Example 3

The following statement can be optimized thanks to extra information:

simd_uniform int i = readFirstInvocationARB( idx ) % 2;
if( i == 0 )
    albedo = texture( myTex[0], uv );
else if( i == 1 )
    albedo = texture( myTex[1], uv );

// Can be optimized to:

simd_uniform int i = readFirstInvocationARB( idx ) % 2;
albedo = texture( myTex[i], uv );

Likewise, the following readFirstInvocationARB call can be converted to a no-op:

simd_uniform int foo = ...;
simd_uniform int bar = readFirstInvocationARB( foo ); //No-op.

Downgrading:
Downgrading is simple. Shader inputs have a natural convergence type. gl_WorkGroupID is threadgroup_uniform, any value coming out of a texture fetch is dynamic.

Simply mixing shader inputs of different convergence results in the lowest common denominator.

Variables can also be explicitly downgraded. Compilers could generate warnings when this is unnecessary. For example:

dynamic uint idx = gl_WorkGroupID.x; //Warning: Unnecessary Convergence degradation. Consider declaring this variable of type threadgroup_uniform

Upgrading:

Upgrading must always be explicit:

uniform int var = uniform_promise( foo );
threadgroup_uniform int var = threadgroup_uniform_promise( foo );
simd_uniform int var = simd_uniform_promise( foo );

There are some functions that can also be used for explicit upgrading, such as readFirstInvocationARB.

The difference between readFirstInvocationARB and simd_uniform_promise is that the former may for example perform instructions to move a value out of a VGPR register to an SGPR register; while the latter is a simple assumption.
For example GPU cards which support non-dynamically-uniform indexing of textures may choose to generate instructions that index the texture by using the index from a VGPR directly, instead of moving the index to an SGPR register first.
On GCN cards readFirstInvocationARB and simd_uniform_promise would basically do the same.

If the user breaks his promise, the use of readFirstInvocationARB and simd_uniform_promise could mean the code behaves differently. Additionally, readFirstInvocationARB always results in defined behavior (results could be non-deterministic though due to race conditions or the data being fetched), while breaking the contract of simd_uniform_promise is always UB.

Variables with explicit convergence keyword cannot be implicitly downgraded or upgraded

The following results in compiler error:

simd_uniform variable = 0;
variable += texture( myTex, uv ).xyzw; //Compiler error

The following is correct, but won't explicitly upgrade.

simd_uniform variable = 0; //Variable is simd_uniform but could be uniform
variable += 5; //OK, but variable is still simd_uniform instead of uniform. Compiler could raise a warning

Note that compilers may still optimize based on the knowledge that 'variable' is actually uniform. But they must behave as if 'variable' is of type simd_uniform, e.g. when it comes to raising compiler errors.

Variables without explicit convergence keyword

Variables declared without any convergence keyword are automatically calculated given by their input, and they can mutate to other types even after their initial declaration. For example:

float4 value = 0; //value is now uniform
value += gl_WorkGroupID.x; //value is now threadgroup_uniform
value *= texture( myTex, uv ).xyzw; //value is now dynamic
value = 0; //value is now uniform

Compilers and other tools can help the programmer identify at which lines variables changed convergency.

With this simple scheme, it becomes possible to diagnose common human mistakes, it opens up potential new optimizations thanks to the extra available information, helps preventing accidental performance regressions caused by a variable becoming dynamic (when e.g. the code was originally written with the variable being simd_uniform), as well as opening the possibility of new tools to help programmers find bugs or improve performance.

Something as silly as writing at any random location:
simd_uniform int foo = a; //Compiles OK
Gives the programmer a lot of information about 'a' because it didn't fail to compile, meaning that 'a' at this point so far can still live in SGPR registers.

Additionally, it helps dispelling mysticism regarding GPUs (the compiler telling inexperienced programmers what they're doing wrong lowers the entry barrier. GPUs are already hard enough). Learning by trial and error is how self taught programmers train.

It may be possible that a scheme like this should also live in SPIR-V, however I lack the knowledge to debate there.

Dynamic Control flow handling

There are cases where a uniform variable needs to be downgraded due to a dynamic break.

In that case, how it's handled depends on whether the variable was implicit or explicit. Consider the following example:

dynamic int threshold = ...; // Can be implicitly dynamic too. Just making it explicit to point out the problem

int i = 2; // Uniform so far
for( i = 0; i < 256; ++i ) // i = 0 is now dynamic (continue reading)
{
   if( i <= 0 )
        memoryBarrier(); // Compiler Error: Error, i is dynamic (see next line)
   if( i < threshold )
        break; // The presence of this break turns i into dynamic
}

When it comes to optimizations, the compiler may perform advanced optimizations treating i as uniform up until the first if( i < threshold ) during the first iteration; since it's guaranteed that i is uniform while i = 0 up until the first break.

But when it comes to error generation it should reject this type of code.
The fix to this error would be need a promise:

dynamic int threshold = ...;

int i;
for( i = 0; i < 256; ++i )
{
   if( i <= 0 )
        memoryBarrier(); // OK
   if( threadgroup_uniform_promise( i < threshold ) )
        break; // Not our problem if user breaks the promise
}

When it comes to explicit, declaration, the code should error more explicitly:

dynamic int threshold = ...;

threadgroup_uniform int i;
for( i = 0; i < 256; ++i )
{
   if( i < threshold )
        break; // Compiler Error: Break turns i into dynamic, but it is of explicit type threadgroup_uniform
}

This error informs the user the code they're generating is no longer threadgroup_uniform, but variable i cannot be downgraded because it's of an explicit type.

Precedents and related discussion:
KhronosGroup/glslang#1809
microsoft/DirectXShaderCompiler#1306
https://reviews.llvm.org/D26348

GLSL 4.60: Typo - Details for specific to

In section 4.4.6, we can read:

Details for specific to image formats and atomic counter bindings are given in the subsections below.

The word "for" should probably be deleted.

Format layout qualifiers on image function parameters?

Implementations are inconsistent about handling format layout qualifiers on image function parameters, and whether they are included in parameter matching. The GLSL spec seems to have a clear statement about this, but it may not be the "right" answer.

Section 4.10 (Memory Qualifiers) includes this example code, claiming that it is "OK" to pass a qualified image to an unqualified parameter:

vec4 funcA(restrict image2D a)   { ... }
vec4 funcB(image2D a)            { ... }
layout(rgba32f) uniform image2D img1;
layout(rgba32f) coherent uniform image2D img2;
funcA(img1);              // OK, adding "restrict" is allowed
funcB(img2);              // illegal, stripping "coherent" is not

The next sentence says "Layout qualifiers cannot be used on formal function parameters, and layout qualification is not included in parameter matching."

That seems pretty clear, but if a function parameter can't be qualified then it can't be loaded from, because loads require a format qualifier (unless GL_EXT_shader_image_load_formatted is enabled). That also seems to require inlining or cloning functions in order to know the right type, which is something we've tried to avoid in SPIR-V. So this is an unexpected/concerning answer.

Consider the following example:

#version 450
layout(local_size_x = 1) in;

layout(binding=0, rgba16f) uniform image2D im;

void f(layout(rgba16f) image2D x)
{
    imageStore(x, ivec2(0,0), vec4(0,0,0,0));
    imageLoad(x, ivec2(0,0));
}

void g(image2D x)
{
    imageStore(x, ivec2(0,0), vec4(0,0,0,0));
}

void main (void)
{
    f(im);
    g(im);
}

glslang generates an error on the definition of f(): "cannot use layout qualifiers on a function parameter". If you remove f(), it successfully compiles the call to g() (passing qualified to nonqualified), but the resulting SPIR-V fails spirv-val "OpFunctionCall Argument '23[%im]'s type does not match Function '8[%_ptr_UniformConstant_7]'s parameter type". If the frontend compiler is supposed to insert a cast, is it supposed to be an OpBitcast? That sounds legal, but is possibly something that has never been tested.

NVIDIA's GLSL compiler generates an error on the call to g() "incompatible type for parameter 1". But it accepts the definition of f and call to f, and generates correct code (AFAICT).

I'm not sure what's the right way to resolve all this. It seems like we should allow format qualifiers on function parameters, so that loads can be supported. And it is fine to allow unqualified function parameters, but I would only expect them to support stores (spec says "it is a compile-time error to pass an image uniform variable or function parameter declared without a format layout qualifier to an image load or atomic function"). If we want to allow passing qualified variables to unqualified parameters, then we either need to relax the type matching rules in SPIR-V or clarify (in GL_KHR_vulkan_glsl?) how to perform the conversion?

Subgroup variables not supported in Ray Tracing Shaders

This only affects the GL_NV_ray_tracing extension.
I noticed that gl_SubgroupSize and gl_SubgroupInvocationID are not available in ray tracing shaders, especially in ray generation shaders, while the same shaders using subgroup functions like subgroupBarrier() are compiling just fine. Is this a missing feature or just not supported at all?

Extending bit width control inside of GLSL

Originally filed by @Cazadorro in KhronosGroup/Vulkan-Ecosystem:

Suggestions to add new integer types in vulkan GLSL, such as int24/uint24 and int10/uint10 which map into 32bit float and 16bit float mantissas respectively, or potentially more control. Submitting based on the recommendation here.

Currently there are some architectures that exist where integer performance is worse that floating point performance. For example, Nvidia hardware is like this and in CUDA, they've provided the ability to do 24bit integer arithmetic through intrinsic functions that take advantage of the mantissa of their floating point units. Integer intensive operations could see a benefit from the utilization of either/both integer and floating point hardware if the programmer took careful consideration of the range of an 24bit integer. Some GPUs do not have 1:1 FPU to ALU which causes a difference in performance in favor of floating point operations. Such optimizations may be possible with precision qualifiers in GLSL, but since it applies globally you would be ignoring integer ALUs potentially (among normal small integer optimizations). It is also unclear to me whether global precision qualifiers are still valid either in Vulkan or recent versions of OpenGL with GLSL.

SPIR-V appears to already support different bit widths for float and integer types. This would seem to indicate no changes to SPIR-V would need to be done in order for such an extension to be adopted.

If one could specify the bits used in integers like so it could map directly into SPIR-V with out issue, and it would be up to the vendor to implement the JIT optimization that would utilize floating point units instead of integer units or what ever internal process they desire.

It may be the case that instead of adding new syntax for only int24 and int10, since the same applies to int54, one should be able to arbitrarily specify bit width of their integers in glsl ie int4, int18 int8 etc... which may have further optimization effects for systems with multiple smaller arithmetic units for integers. This could take place in two different extensions for GLSL, one for up to 32bits and one for up to 64bits.

See original issue for some relevant discussion not captured here (KhronosGroup/Vulkan-Ecosystem#34)

Should GL_EXT_shader_16bit_storage define int -> float constructors?

GL_EXT_shader_16bit_storage added constructors like

float16_t(float)

But, did not add

float16_t(int)

Was excluding the latter intentional? Based on the existence of implicit conversions to match function prototypes, one could imagine that it is implied, but it's not really the right reason. Some quotes from the core spec:

In general, constructors are not built-in functions with predetermined prototypes.

and

Constructors can be used to request a data type conversion to change from one scalar type to another scalar type

and

float(int) // converts a signed integer value to a float

The cross-basic-type constructors should either be added to the extension specification and then implemented in glslang (the operators already exist, but don't get created for the above), or glslang should be changed to enforce semantic check in the lack of cross-type constructors.

cc @jeffbolznv

Are built-in constants a constant expressions?

The OpenGL wiki says:

All of these variables are declared as const, so they are considered constant expressions.

But that is not so obvious to me from the spec:

A variable declared with the const qualifier and an initializer, where the initializer is a constant
expression.

Built-ins are not really explicitly declared. And do not really have an initializer.
Would it be nice to add them to the list of constant expressions explicitly?

Why is gl_SubgroupSize restricted to be a power of 2?

We have a target that executes a non-power-of-2 number of threads in lock-step. It would be natural to make a subgroup have one member per thread, but we are not able to because GL_KHR_shader_subgroup states:

If the extension GL_KHR_shader_subgroup_basic is enabled, the variable
<gl_SubgroupSize> is the number of invocations within a subgroup, and its
value is always a power of 2.

Why is there a restriction that it must be a power of 2? Is there any real need for that?

(As far as I can see, neither SPIR-V nor vulkan have this restriction, it is only in GLSL, so we could in theory have shaders written in other languages running on vulkan with a non-power-of-2 subgroup size.)

Implicit derivatives, uniform control flow, and SPIR-V.

The SPIR-V specification has a number of definitions that deal with implicit derivatives. Let's start with 2.19: Derivatives:

In all cases, derivatives are well defined only if the derivative group has uniform control flow.

The terms "derivative group" and "uniform control flow" are defined as:

Derivative Group: Defined only for the Fragment Execution Model: The set of invocations collectively processing a single point, line, or triangle, including any helper invocations.

Uniform Control Flow: Uniform control flow (or converged control flow) occurs when all invocations in the invocation group or derivative group execute the same control-flow path (and hence the same sequence of dynamic instances of instructions).

Put simply, implicit derivatives are well-defined under SPIR-V's rules if you have uniform control flow among all fragment shader invocations that process a primitive.

GLSL 4.60 defines things a bit differently. Its definition of Uniform Control Flow is:

Uniform control flow (or converged control flow) occurs when all invocations in the invocation group execute the same control-flow path (and hence the same sequence of dynamic instances of instructions).

Notice the lack of a statement about "derivative groups". That's important. In OpenGL, like in Vulkan, an "invocation group" is defined as:

For graphics shaders, an invocation group is an implementation-dependent subset1 of the set of shader invocations of a given shader stage which are produced by a single drawing command.

Incidentally, footnote 1 effectively says that you have to assume the worst-case: that all invocations of a stage from a single draw command are in the same invocation group. Which means that an invocation group must be assumed to be larger than a single primitive.

That's unfortunate, because GLSL says in section 8.9:

Implicit derivatives are undefined within non-uniform control flow and for non-fragment shader texture fetches.

Because of the lack of a concept of "derivative groups", this can only refer to uniform control flow within an invocation group. Which is rather larger than a derivative group.

However, GLSL's section 6.4 says:

Control flow exits the shader, and subsequent implicit or explicit derivatives are undefined when this control flow is non-uniform (meaning different fragments within the primitive take different control paths).

Emphasis added. So it seems clear that the original intent is that uniformity only needs to be within a primitive, not within a full invocation group.

This all needs to be clarified. And as with prior cases, I suspect the best thing to do is just copy SPIR-V's wording. After all, since we can load SPIR-V shaders now, those shaders still have to be able to act under SPIR-V's rules. So hardware has to be able to have the concept of derivative groups and uniformity within them as distinct from invocation groups.

Does branching on gl_HelperInvocation require the invocation goes inactive?

When I spec'ed the GL_KHR_shader_subgroup specification it was negotiated that helper invocations contribute in fragment shader subgroup ops: In fragment shaders, helper invocations participate in subgroup operations.

I'm digging into some of the corner cases with this, and came across a fun one:

#version 450
#extension GL_KHR_shader_subgroup_ballot: enable

layout(set = 0, binding = 0) buffer foo_layout {
    float in_thing[];
};

layout(set = 0, binding = 0) buffer bar_layout {
    float out_thing[];
};

void main() {
    // Lets assume we have at least one helper invocation...
    uvec4 outer_ballot = subgroupBallot(true);

    if (!gl_HelperInvocation) {
        uvec4 inner_ballot = subgroupBallot(true);
        
        // Are there any requirements that the helper invocation must be inactive here?
        // Or (test-wise) can outer_ballot == inner_ballot?
    }
}

So the question is - are helper invocations required to become inactive when they are involved in control flow?

Note - previously (pre subgroup ops) we had no way to detect whether this could/would happen!

significand range for frexp() incorrect due to sign-handling

The specification says the range of the significand (aka mantissa) returned by frexp() is [0.5,1.0]. Handling of the sign of the floating point number is not mentioned except for the special case of minus zero.

In C/C++ frexp() preserves the sign in the significand, so the range is documented as [-1.0,-0.5) and (0.5,1.0]. NVIDIA frexp() appears to preserve the sign in the significand. Given minus zero preservation, I believe sign-preservation was the intent and the specification is in error.

Related, the C/C++ ranges are documented as exclusive/inclusive to guarantee you won't get 0.5 as a significand. The inclusive-inclusive range in the GLSL specification deviates from this, presumably intentionally to allow flexibility in the implementation. (For example, it appears that the NVIDIA GPU on this desktop returns 0.5 for frexp(1.0,exp).) In my opinion it would be nice to be explicit about this deviation from C/C++ instead of expecting people to infer an unusual behavior from a single unemphasized character being '[' instead of '('.

Also related, ARB_gpu_shader5 documents the range as [0.5,1.0), with the inclusive-exclusive reversed from C/C++. Again presumably intentionally, but I imagine it should be made consistent with the GLSL specification if it gets the sign fix.

To summarize, the following are the significand ranges found for frexp() in various places:

[0.5,1.0] - GLSL 4.60 specification (and earlier)
[0.5,1.0) - ARB_gpu_shader5 version 16 (2012)
(0.5,1.0] & [-1.0,-0.5) - C/C++
[0.5,1.0] & [-1.0,-0.5] - proposed corrected GLSL range

(Note that I have reversed the order of the pairs of ranges above from their natural ordering to make visual comparison simpler.)

The handling of sign MUST be corrected, and the inconsistency with C/C++ SHOULD be documented explicitly.

GLSL 4.60: No mention of TCS/TES in array block matching rules

Consider this from 4.3.9:

Furthermore, if a matching block is declared as an array, then the array sizes must also match (or follow array matching rules for the shader interface between a vertex and a geometry shader).

Array matching rules are also different for TCS/TES stages. I'd suggest changing it to:

Furthermore, if a matching block is declared as an array, then the array sizes must also match, in accord with the array matching rules for the shader interface between the two stages.

Or something of the sort.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.