Giter Site home page Giter Site logo

gpuopen-effects / fidelityfx-spd Goto Github PK

View Code? Open in Web Editor NEW
179.0 179.0 35.0 8.36 MB

Single Pass Downsampler (SPD)

Home Page: https://gpuopen.com/fidelityfx-spd/

License: MIT License

Objective-C 72.43% C 27.57%
cauldron dx12 fidelityfx gpuopen vulkan

fidelityfx-spd's People

Contributors

aurolou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fidelityfx-spd's Issues

`SpdResetAtomicCounter` issues many 0-valued uncached writes to the same address

This is more of a question than an issue since functionally, all the samples will operate correctly, but I noticed that the invocation to SpdResetAtomicCounter will issue an uncached write on all threads for the last remaining thread group. Is this intentional by any chance? Perhaps because the driver somehow realizes that the value being written is uniform and thus can be coalesced? Or would it have been better to restrict the invocation of SpdResetAtomicCounter to the thread with localInvocationIndex == 0?

Bug when downsampling 128x32 image

I'm integrating this SPD into ANGLE, and writing tests, I'm seeing a bug in this code. I'm testing on Linux with an Nvidia Geforce GTX 970 on driver 440.100. (I see bigger issues with Intel/mesa too, but I can't be sure they aren't driver bugs).

If the source image has a base size of 128x32 that is all red, using a sampler that clamps to edges, I can see mips 1 through 6 becoming all red (which is correct) and mip 7 (the last 1x1 mip) becoming half-red (as in vec4(0.5, 0, 0, 0.5)).

The last mip is indeed written to, so there aren't any errors with the atomic counter buffer. To make sure the bug is in this library, I made SpdLoadSourceImage even directly return a constant red:

AF4 SpdLoadSourceImage(ASU2 p)
{
    return AF4(1.0, 0.0, 0.0, 1.0);
}

And I still see the last mip being half-red. Additionally, I made SpdLoad return red:

AF4 SpdLoad(ASU2 p)
{
    return AF4(1.0, 0.0, 0.0, 1.0);
}

And that fixes the issue. Is it possible that the library is trying to load outside the image coordinates? That could explain why transparent-black is returned, turning red into half-red.

The rest of the glue code exactly follows the instructions. For completeness, here's the relevant part of the shader:

layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

layout(set = 0, binding = 0, rgba32f) uniform coherent image2D dst[12];
layout(set = 0, binding = 1) uniform sampler2D src;
layout(set = 0, binding = 2) coherent buffer GlobalAtomic
{
    uint counter;
} globalAtomic;

layout(push_constant) uniform PushConstants {
    uint levelCount;
    uint numWorkGroups;
    vec2 invSrcExtent;
} params;

#define A_GPU
#define A_GLSL

#include "third_party/ffx_spd/ffx_a.h"

shared AF4 spd_intermediate[16][16];
shared AU1 spd_counter;

#define SPD_NO_WAVE_OPERATIONS
#define SPD_LINEAR_SAMPLER

AF4 SpdLoadSourceImage(ASU2 p)
{
    // Hack: return red to verify the bug
    return AF4(1.0, 0.0, 0.0, 1.0);
    //AF2 textureCoord = p * params.invSrcExtent + params.invSrcExtent;
    //return texture(src, textureCoord);
}

AF4 SpdLoad(ASU2 p)
{
    return imageLoad(dst[5], p);
}

void SpdStore(ASU2 p, AF4 value, AU1 mip)
{
    imageStore(dst[mip], p, value);
}

AF4 SpdLoadIntermediate(AU1 x, AU1 y)
{
    return spd_intermediate[x][y];
}
void SpdStoreIntermediate(AU1 x, AU1 y, AF4 value)
{
    spd_intermediate[x][y] = value;
}

AF4 SpdReduce4(AF4 v0, AF4 v1, AF4 v2, AF4 v3)
{
    return (v0 + v1 + v2 + v3) * 0.25;
}

void SpdIncreaseAtomicCounter()
{
    spd_counter = atomicAdd(globalAtomic.counter, 1);
}
AU1 SpdGetAtomicCounter()
{
    return spd_counter;
}

#include "third_party/ffx_spd/ffx_spd.h"

void main()
{
    SpdDownsample(gl_WorkGroupID.xy, gl_LocalInvocationIndex,
            params.levelCount, params.numWorkGroups);
}

The issue is present with 128x64 and 128x127 textures too, but not 128x128.

SpdConstants and SpdLinearSamplerConstants

I think it would be better to move

        struct SpdConstants
        {
            int mips;
            int numWorkGroupsPerSlice;
            int workGroupOffset[2];
        };

        struct SpdLinearSamplerConstants
        {
            int mips;
            int numWorkGroupsPerSlice;
            int workGroupOffset[2];
            float invInputSize[2];
            float padding[2];
        };

To ffx_spd.h under a A_CPU define. That would make it more convinient of using just

#define A_CPU
#include <fidelityfx/ffx_a.h>
#include <fidelityfx/ffx_spd.h>

In the implementation file. Currently they are hidden in example project file.

HLSL and GLSL barrier is not equivalent

Hi!

I just have a minor concern/question regarding the memory barriers in the HLSL and GLSL versions. In HLSL, you call GroupMemoryBarrierWithGroupSync(), but in GLSL you only call barrier(). I believe the GLSL corresponding pattern to GroupMemoryBarrierWithGroupSync() is actually a groupMemoryBarrier() + barrier(), as the first guards against memory writes and the second thread.

It's not entirely clear from the spec if a groupMemoryBarrier() blocks execution as well, or if a barrier() also blocks memory writes, but I used this https://anteru.net/blog/2016/mapping-between-HLSL-and-GLSL/ as my reference point and it solved many similar issues on different vendors before.

However be wary, there seems to be an issue with nvidia GPUs in the 20xx-30xx range with this shader when running with both barriers as suggested by the reference - resulting in a GPU hang.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.