gpuopen-effects / fidelityfx-spd Goto Github PK

View Code? Open in Web Editor NEW

179.0 179.0 35.0 8.36 MB

Single Pass Downsampler (SPD)

Home Page: https://gpuopen.com/fidelityfx-spd/

License: MIT License

Objective-C 72.43% C 27.57%

cauldron dx12 fidelityfx gpuopen vulkan

fidelityfx-spd's People

Contributors

Stargazers

Watchers

fidelityfx-spd's Issues

`SpdResetAtomicCounter` issues many 0-valued uncached writes to the same address

This is more of a question than an issue since functionally, all the samples will operate correctly, but I noticed that the invocation to SpdResetAtomicCounter will issue an uncached write on all threads for the last remaining thread group. Is this intentional by any chance? Perhaps because the driver somehow realizes that the value being written is uniform and thus can be coalesced? Or would it have been better to restrict the invocation of SpdResetAtomicCounter to the thread with localInvocationIndex == 0?

Global counter buffer assumes up to 6 slices

FidelityFX-SPD/sample/src/VK/SPDIntegration.glsl

Line 50 in 7c796c6

uint counter[6];

It works for cubemaps, but when using with regular texture arrays it should use unbound buffer:

uint counter[];

And CPU must create either worst case buffer size (2K slices for most vendors) or use some sort of dynamic allocation when generating something new with bigger slice count than used so far.

Bug when downsampling 128x32 image

I'm integrating this SPD into ANGLE, and writing tests, I'm seeing a bug in this code. I'm testing on Linux with an Nvidia Geforce GTX 970 on driver 440.100. (I see bigger issues with Intel/mesa too, but I can't be sure they aren't driver bugs).

If the source image has a base size of 128x32 that is all red, using a sampler that clamps to edges, I can see mips 1 through 6 becoming all red (which is correct) and mip 7 (the last 1x1 mip) becoming half-red (as in vec4(0.5, 0, 0, 0.5)).

The last mip is indeed written to, so there aren't any errors with the atomic counter buffer. To make sure the bug is in this library, I made SpdLoadSourceImage even directly return a constant red:

AF4 SpdLoadSourceImage(ASU2 p)
{
    return AF4(1.0, 0.0, 0.0, 1.0);
}

And I still see the last mip being half-red. Additionally, I made SpdLoad return red:

AF4 SpdLoad(ASU2 p)
{
    return AF4(1.0, 0.0, 0.0, 1.0);
}

And that fixes the issue. Is it possible that the library is trying to load outside the image coordinates? That could explain why transparent-black is returned, turning red into half-red.

The rest of the glue code exactly follows the instructions. For completeness, here's the relevant part of the shader:

layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;

layout(set = 0, binding = 0, rgba32f) uniform coherent image2D dst[12];
layout(set = 0, binding = 1) uniform sampler2D src;
layout(set = 0, binding = 2) coherent buffer GlobalAtomic
{
    uint counter;
} globalAtomic;

layout(push_constant) uniform PushConstants {
    uint levelCount;
    uint numWorkGroups;
    vec2 invSrcExtent;
} params;

#define A_GPU
#define A_GLSL

#include "third_party/ffx_spd/ffx_a.h"

shared AF4 spd_intermediate[16][16];
shared AU1 spd_counter;

#define SPD_NO_WAVE_OPERATIONS
#define SPD_LINEAR_SAMPLER

AF4 SpdLoadSourceImage(ASU2 p)
{
    // Hack: return red to verify the bug
    return AF4(1.0, 0.0, 0.0, 1.0);
    //AF2 textureCoord = p * params.invSrcExtent + params.invSrcExtent;
    //return texture(src, textureCoord);
}

AF4 SpdLoad(ASU2 p)
{
    return imageLoad(dst[5], p);
}

void SpdStore(ASU2 p, AF4 value, AU1 mip)
{
    imageStore(dst[mip], p, value);
}

AF4 SpdLoadIntermediate(AU1 x, AU1 y)
{
    return spd_intermediate[x][y];
}
void SpdStoreIntermediate(AU1 x, AU1 y, AF4 value)
{
    spd_intermediate[x][y] = value;
}

AF4 SpdReduce4(AF4 v0, AF4 v1, AF4 v2, AF4 v3)
{
    return (v0 + v1 + v2 + v3) * 0.25;
}

void SpdIncreaseAtomicCounter()
{
    spd_counter = atomicAdd(globalAtomic.counter, 1);
}
AU1 SpdGetAtomicCounter()
{
    return spd_counter;
}

#include "third_party/ffx_spd/ffx_spd.h"

void main()
{
    SpdDownsample(gl_WorkGroupID.xy, gl_LocalInvocationIndex,
            params.levelCount, params.numWorkGroups);
}

The issue is present with 128x64 and 128x127 textures too, but not 128x128.

SpdConstants and SpdLinearSamplerConstants

I think it would be better to move

        struct SpdConstants
        {
            int mips;
            int numWorkGroupsPerSlice;
            int workGroupOffset[2];
        };

        struct SpdLinearSamplerConstants
        {
            int mips;
            int numWorkGroupsPerSlice;
            int workGroupOffset[2];
            float invInputSize[2];
            float padding[2];
        };

To ffx_spd.h under a A_CPU define. That would make it more convinient of using just

#define A_CPU
#include <fidelityfx/ffx_a.h>
#include <fidelityfx/ffx_spd.h>

In the implementation file. Currently they are hidden in example project file.

HLSL and GLSL barrier is not equivalent

Hi!

I just have a minor concern/question regarding the memory barriers in the HLSL and GLSL versions. In HLSL, you call GroupMemoryBarrierWithGroupSync(), but in GLSL you only call barrier(). I believe the GLSL corresponding pattern to GroupMemoryBarrierWithGroupSync() is actually a groupMemoryBarrier() + barrier(), as the first guards against memory writes and the second thread.

It's not entirely clear from the spec if a groupMemoryBarrier() blocks execution as well, or if a barrier() also blocks memory writes, but I used this https://anteru.net/blog/2016/mapping-between-HLSL-and-GLSL/ as my reference point and it solved many similar issues on different vendors before.

However be wary, there seems to be an issue with nvidia GPUs in the 20xx-30xx range with this shader when running with both barriers as suggested by the reference - resulting in a GPU hang.

Project isn't single entrypoint utility function as claimed, its an app

A few changes to morph it towards more generic external use, which I hope are helpful.

https://github.com/adv-sw/FidelityFX-SPD

Realise you have a GPU API. This adds a CPU API to improve the usefulness of the project.

Can still compile app mode & run it just the way you had everything before AND as a library now.

All yours - please consider merging into master as there's no need for this to fragment.

Please consider supporting a Linux build

See $TITLE.

【Bug】 Multipass CS RGB to SRGB conversion error

In file "CSDownsampler.hlsl"

The constant 0.0031308 should be 0.04045
This problem is not visually detectable

gpuopen-effects / fidelityfx-spd Goto Github PK

fidelityfx-spd's People

Contributors

Stargazers

Watchers

Forkers

fidelityfx-spd's Issues

`SpdResetAtomicCounter` issues many 0-valued uncached writes to the same address

Global counter buffer assumes up to 6 slices

Bug when downsampling 128x32 image

SpdConstants and SpdLinearSamplerConstants

HLSL and GLSL barrier is not equivalent

Project isn't single entrypoint utility function as claimed, its an app

Please consider supporting a Linux build

【Bug】 Multipass CS RGB to SRGB conversion error

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent