gpuopen-effects / fidelityfx-spd Goto Github PK
View Code? Open in Web Editor NEWSingle Pass Downsampler (SPD)
Home Page: https://gpuopen.com/fidelityfx-spd/
License: MIT License
Single Pass Downsampler (SPD)
Home Page: https://gpuopen.com/fidelityfx-spd/
License: MIT License
This is more of a question than an issue since functionally, all the samples will operate correctly, but I noticed that the invocation to SpdResetAtomicCounter
will issue an uncached write on all threads for the last remaining thread group. Is this intentional by any chance? Perhaps because the driver somehow realizes that the value being written is uniform and thus can be coalesced? Or would it have been better to restrict the invocation of SpdResetAtomicCounter
to the thread with localInvocationIndex == 0
?
It works for cubemaps, but when using with regular texture arrays it should use unbound buffer:
uint counter[];
And CPU must create either worst case buffer size (2K slices for most vendors) or use some sort of dynamic allocation when generating something new with bigger slice count than used so far.
I'm integrating this SPD into ANGLE, and writing tests, I'm seeing a bug in this code. I'm testing on Linux with an Nvidia Geforce GTX 970 on driver 440.100. (I see bigger issues with Intel/mesa too, but I can't be sure they aren't driver bugs).
If the source image has a base size of 128x32 that is all red, using a sampler that clamps to edges, I can see mips 1 through 6 becoming all red (which is correct) and mip 7 (the last 1x1 mip) becoming half-red (as in vec4(0.5, 0, 0, 0.5)
).
The last mip is indeed written to, so there aren't any errors with the atomic counter buffer. To make sure the bug is in this library, I made SpdLoadSourceImage
even directly return a constant red:
AF4 SpdLoadSourceImage(ASU2 p)
{
return AF4(1.0, 0.0, 0.0, 1.0);
}
And I still see the last mip being half-red. Additionally, I made SpdLoad
return red:
AF4 SpdLoad(ASU2 p)
{
return AF4(1.0, 0.0, 0.0, 1.0);
}
And that fixes the issue. Is it possible that the library is trying to load outside the image coordinates? That could explain why transparent-black is returned, turning red into half-red.
The rest of the glue code exactly follows the instructions. For completeness, here's the relevant part of the shader:
layout(local_size_x = 256, local_size_y = 1, local_size_z = 1) in;
layout(set = 0, binding = 0, rgba32f) uniform coherent image2D dst[12];
layout(set = 0, binding = 1) uniform sampler2D src;
layout(set = 0, binding = 2) coherent buffer GlobalAtomic
{
uint counter;
} globalAtomic;
layout(push_constant) uniform PushConstants {
uint levelCount;
uint numWorkGroups;
vec2 invSrcExtent;
} params;
#define A_GPU
#define A_GLSL
#include "third_party/ffx_spd/ffx_a.h"
shared AF4 spd_intermediate[16][16];
shared AU1 spd_counter;
#define SPD_NO_WAVE_OPERATIONS
#define SPD_LINEAR_SAMPLER
AF4 SpdLoadSourceImage(ASU2 p)
{
// Hack: return red to verify the bug
return AF4(1.0, 0.0, 0.0, 1.0);
//AF2 textureCoord = p * params.invSrcExtent + params.invSrcExtent;
//return texture(src, textureCoord);
}
AF4 SpdLoad(ASU2 p)
{
return imageLoad(dst[5], p);
}
void SpdStore(ASU2 p, AF4 value, AU1 mip)
{
imageStore(dst[mip], p, value);
}
AF4 SpdLoadIntermediate(AU1 x, AU1 y)
{
return spd_intermediate[x][y];
}
void SpdStoreIntermediate(AU1 x, AU1 y, AF4 value)
{
spd_intermediate[x][y] = value;
}
AF4 SpdReduce4(AF4 v0, AF4 v1, AF4 v2, AF4 v3)
{
return (v0 + v1 + v2 + v3) * 0.25;
}
void SpdIncreaseAtomicCounter()
{
spd_counter = atomicAdd(globalAtomic.counter, 1);
}
AU1 SpdGetAtomicCounter()
{
return spd_counter;
}
#include "third_party/ffx_spd/ffx_spd.h"
void main()
{
SpdDownsample(gl_WorkGroupID.xy, gl_LocalInvocationIndex,
params.levelCount, params.numWorkGroups);
}
The issue is present with 128x64 and 128x127 textures too, but not 128x128.
I think it would be better to move
struct SpdConstants
{
int mips;
int numWorkGroupsPerSlice;
int workGroupOffset[2];
};
struct SpdLinearSamplerConstants
{
int mips;
int numWorkGroupsPerSlice;
int workGroupOffset[2];
float invInputSize[2];
float padding[2];
};
To ffx_spd.h under a A_CPU define. That would make it more convinient of using just
#define A_CPU
#include <fidelityfx/ffx_a.h>
#include <fidelityfx/ffx_spd.h>
In the implementation file. Currently they are hidden in example project file.
Hi!
I just have a minor concern/question regarding the memory barriers in the HLSL and GLSL versions. In HLSL, you call GroupMemoryBarrierWithGroupSync(), but in GLSL you only call barrier(). I believe the GLSL corresponding pattern to GroupMemoryBarrierWithGroupSync() is actually a groupMemoryBarrier() + barrier(), as the first guards against memory writes and the second thread.
It's not entirely clear from the spec if a groupMemoryBarrier() blocks execution as well, or if a barrier() also blocks memory writes, but I used this https://anteru.net/blog/2016/mapping-between-HLSL-and-GLSL/ as my reference point and it solved many similar issues on different vendors before.
However be wary, there seems to be an issue with nvidia GPUs in the 20xx-30xx range with this shader when running with both barriers as suggested by the reference - resulting in a GPU hang.
A few changes to morph it towards more generic external use, which I hope are helpful.
https://github.com/adv-sw/FidelityFX-SPD
Realise you have a GPU API. This adds a CPU API to improve the usefulness of the project.
Can still compile app mode & run it just the way you had everything before AND as a library now.
All yours - please consider merging into master as there's no need for this to fragment.
See $TITLE.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.