Giter Site home page Giter Site logo

caffeineviking / vkhr Goto Github PK

View Code? Open in Web Editor NEW
440.0 23.0 34.0 2.44 MB

Real-Time Hybrid Hair Rendering using Vulkan™

License: MIT License

Makefile 0.61% C++ 69.50% Lua 0.53% Shell 0.80% Python 0.29% GLSL 3.97% R 0.67% C 22.10% Objective-C 1.54%
real-time hybrid hair renderer vulkan cpp glsl rasterizer volume-renderer novel

vkhr's People

Contributors

caffeineviking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vkhr's Issues

Infrastructure for Gathering Data

For the paper/thesis we want to gather some performance data to create plots. We are interested in gathering the performance scaling for the various rendering passes when: shading more pixels, when number of strands is increased, and when the size of the volume is changed (not really that relevant, since vertex-based voxelization is still bounded by the strand count). We also might want to measure memory usage scaling (e.g. of the volume, or from the theoretical minimal PPLL node memory usage).

Anyway, we need to build infrastructure to find out the following information and append it to a CSV:

  • Shaded Pixels
  • Strand Count
  • Volume Sizes

Done automatically (e.g. by passing the --benchmark on flag) and putting all gathered data to benchmark. The idea is then to run a R script that will build all of the plots from benchmark/ (i.e. with some automation).

Performance data for every pass is already recorded into a CSV, so appending the above shouldn't be too hard. The only one I'm a bit worried about is how to find the number of shaded pixels, but maybe I can "hack" it, and take a screenshot at the same time and go through the screenshot when creating the plots. i.e. we count the number of pixels that aren't colored the same as the background (it's a hack :-P). Here is more or less how the CSV currently looks like. Notice I take samples without averaging, so we can find the variance of the data later to create "error bars" and maybe can get away with not having CI. I have 60 frames per CSV.

Frame, Bake Shadow Maps, Clear PPLL Nodes, Draw Mesh Models, Draw Hair Styles, Resolve PPLL
0, 0.775556, 0.033037, 0.104741, 6.77674, 3.22652
1, 0.918074, 0.0343704, 0.105037, 6.78978, 3.23022
2, 0.704741, 0.030963, 0.096, 6.33482, 3.03304
3, 0.799111, 0.030963, 0.0955556, 6.32785, 3.0323
4, 0.889926, 0.0321481, 0.103111, 6.76622, 3.23393
5, 0.905185, 0.0318519, 0.0973333, 6.33363, 3.03511
6, 0.844, 0.0333333, 0.103259, 6.7963, 3.236
7, 0.843556, 0.0305185, 0.0942222, 6.33319, 3.03941
8, 0.835704, 0.0336296, 0.102815, 6.76904, 3.23511
9, 0.777481, 0.030963, 0.0957037, 6.32148, 3.02015
...

Approximate Kajiya-Kay's Specular Component

Now that volume rendering sort of works and the normals seem to be of good-enough quality, we want to try and do Kajiya-Kay on the shell of the volume. The problem is that Kajiya-Kay uses strand tangent for this, and we only have the normal. We can either try to derive the tangent somehow (e.g. cross(N, V) maybe?) and do regular Kajiya-Kay somehow. Try to approximate the specular component of Kajiya-Kay like in Pixar?

Implement the Raytraced Reference AO Solution

We would like to check if our volume-based AO implementation approaches the reference raytraced AO that is the ground truth. The raytracer is lagging behind quite a lot, so basically we use it only to cross-check that the results we get from the rasterizer are valid. In a nutshell, here is what the raytracer should be doing to it:

  1. Shoot rays from the camera towards the scene, for each intersection with a surface point we must shoot:
  2. Random rays in the half-hemisphere (with a normal pointing outside the hair style) and gather the rays that intersect and don't intersect any other hair strands. The ratio of intersected vs non-intersected rays is the amount of ambient occlusion in the scene. Which is what is generally done, as in the thesis: here.

Visualize Hair Strand Density as a AO-term

After implementing #18, we want to try interpreting the densities as some sort of AO-term for shadowing. This will require us to write code to upload and sample the volume in the strand's fragment shader. This is also a good opportunity to write and test that the volume is sampled correctly (in the correct coordinate-space).

Integrate a Basic Embree-based Hair Raytracer

Integrate a basic strand raytracer so we can use it as the ground-truth for comparisons. This first task won't feature a lot of fancy light transport just yet, the core problem this issue tries to address is the matching of the camera parameters between the rasterizer and the raytracer. Match these using simple some Kajiya-Kay.

Optimize Volume Raycasting

Right now we have a constant step-size when raycasting (still somewhat reasonable since we jump one texel at a time), but a better solution would be to use the min-max volume (or something like that) to make "big strides" until something interesting happens (i.e. there is density there). This optimization is called Heirarchical Spatial Enumeration and can be seen over here. We already make use of Adaptive Termination but not to the same extent as they do. So yeah, right now we are constant-size-stepping through the volume at max resolution independent of the LoD. Also, doing so leads to noisy results at large distances, so we should be using a low resolution volume anyway (which means we need to modify the voxelization resolution, maybe by finding the best inverse projected size of a pixel -> voxel, and finding the resolution from there somehow).

Here is the short list of optimizations:

  • Find the Voxelization Resolution
  • Heirarchical Spatial Enumeration
  • Expand on Adaptive Termination

Convert FlatBuffer HAIR to Cem Yuksel's HAIR Format

Some of the given hair models e.g. ponytail.hair and bear.hair are in a FlatBuffers-style binary format. A conversion from FlatBuffer HAIR to Cem Yuksel HAIR will be needed. Load FlatBuffer into memory, and then write all information in a Cem Yuksel style HAIR format. The FlatBuffer HAIR format (seems) to look like this:

  • FlatBuffer Header: [ int32_t numStrands ] followed by [ int32_t verticesPerStrand ],
  • FlatBuffer Vectors: then numStrands*verticesPerStrand many [ float vertexPosition ]

Best approach: import implementation of #1 into existing hair renderer and do it over there (loader written).

Interpolate Strand Tangents in the Raytracer

Right now the strands are flat-shaded in the raytracer, making it hard to compare with our rasterized version. To fix this we need to call rtcInterpolate using Embree (vastly simplified, it's more intricated than that ^^).

Pre-Filter the Density Volume

I've already fixed some stupid bugs in the volume sampling code, so the results are already better, but the fast version of the strand voxelization algorithm still produces lower-quality output than the segment one. But if we pre-filter it by using e.g. the 3⨯3⨯3 max or average, we might get more accurate, and less noisy results, for a fraction of the price we pay for the segment-based one. I'll post my result of this very shortly.

Implement a Simulation Pass

There were concerns in the latest call (2018-12-03) that we should maybe have a very simple animation of the strands to validate that the techniques we have implemented work OK when animated as well. A very good question is if the approximated deep shadow solution will completely break apart when animated, as exactly how much visual "error" will happen is uncertain (it could be well within limits, if so, then we're fine).

Crystal Dynamics solves the issues in the technique by splitting the hair style in several parts that have roughly the same "density", and assumed that the density remains mostly unchanged. In our case, since we don't split the hair style we could get very wrong results since the hair density could be completely off. As suggested, we could store a pre-calculated "thickness map", like in Colin-Barré's fast SSS method, and use that to infer the density (this would involve a 3-D texture for the density. Another way would be to store a per-strand value that says how densely clustered strands are in the local neighborhood. The latter technique I believe to be a better solution, and not attempted before in related work AFAIK). Of course, that still won't solve the problem of having a high deviation from the original density values when animated.

We need to think about this. If the error margins are too high we might have to either switch the self-shadow technique, or compensate for it somehow. We should discuss this in the next call as well I think, I won't try to get started on this just yet, as the other issues currently take major implementation priority.

It would also be interesting to see the animated results, especially of the voxelization and how e.g. AO looks like, since a major feature of our method is that the results can be animated (i.e. the technique is "general").

The simplest solution would be to take the TressFX simulation code and integrate it. But for now, we have opted not to do this, since other things were more important. However, if I or anyone else ever gets around to it, the TressFX/TressFXSimulation.hlsl shader seems ripe for the taking (and is under a nice license). Since we are using SPIR-V anyway, and our glslc.py script can compile HLSL --> SPIR-V we can maybe even use the shader as-is, and just provide the right descriptor bindings to it. Anyway, this is future work. Good luck!

Adjust Depth Buffer for Volume Rendering

A possible problem we'll face is that the values in the depth buffer after volume rendering the hair style will be incorrect, and will cause problems when rendering additional geometry. For instance, consider Lara's mesh we are currently rendering: it will be inside the hair geometry, inside the volume. If we don't write the values to a depth buffer in the correct way, the rendering order will be wrong between the hair and the mesh. For voxels with no hair it's easy: we just do discard on that particular raymarch, but areas with hair are tricky. While we have the depth that should be there (via the raymarch), we can't write to the depth buffer. One way would be to take control of the depth buffer manually as a texture2D and do it ourselves, but that has it's own set of problems. A very easy solution is to simply ignore this problem, and render volumetric hair in the "last" pass.

Create Proxy-Geometry for the Volume Rendering

When the hair style is far away, rasterizing line segments becomes expensive and sort of useless (we only see a little of the detail anyway). Instead, we can use our pretty fast strand voxelization (which only runs once per frame, not twice like the line rasterization) to find the strand density. We can use this density to find a surface on the volume by raymarching through it, and also to find normals by the gradient ∇D of the density field D.

Before doing any of this we need to setup a proxy-geometry so that the fragment shader can be called. That will be the raymarch starting point. A simple one is to simply use the AABB of the hair style (that's also used in the voxelization) as the proxy-geometry. For this we'll need to add a new Drawable, the Volume object in our abstraction which has the correct volume rendering pipeline bound and the correct world transforms too.

volume_rendering

Re-organize GUI and Polish Demo

After we're done with everything else we should clean up the GUI and polish the demo. Right now a lot of the functionality is immutable and we should add extra knobs to them so people can experiment with the method we have presented. There are a bunch of things that are still missing: changing hair parameter at runtime, modifying the light direction, adding *.vkhr files for the Cem Yuksel styles, simulation #17 (probably just plug-in TressFX), model rendering for the raytracer so both rasterizer / raytracer scenes match, texture support for the model rendering (so we don't get the white-spots, because our strands are pixel-sized, as done in the "new" Deus Ex), infinite white plane to get shadows in the scenes. More importantly!: move buttons into the correct "separators" and remove the now "unnecessary" buttons as they are confusing. None of these things should take long to fix (maybe combined they can take 1-2 days), but I just haven't gotten around to doing any of this yet.

Also general documentation of the project, including documenting the shaders and the more important parts of the renderer (the wrapper will probably be refactored into a separate project later after the thesis).

Not a priority for now, but I want to keep this here so I remember to fix these things at a later point in time.

In fact, let's make a list:

  • Knobs for changing hair style parameters (diffuse / specular color and shininess) at run-time.
  • Changing light direction via the GUI (not possible right now, only the "color").
  • Add Cem Yuksel hair style scene files.
  • Crawl scenes for any new files, and also add the vkhr scene given via the CLI.
  • Add mesh rendering to the raytracer as well, so we can have more-or-less the same scenes.
  • Add white plane below the hair style for all built-in vkhr scenes to demonstrate shadows.
  • Add simulation #17 running in compute e.g. via TressFX (left as future work).
  • Texture support for have the same results as TressFX 3.1, and remove the "white spots" too.
  • Re-organize the entire GUI to make more sense and allow people to change the parameters.
  • Add way to visualize the light sources (e.g. via Billboard). (isn't needed :-P)

Emacs Lisp as a scripting language through the Microsoft Language Server Protocol

Hi!

Dear Eric,

I really like this framework however I feel that it lacks somewhat in extensibility. I therefore suggest that you implement an API to the Microsoft language server protocol to communicate with an emacs process. In this way, users could render hair in emacs using the Emacs XWidget package https://www.emacswiki.org/emacs/EmacsXWidgets


(defun render-hair (haircut)
  render-stallmans-haircut )

Cheers,

John

Voxelize the Hair into a Density Volume

We'll need it if we want to make the ADSM more flexible, since it (right now) assumes the hair style density is constant throughout the volume, which is not completely true. While the quality is already acceptable for the hair styles we have, the technique will completely break down for non-uniform density styles. If we want to make the technique as flexible as DOM, we'll need to solve this properly. Tomb Raider solved this by splitting the style into several parts, with different constant densities for each part. While this works, it requires artist time and manual labor, instead we would like to find the hair density automatically, and store it in a volume. I believe we can make ADSM as general and good looking as DOM, but without having to pay the DOM price (rendering multiple shadow maps is bad, and you need a lot of them in the DOM).

For comparison, the paper by Sintorn and Assarsson (linked below) gets around 13 FPS for a comparable hair style at vastly lower resolutions (800x600!) on comparable hardware (to my MX150-2 laptop GPU, they used a GTX 280 back in 2009), using these Deep Opacity Maps. If we can generate the volumes on the GPU at e.g. 1ms or 0.5ms, and still have ADSM be quite generic as DOM, it will still be vastly faster than any DOM-based technique and still have similar quality. Our technique currently runs at 60 FPS at 1080p on my shitty laptop, and still has 2-3ms to spare (that's in the worst case, i.e. every fragment is processing some hair and reading from the shadow map, for a medium-range viewing perspective, we have around 10ms to spare in total) before we dip below 60 FPS. So yes, quite a bit faster than DOM...

This hair density volume can probably also be used for other useful things, like computing an approximated SSS a lá Frostbite 2 (the thickness-map one from GDC 2011) which AFAIK hasn't been attempted before in the area of hair rendering. The dual-scattering paper is the only "real-time" approximation of scattering we have around, so it would be cool if we could extend the Frostbite method to hair rendering, and have something that may be novel in the area of approximated subsurface scattering of hair (that is good enough for games) We could also maybe do some smart transparency handling by using this too. I've found a paper that seems to be doing something similar: "Hair Self-Shadowing and Transparency Depth Ordering using Occupancy Maps" by Sintorn and Assarsson, with a "Occupancy Map".

Implement Correlated Multi-Jittered Sampling

In order for us to more easily compare with ground-truth once we've implemented e.g. "Phone-Wire AA", we should multi-sample each raytraced pixel by using a unbiased sampling pattern, such as CMJ as in the paper:

vec2 cmj(int s, int m, int n, int p) {
    int sx = permute(s % m, m, p * 0xa511e9b3);
    int sy = permute(s / m, n, p * 0x63d83595);
    float jx = randfloat(s, p * 0xa399d265);
    float jy = randfloat(s, p * 0x711ad6a5);
    vec2 r = vec2((s % m + (sy + jx) / n) / m,
                  (s / m + (sx + jy) / m) / n);
    return r;
}

Add Vulkan Timestamp Queries and Plots

We'll probably want to profile the performance gains (most likely losses) of the SPIR-V conversion, and also have some way to easily profile the app. Of course, we could use RGP to do this, but it would be better to have a way to do this "live". I propose we gather query results for timestamps for each stage of our render, i.e. shadow map rendering pass, depth map rendering pass, each compute dispatch of the OIT solution, the mesh drawing, and (optionally, if our LoD technique uses raster too for strands), raster hair rendering time. We could also be extra fancy and add ImGui plots for the timings, a bit like in the SEED's Vulkan PICA PICA.

Modifiers for Environmental Situations

I must say that I am really impressed ! This is more a feature request / general question for future development.

In most case use, games will need to render hair in a not so clean state.

I am talking about liquids
(water, blood, viscous goo, oil ..)
and solids :
(dirt, grass, leaves, moss, trash ...)
and a mix of both. Sadly, bear should be able to render hair loss due to damage...

There may also be under water (submerged or shower) as well as low gravity, electric shock / electromagnetic, burnt and special form of hair (glowing, lightning, fire, spectral ...)

Are those possible in the future for vkhr ? I believe most new console games will feature great hair and that this is going to be used a lot :)

Convert HLSL OIT Shaders to SPIR-V

Borrow Dominik's BezierDirect HLSL shaders which handle strand transparency in the compute shader, and first to try convert them to SPIR-V, and see if they produce reasonable output. Let's try with dxc first since in the Khronos Meetup's that seems to be recommended. Otherwise, we can also try to use glslc.

Looking into the SPIR-V disassembly and there seems to be some binding collisions in some descriptor sets:

OpDecorate %g_DepthBuffer DescriptorSet 0
OpDecorate %g_DepthBuffer Binding 1
OpDecorate %g_BackBuffer DescriptorSet 0
OpDecorate %g_BackBuffer Binding 1

which means we'll have to sprinkle some [[vk::binding(x, y)]] magic in some places to get it to work OK.

Otherwise it seems "straight forward", as long as the HLSL -> SPIR compiler has done everything else right?

Artifacts in Fullscreen

I've noticed that in some cases (if you're unlucky), after swapchain recreation upon going fullscreen, there are artifacts in the hair rendering. This does not happen on non-fullscreen resizes, and seems to be happening on AMD GPUs. Validation layers seem to be screaming a bit at that, so I'll get back to this after the publication. I am just opening this here in case I forget about it.

Implement the Screen-Space Strand Density Metric

In order to guide the transparency and AA method we'll probably want to find the screen-space hair density, and classify fragments (or tiles) as having low or high density with some sort of threshold. We'll use this data to adapt the PPLL and AA algorithms (exactly how, is till TBD, but we have some good ideas already rolling). I can see two ways to do this: use the volume we already have, or to accumulate rasterized lines onto a buffer. i.e. we do more-or-less what we already do with the volume, but project and increment it on a plane instead.

Some ideas would be to conditionally insert fragments into the PPLL or vary the number of fragments we are actually sorting properly (and not just squashing together), or decide how many of the fragments should use a "high-quality" version of the shading. Most of the time we are spending is shading the strands, which is a drastic change from the previous non-OIT version of our shader, that only needed to shade the top-part of the hair-style (and the rest of the fragments were culled). Maybe the SSHD could be used here somehow?!?

Try Volume AO with GPU Raycasting

I tried some techniques in #27 that didn't quite work, even if the combined ADSM + 3^3 Filtered Density AO looks good, it would be nice if we could find AO with our voxelized densities such that it matches the ground-truth raytraced results (even if the raycasting turns out to be too expensive, we'll at least have the results that show our volume is not complete bogus). And if it's fast enough, we might even apply it in the minified case!

Here is how I'm going to do it this time: launch rays around a sphere, just like the raytraced version, and see if we can approach the raytraced results (which should happen). After that, we can get a little more fancy and do something like Crassin et al.'s "Interactive Indirect Illumination Using Voxel Cone Tracing", especially in their AO section. Maybe I should have started with trying these methods first, before the ones in #27 :-P.

Write the GPU Voxelization Algorithm

We should eventually port the CPU voxelization algorithm to the GPU if we want to update the densities in real-time. Right now CPU voxelization of 1.8M hair vertices takes around 15ms which I guess is "real-time", but it would be nice to do this on the GPU as well. As part of porting Dominik's HLSL shaders I've used the image2D, so allocating an read/write image3D shouldn't be too different. The only part I see as somewhat problematic is that we'll need to use imageAtomicAdd as part of the voxelization algorithm, which requires the underlying format to be r32i or r32u as shown here, and can't be used with r8i and r8ui as we'd have liked (there is talk about format conversion, so it might still be possible). I also don't know the cost of imageAtomicAdd, so I need to find out if its usable. Otherwise it should be quite straightforward, like this:

#version 460 core

#include "volume.glsl"

layout(std430, binding = 0) buffer Vertices {
    vec3 positions[];
};

layout(local_size_x = 512) in;

layout(binding = 2, r32ui) uniform uimage3D density;

void main() {
    vec3 voxel_size = volume_bounds / volume_resolution;
    vec3 voxel = (positions[gl_GlobalInvocationID.x] - volume_origin) / voxel_size;
    imageAtomicAdd(density, ivec3(floor(voxel)), 1);
}

Add Simple Blinn-Phong Mesh Renderer

We should eventually also add a simple mesh renderer for OBJ files to see the effects of shadow maps on curved surfaces, instead of only the self-shadows we have now, and also to take advantage of the the OIT optimization in Dominik's thesis, which ignores strands that are occluded by an object (e.g. a head). This is not priority work just yet, but I want to keep this issue here so I don't forget about it. My plan is just to use TinyOBJLoader and use the good'ol Blinn-Phong reflection model, to keep things simple. Also by doing this we can more easily compare our results with Cem Yuksel's paper (and anybody that uses his assets), since they usually use the woman.obj (all papers which use his hair assets use it, so we should also do this too).

Implement Raycasting for Gathering Densities

Now that we have the density volume in place we may want a way to gather densities in a certain direction, this is useful to us for at least two reasons: we may want to visualize the volume directly in VKHR, and there is a case where we need to gather the total amount of density up and until a certain point in the volume. In the latter case, we'll use it to feed ADSM, with the number of strands that are occluding the current strand.

I'm still uncertain about the cost of this, but it sounds expensive. I'm planning on doing another solution to feed ADSM, where we do some sort of pre-gather steps, to estimate the global number of hairs in the way.

Implement Level of Detail Scheme

Now that each individual component "works" we want to dynamically change between the maxified (strand-based) and the minified (volume-based) solutions "smoothly". Since we'll most likely only have 1-2 hair styles on the screen using the maxified case (relatively expensive because of the rasterization), we want to use our minified-case as much as possible, so our solution can scale nicely in the quality / performance space. For the minified-case we'll probably reduce the voxelization size (which reduces the number of raycasts needed since the step size will be shorter) and apply #41. When we are in the maxified-case we want to use our metric for hair density to blend less fragments in areas of high-density (since they'll have stuff there anyway), that can also be seen as a sort of LoD I guess. We also need to blend between the minified and maxified case, when the transition happens. If our minified case is really good then that might already be OK, but otherwise we can use dithering / blending. Finally, we need a good function metric to when these changes should happen.

Calculate Isosurface Normal by using the Gradient

If we have some sort of SDF we can find the normal N by taking the gradient ∇S of the scalar field S. The idea is to do something similar with our volume, so we can find the normals of our contour. According to the Pixar paper, we should be able to get close to Kajiya-Kay without using the tangent, so it should be possible with a bit of trickery to find a shading model that follows the same highlights as Kajiya-Kay (actually the Kajiya-Kay model was meant to be used for volume rendering). "Volumetric Methods for Simulation and Rendering of Hair" is the Pixar paper. Maybe we can find the tangent by e.g cross(N,V) like in Scheuermann's shader?

1024³ Voxelization Crashes

Even though 256³ will probably work for our uses, I'm still curious why the segment-based voxelization goes ka-boom on 1024³. Update: this also happens on the vertex-based voxelization. Also, it doesn't crash on the voxelization itself, it's when we upload the data to the GPU. I'm good on GPU-memory, so it shouldn't be the problem. Maybe something with alignment? Hmm... going to have a closer look once I have more time.

Integrate Dominik's Hair Compute Pass

After solving #13 we should actually start trying to match the bindings of the constant buffer and creating the right storage buffers to match those in Dominik's implementation. We should take this in steps, and maybe compare the results of each pass side-by-side (i.e. VKHR and RealTimeHair) and see that they have similar results. I don't expect things to work out-of-the-box without some fixing, so let's do this iteratively.

I've currently inspected and verified local correctness of:

  • ReduceDepthBuffer
  • ClipCurves
  • PrefixSum_1
  • PrefixSum_2
  • Reorder
  • DrawStrands

Implement GUI for Controlling Render Parameters

Using IMGUI for simplicity as I've already done some tests, and it seems to work fine. Something so we are able to change the scene in the demo, switch between raytracing and rasterization, and controlling params of our technique. Of special interest is the ability to change the LoD of the scalable hair rendering scheme.

Implement Phone-Wire AA

After implementing #28 we want to "fix" the aliasing in the rasterizer as well, which mostly happens in areas of low-density. We can solve this by using e.g. Phone-Wire AA, which essentially assumes a line is already a pixel-wide, and then seemingly reduces the thickness by assigning the alpha values of the fragment, relative to the thickness reduction ratio. Emil provides us with a sample implementation of it, which works like this:

// Compute view-space w
float w = dot(ViewProj[3], float4(In.Position.xyz, 1.0f));

// Compute what radius a pixel wide wire would have
float pixel_radius = w * PixelScale;

// Clamp radius to pixel size. Fade with reduction in radius vs original.
float radius = max(actual_radius, pixel_radius);
float fade = actual_radius / radius;

// Compute final position
float3 position = In.Position + radius * normalize(In.Normal);

Maybe we can also get away by just using one pixel clamped lines and then changing the alpha component based on the thickness (which we change in the data itself). i.e. we modulate the alpha of the strand as pre-processing step based on the thickness, where the last 10-15% are interpolated towards 0, for thinning out.

Otherwise, we can use the strand coverage calculation as done in TressFX/TressFXRendering.hls#172 as well:

float ComputeCoverage(float2 p0, float2 p1, float2 pixelLoc, float2 winSize)
{
    // p0, p1, pixelLoc are in d3d clip space (-1 to 1)x(-1 to 1)

    // Scale positions so 1.f = half pixel width
    p0 *= winSize;
    p1 *= winSize;
    pixelLoc *= winSize;

    float p0dist = length(p0 - pixelLoc);
    float p1dist = length(p1 - pixelLoc);
    float hairWidth = length(p0 - p1);

    // will be 1.f if pixel outside hair, 0.f if pixel inside hair
    float outside = any(float2(step(hairWidth, p0dist), step(hairWidth, p1dist)));

    // if outside, set sign to -1, else set sign to 1
    float sign = outside > 0.f ? -1.f : 1.f;

    // signed distance (positive if inside hair, negative if outside hair)
    float relDist = sign * saturate(min(p0dist, p1dist));

    // returns coverage based on the relative distance
    // 0, if completely outside hair edge
    // 1, if completely inside hair edge
    return (relDist + 1.f) * 0.5f;
}

Match Settings b.w. Renderer and Raytracer

After implementing #3 and #4, make sure they produce similar results when given the same scene graph. Use simple shading model to sanity check it. The results should be somewhat similar in local illumination. Just for now, let's try to match it with Kajiya-Kay, like we already do in a shader shaders/kajiya-kay.frag.

Implement Basic Shadow Mapping Pipeline

In order to support strand self-shadowing in the rasterizer, a way to generate shadow maps will have to be done. We'll start by creating vanilla shadow maps like those in "Casting Curved Shadows on Curve Surfaces" and after that extend this to support the techniques optimized for strand-based self-shadowing. Also, it's a lot more manageable to implement this iteratively, since going full e.g. Deep Opacity Map could be hard to get it working right away, and since it depends on something like shadow maps, it's a good starting point.

This involves a bit more changes to the current Rasterizer, but after the refactor I made, it should be a lot easier to implement this now. AFAIK we need a render pass that outputs the depth buffer from the lights.

Find a Isosurface in the Density Volume

After creating #36 we should try to find a good isosurface by raymarching the volume via proxy-geometry. I don't know what's the best way to do this, but one way is to simply "shade" when the density is higher than a certain value e.g. 5 (i.e. at least 5 strands in this voxel), and discard the fragment when we are below this. I'll start by doing this since it's simple, but I'm sure there are better ways to do this. Just to cross check that our volume is indeed suited to do this, I've tried running it through ParaView and use the "Contour" filter with 5:

ponytail-isosurface

Blend Fragments with a Per-Pixel Linked List

After implementing #29 (I've already sort of tested this) we'll see that the ordering of the strands when they have opacity will be wrong, and need to be sorted back --> front to produce the correct result. Therefore, we need a way to sort fragments and blend them. The classic way to do this is by using a PPLL, like in TressFX. It can be made more "efficient" in our case by using our Screen-Space Density Metric which we implemented in #31, and storing more fragments in areas of low-density (has high aliasing, at least within our observations). I've decided to split the PPLL task into several parts so we can more easily track the progress we're doing:

  • Allocate the head pointer buffer (pointers to nodes) and the node heap (stores fragment information).
  • Implement the ppll_next, ppll_link, ppll_head, ppll_data helper in GLSL for the PPLL insertion
  • Insert strand fragments into the PPLL instead of pushing them to out (the current framebuffer target)
  • Check that the head and node structures produce reasonable values when inspecting with RenderDoc
  • Resolve PPLL by sorting and blending the nearest k-fragments, just like the original AMD PPLL method.
    • Transition framebuffer images to VK_IMAGE_LAYOUT_GENERAL and back for the PPLL resolve passes.
    • Visualize fragments in PPLL to make sure we're not just putting junk (even though we validated it).
    • Sort and blend the fragments in the PPLL and mix with the existing opaque framebuffer contents...
    • Make new renderpass to process GUI and other things like that as well (will be needed later here).
  • Sanity check PPLL blending results and look into implementing the better coverage algorithm as well...

Detach Raytracer from the Rendering Thread

So it doesn't lock the GUI and that we can progressively accumulate the raytraced results until convergance. If we disable (or pause) the rotating light source and any animation (if we eventually get around to having that), then it should be possible. This is desirable for #28, but will be useful in general for the raytraced AO as well.

Create a Scene Graph for Both Renderers

Since both the rasterizer and the raytracer will render the same scene, we'll need a common scene structure that both use as a base to e.g. upload geometry and build acceleration structures with the correct transform and camera positions. The idea is to build a very simple format for storing this, something like this perhaps:

{
    "camera": {
        "fieldOfView": 75.0,
        "origin": [0, 0, 4],
        "lookAt": [0, 0, 0],
        "upward": [0, 1, 0]
    },

    "transform": {
        "scale":     [1, 1, 1],
        "translate": [0, 0, 0],
        "rotate": [0, 0, 0, 1],

        "models": [
            "woman/woman.obj"
        ],

        "styles": [
            "wCurly.hair"
        ]
    }
}

Implement Multi-Layer Alpha Blending

After implementing #42 we can pretty easily transform our vanilla PPLL into the technique due to Salvi and Vaidyanathan called Multi-Layer Alpha Blending. The general idea is to insert the first k-fragments into the PPLL and resolve this like usual, and then merge the remaining fragments (without sorting them of course).

Implement a Simple Line Renderer in Vulkan

Now that the hair geometry is in main memory, we want to upload this onto the GPU and start doing useful stuff. A good first step is to just render the hair geometry as simple hair primitives to start things off, this will also be good to sanity check that everything on the Vulkan side has been setup correctly. Though, as we will be mostly doing compute-stuff, we need to create a compute queue as well as a graphics queue after that :)

Vulkan Setup in Progress

  • Validation layer and surface extension checks
  • Instance and surface creation via GLFW 3 API
  • Enabled Vulkan validation layers and callback
  • Enumerate and find suitable physical devices:
    - Discrete GPU + graphics + compute queue
    - Prefer GPUs with WSI presentation support
    - Prefer GPUs with a lot of graphics memory!
  • Setup swapchain + create image views into it
  • Create graphics and (later) compute pipelines
  • Setup shader module compiler via spirv-tools
  • Command pool and command buffer creation
  • Render pass and simple command dispatches
  • Presents framebuffers (plus double buffering)
  • Setup vertex binding descriptor for geometry
  • Finally, setup any uniform descriptor sets too.
  • Ah, also setup the depth buffer. Forgot that :)

That should be most of it. I'm writing a thin wrapper to make lifetime/boilerplate handling a bit less painful.

Implement "Approximated Deep Shadows"

This feature depends on issue #10 being implemented first. As discussed, we'll start by implementing the Approximated Deep Shadow Map technique like in Crystal Dynamic's 2013 Tomb Raider. It works like this:

  1. Render hair geometry from the perspective of a light source (only directional for now, no omni-light)
  2. Get the depth buffer from this rendering pass, and provide it as input into the next rendering passes.
  3. When shading, use the depth buffer to estimate shadows by filtering with a e.g. 3x3 kernel that does: depth comparisons against neighboring pixels to estimate a "depth" for a strand inside the hair style, this provides a weighted average of the depth for a pixel (x, y). This is somewhat similar to e.g. SSAO.

Jason's pseudo-code goes something like this:

// I think we ended up fudging with these numbers a bit, but this is what was in my notes

kernelSize = 3

// localfiberSpacing controls the strength of the shadow factor
// Larger localfiberSpacing means lighter shadow
// Smaller localfiberSpacing means darker shadow

fiberSpacing = 250

invFiberWidth = 1 / (fiberSpacing * fiberRadius)  -> FiberRadius was exposed through TressFX data on the hair. Though I forget what it was

shadow = 0

scale = ShadowMapRes

lightLinearDepth =   linear depth of hair being shaded from light pov

weightTotal = 0

for x = (1-kernelSize)/2 ; x <= kernelSize/2 ; x++
for y = (1-kernelSize)/2 ; y <= kernelSize/2 ; y++
{

    sigma = (kernelSize/2)/2.4 // standard deviation, when kernel/2 > 3*sigma, it's close to zero, so can use 1.5 instead

    exponent = -1 * (x^2 + y^2)/ (2*sigma^2)

    localWeight = 1 / (2*PI*sigma^2) * pow( e, exponent )


    localShadowSample = linearized depth from shadow map offset by current <x, y> pair

    shadow += ApproximateDeepShadow(linearShadowSample, lightLinearDepth, invFiberWidth) * localWeight

    weightTotal += localWeight
}

shadow /= weightTotal

ApproximateDeepShadow( linearShadowDepth, linearLightDepth, invFiberWidth)
{
    depthRange = max(0.f, linearLightDepth - linearShadowDepth)
    numFibers = depthRange * invFiberWidth

    if(depthRange > 1e-5) numFibers += 1

    // stronger shadow term the deeper the fragment is within the hair
    return pow(abs(1 - fiberAlpha), numFibers) // fiberAlpha was set through hair params and tuneable by artist
}

Generate the Min/Max MIP Volume

We want to generate two mip-hierarchies from the volume, one for the min and one for the max. We'll use this to find e.g. silhouette areas in screen-space (by finding the correct mip-level that translates to one pixel). We can find these mip-volumes by iteratively downsampling the volume at level-N, beginning at level-0, that is our original volume (e.g. 256^3 voxelization). This downsampling happens by taking the 2x2x2 max/min of the previous level. e.g. min(v[i,j,k], v[i+1,j,k], v[i,j+1,k],...,v[i+1,j+1,k+1]) (a total of 8 voxels).

Integrate Raytracer into the Swapchain

Right now the raytracer produces the correct rendered output to a vkhr::Image, but we need a way to get it into the Vulkan framebuffer somehow. Right now my strategy will be to render to the vkhr::Image image, create a vkpp::DeviceImage (i.e. a staged vkpp::Image) with that data, and create vkpp::ImageView from it. I am still a bit uncertain on what's the best way to update it, since the results might change every frame or so.

In my initial implementation I plan to have two vkhr::Image for double buffering. The backbuffer is updated, i.e. raytraced in this case, and then uploaded to GPU memory along with the descriptor sets. The frontbuffer is shown by setting the correct descriptors for the combined image sampler, and then using my billboards shader. The backbuffer is made into a frontbuffer when the render to the backbuffer is done.

Still now completely 100% if the approach is the best one out there, but we can come back and fix this later.

Make PCF Loop Bounds Constant

I've found that we can save 3-4ms on the strand self-shadows of the ADSM approach if the PCF kernel radius is known at shader compilation time, it looks like glslc is unrolling the loop and making more assumptions. I know this for sure makes a difference on my laptop, I still need to profile the benefits on the AMD-machine (I've heard that performance can sometimes even decrease if the compiler is too aggressive in doing loop unrolling on GCN, so I'll profile to make sure this is worth it). For reference, on my laptop, in the worst case (i.e. hair occupies the entire screen) I get ~9.8ms for rendering the ponytail, without any shadows, I get 5.8ms, and without any shading at all, 4.8ms (an unavoidable cost of rasterizing 1.8M lines). So a performance breakdown gives us:

  • Rasterization: 4.8ms
  • Kajiya-Kay: 1ms
  • 3x3 PCF ADSM: 4ms
  • Total cost: 9.8ms

By having constant loop bounds the entire pass instead takes ~6.8ms, and therefore we'll get these results:

  • Rasterization: 4.8ms
  • Kajiya-Kay: 1ms
  • 3x3 PCF ADSM: 1ms
  • Total cost: 6.8ms

For further reference, on the beefy AMD-machine, the unoptimized version (that took 9.8ms on my laptop), takes around 1.5ms for the entire pass (i.e. total cost is 1.5ms). I haven't tested the optimized version yet, but if we assume the speedup translates to AMD-hardware as well, we'll go from 1.5ms to ~1ms. This will leave us more room to do other cool things (and we still need to think about leaving time for the OIT), so I think this optimization is worth trying out.

I'll see if I can make the offending calculation go away, or just assume that we use 3x3 kernels all the time. A way would be to use specialization constants, and re-compile the shaders when the PCF radius is modified.

Simulate ADSM in Volume Rendering

One of the components that will be missing even with Kajiya-Kay is the directional shadow component. The idea of ADSM should be fairly easy to emulate in the raymarcher, and with coarsely granular rays (I think). It boils down to this: find surface, shade surface, raymarch towards the light and approximate_deep_shadows. We should even be able to pass in the same arguments as in the strand-based solution. This can't be hard...

Estimate AO with the GPU Raymarcher

Now that we have a reference AO solution (#25) we should try to approximate it by using our raymarcher. In a nutshell, we can do this by shooting rays from the camera toward the strand that is to be shaded, and then "counting" the number of strands in the way (which is the local density of hair calculated by the voxelization). This gives us the expected AO as seen from the camera's point-of-view. This is of course not the same thing that is being calculated by the raytracer, so an alternative solution would be to sample in the neighborhood around the strand we want to find the AO of (or even shoot rays from the strand in random directions, and accumulating the number of strands encountered). I think it's worth a shot to try both approaches, and see what seems most reasonable with the ground-truth (the second one is likely to match up with the raytracer).

Manually Fix SPIR-V Binding Collisions

The compute shaders under share/shaders/compute-curve have been converted from HLSL but are still not really valid since they have one / two binding collisions each. I'll sprinkle some [[vk::binding]] directives.

Support Cem Yuksel's HAIR Format

Most research papers on hair rendering use Cem Yuksel's HAIR file format for the geometry, since there is a large repository of existing hair styles provided in: http://www.cemyuksel.com/research/hairmodels. Write a basic HAIR file loader, more-or-less like cyHair, but optimize the file IO (uses C-style freads), and interface.

Usage:

vkhr::HairStyle curly_hair { "share/style/wCurly.hair" };

if (!curly_hair) {
    std::cerr << "Failed to load hair style!" << std::endl;
    return -1;
}

for (auto vertex : curly_hair.vertices) {
    // Do some processing on the vertices.
}

curly_hair.save("share/style/wCurly.hair.pre-processed");

if (!curly_hair) {
    std::cerr << "Failed to save hair style!" << std::endl;
    return -2;
}

Integrate a "Fake" Strand Thickness

In order to properly implement #29 and to match the raytraced output better, we want to integrate a strand thickness into the rasterized line output. Since there's no way to change the line thickness in the shader, we will either need to expand the line to a billboard, or simply just fade out the rasterized line at the tips to fake the strand becoming thinner. This last solution is of course limited, since the radius can't change throughout the hair geometry, but on the other hand, it is really simple. If we can get similar levels of performance using the billboard approach (I doubt it though), by expanding using the geometry shader, we can always use that.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.