Giter Site home page Giter Site logo

caffeineviking / vkhr Goto Github PK

View Code? Open in Web Editor NEW
433.0 23.0 34.0 2.44 MB

Real-Time Hybrid Hair Rendering using Vulkan™

License: MIT License

Makefile 0.61% C++ 69.50% Lua 0.53% Shell 0.80% Python 0.29% GLSL 3.97% R 0.67% C 22.10% Objective-C 1.54%
real-time hybrid hair renderer vulkan cpp glsl rasterizer volume-renderer novel

vkhr's Introduction

Real-Time Hybrid Hair Renderer in Vulkan™

Lara Croft's Ponytail

This is a proof of concept hair renderer that's based on a novel hybrid approach. It's capable of rendering strand-based hair geometry in real-time, even for fully simulated hair styles with over 100,000 strands. Our hybrid rendering pipeline scales in the performance/quality domain by using a rasterizer for closeup shots and a raymarcher for level of detail minification. We can render multiple hair styles at once with this hybrid technique, which has a smooth transition between the two solutions.

In the figure above we see the ponytail from TressFX 3.1 (with 136,320 hair strands and 1,635,840 line segments) rendered in real-time (7ms) with our rasterized solution. If we want to render more than a couple of characters on the screen at once, we're going to need a more scalable solution, as it doesn't scale for far away distances. This is where our raymarcher comes in, as its performance scales linearly with the fragments rendered on the screen, and is a also lot cheaper for far away hairs. However, our raymarcher's performance breaks down during close-up shots, and also looks "worse" than our rasterizer for those cases, as it's an approximation of the real geometry. The trick is to use both of the solutions together! The rasterizer may perform worse, but it still produces high-quality close-up shots, and performance scaling isn't too bad in those cases. Our raymarcher performs and scales better at a distance and is indistinguishable from the rasterized results in those cases. The figure below shows both the rasterized and raymarched solutions together. Each screenshot is split in the middle, with the left part being the rasterized solution, and the right side the raymarched solution. We have alpha blended these two in the middle to simulate a level of detail transition. As you can see the results are quite close and the transitions are not that noticeable from far away. There's however a huge (up to 5-6x) performance difference that scales proportional to distance.

It is written 100% from scratch in Vulkan™ and simulates light scattering, self-shadowing and transparency in real-time. We do anti-aliasing by using a fast line coverage method. Our volumetric approximation is derived from the original geometry by a fast voxelization scheme that's used for raymarching and direct shading on an isosurface. We also use this volume for ambient occlusion and for more precise self-shadowing even in the rasterized case. Our work was published at EGSR 2019.

Hybrid Hair Render

Outline

For the rest of this document you'll find out about the features in our renderer and the benefits of using a hybrid approach when rendering hair, by looking at its performance. Once you're intrigued, we'll show you how to build, and run the project yourself so you can try it out. I have written lots of documentation if you want to find out how it works or which limitations there are. You'll find more screenshots towards the end of this document and in the Captain's Log found in the project wiki.

Table of Contents

Features

A real-time hybrid hair rendering pipeline suitable for video games, that scales in the performance and quality domain. It is:

  • Written from scratch in modern C++17 with minimal dependencies,
  • Uses the Vulkan™ API with a lightweight wrapper: vkpp, written for modern C++17, with proper lifetime management,
  • Has a built-in raytracer based on Intel's Embree® with a CMJ sampler to compare ground-truth global effects, like AO,
  • Loads Cem Yuksel's free & open .hair file format, and has a easy human-readable scene graph format based on JSON,
  • Consists of a strand-based hair rasterizer and a volume raymarcher.

It uses this rasterized solution for close-up shots, and our raymarched solution for level of detail. This hybrid hair renderer:

  • Models single light scattering in a strand with Kajiya-Kay's shading,
  • Estimates hair self-shadowing with a fast Approximated Deep Shadow Map (ADSM) method à la Tomb Raider (2013),
  • Produces anti-aliased strands by using a simple, but effective, line coverage calculation similar to Emil Persson's GPAA,
  • Resolves strand transparency with a fragment k-Buffer PPLL similar to TressFX's OIT that's built and sorted on the GPU,
  • Has a scalable level of detail scheme based on volume ray casting.

This novel volumetric approximation for strand-based hair can be found once per-frame for fully simulated hair. It features:

  • A very fast compute-based strand voxelization technique for hairs,
  • An approximation of Kajiya-Kay's model by finding the tangents inside of a volume by quantized strand voxelization,
  • An ADSM equivalent, that also takes into account the varying hair spacing by using the actual strand density as input,
  • A way to approximate the local ambient occlusion by using the same strand density (a useful representation for hair),
  • An approximation of transparency (no DVR!) for low density areas.

Our hybrid rendering solution combines the best of strand- and volume-based hair representations. Some benefits are that:

  • It is faster than purely raster-based techniques in the far away case,
  • The performance is more predictable and configurable as raymarching scales linearly with the hair's screen coverage,
  • The level of detail transition is quite smooth because both the rasterizer and raymarcher approximate similar effects,
  • The ambient occlusion and other global effects are trivial to estimate in a volume but impossible in a pure rasterizer,
  • It is automatic as our voxelization works even with simulated hairs.

Benchmarks

Along with this project we bundle a set of benchmarks that can be run by passing the --benchmark yes flag. They compare the performance between the rasterizer and raymarcher and how these performance scale (e.g. with respect to increasing distances or strands). In order for you to get an idea if our solution is good enough for your purposes, we have included the results from our paper, which were run on a Radeon™ Pro WX 9100. The results were taken with V-Sync off and without any other GPU intensive programs running in the background. The timing information was taken via Vulkan timestamp queries, and averaged over a period of 60 frames (not much variance). We have plotted the results below for your viewing pleasure.

Performance and Memory Breakdown

In the above plot to the left we see how the rasterizer and raymarcher fare at different distances, and how much time each rendering pass takes. For the near cases (e.g. a character close-up) the raymarched solution is around twice as fast, but the fidelity isn't as good, as strands appear to be clumped together because of the volume approximation. On the other hand, the rasterized solution produces high-quality output as each strand is individually distinguishable. However for far away to medium distances, these small details are not noticable, and the rasterized and raymarched solution are indistinguishable. The raymarcher on the other hand is now 5x faster in these distances! It has better scaling with distance for far away shots.

Setup: Rendering at 1280x720, Ponytail Scene, V-Sync Off, 1024x1024 Shadow Maps, 256³ Volume, 512 Raymarching Steps.

The raymarcher also doesn't have to produce shadow maps, which would scale linearly with the number of light sources for the scene. Finally, notice that the strand voxelization is quite cheap and does not account for much of the total render time. In the memory department, the figure to the right shows the GPU data breakdown. When comparing to the original strand-based geometry, the volume does not consume an inordinate amount of memory, and this value can also be tweaked with the volume resolution. The main culpit are the PPLL nodes that are used for our transparency solution. These scale with the resolution and also depend on how many strands are being shaded (and might lead to artifacts if memory underallocated).

Performance Scaling

For the two plots above we see how performance scales for each renderer with respect to screen coverage and number of hair strands. The raymarcher has a lower intercept, making it cheap to render for low screen coverage (far away distances). Performance on the rasterizer scales linearly with the number of hair strands (as expected), and also for the raymarcher but with a very slow slope (caused by the voxelization). Our technique works especially well for realistic amounts of hair, where anything less than ~20,000 strands of hair will look bald. While the scaling on the right doesn't look very promising for the raymarcher, its performance can be tuned by changing the number of raymarch steps that moves the intercept up / down.

Dependencies

  • premake5 (pre-build)
  • Any Vulkan™ 1.1 SDK
  • glfw3 (tested v3.2.1)
  • embree3 (uses v3.2.4)
  • Any C++17 compiler!

All other dependencies are fetched using git submodules. They include the following awesome libraries: g-truc/glm, ocurnut/imgui, syoyo/tinyobjloader, nothings/stb and nlohmann/json. The C++17 Vulkan wrapper: vkpp is being developed alongside this project. It will at a later time be split into another repository: vkpp, when I have time to do it.

Compiling

  1. First, make sure you've changed your current directory to vkhr
  2. Then do git submodule update --init --recursive --depth 1
    • Description: fetches submodule dependencies to foreign
  3. Since we use premake, you'll most likely need to fetch it as well:
    • Tip: there's pre-generated Visual Studio solutions in build
      • if you're happy with that, you can skip the steps below
    • Unix-like: just install premake5 with your package manager
  4. Now make sure you have the glfw3 external dependency solved
    • Unix-like: just install glfw with your package manager too
    • Visual Studio: pre-built version is already provided for you!
  5. Finally, you'll also need Embree for the hair raytracing back-end:
    • Unix-like: just install embree using your package managers
    • Visual Studio: pre-built version is already provided for you!
  6. Generate the vkhr project files by targeting your current setup
    • Visual Studio: premake5 vs2019 or my alias make solution
      • then open the Visual Studio project in build/vkhr.sln
      • you might have to retarget the VS solution to your SDK
    • GNU Makefiles: premake5 gmake or just call make all/run.
  7. Build as usual in your platform, and run with bin/vkhr <scene>.

Distribution

Install: if you're on Arch Linux it's as simple as running makepkg -i.

For Windows just call make distribute for a "portable" ZIP archive.

The client then only needs a working Vulkan runtime to start vkhr.

System Requirements

Platforms must support Vulkan™.

It has been tested on these GPUs:

  • NVIDIA® GeForce® MX150,
  • Radeon™ Pro WX 9100 Graphics,
  • Intel® HD Graphics 620.

on Windows 10 and GNU / Linux.

Usage

  • bin/vkhr: loads the default vkhr scene share/scenes/ponytail.vkhr with the default render settings.
  • bin/vkhr <settings> <path-to-scene>: loads the specified vkhr scene, with the given render settings.
  • bin/vkhr --benchmark yes: runs the default benchmark and saves it to a CSV file inside benchmarks/.
    • Plots can be generated from this data by using the utils/plotte.r script (requires R and ggplot).
  • Default configuration: --width 1280 --height 720 --fullscreen no --vsync on --benchmark no --ui yes
  • Shortcuts: U toggles the UI, S takes a screenshots, T switches between renderers, L toggles light rotation on/off, R recompiles the shaders by using glslc (needs to be set in $PATH to work), and Q / ESC quits the app.
  • Controls: simply click and drag to rotate the camera, scroll to zoom, use the middle mouse button to pan.
  • UI: all configuration happens in the ImGUI window that is documented under the Help button in the UI.
  • man docs/vkhr.1 will open the manual page containing even more detailed usage information for vkhr.

Documentation

You're reading part of it! Besides this readme.md, you'll find that most of the important shaders are nicely documented. Two good examples are GPAA.glsl for the line coverage calculations, and approximate_deep_shadows.glsl for the self-shadowing technique. You'll notice that the quality of it varies quite a bit, feel free to open an issue if you sense something isn't clear. I haven't documented the host-side of the implementation yet as that would take too long, and isn't that interesting anyway.

If you want a high-level summary of our technique read Real-Time Hybrid Hair Rendering, which is a short conference paper on our method (only the pre-print). You'll also find a copy of it here, which you can build by using LaTeX. If you want a more extensive and detailed version of our paper, my thesis Scalable Strand-Based Hair Rendering, will soon be available. Both of these also show the difference between our technique and other existing frameworks like TressFX, that only use a rasterizer.

And if you still haven't had enough, I have written a bunch of entries in the Captain's Log, that shows the progress log from day 1 to the current version. Besides having a lot of pretty pictures, it shows the problems we encountered, and how we've solved them. This gives a bit more insight into why we have chosen this approach, and not something completely different. The slides for my thesis defense and presentation at EGSR 2019 could also be useful to get an overview into our technique.

Directories

  • benchmarks: output from the bundled benchmarks goes in here.
  • bin: contains the built software and any other accompanying tools.
  • build: stores intermediate object files and generated GNU Make files.
    • obj: has all of the generated object files given under compilation.
    • Makefile: automatically generated by executing premake5 gmake.
    • *.make: program specific make config for augmenting Makefile.
    • you'll also find the pre-generated Visual Studio '17 solution here.
  • docs: any generated documentation for this project is over here.
  • foreign: external headers and source for libraries and modules.
  • include: only internal headers from this project should go here.
    • vkhr: internal headers for the Vulkan hair renderer project.
    • vkpp: headers for a minimal modern C++ Vulkan wrapper.
  • license.md: please look through this very carefully.
  • premake5.lua: configuration file for the build system.
  • readme.md: this file contains information on the project.
  • share: any extra data that needs to be bundled should go here.
    • images: any images on disk that should be used as textures.
    • models: the meshes/models/materials to be used in the project.
    • shaders: all of the uncompiled shaders should go over here.
    • scenes: any sort of scene files (e.g. in json) should go here.
    • styles: the hair styles compatible with the Cem Yuksel format.
  • src: all source code for the project should be located below here.
    • vkhr: source code for the Vulkan hair renderer project itself.
    • vkpp: full implementation of an Vulkan C++ wrapper (separate).
    • main.cc: the primary entry point when generating the binaries.
  • utils: any sort of helper scripts or similar should be over here.

Reporting Bugs

vkhr is 100% bug-free, anything that seems like a bug is in fact a feature!

This is a proof-of-concept research prototype, and as such, I wouldn't recommend using it for something serious, at least as it is. Also, do not expect this repository to be well maintained, I will not spend too much time with it after the thesis is done.

Still, if you find anything, feel free to open an issue, I'll see what I can do :)

Acknowledgements

First I would like to thank Matthäus Chajdas, Dominik Baumeister, and Jason Lacroix at AMD for supervising this thesis, and for always guiding me in the right direction. I'd also like to thank the fine folk at LiU for providing feedback and support, in particular, my examinator Ingemar Ragnemalm and Harald Nautsch at ISY and Stefan Gustavson from ITN. I would also like to thank AMD and RTG Game Engineering for their hospitality and friendliness, and for letting me sit in their Munich office.

Legal Notice

The Vulkan Logo

Vulkan and the Vulkan logo are registered trademarks of Khronos Group Inc.

All hair styles are courtesy of Cem Yuksel's great HAIR model files repository.

The ponytail and bear hair geometry are from the TressFX 3.1 repository, and proper rights have been granted by AMD Inc. to be used in this repository. However, you are not allowed to use it outside of this repository! i.e. not the MIT license for it!

The woman model was created by Murat Afshar (also for Cem Yuksel's repo).

Everything in this repository is under the MIT license except the assets I've used. Those fall under the license terms of their respective creators. All of the code in this repository is my own, and that you can use however you like (under the license).

Both GLFW and Embree are pre-compiled to facilitate building on Windows.

See: foreign/glfw/COPYING and foreign/embree/LICENSE for these licenses.

Screenshots

TressFX Ponytail
The screenshot above is another render of the ponytail from TressFX (with 136,320 hair strands), but from a different angle.

Big Bear from TressFX Enhanced Fur for Bear
Above are screenshots of the bear from TressFX 3.1 (961,280 fur strands and 3,845,120 line segments) rendered in real-time.

Component-by-Component Difference in the Two Solutions
In the figure above we show the component-by-component difference between our rasterized and raymarched solutions.

Original  Tangents Voxelized Tangents
The comparisons above shows the differences between the actual tangents (on the left) and their voxelized approximations.

Raytraced  AO Rasterized AO Raymarched AO
Above is a comparison of the ground-truth AO (on the left) from our raytracer and our approximation (middle and right).

Aliased Ponytail Anti-Aliased Ponytail
Here we show the difference between not handling anti-aliasing and transparency at all (on the left) and when doing so ;-).

vkhr's People

Contributors

caffeineviking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vkhr's Issues

Approximate Kajiya-Kay's Specular Component

Now that volume rendering sort of works and the normals seem to be of good-enough quality, we want to try and do Kajiya-Kay on the shell of the volume. The problem is that Kajiya-Kay uses strand tangent for this, and we only have the normal. We can either try to derive the tangent somehow (e.g. cross(N, V) maybe?) and do regular Kajiya-Kay somehow. Try to approximate the specular component of Kajiya-Kay like in Pixar?

Implement Basic Shadow Mapping Pipeline

In order to support strand self-shadowing in the rasterizer, a way to generate shadow maps will have to be done. We'll start by creating vanilla shadow maps like those in "Casting Curved Shadows on Curve Surfaces" and after that extend this to support the techniques optimized for strand-based self-shadowing. Also, it's a lot more manageable to implement this iteratively, since going full e.g. Deep Opacity Map could be hard to get it working right away, and since it depends on something like shadow maps, it's a good starting point.

This involves a bit more changes to the current Rasterizer, but after the refactor I made, it should be a lot easier to implement this now. AFAIK we need a render pass that outputs the depth buffer from the lights.

Simulate ADSM in Volume Rendering

One of the components that will be missing even with Kajiya-Kay is the directional shadow component. The idea of ADSM should be fairly easy to emulate in the raymarcher, and with coarsely granular rays (I think). It boils down to this: find surface, shade surface, raymarch towards the light and approximate_deep_shadows. We should even be able to pass in the same arguments as in the strand-based solution. This can't be hard...

Manually Fix SPIR-V Binding Collisions

The compute shaders under share/shaders/compute-curve have been converted from HLSL but are still not really valid since they have one / two binding collisions each. I'll sprinkle some [[vk::binding]] directives.

Detach Raytracer from the Rendering Thread

So it doesn't lock the GUI and that we can progressively accumulate the raytraced results until convergance. If we disable (or pause) the rotating light source and any animation (if we eventually get around to having that), then it should be possible. This is desirable for #28, but will be useful in general for the raytraced AO as well.

Add Vulkan Timestamp Queries and Plots

We'll probably want to profile the performance gains (most likely losses) of the SPIR-V conversion, and also have some way to easily profile the app. Of course, we could use RGP to do this, but it would be better to have a way to do this "live". I propose we gather query results for timestamps for each stage of our render, i.e. shadow map rendering pass, depth map rendering pass, each compute dispatch of the OIT solution, the mesh drawing, and (optionally, if our LoD technique uses raster too for strands), raster hair rendering time. We could also be extra fancy and add ImGui plots for the timings, a bit like in the SEED's Vulkan PICA PICA.

Adjust Depth Buffer for Volume Rendering

A possible problem we'll face is that the values in the depth buffer after volume rendering the hair style will be incorrect, and will cause problems when rendering additional geometry. For instance, consider Lara's mesh we are currently rendering: it will be inside the hair geometry, inside the volume. If we don't write the values to a depth buffer in the correct way, the rendering order will be wrong between the hair and the mesh. For voxels with no hair it's easy: we just do discard on that particular raymarch, but areas with hair are tricky. While we have the depth that should be there (via the raymarch), we can't write to the depth buffer. One way would be to take control of the depth buffer manually as a texture2D and do it ourselves, but that has it's own set of problems. A very easy solution is to simply ignore this problem, and render volumetric hair in the "last" pass.

Pre-Filter the Density Volume

I've already fixed some stupid bugs in the volume sampling code, so the results are already better, but the fast version of the strand voxelization algorithm still produces lower-quality output than the segment one. But if we pre-filter it by using e.g. the 3⨯3⨯3 max or average, we might get more accurate, and less noisy results, for a fraction of the price we pay for the segment-based one. I'll post my result of this very shortly.

Add Simple Blinn-Phong Mesh Renderer

We should eventually also add a simple mesh renderer for OBJ files to see the effects of shadow maps on curved surfaces, instead of only the self-shadows we have now, and also to take advantage of the the OIT optimization in Dominik's thesis, which ignores strands that are occluded by an object (e.g. a head). This is not priority work just yet, but I want to keep this issue here so I don't forget about it. My plan is just to use TinyOBJLoader and use the good'ol Blinn-Phong reflection model, to keep things simple. Also by doing this we can more easily compare our results with Cem Yuksel's paper (and anybody that uses his assets), since they usually use the woman.obj (all papers which use his hair assets use it, so we should also do this too).

Try Volume AO with GPU Raycasting

I tried some techniques in #27 that didn't quite work, even if the combined ADSM + 3^3 Filtered Density AO looks good, it would be nice if we could find AO with our voxelized densities such that it matches the ground-truth raytraced results (even if the raycasting turns out to be too expensive, we'll at least have the results that show our volume is not complete bogus). And if it's fast enough, we might even apply it in the minified case!

Here is how I'm going to do it this time: launch rays around a sphere, just like the raytraced version, and see if we can approach the raytraced results (which should happen). After that, we can get a little more fancy and do something like Crassin et al.'s "Interactive Indirect Illumination Using Voxel Cone Tracing", especially in their AO section. Maybe I should have started with trying these methods first, before the ones in #27 :-P.

Integrate a "Fake" Strand Thickness

In order to properly implement #29 and to match the raytraced output better, we want to integrate a strand thickness into the rasterized line output. Since there's no way to change the line thickness in the shader, we will either need to expand the line to a billboard, or simply just fade out the rasterized line at the tips to fake the strand becoming thinner. This last solution is of course limited, since the radius can't change throughout the hair geometry, but on the other hand, it is really simple. If we can get similar levels of performance using the billboard approach (I doubt it though), by expanding using the geometry shader, we can always use that.

Make PCF Loop Bounds Constant

I've found that we can save 3-4ms on the strand self-shadows of the ADSM approach if the PCF kernel radius is known at shader compilation time, it looks like glslc is unrolling the loop and making more assumptions. I know this for sure makes a difference on my laptop, I still need to profile the benefits on the AMD-machine (I've heard that performance can sometimes even decrease if the compiler is too aggressive in doing loop unrolling on GCN, so I'll profile to make sure this is worth it). For reference, on my laptop, in the worst case (i.e. hair occupies the entire screen) I get ~9.8ms for rendering the ponytail, without any shadows, I get 5.8ms, and without any shading at all, 4.8ms (an unavoidable cost of rasterizing 1.8M lines). So a performance breakdown gives us:

  • Rasterization: 4.8ms
  • Kajiya-Kay: 1ms
  • 3x3 PCF ADSM: 4ms
  • Total cost: 9.8ms

By having constant loop bounds the entire pass instead takes ~6.8ms, and therefore we'll get these results:

  • Rasterization: 4.8ms
  • Kajiya-Kay: 1ms
  • 3x3 PCF ADSM: 1ms
  • Total cost: 6.8ms

For further reference, on the beefy AMD-machine, the unoptimized version (that took 9.8ms on my laptop), takes around 1.5ms for the entire pass (i.e. total cost is 1.5ms). I haven't tested the optimized version yet, but if we assume the speedup translates to AMD-hardware as well, we'll go from 1.5ms to ~1ms. This will leave us more room to do other cool things (and we still need to think about leaving time for the OIT), so I think this optimization is worth trying out.

I'll see if I can make the offending calculation go away, or just assume that we use 3x3 kernels all the time. A way would be to use specialization constants, and re-compile the shaders when the PCF radius is modified.

Implement a Simple Line Renderer in Vulkan

Now that the hair geometry is in main memory, we want to upload this onto the GPU and start doing useful stuff. A good first step is to just render the hair geometry as simple hair primitives to start things off, this will also be good to sanity check that everything on the Vulkan side has been setup correctly. Though, as we will be mostly doing compute-stuff, we need to create a compute queue as well as a graphics queue after that :)

Vulkan Setup in Progress

  • Validation layer and surface extension checks
  • Instance and surface creation via GLFW 3 API
  • Enabled Vulkan validation layers and callback
  • Enumerate and find suitable physical devices:
    - Discrete GPU + graphics + compute queue
    - Prefer GPUs with WSI presentation support
    - Prefer GPUs with a lot of graphics memory!
  • Setup swapchain + create image views into it
  • Create graphics and (later) compute pipelines
  • Setup shader module compiler via spirv-tools
  • Command pool and command buffer creation
  • Render pass and simple command dispatches
  • Presents framebuffers (plus double buffering)
  • Setup vertex binding descriptor for geometry
  • Finally, setup any uniform descriptor sets too.
  • Ah, also setup the depth buffer. Forgot that :)

That should be most of it. I'm writing a thin wrapper to make lifetime/boilerplate handling a bit less painful.

Implement GUI for Controlling Render Parameters

Using IMGUI for simplicity as I've already done some tests, and it seems to work fine. Something so we are able to change the scene in the demo, switch between raytracing and rasterization, and controlling params of our technique. Of special interest is the ability to change the LoD of the scalable hair rendering scheme.

Interpolate Strand Tangents in the Raytracer

Right now the strands are flat-shaded in the raytracer, making it hard to compare with our rasterized version. To fix this we need to call rtcInterpolate using Embree (vastly simplified, it's more intricated than that ^^).

Implement a Simulation Pass

There were concerns in the latest call (2018-12-03) that we should maybe have a very simple animation of the strands to validate that the techniques we have implemented work OK when animated as well. A very good question is if the approximated deep shadow solution will completely break apart when animated, as exactly how much visual "error" will happen is uncertain (it could be well within limits, if so, then we're fine).

Crystal Dynamics solves the issues in the technique by splitting the hair style in several parts that have roughly the same "density", and assumed that the density remains mostly unchanged. In our case, since we don't split the hair style we could get very wrong results since the hair density could be completely off. As suggested, we could store a pre-calculated "thickness map", like in Colin-Barré's fast SSS method, and use that to infer the density (this would involve a 3-D texture for the density. Another way would be to store a per-strand value that says how densely clustered strands are in the local neighborhood. The latter technique I believe to be a better solution, and not attempted before in related work AFAIK). Of course, that still won't solve the problem of having a high deviation from the original density values when animated.

We need to think about this. If the error margins are too high we might have to either switch the self-shadow technique, or compensate for it somehow. We should discuss this in the next call as well I think, I won't try to get started on this just yet, as the other issues currently take major implementation priority.

It would also be interesting to see the animated results, especially of the voxelization and how e.g. AO looks like, since a major feature of our method is that the results can be animated (i.e. the technique is "general").

The simplest solution would be to take the TressFX simulation code and integrate it. But for now, we have opted not to do this, since other things were more important. However, if I or anyone else ever gets around to it, the TressFX/TressFXSimulation.hlsl shader seems ripe for the taking (and is under a nice license). Since we are using SPIR-V anyway, and our glslc.py script can compile HLSL --> SPIR-V we can maybe even use the shader as-is, and just provide the right descriptor bindings to it. Anyway, this is future work. Good luck!

Calculate Isosurface Normal by using the Gradient

If we have some sort of SDF we can find the normal N by taking the gradient ∇S of the scalar field S. The idea is to do something similar with our volume, so we can find the normals of our contour. According to the Pixar paper, we should be able to get close to Kajiya-Kay without using the tangent, so it should be possible with a bit of trickery to find a shading model that follows the same highlights as Kajiya-Kay (actually the Kajiya-Kay model was meant to be used for volume rendering). "Volumetric Methods for Simulation and Rendering of Hair" is the Pixar paper. Maybe we can find the tangent by e.g cross(N,V) like in Scheuermann's shader?

Integrate Raytracer into the Swapchain

Right now the raytracer produces the correct rendered output to a vkhr::Image, but we need a way to get it into the Vulkan framebuffer somehow. Right now my strategy will be to render to the vkhr::Image image, create a vkpp::DeviceImage (i.e. a staged vkpp::Image) with that data, and create vkpp::ImageView from it. I am still a bit uncertain on what's the best way to update it, since the results might change every frame or so.

In my initial implementation I plan to have two vkhr::Image for double buffering. The backbuffer is updated, i.e. raytraced in this case, and then uploaded to GPU memory along with the descriptor sets. The frontbuffer is shown by setting the correct descriptors for the combined image sampler, and then using my billboards shader. The backbuffer is made into a frontbuffer when the render to the backbuffer is done.

Still now completely 100% if the approach is the best one out there, but we can come back and fix this later.

Integrate Dominik's Hair Compute Pass

After solving #13 we should actually start trying to match the bindings of the constant buffer and creating the right storage buffers to match those in Dominik's implementation. We should take this in steps, and maybe compare the results of each pass side-by-side (i.e. VKHR and RealTimeHair) and see that they have similar results. I don't expect things to work out-of-the-box without some fixing, so let's do this iteratively.

I've currently inspected and verified local correctness of:

  • ReduceDepthBuffer
  • ClipCurves
  • PrefixSum_1
  • PrefixSum_2
  • Reorder
  • DrawStrands

Implement Multi-Layer Alpha Blending

After implementing #42 we can pretty easily transform our vanilla PPLL into the technique due to Salvi and Vaidyanathan called Multi-Layer Alpha Blending. The general idea is to insert the first k-fragments into the PPLL and resolve this like usual, and then merge the remaining fragments (without sorting them of course).

Implement the Screen-Space Strand Density Metric

In order to guide the transparency and AA method we'll probably want to find the screen-space hair density, and classify fragments (or tiles) as having low or high density with some sort of threshold. We'll use this data to adapt the PPLL and AA algorithms (exactly how, is till TBD, but we have some good ideas already rolling). I can see two ways to do this: use the volume we already have, or to accumulate rasterized lines onto a buffer. i.e. we do more-or-less what we already do with the volume, but project and increment it on a plane instead.

Some ideas would be to conditionally insert fragments into the PPLL or vary the number of fragments we are actually sorting properly (and not just squashing together), or decide how many of the fragments should use a "high-quality" version of the shading. Most of the time we are spending is shading the strands, which is a drastic change from the previous non-OIT version of our shader, that only needed to shade the top-part of the hair-style (and the rest of the fragments were culled). Maybe the SSHD could be used here somehow?!?

Modifiers for Environmental Situations

I must say that I am really impressed ! This is more a feature request / general question for future development.

In most case use, games will need to render hair in a not so clean state.

I am talking about liquids
(water, blood, viscous goo, oil ..)
and solids :
(dirt, grass, leaves, moss, trash ...)
and a mix of both. Sadly, bear should be able to render hair loss due to damage...

There may also be under water (submerged or shower) as well as low gravity, electric shock / electromagnetic, burnt and special form of hair (glowing, lightning, fire, spectral ...)

Are those possible in the future for vkhr ? I believe most new console games will feature great hair and that this is going to be used a lot :)

Find a Isosurface in the Density Volume

After creating #36 we should try to find a good isosurface by raymarching the volume via proxy-geometry. I don't know what's the best way to do this, but one way is to simply "shade" when the density is higher than a certain value e.g. 5 (i.e. at least 5 strands in this voxel), and discard the fragment when we are below this. I'll start by doing this since it's simple, but I'm sure there are better ways to do this. Just to cross check that our volume is indeed suited to do this, I've tried running it through ParaView and use the "Contour" filter with 5:

ponytail-isosurface

Artifacts in Fullscreen

I've noticed that in some cases (if you're unlucky), after swapchain recreation upon going fullscreen, there are artifacts in the hair rendering. This does not happen on non-fullscreen resizes, and seems to be happening on AMD GPUs. Validation layers seem to be screaming a bit at that, so I'll get back to this after the publication. I am just opening this here in case I forget about it.

Generate the Min/Max MIP Volume

We want to generate two mip-hierarchies from the volume, one for the min and one for the max. We'll use this to find e.g. silhouette areas in screen-space (by finding the correct mip-level that translates to one pixel). We can find these mip-volumes by iteratively downsampling the volume at level-N, beginning at level-0, that is our original volume (e.g. 256^3 voxelization). This downsampling happens by taking the 2x2x2 max/min of the previous level. e.g. min(v[i,j,k], v[i+1,j,k], v[i,j+1,k],...,v[i+1,j+1,k+1]) (a total of 8 voxels).

Convert FlatBuffer HAIR to Cem Yuksel's HAIR Format

Some of the given hair models e.g. ponytail.hair and bear.hair are in a FlatBuffers-style binary format. A conversion from FlatBuffer HAIR to Cem Yuksel HAIR will be needed. Load FlatBuffer into memory, and then write all information in a Cem Yuksel style HAIR format. The FlatBuffer HAIR format (seems) to look like this:

  • FlatBuffer Header: [ int32_t numStrands ] followed by [ int32_t verticesPerStrand ],
  • FlatBuffer Vectors: then numStrands*verticesPerStrand many [ float vertexPosition ]

Best approach: import implementation of #1 into existing hair renderer and do it over there (loader written).

Implement the Raytraced Reference AO Solution

We would like to check if our volume-based AO implementation approaches the reference raytraced AO that is the ground truth. The raytracer is lagging behind quite a lot, so basically we use it only to cross-check that the results we get from the rasterizer are valid. In a nutshell, here is what the raytracer should be doing to it:

  1. Shoot rays from the camera towards the scene, for each intersection with a surface point we must shoot:
  2. Random rays in the half-hemisphere (with a normal pointing outside the hair style) and gather the rays that intersect and don't intersect any other hair strands. The ratio of intersected vs non-intersected rays is the amount of ambient occlusion in the scene. Which is what is generally done, as in the thesis: here.

Implement Correlated Multi-Jittered Sampling

In order for us to more easily compare with ground-truth once we've implemented e.g. "Phone-Wire AA", we should multi-sample each raytraced pixel by using a unbiased sampling pattern, such as CMJ as in the paper:

vec2 cmj(int s, int m, int n, int p) {
    int sx = permute(s % m, m, p * 0xa511e9b3);
    int sy = permute(s / m, n, p * 0x63d83595);
    float jx = randfloat(s, p * 0xa399d265);
    float jy = randfloat(s, p * 0x711ad6a5);
    vec2 r = vec2((s % m + (sy + jx) / n) / m,
                  (s / m + (sx + jy) / m) / n);
    return r;
}

Visualize Hair Strand Density as a AO-term

After implementing #18, we want to try interpreting the densities as some sort of AO-term for shadowing. This will require us to write code to upload and sample the volume in the strand's fragment shader. This is also a good opportunity to write and test that the volume is sampled correctly (in the correct coordinate-space).

Implement Raycasting for Gathering Densities

Now that we have the density volume in place we may want a way to gather densities in a certain direction, this is useful to us for at least two reasons: we may want to visualize the volume directly in VKHR, and there is a case where we need to gather the total amount of density up and until a certain point in the volume. In the latter case, we'll use it to feed ADSM, with the number of strands that are occluding the current strand.

I'm still uncertain about the cost of this, but it sounds expensive. I'm planning on doing another solution to feed ADSM, where we do some sort of pre-gather steps, to estimate the global number of hairs in the way.

Emacs Lisp as a scripting language through the Microsoft Language Server Protocol

Hi!

Dear Eric,

I really like this framework however I feel that it lacks somewhat in extensibility. I therefore suggest that you implement an API to the Microsoft language server protocol to communicate with an emacs process. In this way, users could render hair in emacs using the Emacs XWidget package https://www.emacswiki.org/emacs/EmacsXWidgets


(defun render-hair (haircut)
  render-stallmans-haircut )

Cheers,

John

Implement Level of Detail Scheme

Now that each individual component "works" we want to dynamically change between the maxified (strand-based) and the minified (volume-based) solutions "smoothly". Since we'll most likely only have 1-2 hair styles on the screen using the maxified case (relatively expensive because of the rasterization), we want to use our minified-case as much as possible, so our solution can scale nicely in the quality / performance space. For the minified-case we'll probably reduce the voxelization size (which reduces the number of raycasts needed since the step size will be shorter) and apply #41. When we are in the maxified-case we want to use our metric for hair density to blend less fragments in areas of high-density (since they'll have stuff there anyway), that can also be seen as a sort of LoD I guess. We also need to blend between the minified and maxified case, when the transition happens. If our minified case is really good then that might already be OK, but otherwise we can use dithering / blending. Finally, we need a good function metric to when these changes should happen.

Integrate a Basic Embree-based Hair Raytracer

Integrate a basic strand raytracer so we can use it as the ground-truth for comparisons. This first task won't feature a lot of fancy light transport just yet, the core problem this issue tries to address is the matching of the camera parameters between the rasterizer and the raytracer. Match these using simple some Kajiya-Kay.

Voxelize the Hair into a Density Volume

We'll need it if we want to make the ADSM more flexible, since it (right now) assumes the hair style density is constant throughout the volume, which is not completely true. While the quality is already acceptable for the hair styles we have, the technique will completely break down for non-uniform density styles. If we want to make the technique as flexible as DOM, we'll need to solve this properly. Tomb Raider solved this by splitting the style into several parts, with different constant densities for each part. While this works, it requires artist time and manual labor, instead we would like to find the hair density automatically, and store it in a volume. I believe we can make ADSM as general and good looking as DOM, but without having to pay the DOM price (rendering multiple shadow maps is bad, and you need a lot of them in the DOM).

For comparison, the paper by Sintorn and Assarsson (linked below) gets around 13 FPS for a comparable hair style at vastly lower resolutions (800x600!) on comparable hardware (to my MX150-2 laptop GPU, they used a GTX 280 back in 2009), using these Deep Opacity Maps. If we can generate the volumes on the GPU at e.g. 1ms or 0.5ms, and still have ADSM be quite generic as DOM, it will still be vastly faster than any DOM-based technique and still have similar quality. Our technique currently runs at 60 FPS at 1080p on my shitty laptop, and still has 2-3ms to spare (that's in the worst case, i.e. every fragment is processing some hair and reading from the shadow map, for a medium-range viewing perspective, we have around 10ms to spare in total) before we dip below 60 FPS. So yes, quite a bit faster than DOM...

This hair density volume can probably also be used for other useful things, like computing an approximated SSS a lá Frostbite 2 (the thickness-map one from GDC 2011) which AFAIK hasn't been attempted before in the area of hair rendering. The dual-scattering paper is the only "real-time" approximation of scattering we have around, so it would be cool if we could extend the Frostbite method to hair rendering, and have something that may be novel in the area of approximated subsurface scattering of hair (that is good enough for games) We could also maybe do some smart transparency handling by using this too. I've found a paper that seems to be doing something similar: "Hair Self-Shadowing and Transparency Depth Ordering using Occupancy Maps" by Sintorn and Assarsson, with a "Occupancy Map".

Blend Fragments with a Per-Pixel Linked List

After implementing #29 (I've already sort of tested this) we'll see that the ordering of the strands when they have opacity will be wrong, and need to be sorted back --> front to produce the correct result. Therefore, we need a way to sort fragments and blend them. The classic way to do this is by using a PPLL, like in TressFX. It can be made more "efficient" in our case by using our Screen-Space Density Metric which we implemented in #31, and storing more fragments in areas of low-density (has high aliasing, at least within our observations). I've decided to split the PPLL task into several parts so we can more easily track the progress we're doing:

  • Allocate the head pointer buffer (pointers to nodes) and the node heap (stores fragment information).
  • Implement the ppll_next, ppll_link, ppll_head, ppll_data helper in GLSL for the PPLL insertion
  • Insert strand fragments into the PPLL instead of pushing them to out (the current framebuffer target)
  • Check that the head and node structures produce reasonable values when inspecting with RenderDoc
  • Resolve PPLL by sorting and blending the nearest k-fragments, just like the original AMD PPLL method.
    • Transition framebuffer images to VK_IMAGE_LAYOUT_GENERAL and back for the PPLL resolve passes.
    • Visualize fragments in PPLL to make sure we're not just putting junk (even though we validated it).
    • Sort and blend the fragments in the PPLL and mix with the existing opaque framebuffer contents...
    • Make new renderpass to process GUI and other things like that as well (will be needed later here).
  • Sanity check PPLL blending results and look into implementing the better coverage algorithm as well...

Match Settings b.w. Renderer and Raytracer

After implementing #3 and #4, make sure they produce similar results when given the same scene graph. Use simple shading model to sanity check it. The results should be somewhat similar in local illumination. Just for now, let's try to match it with Kajiya-Kay, like we already do in a shader shaders/kajiya-kay.frag.

Infrastructure for Gathering Data

For the paper/thesis we want to gather some performance data to create plots. We are interested in gathering the performance scaling for the various rendering passes when: shading more pixels, when number of strands is increased, and when the size of the volume is changed (not really that relevant, since vertex-based voxelization is still bounded by the strand count). We also might want to measure memory usage scaling (e.g. of the volume, or from the theoretical minimal PPLL node memory usage).

Anyway, we need to build infrastructure to find out the following information and append it to a CSV:

  • Shaded Pixels
  • Strand Count
  • Volume Sizes

Done automatically (e.g. by passing the --benchmark on flag) and putting all gathered data to benchmark. The idea is then to run a R script that will build all of the plots from benchmark/ (i.e. with some automation).

Performance data for every pass is already recorded into a CSV, so appending the above shouldn't be too hard. The only one I'm a bit worried about is how to find the number of shaded pixels, but maybe I can "hack" it, and take a screenshot at the same time and go through the screenshot when creating the plots. i.e. we count the number of pixels that aren't colored the same as the background (it's a hack :-P). Here is more or less how the CSV currently looks like. Notice I take samples without averaging, so we can find the variance of the data later to create "error bars" and maybe can get away with not having CI. I have 60 frames per CSV.

Frame, Bake Shadow Maps, Clear PPLL Nodes, Draw Mesh Models, Draw Hair Styles, Resolve PPLL
0, 0.775556, 0.033037, 0.104741, 6.77674, 3.22652
1, 0.918074, 0.0343704, 0.105037, 6.78978, 3.23022
2, 0.704741, 0.030963, 0.096, 6.33482, 3.03304
3, 0.799111, 0.030963, 0.0955556, 6.32785, 3.0323
4, 0.889926, 0.0321481, 0.103111, 6.76622, 3.23393
5, 0.905185, 0.0318519, 0.0973333, 6.33363, 3.03511
6, 0.844, 0.0333333, 0.103259, 6.7963, 3.236
7, 0.843556, 0.0305185, 0.0942222, 6.33319, 3.03941
8, 0.835704, 0.0336296, 0.102815, 6.76904, 3.23511
9, 0.777481, 0.030963, 0.0957037, 6.32148, 3.02015
...

Convert HLSL OIT Shaders to SPIR-V

Borrow Dominik's BezierDirect HLSL shaders which handle strand transparency in the compute shader, and first to try convert them to SPIR-V, and see if they produce reasonable output. Let's try with dxc first since in the Khronos Meetup's that seems to be recommended. Otherwise, we can also try to use glslc.

Looking into the SPIR-V disassembly and there seems to be some binding collisions in some descriptor sets:

OpDecorate %g_DepthBuffer DescriptorSet 0
OpDecorate %g_DepthBuffer Binding 1
OpDecorate %g_BackBuffer DescriptorSet 0
OpDecorate %g_BackBuffer Binding 1

which means we'll have to sprinkle some [[vk::binding(x, y)]] magic in some places to get it to work OK.

Otherwise it seems "straight forward", as long as the HLSL -> SPIR compiler has done everything else right?

Support Cem Yuksel's HAIR Format

Most research papers on hair rendering use Cem Yuksel's HAIR file format for the geometry, since there is a large repository of existing hair styles provided in: http://www.cemyuksel.com/research/hairmodels. Write a basic HAIR file loader, more-or-less like cyHair, but optimize the file IO (uses C-style freads), and interface.

Usage:

vkhr::HairStyle curly_hair { "share/style/wCurly.hair" };

if (!curly_hair) {
    std::cerr << "Failed to load hair style!" << std::endl;
    return -1;
}

for (auto vertex : curly_hair.vertices) {
    // Do some processing on the vertices.
}

curly_hair.save("share/style/wCurly.hair.pre-processed");

if (!curly_hair) {
    std::cerr << "Failed to save hair style!" << std::endl;
    return -2;
}

Optimize Volume Raycasting

Right now we have a constant step-size when raycasting (still somewhat reasonable since we jump one texel at a time), but a better solution would be to use the min-max volume (or something like that) to make "big strides" until something interesting happens (i.e. there is density there). This optimization is called Heirarchical Spatial Enumeration and can be seen over here. We already make use of Adaptive Termination but not to the same extent as they do. So yeah, right now we are constant-size-stepping through the volume at max resolution independent of the LoD. Also, doing so leads to noisy results at large distances, so we should be using a low resolution volume anyway (which means we need to modify the voxelization resolution, maybe by finding the best inverse projected size of a pixel -> voxel, and finding the resolution from there somehow).

Here is the short list of optimizations:

  • Find the Voxelization Resolution
  • Heirarchical Spatial Enumeration
  • Expand on Adaptive Termination

Create Proxy-Geometry for the Volume Rendering

When the hair style is far away, rasterizing line segments becomes expensive and sort of useless (we only see a little of the detail anyway). Instead, we can use our pretty fast strand voxelization (which only runs once per frame, not twice like the line rasterization) to find the strand density. We can use this density to find a surface on the volume by raymarching through it, and also to find normals by the gradient ∇D of the density field D.

Before doing any of this we need to setup a proxy-geometry so that the fragment shader can be called. That will be the raymarch starting point. A simple one is to simply use the AABB of the hair style (that's also used in the voxelization) as the proxy-geometry. For this we'll need to add a new Drawable, the Volume object in our abstraction which has the correct volume rendering pipeline bound and the correct world transforms too.

volume_rendering

Implement Phone-Wire AA

After implementing #28 we want to "fix" the aliasing in the rasterizer as well, which mostly happens in areas of low-density. We can solve this by using e.g. Phone-Wire AA, which essentially assumes a line is already a pixel-wide, and then seemingly reduces the thickness by assigning the alpha values of the fragment, relative to the thickness reduction ratio. Emil provides us with a sample implementation of it, which works like this:

// Compute view-space w
float w = dot(ViewProj[3], float4(In.Position.xyz, 1.0f));

// Compute what radius a pixel wide wire would have
float pixel_radius = w * PixelScale;

// Clamp radius to pixel size. Fade with reduction in radius vs original.
float radius = max(actual_radius, pixel_radius);
float fade = actual_radius / radius;

// Compute final position
float3 position = In.Position + radius * normalize(In.Normal);

Maybe we can also get away by just using one pixel clamped lines and then changing the alpha component based on the thickness (which we change in the data itself). i.e. we modulate the alpha of the strand as pre-processing step based on the thickness, where the last 10-15% are interpolated towards 0, for thinning out.

Otherwise, we can use the strand coverage calculation as done in TressFX/TressFXRendering.hls#172 as well:

float ComputeCoverage(float2 p0, float2 p1, float2 pixelLoc, float2 winSize)
{
    // p0, p1, pixelLoc are in d3d clip space (-1 to 1)x(-1 to 1)

    // Scale positions so 1.f = half pixel width
    p0 *= winSize;
    p1 *= winSize;
    pixelLoc *= winSize;

    float p0dist = length(p0 - pixelLoc);
    float p1dist = length(p1 - pixelLoc);
    float hairWidth = length(p0 - p1);

    // will be 1.f if pixel outside hair, 0.f if pixel inside hair
    float outside = any(float2(step(hairWidth, p0dist), step(hairWidth, p1dist)));

    // if outside, set sign to -1, else set sign to 1
    float sign = outside > 0.f ? -1.f : 1.f;

    // signed distance (positive if inside hair, negative if outside hair)
    float relDist = sign * saturate(min(p0dist, p1dist));

    // returns coverage based on the relative distance
    // 0, if completely outside hair edge
    // 1, if completely inside hair edge
    return (relDist + 1.f) * 0.5f;
}

Re-organize GUI and Polish Demo

After we're done with everything else we should clean up the GUI and polish the demo. Right now a lot of the functionality is immutable and we should add extra knobs to them so people can experiment with the method we have presented. There are a bunch of things that are still missing: changing hair parameter at runtime, modifying the light direction, adding *.vkhr files for the Cem Yuksel styles, simulation #17 (probably just plug-in TressFX), model rendering for the raytracer so both rasterizer / raytracer scenes match, texture support for the model rendering (so we don't get the white-spots, because our strands are pixel-sized, as done in the "new" Deus Ex), infinite white plane to get shadows in the scenes. More importantly!: move buttons into the correct "separators" and remove the now "unnecessary" buttons as they are confusing. None of these things should take long to fix (maybe combined they can take 1-2 days), but I just haven't gotten around to doing any of this yet.

Also general documentation of the project, including documenting the shaders and the more important parts of the renderer (the wrapper will probably be refactored into a separate project later after the thesis).

Not a priority for now, but I want to keep this here so I remember to fix these things at a later point in time.

In fact, let's make a list:

  • Knobs for changing hair style parameters (diffuse / specular color and shininess) at run-time.
  • Changing light direction via the GUI (not possible right now, only the "color").
  • Add Cem Yuksel hair style scene files.
  • Crawl scenes for any new files, and also add the vkhr scene given via the CLI.
  • Add mesh rendering to the raytracer as well, so we can have more-or-less the same scenes.
  • Add white plane below the hair style for all built-in vkhr scenes to demonstrate shadows.
  • Add simulation #17 running in compute e.g. via TressFX (left as future work).
  • Texture support for have the same results as TressFX 3.1, and remove the "white spots" too.
  • Re-organize the entire GUI to make more sense and allow people to change the parameters.
  • Add way to visualize the light sources (e.g. via Billboard). (isn't needed :-P)

Write the GPU Voxelization Algorithm

We should eventually port the CPU voxelization algorithm to the GPU if we want to update the densities in real-time. Right now CPU voxelization of 1.8M hair vertices takes around 15ms which I guess is "real-time", but it would be nice to do this on the GPU as well. As part of porting Dominik's HLSL shaders I've used the image2D, so allocating an read/write image3D shouldn't be too different. The only part I see as somewhat problematic is that we'll need to use imageAtomicAdd as part of the voxelization algorithm, which requires the underlying format to be r32i or r32u as shown here, and can't be used with r8i and r8ui as we'd have liked (there is talk about format conversion, so it might still be possible). I also don't know the cost of imageAtomicAdd, so I need to find out if its usable. Otherwise it should be quite straightforward, like this:

#version 460 core

#include "volume.glsl"

layout(std430, binding = 0) buffer Vertices {
    vec3 positions[];
};

layout(local_size_x = 512) in;

layout(binding = 2, r32ui) uniform uimage3D density;

void main() {
    vec3 voxel_size = volume_bounds / volume_resolution;
    vec3 voxel = (positions[gl_GlobalInvocationID.x] - volume_origin) / voxel_size;
    imageAtomicAdd(density, ivec3(floor(voxel)), 1);
}

Implement "Approximated Deep Shadows"

This feature depends on issue #10 being implemented first. As discussed, we'll start by implementing the Approximated Deep Shadow Map technique like in Crystal Dynamic's 2013 Tomb Raider. It works like this:

  1. Render hair geometry from the perspective of a light source (only directional for now, no omni-light)
  2. Get the depth buffer from this rendering pass, and provide it as input into the next rendering passes.
  3. When shading, use the depth buffer to estimate shadows by filtering with a e.g. 3x3 kernel that does: depth comparisons against neighboring pixels to estimate a "depth" for a strand inside the hair style, this provides a weighted average of the depth for a pixel (x, y). This is somewhat similar to e.g. SSAO.

Jason's pseudo-code goes something like this:

// I think we ended up fudging with these numbers a bit, but this is what was in my notes

kernelSize = 3

// localfiberSpacing controls the strength of the shadow factor
// Larger localfiberSpacing means lighter shadow
// Smaller localfiberSpacing means darker shadow

fiberSpacing = 250

invFiberWidth = 1 / (fiberSpacing * fiberRadius)  -> FiberRadius was exposed through TressFX data on the hair. Though I forget what it was

shadow = 0

scale = ShadowMapRes

lightLinearDepth =   linear depth of hair being shaded from light pov

weightTotal = 0

for x = (1-kernelSize)/2 ; x <= kernelSize/2 ; x++
for y = (1-kernelSize)/2 ; y <= kernelSize/2 ; y++
{

    sigma = (kernelSize/2)/2.4 // standard deviation, when kernel/2 > 3*sigma, it's close to zero, so can use 1.5 instead

    exponent = -1 * (x^2 + y^2)/ (2*sigma^2)

    localWeight = 1 / (2*PI*sigma^2) * pow( e, exponent )


    localShadowSample = linearized depth from shadow map offset by current <x, y> pair

    shadow += ApproximateDeepShadow(linearShadowSample, lightLinearDepth, invFiberWidth) * localWeight

    weightTotal += localWeight
}

shadow /= weightTotal

ApproximateDeepShadow( linearShadowDepth, linearLightDepth, invFiberWidth)
{
    depthRange = max(0.f, linearLightDepth - linearShadowDepth)
    numFibers = depthRange * invFiberWidth

    if(depthRange > 1e-5) numFibers += 1

    // stronger shadow term the deeper the fragment is within the hair
    return pow(abs(1 - fiberAlpha), numFibers) // fiberAlpha was set through hair params and tuneable by artist
}

Create a Scene Graph for Both Renderers

Since both the rasterizer and the raytracer will render the same scene, we'll need a common scene structure that both use as a base to e.g. upload geometry and build acceleration structures with the correct transform and camera positions. The idea is to build a very simple format for storing this, something like this perhaps:

{
    "camera": {
        "fieldOfView": 75.0,
        "origin": [0, 0, 4],
        "lookAt": [0, 0, 0],
        "upward": [0, 1, 0]
    },

    "transform": {
        "scale":     [1, 1, 1],
        "translate": [0, 0, 0],
        "rotate": [0, 0, 0, 1],

        "models": [
            "woman/woman.obj"
        ],

        "styles": [
            "wCurly.hair"
        ]
    }
}

Estimate AO with the GPU Raymarcher

Now that we have a reference AO solution (#25) we should try to approximate it by using our raymarcher. In a nutshell, we can do this by shooting rays from the camera toward the strand that is to be shaded, and then "counting" the number of strands in the way (which is the local density of hair calculated by the voxelization). This gives us the expected AO as seen from the camera's point-of-view. This is of course not the same thing that is being calculated by the raytracer, so an alternative solution would be to sample in the neighborhood around the strand we want to find the AO of (or even shoot rays from the strand in random directions, and accumulating the number of strands encountered). I think it's worth a shot to try both approaches, and see what seems most reasonable with the ground-truth (the second one is likely to match up with the raytracer).

1024³ Voxelization Crashes

Even though 256³ will probably work for our uses, I'm still curious why the segment-based voxelization goes ka-boom on 1024³. Update: this also happens on the vertex-based voxelization. Also, it doesn't crash on the voxelization itself, it's when we upload the data to the GPU. I'm good on GPU-memory, so it shouldn't be the problem. Maybe something with alignment? Hmm... going to have a closer look once I have more time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.