Any numbers we can see for performance. Been interested to see if a Jax LBM implementa

Performance numbers about xlb HOT 4 CLOSED

loliverhennigh commented on August 23, 2024

Performance numbers

from xlb.

Comments (4)

mehdiataei commented on August 23, 2024 1

Hello Oliver,

We will soon release the performance metrics in an accompanying paper. Roughly speaking, when compared to a fully fused LBM kernel in a state-of-the-art C++ benchmark code, our version is approximately 6-7 times slower for lid-driven cavity flow. However, assuming that the BC kernel in the C++ isn't fused (which is often the case if you want to leverage the complex BCs that are available in XLB), the performance gap narrows to roughly 3-5 times (this is also the case if you compare the performance for periodic BCs, such as the performance test case in Lettuce). While I haven't run tests on V100, preliminary tests suggest it is significantly faster than Lettuce.

A major advantage is that our code has ~96% scaling efficiency on a single DGX node and maintains respectable scaling even on up to 512 GPUs. As far as I remember, Lettuce wasn't multi-GPU (or multi-node) capable.

It's worth noting that there are ongoing work to close this performance gap further by integrating Triton kernels into portions of the code.

from xlb.

mehdiataei commented on August 23, 2024 1

Hey Oliver.

Thanks for your comments. In fact, we have discussed using Warp specifically for visualization tasks with NVIDIA extensively! (we are collaborating with NVIDIA JAX team on this project FYI). Warp and USD would be quite useful for this purpose, especially when dealing with simulation with multi-billion voxels.

The issue with Warp is the license agreement, which is incompatible with Apache 2.0. I have raised this issue with directors at NVIDIA, but haven't heard back of any progress.

I am happy to chat more about this if your're interested. I have added you on Linkedin or pls shoot me an email at [email protected].

from xlb.

loliverhennigh commented on August 23, 2024

Fantastic! This is very exciting to hear. Have you considered using either Taichi Lang or Warp for writing the kernels (https://github.com/NVIDIA/warp)? I have experience with both and found them to be particularly good for things like this. I have an LBM solver implemented in Warp and am getting same performance as FluidX3D (https://github.com/ProjectPhysX/FluidX3D). Warp also has pretty good Jax integration I think. I haven't tried implementing LBM in Taichi but have a explicit finite volume solver and it appears to be getting SOA performance although I am less confident of that. I have also tried Triton a bit but found it a little difficult to get working for this kind of work. If you do implement in Triton I will be very interested to see how it goes though :).

from xlb.

loliverhennigh commented on August 23, 2024

Sorry one more comment, if you are interested in getting the rendering stuff like in fluidX3D running I would also suggest looking at either Warp or Taichi. Implementing ray marching/tracing is kinda complicated in a tensor based framework like Jax. I can't imagine implementing it in Triton. Here is a very simple ray marching on the density contours of a FV solver in Taichi. https://www.youtube.com/watch?v=xcZcHbvMe-g.

from xlb.

Performance numbers about xlb HOT 4 CLOSED

Comments (4)

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent