autodesk / xlb Goto Github PK

XLB: Accelerated Lattice Boltzmann (XLB) based on JAX for Physics-based ML

License: Other

Python 100.00%

xlb's Introduction

XLB: A Differentiable Massively Parallel Lattice Boltzmann Library in Python for Physics-Based Machine Learning

XLB is a fully differentiable 2D/3D Lattice Boltzmann Method (LBM) library that leverages hardware acceleration. It's built on top of the JAX library and is specifically designed to solve fluid dynamics problems in a computationally efficient and differentiable manner. Its unique combination of features positions it as an exceptionally suitable tool for applications in physics-based machine learning.

Accompanying Paper

Please refer to the accompanying paper for benchmarks, validation, and more details about the library.

Citing XLB

If you use XLB in your research, please cite the following paper:

@article{ataei2024xlb,
  title={{XLB}: A differentiable massively parallel lattice {Boltzmann} library in {Python}},
  author={Ataei, Mohammadmehdi and Salehipour, Hesam},
  journal={Computer Physics Communications},
  volume={300},
  pages={109187},
  year={2024},
  publisher={Elsevier}
}

Key Features

Integration with JAX Ecosystem: The library can be easily integrated with JAX's robust ecosystem of machine learning libraries such as Flax, Haiku, Optax, and many more.
Differentiable LBM Kernels: XLB provides differentiable LBM kernels that can be used in differentiable physics and deep learning applications.
Scalability: XLB is capable of scaling on distributed multi-GPU systems, enabling the execution of large-scale simulations on hundreds of GPUs with billions of cells.
Support for Various LBM Boundary Conditions and Kernels: XLB supports several LBM boundary conditions and collision kernels.
User-Friendly Interface: Written entirely in Python, XLB emphasizes a highly accessible interface that allows users to extend the library with ease and quickly set up and run new simulations.
Leverages JAX Array and Shardmap: The library incorporates the new JAX array unified array type and JAX shardmap, providing users with a numpy-like interface. This allows users to focus solely on the semantics, leaving performance optimizations to the compiler.
Platform Versatility: The same XLB code can be executed on a variety of platforms including multi-core CPUs, single or multi-GPU systems, TPUs, and it also supports distributed runs on multi-GPU systems or TPU Pod slices.
Visualization: XLB provides a variety of visualization options including in-situ on GPU rendering using PhantomGaze.

Showcase

On GPU in-situ rendering using PhantomGaze library (no I/O). Flow over a NACA airfoil using KBC Lattice Boltzmann Simulation with ~10 million cells.

DrivAer model in a wind-tunnel using KBC Lattice Boltzmann Simulation with approx. 317 million cells

Airflow in to, out of, and within a building (~400 million cells)

The stages of a fluid density field from an initial state to the emergence of the "XLB" pattern through deep learning optimization at timestep 200 (see paper for details)

Lid-driven Cavity flow at Re=100,000 (~25 million cells)

Capabilities

LBM

BGK collision model (Standard LBM collision model)
KBC collision model (unconditionally stable for flows with high Reynolds number)

Machine Learning

Easy integration with JAX's ecosystem of machine learning libraries
Differentiable LBM kernels
Differentiable boundary conditions

Lattice Models

D2Q9
D3Q19
D3Q27 (Must be used for KBC simulation runs)

Compute Capabilities

Distributed Multi-GPU support
Mixed-Precision support (store vs compute)
Out-of-core support (coming soon)

Output

Binary and ASCII VTK output (based on PyVista library)
In-situ rendering using PhantomGaze library
Orbax-based distributed asynchronous checkpointing
Image Output
3D mesh voxelizer using trimesh

Boundary conditions

Equilibrium BC: In this boundary condition, the fluid populations are assumed to be in at equilibrium. Can be used to set prescribed velocity or pressure.
Full-Way Bounceback BC: In this boundary condition, the velocity of the fluid populations is reflected back to the fluid side of the boundary, resulting in zero fluid velocity at the boundary.
Half-Way Bounceback BC: Similar to the Full-Way Bounceback BC, in this boundary condition, the velocity of the fluid populations is partially reflected back to the fluid side of the boundary, resulting in a non-zero fluid velocity at the boundary.
Do Nothing BC: In this boundary condition, the fluid populations are allowed to pass through the boundary without any reflection or modification.
Zouhe BC: This boundary condition is used to impose a prescribed velocity or pressure profile at the boundary.
Regularized BC: This boundary condition is used to impose a prescribed velocity or pressure profile at the boundary. This BC is more stable than Zouhe BC, but computationally more expensive.
Extrapolation Outflow BC: A type of outflow boundary condition that uses extrapolation to avoid strong wave reflections.
Interpolated Bounceback BC: Interpolated bounce-back boundary condition due to Bouzidi for a lattice Boltzmann method simulation.

Installation Guide

To use XLB, you must first install JAX and other dependencies using the following commands:

Please refer to https://github.com/google/jax for the latest installation documentation. The following table is taken from JAX's Github page.

Hardware	Instructions
CPU	`pip install -U "jax[cpu]"`
NVIDIA GPU on x86_64	`pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html`
Google TPU	`pip install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html`
AMD GPU	Use Docker or build from source.
Apple GPU	Follow Apple's instructions.

Note: We encountered challenges when executing XLB on Apple GPUs due to the lack of support for certain operations in the Metal backend. We advise using the CPU backend on Mac OS. We will be testing XLB on Apple's GPUs in the future and will update this section accordingly.

Install dependencies:

pip install pyvista numpy matplotlib Rtree trimesh jmp orbax-checkpoint termcolor

Run an example:

git clone https://github.com/Autodesk/XLB
cd XLB
export PYTHONPATH=.
python3 examples/CFD/cavity2d.py

Roadmap

Work in Progress (WIP)

Note: Some of the work-in-progress features can be found in the branches of the XLB repository. For contributions to these features, please reach out.

🚀 Warp Backend: Achieving state-of-the-art performance by leveraging the Warp framework in combination with JAX.
🌐 Grid Refinement: Implementing adaptive mesh refinement techniques for enhanced simulation accuracy.
⚡ Multi-GPU Acceleration using Neon + Warp: Using Neon's data structure for improved scaling.
💾 Out-of-Core Computations: Enabling simulations that exceed available GPU memory, suitable for CPU+GPU coherent memory models such as NVIDIA's Grace Superchips.
🗜️ GPU Accelerated Lossless Compression and Decompression: Implementing high-performance lossless compression and decompression techniques for larger-scale simulations and improved performance.
🌡️ Fluid-Thermal Simulation Capabilities: Incorporating heat transfer and thermal effects into fluid simulations.
🎯 Adjoint-based Shape and Topology Optimization: Implementing gradient-based optimization techniques for design optimization.
🧠 Machine Learning Accelerated Simulations: Leveraging machine learning to speed up simulations and improve accuracy.
📉 Reduced Order Modeling using Machine Learning: Developing data-driven reduced-order models for efficient and accurate simulations.

Wishlist

Contributions to these features are welcome. Please submit PRs for the Wishlist items.

🌊 Free Surface Flows: Simulating flows with free surfaces, such as water waves and droplets.
📡 Electromagnetic Wave Propagation: Simulating the propagation of electromagnetic waves.
🛩️ Supersonic Flows: Simulating supersonic flows.
🌊🧱 Fluid-Solid Interaction: Modeling the interaction between fluids and solid objects.
🧩 Multiphase Flow Simulation: Simulating flows with multiple immiscible fluids.
🔥 Combustion: Simulating combustion processes and reactive flows.
🪨 Particle Flows and Discrete Element Method: Incorporating particle-based methods for granular and particulate flows.
🔧 Better Geometry Processing Pipelines: Improving the handling and preprocessing of complex geometries for simulations.

xlb's People

Contributors

Stargazers

Watchers

Forkers

mehdiataei hsalehipour nouiz zhang9song changyu-hu ghas-results loliverhennigh lcrypto dmytrosytnyk harrybraviner massimim cruxdevstuff biackthorn slitvinov sirenri2001

xlb's Issues

assign_fields_sharded with pre-defined initialization

The assign_fields_sharded doesn't work correctly if we define initial rho0 and u0 on multi-GPU systems. It is a simple fix and I'll submit a PR for it. Adding it here so that I don't forget about it.

The demo occurs in paper

The demo in companying paper (6.2. Differentiable flow control with deep learning) very interests me, but I don't find in example/ folder, could you please give me relevant file?

Problem applying pressure based boundary condition using ZouHe

If I want to implement inlet or outlet boundary condition to a specific pressure using ZouHe, it shows a TypeError : mul got incompatible shapes for broadcasting: (998,), (9,). I used nx= 1400, ny= 1000. I tried to apply a specific pressure in left vertical inlet wall ( x=0, y=0:1000 ) . Same problem arrives if I want to apply a specific pressure in right outlet boundary.

Outflow BC

outlet = self.boundingBoxIndices['right']
rho_outlet = np.ones(outlet.shape[0], dtype=self.precisionPolicy.compute_dtype)
self.BCs.append(ZouHe(tuple(outlet.T), self.gridInfo, self.precisionPolicy, 'pressure', rho_outlet))

Simulation Parameters for Cylinder
Parameter | Value

            Omega | 0.5518611517342236
 Grid Points in X | 1400
 Grid Points in Y | 1000
 Grid Points in Z | 0
   Dimensionality | 2
 Precision Policy | f64/f64
     Lattice Type | D2Q9
  Checkpoint Rate | 0

Checkpoint Directory | ./checkpoints
Downsampling Factor | 1
Print Info Rate | 200
I/O Rate | 100
Compute MLUPS | False
Restore Checkpoint | False
Backend | gpu
Number of Devices | 1
Volumetric Ratio of Solid : 0.502656
Time to create the grid mask: 0.3005838394165039
Time to create the local masks and normal arrays: 11.985322952270508
WARNING: Default initial conditions assumed: density = 1, velocity = 0
To set explicit initial density and velocity, use self.initialize_macroscopic_fields.
jax.errors.SimplifiedTraceback: For simplicity, JAX has removed its internal frames from the traceback of the following exception. Set JAX_TRACEBACK_FILTERING=off to include these.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/mnt/c/RunCodeHere/Projects/XLB/Permeability02_benchmark.py", line 297, in
sim.run(t_max)
File "/mnt/c/RunCodeHere/Projects/XLB/src/base.py", line 905, in run
f, fstar = self.step(f, timestep, return_fpost=self.returnFpost)
File "/mnt/c/RunCodeHere/Projects/XLB/src/base.py", line 841, in step
f_poststreaming = self.apply_bc(f_poststreaming, f_postcollision, timestep, "PostStreaming")
File "/mnt/c/RunCodeHere/Projects/XLB/src/base.py", line 804, in apply_bc
fout = fout.at[bc.indices].set(bc.apply(fout, fin))
File "/mnt/c/RunCodeHere/Projects/XLB/src/boundary_conditions.py", line 781, in apply
feq = self.calculate_equilibrium(fout)
File "/mnt/c/RunCodeHere/Projects/XLB/src/boundary_conditions.py", line 738, in calculate_equilibrium
feq = self.equilibrium(rho, vel)
File "/mnt/c/RunCodeHere/Projects/XLB/src/boundary_conditions.py", line 285, in equilibrium
feq = rho * self.lattice.w * (1.0 + 1.0 * cu + 0.5 * cu**2 - usqr)
File "/home/shouvik/anaconda3/lib/python3.9/site-packages/jax/src/numpy/array_methods.py", line 743, in op
return getattr(self.aval, f"{name}")(self, *args)
File "/home/shouvik/anaconda3/lib/python3.9/site-packages/jax/_src/numpy/array_methods.py", line 271, in deferring_binary_op
return binary_op(*args)
File "/home/shouvik/anaconda3/lib/python3.9/site-packages/jax/src/numpy/ufuncs.py", line 99, in fn
return lax_fn(x1, x2) if x1.dtype != np.bool else bool_lax_fn(x1, x2)
TypeError: mul got incompatible shapes for broadcasting: (998,), (9,).

"AttributeError: Unrecognized config option: jax_array" in windtunnel3d.py example

Line 33 jax.config.update('jax_array', True) in windtunnel3d.py throws an exception. Looks it should be removed.

Executing the windtunnel3d.py example in a Google Colab instance throws this exception:

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[/content/XLB/examples/CFD/windtunnel3d.py](https://localhost:8080/#) in <module>
     31 # disable JIt compilation
     32 
---> 33 jax.config.update('jax_array', True)
     34 
     35 class Car(KBCSim):

[/usr/local/lib/python3.10/dist-packages/jax/_src/config.py](https://localhost:8080/#) in update(self, name, val)
     85   def update(self, name, val):
     86     if name not in self._value_holders:
---> 87       raise AttributeError(f"Unrecognized config option: {name}")
     88     self._value_holders[name]._set(val)
     89 

AttributeError: Unrecognized config option: jax_array

This happens on an L4 instance. I install dependencies by executing, in Colab notebook cells:

!pip install pyvista numpy matplotlib Rtree trimesh jmp orbax-checkpoint termcolor

And then cloning the repository and executing the example:

!git clone https://github.com/Autodesk/XLB

%run XLB/examples/CFD/windtunnel3d.py

Looks like the operation in line 33 was removed after the array migration in JAX 0.4.1, per: https://jax.readthedocs.io/en/latest/jax_array_migration.html and this line should just be removed. Thanks!

Significant slowdown jaxlib 0.4.16 vs 0.4.14

In XLB library the MLUPs (Millions of Lattice Updates per Second) has dropped by about 15% after updating jaxlib from 0.4.14 to 0.4.16.

Results on a single RTX 6000 Ada:

Version 0.4.14:

omega =  0.4918839153959666
XLA backend: gpu
Number of XLA devices available: 1
WARNING: Checkpointing is disabled for this simulation.
Time to create the grid connectivity bitmask: 0.17581534385681152
Time to create the local bitmasks and normal arrays: 6.792882680892944
WARNING: Default initial conditions assumed: density = 1, velocity = 0
         To set explicit initial density and velocity, use self.initialize_macroscopic_fields.
Domain: 512 x 512 x 512
Number of voxels: 134217728
MLUPS: 827.4825740398888

Version 0.4.14:

WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1695236429.981993  393900 tfrt_cpu_pjrt_client.cc:349] TfrtCpuClient created.
omega =  0.4918839153959666
XLA backend: gpu
Number of XLA devices available: 1
WARNING: Checkpointing is disabled for this simulation.
Time to create the grid connectivity bitmask: 0.17301297187805176
Time to create the local bitmasks and normal arrays: 6.891621112823486
WARNING: Default initial conditions assumed: density = 1, velocity = 0
         To set explicit initial density and velocity, use self.initialize_macroscopic_fields.
W0000 00:00:1695236443.334162  393900 hlo_rematerialization.cc:2946] Can't reduce memory use below 27.08GiB (29077237923 bytes) by rematerialization; only reduced to 29.54GiB (31714181580 bytes), down from 29.54GiB (31714181580 bytes) originally
Domain: 512 x 512 x 512
Number of voxels: 134217728
MLUPS: 726.4662730520944
I0000 00:00:1695236480.893240  393900 tfrt_cpu_pjrt_client.cc:352] TfrtCpuClient destroyed.

Notice the new warnings tfrt_cpu_pjrt_client and hlo_rematerialization generated in version 0.4.16.

Step to reproduce:
Follow the instructions in the library and run:

python examples/performance/MLUPS3d.py 256 200

(the first input in the number of voxels in each dimension and the second number is the number of iterations)

folder stl-files missing

When executing windtunnel3d.py, get the following issue:

ValueError: string is not a file: stl-files/DrivAer-Notchback.stl

there is no folder stl-files, or any stl file elsewhere.

Delete the pull request that I accidentally created

Hi,
I accidentally created a pull request when working on my fork branch. Can you guys help me delete that because I'm working on behave of my company and the code I wrote might involve copyright issues.
Thanks!

Performance numbers

Any numbers we can see for performance. Been interested to see if a Jax LBM implementation can get close to optimal performance. Lettuce LBM is around 20x slower for example, https://github.com/lettucecfd/lettuce

cavity2d.py NotImplementedError: CopyRange not supported

please tell how to fix it,thanks in advance

Traceback (most recent call last):
File "e:\code\jupyter\XLB\setup.py", line 99, in
sim.run(5000)
File "e:\code\jupyter\XLB\src\base.py", line 928, in run
self.mngr.save(timestep, state)
File "D:\SomeApps\Miniconda3\envs\python39\lib\site-packages\orbax\checkpoint\checkpoint_manager.py", line 515, in save
self._checkpointers[k].save(item_dir, item, **kwargs)
File "D:\SomeApps\Miniconda3\envs\python39\lib\site-packages\orbax\checkpoint\checkpointer.py", line 150, in save
self._handler.finalize(tmpdir)
File "D:\SomeApps\Miniconda3\envs\python39\lib\site-packages\orbax\checkpoint\pytree_checkpoint_handler.py", line 1411, in finalize
type_handlers.merge_ocdbt_per_process_files(directory)
File "D:\SomeApps\Miniconda3\envs\python39\lib\site-packages\orbax\checkpoint\type_handlers.py", line 744, in merge_ocdbt_per_process_files
asyncio.run(open_and_copy())
File "D:\SomeApps\Miniconda3\envs\python39\lib\asyncio\runners.py", line 44, in run
return loop.run_until_complete(main)
File "D:\SomeApps\Miniconda3\envs\python39\lib\asyncio\base_events.py", line 647, in run_until_complete
return future.result()
File "D:\SomeApps\Miniconda3\envs\python39\lib\site-packages\orbax\checkpoint\type_handlers.py", line 742, in open_and_copy
await asyncio.gather(*copy_ops)
File "D:\SomeApps\Miniconda3\envs\python39\lib\asyncio\tasks.py", line 688, in _wrap_awaitable
return (yield from awaitable.await())
NotImplementedError: CopyRange not supported

About the Physics-based machine learning demos

Hello there!

Firstly, I would like to express my gratitude for developing such an invaluable tool.

Currently I'm trying to code the LBM with machine learning by XLB, but I do not know how. And I read form the paper about the Physics-based machine learning demos: Enhancing coarse-grained simulations using deep learning correctors and Diﬀerentiable ﬂow control with deep learning, but I do not find any code about them in the repo.

Could you possibly provide some guidance on how to implement these concepts? Alternatively, another simple demo illustrating the integration of machine learning with LBM would be immensely helpful.

example program running error

python examples/CFD/cavity2d.py
'rm' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
WARNING:absl:Configured `CheckpointManager` using deprecated legacy API. Please follow the instructions at https://orbax.readthedocs.io/en/latest/api_refactor.html to migrate by August 1st, 2024.
Simulation Parameters for Cavity
Parameter | Value

            Omega | 1.2523481527864746
 Grid Points in X | 200
 Grid Points in Y | 200
 Grid Points in Z | 0
   Dimensionality | 2
 Precision Policy | f32/f32
     Lattice Type | D2Q9
  Checkpoint Rate | 1000

Checkpoint Directory | C:\Users\ZZL\Desktop\temp\XLB\checkpoints
Downsampling Factor | 1
Print Info Rate | 100
I/O Rate | 100
Compute MLUPS | False
Restore Checkpoint | False
Backend | cpu
Number of Devices | 1
Time to create the grid mask: 0.11803746223449707
Time to create the local masks and normal arrays: 0.3295261859893799
WARNING: Default initial conditions assumed: density = 1, velocity = 0
To set explicit initial density and velocity, use self.initialize_macroscopic_fields.
Timestep 0 of 5000 completed
Saving data at timestep 0/5000
Saved .\fields_0000000.vtk in 0.003017 seconds.
Saved .\BCs_0000000.vtk in 0.000519 seconds.
Saving checkpoint at timestep 0/5000
WARNING:absl:Attempted to create temporary directory C:\Users\ZZL\Desktop\temp\XLB\checkpoints\0.orbax-checkpoint-tmp-0 which already exists. Removing existing directory since it is not finalized.
Traceback (most recent call last):
File "C:\Users\ZZL\Desktop\temp\XLB\examples\CFD\cavity2d.py", line 96, in
sim.run(5000)
File "C:\Users\ZZL\Desktop\temp\XLB\src\base.py", line 928, in run
self.mngr.save(timestep, state)
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\checkpoint_manager.py", line 1110, in save
self._checkpointer.save(save_directory, args=args)
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\checkpointer.py", line 186, in save
self._handler.finalize(tmpdir)
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\composite_checkpoint_handler.py", line 514, in finalize
handler.finalize(self._get_item_directory(directory, item_name))
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\pytree_checkpoint_handler.py", line 759, in finalize
self._handler_impl.finalize(directory)
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\base_pytree_checkpoint_handler.py", line 1064, in finalize
type_handlers.merge_ocdbt_per_process_files(directory)
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\type_handlers.py", line 670, in merge_ocdbt_per_process_files
asyncio.run(open_and_copy())
File "D:\aruanjian\Python311\Lib\site-packages\nest_asyncio.py", line 30, in run
return loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\site-packages\nest_asyncio.py", line 98, in run_until_complete
return f.result()
^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\asyncio\futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
File "D:\aruanjian\Python311\Lib\asyncio\tasks.py", line 279, in __step
result = coro.throw(exc)
^^^^^^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\site-packages\orbax\checkpoint\type_handlers.py", line 667, in open_and_copy
await asyncio.gather(*copy_ops)
File "D:\aruanjian\Python311\Lib\asyncio\tasks.py", line 349, in __wakeup
future.result()
File "D:\aruanjian\Python311\Lib\asyncio\tasks.py", line 279, in __step
result = coro.throw(exc)
^^^^^^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\asyncio\tasks.py", line 694, in _wrap_awaitable
return (yield from awaitable.await())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\asyncio\futures.py", line 287, in await
yield self # This tells Task to wait for completion.
^^^^^^^^^^
File "D:\aruanjian\Python311\Lib\asyncio\tasks.py", line 349, in __wakeup
future.result()
File "D:\aruanjian\Python311\Lib\asyncio\futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
NotImplementedError: CopyRange not supported