Giter Site home page Giter Site logo

pdebench / pdebench Goto Github PK

View Code? Open in Web Editor NEW
682.0 16.0 79.0 1.16 MB

PDEBench: An Extensive Benchmark for Scientific Machine Learning

License: Other

Python 51.27% Shell 2.12% Jupyter Notebook 46.62%
ai benchmark jax machine-learning pytorch scientific scientific-computing sciml simulation deep-learning

pdebench's Introduction

PDEBench

The code repository for the NeurIPS 2022 paper PDEBench: An Extensive Benchmark for Scientific Machine Learning

πŸŽ‰ SimTech Best Paper Award 2023 🎊

PDEBench provides a diverse and comprehensive set of benchmarks for scientific machine learning, including challenging and realistic physical problems. This repository consists of the code used to generate the datasets, to upload and download the datasets from the data repository, as well as to train and evaluate different machine learning models as baselines. PDEBench features a much wider range of PDEs than existing benchmarks and includes realistic and difficult problems (both forward and inverse), larger ready-to-use datasets comprising various initial and boundary conditions, and PDE parameters. Moreover, PDEBench was created to make the source code extensible and we invite active participation from the SciML community to improve and extend the benchmark.

Visualizations of some PDE problems covered by the benchmark.

Created and maintained by Makoto Takamoto <[email protected], [email protected]>, Timothy Praditia <[email protected]>, Raphael Leiteritz, Dan MacKinlay, Francesco Alesiani, Dirk PflΓΌger, and Mathias Niepert.


Datasets and Pretrained Models

We also provide datasets and pretrained machine learning models.

PDEBench Datasets: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2986

PDEBench Pre-Trained Models: https://darus.uni-stuttgart.de/dataset.xhtml?persistentId=doi:10.18419/darus-2987

DOIs

DOI:10.18419/darus-2986 DOI:10.18419/darus-2987

Installation

Using pip

Locally:

pip install --upgrade pip wheel
pip install .

From PyPI:

pip install pdebench

To include dependencies for data generation:

pip install "pdebench[datagen310]"
pip install ".[datagen310]" # locally

or

pip install "pdebench[datagen39]"
pip install ".[datagen39]" # locally

GPU Support

For GPU support there are additional platform-specific instructions:

For PyTorch, the latest version we support is v1.13.1 see previous-versions/#linux - CUDA 11.7.

For JAX, which is approximately 6 times faster for simulations than PyTorch in our tests, see jax#pip-installation-gpu-cuda-installed-via-pip

Installation using conda:

If you like you can also install dependencies using anaconda, we suggest to use mambaforge as a distribution. Otherwise you may have to enable the conda-forge channel for the following commands.

Starting from a fresh environment:

conda create -n myenv python=3.9
conda activate myenv

Install dependencies for model training:

conda install deepxde hydra-core h5py -c conda-forge

According to your hardware availability, either install PyTorch with CUDA support:

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 cpuonly -c pytorch

Optional dependencies for data generation:

conda install clawpack jax jaxlib python-dotenv

Configuring DeepXDE

In our tests we used PyTorch as backend for DeepXDE. Please follow the documentation to enable this.

Data Generation

The data generation codes are contained in data_gen:

  • gen_diff_react.py to generate the 2D diffusion-reaction data.
  • gen_diff_sorp.py to generate the 1D diffusion-sorption data.
  • gen_radial_dam_break.py to generate the 2D shallow-water data.
  • gen_ns_incomp.py to generate the 2D incompressible inhomogenous Navier-Stokes data.
  • plot.py to plot the generated data.
  • uploader.py to upload the generated data to the data repository.
  • .env is the environment data to store Dataverse URL and API token to upload the generated data. Note that the filename should be strictly .env (i.e. remove the example from the filename)
  • configs directory contains the yaml files storing the configuration for the simulation. Arguments for the simulation are problem-specific and detailed explanation can be found in the simulation scripts.
  • src directory contains the simulation scripts for different problems: sim_diff_react-py for 2D diffusion-reaction, sim_diff_sorp.py for 1D diffusion-sorption, and swe for the shallow-water equation.

Data Generation for 1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes Equations

The data generation codes are contained in data_gen_NLE:

  • utils.py util file for data generation, mainly boundary conditions and initial conditions.

  • AdvectionEq directory with the source codes to generate 1D Advection equation training samples

  • BurgersEq directory with the source codes to generate 1D Burgers equation training samples

  • CompressibleFluid directory with the source codes to generate compressible Navier-Stokes equations training samples

  • save directory saving the generated training samples

A typical example to generate training samples (1D Advection Equation): (in data_gen/data_gen_NLE/AdvectionEq/)

python3 advection_multi_solution_Hydra.py +multi=beta1e0.yaml

which is assumed to be performed in each directory.

Examples for generating other PDEs are provided in run_trainset.sh in each PDE's directories. The config files for Hydra are stored in config directory in each PDE's directory.

Data Transformaion and Merge into HDF5 format

1D Advection/Burgers/Reaction-Diffusion/2D DarcyFlow/Compressible Navier-Stokes Equations save data as a numpy array. So, to read those data via our dataloaders, the data transformation/merge should be performed. This can be done using data_gen_NLE/Data_Merge.py whose config file is located at: data_gen/data_gen_NLE/config/config.yaml. After properly setting the parameters in the config file (type: name of PDEs, dim: number of spatial-dimension, bd: boundary condition), the corresponding HDF5 file could be obtained as:

python3 Data_Merge.py

Configuration

You can set the default values for data locations for this project by putting config vars like this in the .env file:

WORKING_DIR=~/Data/Working
ARCHIVE_DATA_DIR=~/Data/Archive

There is an example in example.env.

Data Download

The download scripts are provided in data_download. There are two options to download data.

  1. Using download_direct.py (recommended)
    • Retrieves data shards directly using URLs. Sample command for each PDE is given in the README file in the data_download directory.
  2. Using download_easydataverse.py (might be slow and you could encounter errors/issues; hence, not recommended!)
    • Use the config files from the config directory that contains the yaml files storing the configuration. Any files in the dataset matching args.filename will be downloaded into args.data_folder.

Baseline Models

In this work, we provide three different ML models to be trained and evaluated against the benchmark datasets, namely FNO, U-Net, and PINN. The codes for the baseline model implementations are contained in models:

  • train_models_forward.py is the main script to train and evaluate the model. It will call on model-specific script based on the input argument.
  • train_models_inverse.py is the main script to train and evaluate the model for inverse problems. It will call on model-specific script based on the input argument.
  • metrics.py is the script to evaluate the trained models based on various evaluation metrics described in our paper. Additionally, it also plots the prediction and target data.
  • analyse_result_forward.py is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. Additionally it also plots a bar chart to compare the results between different models.
  • analyse_result_inverse.py is the script to convert the saved pickle file from the metrics calculation script into pandas dataframe format and save it as a CSV file. This script is used for the inverse problems. Additionally it also plots a bar chart to compare the results between different models.
  • fno contains the scripts of FNO implementation. These are partly adapted from the FNO repository.
  • unet contains the scripts of U-Net implementation. These are partly adapted from the U-Net repository.
  • pinn contains the scripts of PINN implementation. These utilize the DeepXDE library.
  • inverse contains the model for inverse model based on gradient.
  • config contains the yaml files for the model training input. The default templates for different equations are provided in the args directory. User just needs to copy and paste them to the args keyword in the config.yaml file.

An example to run the forward model training can be found in run_forward_1D.sh, and an example to run the inverse model training can be found in run_inverse.sh.

Short explanations on the config args

  • model_name: string, containing the baseline model name, either 'FNO', 'Unet', or 'PINN'.
  • if_training: bool, set True for training, or False for evaluation.
  • continue_training: bool, set True to continute training from a checkpoint.
  • num_workers: int, number of workers for the PyTorch dataloader.
  • batch_size: int, training batch size.
  • initial_step: int, number of time steps used as input for FNO and U-Net.
  • t_train: int, number of the last time step used for training (for extrapolation testing, set this to be < Nt).
  • model_update: int, number of epochs to save model.
  • filename: str, has to match the dataset filename.
  • single_file: bool, set False for 2D diffusion-reaction, 1D diffusion-sorption, and the radial dam break scenarios, and set True otherwise.
  • reduced_resolution: int, factor to downsample spatial resolution.
  • reduced_resolution_t: int, factor to downsample temporal resolution.
  • reduced_batch: int, factor to downsample sample size used for training.
  • epochs: int, total epochs used for training.
  • learning_rate: float, learning rate of the optimizer.
  • scheduler_step: int, number of epochs to update the learning rate scheduler.
  • scheduler_gamma: float, decay rate of the learning rate.

U-Net specific args:

  • in_channels: int, number of input channels
  • out_channels: int, number of output channels
  • ar_mode: bool, set True for fully autoregressive or pushforward training.
  • pushforward: bool, set True for pushforward training, False otherwise (ar_mode also has to be set True).
  • unroll_step: int, number of time steps to backpropagate in the pushforward training.

FNO specific args:

  • num_channels: int, number of channels (variables).
  • modes: int, number of Fourier modes to multiply.
  • width: int, number of channels for the Fourier layer.

INVERSE specific args:

  • base_path: string, location of the data directory
  • training_type: string, type of training, autoregressive, single
  • mcmc_num_samples: int, number of generated samples
  • mcmc_warmup_steps: 10
  • mcmc_num_chains: 1
  • num_samples_max: 1000
  • in_channels_hid: 64
  • inverse_model_type: string, type of inverse inference model, ProbRasterLatent, InitialConditionInterp
  • inverse_epochs: int, number of epochs for the gradient based method
  • inverse_learning_rate: float, learning rate for the gradient based method
  • inverse_verbose_flag: bool, some printing

Plotting specific args:

  • plot: bool, set True to activate plotting.
  • channel_plot: int, determines which channel/variable to plot.
  • x_min: float, left spatial domain.
  • x_max: float, right spatial domain.
  • y_min: float, lower spatial domain.
  • y_max: float, upper spatial domain.
  • t_min: float, start of temporal domain.
  • t_max: float, end of temporal domain.

Datasets and pretrained models

We provide the benchmark datasets we used in the paper through our DaRUS data repository. The data generation configuration can be found in the paper. Additionally, the pretrained models are also available to be downloaded from PDEBench Pretrained Models DaRus repository. To use the pretrained models, users can specify the argument continue_training: True in the config file.


Directory Tour

Below is an illustration of the directory structure of PDEBench.

πŸ“‚ pdebench
|_πŸ“ models
  |_πŸ“ pinn    # Model: Physics-Informed Neural Network 
    |_πŸ“„ train.py  
    |_πŸ“„ utils.py
    |_πŸ“„ pde_definitions.py
  |_πŸ“ fno     # Model: Fourier Neural Operator
    |_πŸ“„ train.py
    |_πŸ“„ utils.py
    |_πŸ“„ fno.py
  |_πŸ“ unet    # Model: U-Net
    |_πŸ“„ train.py
    |_πŸ“„ utils.py
    |_πŸ“„ unet.py
  |_πŸ“ inverse # Model: Gradient-Based Inverse Method
    |_πŸ“„ train.py
    |_πŸ“„ utils.py
    |_πŸ“„ inverse.py
  |_πŸ“ config  # Config: All config files reside here
  |_πŸ“„ train_models_inverse.py
  |_πŸ“„ run_forward_1D.sh
  |_πŸ“„ analyse_result_inverse.py
  |_πŸ“„ train_models_forward.py
  |_πŸ“„ run_inverse.sh
  |_πŸ“„ metrics.py
  |_πŸ“„ analyse_result_forward.py
|_πŸ“ data_download  # Data: Scripts to download data from DaRUS
  |_πŸ“ config
  |_πŸ“„ download_direct.py
  |_πŸ“„ download_easydataverse.py
  |_πŸ“„ visualize_pdes.py
  |_πŸ“„ README.md
  |_πŸ“„ download_metadata.csv
|_πŸ“ data_gen   # Data: Scripts to generate data
  |_πŸ“ configs
  |_πŸ“ data_gen_NLE
  |_πŸ“ src
  |_πŸ“ notebooks
  |_πŸ“„ gen_diff_sorp.py
  |_πŸ“„ plot.py
  |_πŸ“„ example.env
  |_πŸ“„ gen_ns_incomp.py
  |_πŸ“„ gen_diff_react.py
  |_πŸ“„ uploader.py
  |_πŸ“„ gen_radial_dam_break.py
|_πŸ“„ __init__.py

Publications & Citations

Please cite the following papers if you use PDEBench datasets and/or source code in your research.

PDEBench: An Extensive Benchmark for Scientific Machine Learning - NeurIPS'2022
@inproceedings{PDEBench2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and PflΓΌger, Dirk and Niepert, Mathias},
title = {{PDEBench: An Extensive Benchmark for Scientific Machine Learning}},
year = {2022},
booktitle = {36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks},
url = {https://arxiv.org/abs/2210.07182}
}
PDEBench Datasets - NeurIPS'2022
@data{darus-2986_2022,
author = {Takamoto, Makoto and Praditia, Timothy and Leiteritz, Raphael and MacKinlay, Dan and Alesiani, Francesco and PflΓΌger, Dirk and Niepert, Mathias},
publisher = {DaRUS},
title = {{PDEBench Datasets}},
year = {2022},
doi = {10.18419/darus-2986},
url = {https://doi.org/10.18419/darus-2986}
}
Learning Neural PDE Solvers with Parameter-Guided Channel Attention - ICML'2023
@article{cape-takamoto:2023,
     author   = {Makoto Takamoto and
                 Francesco Alesiani and
                 Mathias Niepert},
 title        = {Learning Neural {PDE} Solvers with Parameter-Guided Channel Attention},
 journal      = {CoRR},
 volume       = {abs/2304.14118},
 year         = {2023},
 url          = {https://doi.org/10.48550/arXiv.2304.14118},
 doi          = {10.48550/arXiv.2304.14118},
 eprinttype    = {arXiv},
 eprint       = {2304.14118},
 }
Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations - ICLR-W'2024 & ICML'2024
@inproceedings{vcnef-vectorized-conditional-neural-fields-hagnberger:2024,
author = {Hagnberger, Jan and Kalimuthu, Marimuthu and Musekamp, Daniel and Niepert, Mathias},
title = {{Vectorized Conditional Neural Fields: A Framework for Solving Time-dependent Parametric Partial Differential Equations}},
year = {2024},
booktitle = {Proceedings of the 41st International Conference on Machine Learning (ICML 2024)}
}

Code contributors

License

MIT licensed, except where otherwise stated. See LICENSE.txt file.

pdebench's People

Contributors

arthurfeeney avatar danmackinlay avatar falesiani avatar kmario23 avatar leiterrl avatar mniepert avatar timothypraditia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdebench's Issues

Ambiguity between h5 and hdf5 file extensions

Hello,

I undertook the following steps:

  1. Download the diff_sorp data, using the command
    python3 download_direct.py --root_folder ~/Documents/sandbox --pde_name diff_sorp
    which made a file called 1D_diffs-sorp_NA_NA.h5.
  2. The root_folder above is clearly wrong. I manually moved the downloaded data into pdebench/data.
  3. I then ran CUDA_VISIBLE_DEVICES='0' python3 train_models_forward.py +args=config_Adv.yaml ++args.filename='1D_diff-sorp_NA_NA.h5' ++args.model_name='FNO'. This code returned an AssertionError, claiming that HDF5 data was assumed. I found what caused this assertion in a utils.py file I had linked below.
  4. I then tried changing the h5 extension into hdf5. I also changed h5 into hdf5 in the above terminal command. This gave a very elaborate error message attached at the bottom.

I wanted to first clarify an ambiguity. The file I downloaded had the extension as "h5". Yet the FNO model and example code in run_forward_1D.sh appear to assume the extension is "hdf5". Not sure what the intended approach was here?
Secondly, I was wondering if you knew what could be causing the above error.

The code I've been using/reading came from:
https://github.com/pdebench/PDEBench/tree/main/pdebench/data_download
https://github.com/pdebench/PDEBench/blob/main/pdebench/models/run_forward_1D.sh
https://github.com/pdebench/PDEBench/blob/main/pdebench/models/fno/utils.py

(OS: Ubuntu 22.04)

Using backend: tensorflow.compat.v1

WARNING:tensorflow:From /home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/deepxde/nn/initializers.py:118: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.

WARNING:tensorflow:From /home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/deepxde/nn/initializers.py:118: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.

/home/ton070/Documents/PDEBench/pdebench/models/train_models_forward.py:164: UserWarning: 
The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="config", config_name="config")
/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
FNO
Epochs = 500, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
FNODatasetSingle
Error executing job with overrides: ['+args=config_Adv.yaml', '++args.filename=1D_diff-sorp_NA_NA.hdf5', '++args.model_name=FNO']
An error occurred during Hydra's exception formatting:
AssertionError()
Traceback (most recent call last):
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 254, in run_and_report
    assert mdl is not None
AssertionError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ton070/Documents/PDEBench/pdebench/models/train_models_forward.py", line 249, in <module>
    main()
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/main.py", line 90, in decorated_main
    _run_hydra(
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 296, in run_and_report
    raise ex
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/ton070/Documents/PDEBench/pdebench/models/train_models_forward.py", line 168, in main
    run_training_FNO(
  File "/home/ton070/Documents/PDEBench/pdebench/models/fno/train.py", line 67, in run_training
    train_data = FNODatasetSingle(flnm,
  File "/home/ton070/Documents/PDEBench/pdebench/models/fno/utils.py", line 190, in __init__
    _data = np.array(f['density'], dtype=np.float32)  # batch, time, x,...
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/ton070/miniconda3/envs/pdebench/lib/python3.9/site-packages/h5py/_hl/group.py", line 328, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'density' doesn't exist)"

Release ?

Hello,

Thank you for this package. Are you planning to release a tarball, or even a PyPi release anytime soon? I would like to write a Spack recipe and this is required.

Thank you!

Downloading for compressible NS equation

Hi! Thanks for a good work here! I have a question here.

As I check on compressible NS equation of your dataset, I found that there are some trajectory like:
2024-04-25 175610
I don't know if this is the correct trajectory. I download this dataset via wget and manually change the file name to 2D_CFD_Rand_M0.1_Eta0.1_Zeta0.1_periodic_128_Train.hdf5. Some trajectory is 'normal' like:
屏幕ζˆͺε›Ύ 2024-04-25 190710
I don't know if using wget is the reason that causing this.
What's more, I try to check the continuity equation $\frac{\partial \rho}{\partial t} + \nabla(\rho \cdot \textbf{v}) = 0$, but it always fails as the formula $\frac{\partial \rho}{\partial t} + \nabla(\rho \cdot \textbf{v})$ equal to a non-zero value approximately the same scale as $\rho$ or $\textbf{v}$. I calculate those derivatives by using central difference method in temporal domain and spectral method in spatial domain. So I think maybe something wrong with this dataset I downloaded.

I don't know if you got some ideas here. Looking forward to your reply!

Configuration file for FNO on SWE equation

Dear authors,
Thanks for the nice benchmark as well as the data!
I have a question about using FNO for shallow-water equation 2D (SWE). It seems there is only a configuration file for PINNs on this PDE.
I want to know in which file is the configuration file or hyperparameters of FNO on SWE 2D?

best regards

Question on the NS Equation

Thank you for your work on building this benchmark.

When I checked the definition on both of your paper and the code, I see that for the compressible Navier-Stocks equation, the viscous stress tensor $\sigma'$, shear viscosity $\eta$ and bulk viscosity $\zeta$ shown in the paper are not included in the pdes in the models/pinn/pde_definitions.py.
Is that a mistake or just a normal settings?

Thank you!

Issue on the computation of numerical errors for 2D-Darcy

Hi @leiterrl and @kmario23,

Consider the darcy flow $$\nabla \cdot (a \nabla u) = 1.$$
When computing the RMSE in /model/metric.py, the program will minus [a, u_pred] with [a, u_ref], rather than u_pred - u_ref. It make no difference for MSE but I guess it may underestimate the (relative) L2 norm error (i.e. RMSE and nRMSE in the paper), because it will double the N on the denominator.

I add some temporary code to illustrate the point on Line 177, you can see the pred is doubled on the last channel-dim and the first channel of pred - target all equal zeros. See
https://github.com/liu-ziyuan-math/PDEBench-testl2/blob/main/pdebench/models/metrics.py

I'm not sure if it occur only in the 2D-Darcy problem but I presume it should be so since the 2D-darcy is the only time independent problem in the datasets.

Config files for incompressible Navier-Stokes equations

Howdy!

Thanks for the great work you've done.

I have trouble in model training now, and I'm seeking for your help.

I wanna use FNO model to train and evaluate on the dataset ns_sim_2d-i.h5.

However, I thought the command to run the forward model training can be found in models/run_forward_1D.sh, but there's not.

Neither did I find any config_ns_incomp.yaml file for the model training input in models/config/args/. I saw config_diff-react.yaml, config_diff-sorp.yaml and config_rdb.yaml, though.

Your reply will be appreciated.

Description of the datasets

Hi everyone, I came across your paper in the neurips dataset section and I was curious to check out your datasets, which could help me a lot formalize what angle I should take with my research.

But I was not sure which one to look at first, as there are many of them. Do you happen to have a description of the different datasets within a single family ? For instance, with incompressible Xavier stokes, what is the difference between the files ns_incom_inhom_2d_512-0.h5and ns_incom_inhom_2d_512-100.h5 ?

Thanks for your help !

OSError When Generating Datasets

i met this problem when i trying to run files like

pdebench/data_gen/gen_xxx_xxx.py, it seems to be a problem of multithread? or just problem about file path?

Traceback (most recent call last):
  File "xxxx/pdebench/data_gen/gen_diff_sorp.py", line 101, in main
    pool.starmap(simulator, zip(repeat(config), seed))
  File xxxx/multiprocessing/pool.py", line 372, in starmap
    return self._map_async(func, iterable, starmapstar, chunksize).get()
  File "xxxx/multiprocessing/pool.py", line 771, in get
    raise self._value
OSError: Unable to open file (message not aligned)

thanks and i need your help
o(β•₯﹏β•₯)o

version matching | old issue-2023-09-24

Sorry for this late post, I just got used to github

I found that in the "readme.md" of PDEbench, there's a section called "Configuring DeepXDE". However, according to DeepXDE, the required PyTorch version is 1.9.0 or later(Fig.1). But the latest version that pdebench supports is v1.13.0 (Fig.2). I wonder if there is something worth discussion.

image
image

Downloading the data

Dear authors, I am getting an error when trying to download the data:

The version_base parameter is not specified.
Please specify a compatability version level, or None.
Will assume defaults for version 1.1
  @hydra.main(config_path="config/", config_name="config")
/.../PDEBench/venv/lib/python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  ret = run_job(
Error executing job with overrides: []
Traceback (most recent call last):
  File "/.../PDEBench/pdebench/data_download/download.py", line 42, in main
    api = NativeApi(os.getenv("DATAVERSE_URL"), os.getenv("DATAVERSE_API_TOKEN"))
  File "/.../PDEBench/venv/lib/python3.9/site-packages/pyDataverse/api.py", line 639, in __init__
    super().__init__(base_url, api_token, api_version)
  File "/.../PDEBench/venv/lib/python3.9/site-packages/pyDataverse/api.py", line 61, in __init__
    raise ApiUrlError("base_url {0} is not a string.".format(base_url))
pyDataverse.exceptions.ApiUrlError: base_url None is not a string.

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I left the configuration as it is provided in the repo.

Question about training and evaluation time steps

Hi, thanks for sharing this great benchmark to facilitate the research! I have some questions when trying out your code.

Training steps

The last used time steps seem inconsistent in the "autoregressive" mode and the "single" mode. When training in the "autoregressive" mode, samples from initial_step to t_train-1 are used (line 206 and 212):

if training_type in ['autoregressive']:
# Autoregressive loop
for t in range(initial_step, t_train):
# Reshape input tensor into [b, x1, ..., xd, t_init*v]
inp = xx.reshape(inp_shape)
# Extract target at current time step
y = yy[..., t:t+1, :]

Note that the time step t_train is excluded in range(initial_step, t_train), which I think should be correct. However, when training in the "single" mode, the prediction target is the ground truth sample at step t_train.
if training_type in ['single']:
x = xx[..., 0 , :]
y = yy[..., t_train:t_train+1 , :]

I suppose the final time steps should be the same in these two modes, should it be yy[..., t_train-1:t_train, :] in the "single" mode?

Shallow water dataset πŸ›

Another question is about the sallow water dataset. In the paper, the total time step is 100, however, in the downloaded dataset there are 101 time steps. This will cause error in the FNO validation code where the prediction has 101 steps (line 269) while the label has 100 steps (line 282):

for t in range(initial_step, yy.shape[-2]):
inp = xx.reshape(inp_shape)
y = yy[..., t:t+1, :]

_yy = yy[..., :t_train, :]
val_l2_full += loss_fn(pred.reshape(_batch, -1), _yy.reshape(_batch, -1)).item()

Should we use range(initial_step, t_train) in line 269?

Initial time steps in validation

Finally, when using "autoregressive" in validation, the initial time steps are also used to compute the error (line 283). To my understanding the initial steps should be the same as the ground truth and will have no error. Do we need to exclude these initial steps when computing the error?

Thank you!

Code error

Hello!
This work of yours has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution.
I was recently studying your project code and I found that in the Inverse/train.py file, FNO3d is incorrectly written as FNO2d when initializing the model.
When I am using analyse_result_forward.py, I cannot find the *pickle file, where can I go to find it.
By the way, what files and commands are needed to run the code for the Inverse problem?

CUDA error: initialization error

Tried running python train_models_forward.py and I ran into the following error:

Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/asebasti/PDEBench/pdebench/models/train_models_forward.py", line 250, in <module>
    main()
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/main.py", line 90, in decorated_main
    _run_hydra(
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/utils.py", line 389, in _run_hydra
    _run_app(
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/utils.py", line 452, in _run_app
    run_and_report(
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/utils.py", line 216, in run_and_report
    raise ex
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/utils.py", line 213, in run_and_report
    return func()
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/utils.py", line 453, in <lambda>
    lambda: hydra.run(
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/asebasti/PDEBench/pdebench/models/train_models_forward.py", line 167, in main
    run_training_FNO(
  File "/home/asebasti/PDEBench/pdebench/models/fno/train.py", line 111, in run_training
    _, _data, _ = next(iter(val_loader))
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/asebasti/environments/conda_envs/pde_env_2/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/asebasti/PDEBench/pdebench/models/fno/utils.py", line 361, in __getitem__
    return self.data[idx,...,:self.initial_step,:], self.data[idx], self.grid
RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Ran this on a machine with a Tesla K80 (CUDA version: 11.4) and another machine with an A100(CUDA version:11.6), and ran into the same error on both of them.

#8 Seems to have run into the same issue as well. I tried adding the generator keyword argument to the dataloaders as suggested in the thread but still run into the same error message.

Dependency conflict when installing the required packages

Trying to install the requirements (even in a new venv as recommended in the README) results in a dependency conflict:

INFO: pip is looking at multiple versions of pandas to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install -r requirements.txt (line 10), -r requirements.txt (line 12), -r requirements.txt (line 13), -r requirements.txt (line 15), -r requirements.txt (line 16), -r requirements.txt (line 18), -r requirements.txt (line 2), -r requirements.txt (line 3), -r requirements.txt (line 4), -r requirements.txt (line 5) and numpy~=1.22.3 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy~=1.22.3
    scipy 1.8.0 depends on numpy<1.25.0 and >=1.17.3
    matplotlib 3.5.2 depends on numpy>=1.17
    h5py 3.6.0 depends on numpy>=1.14.5
    pandas 1.4.2 depends on numpy>=1.18.5; platform_machine != "aarch64" and platform_machine != "arm64" and python_version < "3.10"
    pytorch-lightning 1.6.3 depends on numpy>=1.17.2
    clawpack 5.6.1 depends on numpy>=1.14
    deepxde 1.1.3 depends on numpy
    jax 0.3.13 depends on numpy>=1.19
    jaxlib 0.3.10 depends on numpy>=1.19
    phiflow 2.0.3 depends on numpy==1.19.5

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

1D AdvectionEq Dataset dimensions

Hello! This work of yours about PDEBench has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution.
I'm recently studying your project code.

I went to the /data_gen_NLE/AdvectionEq folder to generate the 1D AdvectionEq Dataset, and use Data_Merge.py to convert data format to hdf5.

CUDA_VISIBLE_DEVICES='2,3' python3 advection_multi_solution_Hydra.py +multi=beta1e0.yaml

The config file to generate 11D_Advection_Sols_beta1.0.hdf5 files is:

save: '/save/advection/'
dt_save: 0.01
ini_time: 0.
fin_time: 2.
nx: 1024
xL: 0.
xR: 1.
beta : 1.e0
if_show: 1
numbers: 10
CFL: 4.e-1
if_second_order: 1.
show_steps: 100
init_key: 2022
if_rand_param: False

Using the generated data for testing,

CUDA_VISIBLE_DEVICES='3' python3 train_models_forward.py +args=config_Adv.yaml ++args.filename='1D_Advection_Sols_beta1.0.hdf5' ++args.model_name='FNO'

a problem occurred.

I checked the dimensions of the data and it is [1,10,201,1024], so models/fno/utils.py determines its shape is 4, which means it's a 2D datasets.
image

Then, comes the problem.

FNO
Epochs = 5, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
FNODatasetSingle
Error executing job with overrides: ['+args=config_Adv.yaml', '++args.filename=1D_Advection_Sols_beta2.0.hdf5', '++args.model_name=FNO']
Traceback (most recent call last):
  File "/home/wm/PDEBench/pdebench/models/train_models_forward.py", line 253, in <module>
    main()
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 457, in _run_app
    run_and_report(
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/utils.py", line 458, in <lambda>
    lambda: hydra.run(
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/_internal/hydra.py", line 132, in run
    _ = ret.return_value
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
  File "/home/wm/PDEBench/pdebench/models/train_models_forward.py", line 166, in main
    run_training_FNO(
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/fno/train.py", line 65, in run_training
    train_data = FNODatasetSingle(flnm,
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/pdebench/models/fno/utils.py", line 326, in __init__
    _data = np.array(f['nu'], dtype=np.float32)  # batch, time, x,...
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/wm/miniconda3/envs/pdebench/lib/python3.9/site-packages/h5py/_hl/group.py", line 357, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 189, in h5py.h5o.open
KeyError: "Unable to synchronously open object (object 'nu' doesn't exist)"

Yet, I used the dataset data/1D/Advection/Train/1D_Advection_Sols_beta7.0.hdf5 downloaded from the /pdebench/data_download/ directory for testing. It's shape is [10000,201,1024]. The problem disappeared.

So, which parameter in the configuration file determines the dimensions of the dataset?

Getting error while downloading dataset

Hi, I tried downloading the Darcy flow dataset using the following command.
python download_direct.py --root_folder $proj_home/data --pde_name darcy

Getting the following error:
RuntimeError: File not found or corrupted.

Unable to synchronously open object (object 'nu' doesn't exist)

Hello!
This work of yours has been a strong support to drive innovation in machine learning simulation and I thank you for your contribution. I was recently studying your project code.

I went to the /data_gen_NLE/ReactionDiffusionEq/ folder to generate the Reaction Diffusion dataset and went to the /pdebench/models/ folder to run run_forward_1D.sh to train the network. The command to run is:

CUDA_VISIBLE_DEVICES='0' python3 train_models_forward.py +args=config_ReacDiff.yaml ++args.filename='ReacDiff_Nu1.0_Rho1.0.hdf5' ++args.model_name='FNO'

Then, I encountered this bug.
image

Similarly, I went to the /data_gen_NLE/BurgersEq/ folder to generate the burgers dataset and then trained the network with the command,

 CUDA_VISIBLE_DEVICES='2,3' python3 train_models_forward.py +args=config_Bgs.yaml ++args.filename='1D_Burgers_Sols_Nu1.0.hdf5' ++args.model_name='FNO'

and encountered a similar bug.
image

But, I used the dataset downloaded from the /pdebench/data_download/ directory for testing and the program was able to run successfully.

I wonder if it is a problem with the HDF5 file. I use the HDFView to check the Data format.

image
I found that the t-axis coordinate has 202 points(form 0 to 2.01) and the x-axis has 1024 points(form 0 to 1), but the tensor is a 2*5000 data format.

The config file to generate 1D_Burgers_Sols_Nu1.0.hdf5 files is
image

2D Incompressible Navier-Stocks Data Dimension Description

Hi, I downloaded the "ns_incom_inhom_2d_512-0.h5" from the link and tried using the data in my experiments. The data consists of four items, whose names and dimensions are:

Key=force, shape=(4, 512, 512, 2)
Key=particles, shape=(4, 1000, 512, 512, 1)
Key=t, shape=(4, 1000)
Key=velocity, shape=(4, 1000, 512, 512, 2)

I have several questions about them:

  1. what's the meaning of the first dimension of all data? E.g. t.shape=(4, 1000), I guess the '1000' means 1000 time steps, but the '4' is unclear.
  2. what do the 'velocity' and 'particles' mean? I guess the 'velocity' means the average velocity of the fluid in the grid, and the 'particles' means the average number of particles in the grid. Is it correct?

Thank you!

Update sim_ns_incomp_2d.py and corresponding config file

I've encountered 3 issues trying to run gen_ns_incomp.py.

  1. There are incorrect keyword variables in ns_incomp.yaml: force_smoothness, force_scale, domain_size and enable_gravity. When I run python gen_ns_incomp.py it says those keyword arguments are not recognizable. Also, with scale=0.4, the particles field seem to be all nan, which leads to Phiflow error saying that the solver cannot converge. I tried setting it to 1 and that seemed to a least generate some output.
  2. Phiflow 2.2 has changed how a Box is defined. I got an error on line 93 of sim_ns_incomp_2d.py with the newest version of PhiFlow. I got around this issue by changing it to return Box['x,y', :, 0:y]
  3. The image generation seems broken. When I set save_images=true and save_gif=true in ns_incomp.yaml, I got an error from line 186 of sim_ns_incomp_2d.py which is particles_images.append() which appends nothing to the particles_image. Similar thing happens in line 187. I'm not sure how to get around this issue yet.

Config files / running the example code

Hi!
Great work, it's pretty great to have dataloaders set up for all these different PDE examples.

I've been struggling with running the example code, e.g. on Advection data, which is the default one which the data_download file gets. (Btw, the downloader doesn't respect the config file data paths, which is a bit confusing for a new user.)

I started by trying

CUDA_VISIBLE_DEVICES='2' python train_models_forward.py +args=config.yaml

and got the following error : In 'config': Could not find 'args/config.yaml'
My understanding is that I had to change config.yaml with config_pinn_pde1d.yaml, but I'm not entirely sure.

Once I did so, and changed the filename and root_path in my config, I got an error "local variable 'timedomain' referenced before assignment", which seems due to how the filename is parsed.

geomtime = dde.geometry.GeometryXTime(geom, timedomain)
It would be nice to add a raise ValueError around this line ; and perhaps to be more explicit in what filenames are allowed or not.

After fixing this, I got a shape error

File "/Users/gnegiar/PDEBench/pdebench/models/pinn/utils.py", line 355, in __init__
    self.bd_data_L = self.data_output[0, :, None]
IndexError: too many indices for tensor of dimension 1

self.bd_data_L = self.data_output[0, :, None]
, which I'm not sure how to fix. It seems that due to the indexing using val_batch_idx, the tensor self.data_output becomes 1d. I'm surprised here that the tensor isn't 3D to start with (I checked, and h5_file['tensor'] is shape (201, 1024).

Perhaps I downloaded the wrong files? I'm using the data1D/Advection/Test/Advection_beta0.1.h5 file for now, thinking it would be the simplest.

Here is the config that I used for downloading (I didn't modify it) :

args:
    filename: 'Advection_beta'
    dataverse_url: 'https://darus.uni-stuttgart.de'
    dataset_id: 'doi:10.18419/darus-2986'
    data_folder: 'data'

Any help would be greatly welcome!!

Cannot call pretrained PINN models

Hi,

First, thank you for the great work you have done.

I am trying to use your pretrained models to get the exact same results as from your paper but cannot figure out how to do so. I am able to train my own PINNs, but not to use the pretrained PINNs.

For example, I am trying to run 1D_Advection_Sols_beta4.0_PINN.pt-15000.pt (available here once downloaded and unzipped). This pretrained model should give the numbers as in the last column for beta=4.0 of page 13 of the Appendix. But I cannot figure out how to run this pretrained model.

In contrast, for example, when using a pretrained Unet, I can do so by setting continue_training = True in the config.yaml file. But I don't see any such functionality for PINN.

Your help would be very much appreciated.

All the best,
Ognjen Stefanovic

Problem from reproducing the results of UNet for 2D Darcy

Dear @kmario23 and @leiterrl,
Good work!

I've been working on the dataset recently and reproducing the results of UNet for 2D Darcy by directly loading the pre-trained models but found the results differ from the paper. The train.py returns the following results:

beta100.0

normalized RMSE: 0.39892
RMSE of conserved variables: 4.05248
Maximum value of rms error: 21.00043
RMSE at boundaries: 0.66137
RMSE in Fourier space: [2.2808344  0.18967566 0.01259436]

beta10.0

normalized RMSE: 0.38663
RMSE of conserved variables: 0.40114
Maximum value of rms error: 1.99606
RMSE at boundaries: 0.07022
RMSE in Fourier space: [0.21533394 0.01957521 0.00129131]

beta1.0

normalized RMSE: 0.91649
RMSE of conserved variables: 0.14371
Maximum value of rms error: 0.44757
RMSE at boundaries: 0.01598
RMSE in Fourier space: [0.06181364 0.00431857 0.00024032]

beta0.1

normalized RMSE: 1.86851
RMSE of conserved variables: 0.04009
Maximum value of rms error: 0.22422
RMSE at boundaries: 0.00231
RMSE in Fourier space: [1.9320356e-02 1.0900073e-03 4.7218618e-05]

beta0.01

normalized RMSE: 12.34850
RMSE of conserved variables: 0.04722
Maximum value of rms error: 0.22447
RMSE at boundaries: 0.00533
RMSE in Fourier space: [2.1808032e-02 1.0689541e-03 7.7311743e-05]

I downloaded the model and data in the project path /model and /data and stored the code in a sub-sub-folder `/others/Unet/'. The information maybe helpful that because my environment is now fragile, I directly change the default parameter of Unet/train.run_training as following FYI but I suspect that this change will hardly make a difference:

# -*- coding: utf-8 -*-

import sys
import torch
import numpy as np
import pickle
import torch.nn as nn
import torch.nn.functional as F

import operator
from functools import reduce
from functools import partial

from timeit import default_timer

# torch.manual_seed(0)
# np.random.seed(0)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

from Unet import UNet1d, UNet2d, UNet3d
from utils import UNetDatasetSingle, UNetDatasetMult
from metrics import metrics

data_PATH = '../data/2D/DarcyFlow/2D_DarcyFlow_beta0.01_Train.hdf5'
data_PATH = '../data/2D/DarcyFlow/2D_DarcyFlow_beta0.1_Train.hdf5'
data_PATH = '../data/2D/DarcyFlow/2D_DarcyFlow_beta1.0_Train.hdf5'
data_PATH = '../data/2D/DarcyFlow/2D_DarcyFlow_beta10.0_Train.hdf5'
data_PATH = '../data/2D/DarcyFlow/2D_DarcyFlow_beta100.0_Train.hdf5'

def run_training(if_training=False,
                 continue_training=False,
                 num_workers=2,
                 initial_step=1,
                 t_train=1,
                 in_channels=1,
                 out_channels=1,
                 batch_size=50,
                 unroll_step=1,
                 ar_mode=False,
                 pushforward=False,
                 epochs=500,
                 learning_rate=1.e-3,
                 scheduler_step=100,
                 scheduler_gamma=0.5,
                 model_update=2,
                 flnm=data_PATH,
                 single_file=True,
                 reduced_resolution=1,
                 reduced_resolution_t=1,
                 reduced_batch=1,
                 plot=False,
                 channel_plot=0,
                 x_min=-1,
                 x_max=1,
                 y_min=-1,
                 y_max=1,
                 t_min=0,
                 t_max=5,
                 base_path='../../data/',
                 training_type='single'
                 ):
    print(
        f'Epochs = {epochs}, learning rate = {learning_rate}, scheduler step = {scheduler_step}, scheduler gamma = {scheduler_gamma}')

    ################################################################
    # load data
    ################################################################

    if single_file:
        # filename
        model_name = flnm[:-5] + '_Unet'

        # Initialize the dataset and dataloader
        train_data = UNetDatasetSingle(flnm,
                                       saved_folder=base_path,
                                       reduced_resolution=reduced_resolution,
                                       reduced_resolution_t=reduced_resolution_t,
                                       reduced_batch=reduced_batch,
                                       initial_step=initial_step)
        val_data = UNetDatasetSingle(flnm,
                                     saved_folder=base_path,
                                     reduced_resolution=reduced_resolution,
                                     reduced_resolution_t=reduced_resolution_t,
                                     reduced_batch=reduced_batch,
                                     initial_step=initial_step,
                                     if_test=True)

    else:
        # filename
        model_name = flnm + '_Unet'

        train_data = UNetDatasetMult(flnm,
                                     reduced_resolution=reduced_resolution,
                                     reduced_resolution_t=reduced_resolution_t,
                                     reduced_batch=reduced_batch,
                                     saved_folder=base_path)
        val_data = UNetDatasetMult(flnm,
                                   reduced_resolution=reduced_resolution,
                                   reduced_resolution_t=reduced_resolution_t,
                                   reduced_batch=reduced_batch,
                                   if_test=True,
                                   saved_folder=base_path)

    train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
                                               num_workers=num_workers, shuffle=True)
    val_loader = torch.utils.data.DataLoader(val_data, batch_size=batch_size,
                                             num_workers=num_workers, shuffle=False)

    ################################################################
    # training and evaluation
    ################################################################

    # model = UNet2d(in_channels, out_channels).to(device)
    _, _data = next(iter(val_loader))
    dimensions = len(_data.shape)
    print('Spatial Dimension', dimensions - 3)
    if training_type in ['autoregressive']:
        if dimensions == 4:
            model = UNet1d(in_channels * initial_step, out_channels).to(device)
        elif dimensions == 5:
            model = UNet2d(in_channels * initial_step, out_channels).to(device)
        elif dimensions == 6:
            model = UNet3d(in_channels * initial_step, out_channels).to(device)
    if training_type in ['single']:
        if dimensions == 4:
            model = UNet1d(in_channels, out_channels).to(device)
        elif dimensions == 5:
            model = UNet2d(in_channels, out_channels).to(device)
        elif dimensions == 6:
            model = UNet3d(in_channels, out_channels).to(device)

    # Set maximum time step of the data to train
    if t_train > _data.shape[-2]:
        t_train = _data.shape[-2]
    # Set maximum of unrolled time step for the pushforward trick
    if t_train - unroll_step < 1:
        unroll_step = t_train - 1

    if training_type in ['autoregressive']:
        if ar_mode:
            if pushforward:
                model_name = model_name + '-PF-' + str(unroll_step)
            if not pushforward:
                unroll_step = _data.shape[-2]
                model_name = model_name + '-AR'
        else:
            model_name = model_name + '-1-step'

    print(model_name)
    model_path = '../../model/' + model_name[21:] + "_PF_1.pt"
    print(model_name)

    total_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    print(f'Total parameters = {total_params}')

    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=1e-4)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=scheduler_step, gamma=scheduler_gamma)

    loss_fn = nn.MSELoss(reduction="mean")
    loss_val_min = np.infty

    start_epoch = 0

    if not if_training:
        checkpoint = torch.load(model_path, map_location=device)
        model.load_state_dict(checkpoint['model_state_dict'])
        model.to(device)
        model.eval()
        Lx, Ly, Lz = 1., 1., 1.
        errs = metrics(val_loader, model, Lx, Ly, Lz, plot, channel_plot,
                       model_name, x_min, x_max, y_min, y_max,
                       t_min, t_max, mode='Unet', initial_step=initial_step)
        # pickle.dump(errs, open(model_name + '.pickle', "wb"))

        return

    # If desired, restore the network by loading the weights saved in the .pt
    # file
    if continue_training:
        print('Restoring model (that is the network\'s weights) from file...')
        checkpoint = torch.load(model_path, map_location=device)
        model.load_state_dict(checkpoint['model_state_dict'])
        model.to(device)
        model.train()

        # Load optimizer state dict
        optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
        for state in optimizer.state.values():
            for k, v in state.items():
                if isinstance(v, torch.Tensor):
                    state[k] = v.to(device)

        start_epoch = checkpoint['epoch']
        loss_val_min = checkpoint['loss']

    print('start training...')

    if ar_mode:

        for ep in range(start_epoch, epochs):
            model.train()
            t1 = default_timer()
            train_l2_step = 0
            train_l2_full = 0

            for xx, yy in train_loader:
                loss = 0

                # xx: input tensor (first few time steps) [b, x1, ..., xd, t_init, v]
                # yy: target tensor [b, x1, ..., xd, t, v]
                # grid: meshgrid [b, x1, ..., xd, dims]
                xx = xx.to(device)
                yy = yy.to(device)

                if training_type in ['autoregressive']:

                    # Initialize the prediction tensor
                    pred = yy[..., :initial_step, :]

                    # Extract shape of the input tensor for reshaping (i.e. stacking the
                    # time and channels dimension together)
                    inp_shape = list(xx.shape)
                    inp_shape = inp_shape[:-2]
                    inp_shape.append(-1)

                    # Autoregressive loop
                    for t in range(initial_step, t_train):

                        if t < t_train - unroll_step:
                            with torch.no_grad():
                                # Reshape input tensor into [b, x1, ..., xd, t_init*v]
                                inp = xx.reshape(inp_shape)
                                temp_shape = [0, -1]
                                temp_shape.extend([i for i in range(1, len(inp.shape) - 1)])
                                inp = inp.permute(temp_shape)

                                # Extract target at current time step
                                y = yy[..., t:t + 1, :]

                                # Model run
                                temp_shape = [0]
                                temp_shape.extend([i for i in range(2, len(inp.shape))])
                                temp_shape.append(1)
                                im = model(inp).permute(temp_shape).unsqueeze(-2)

                                # Concatenate the prediction at current time step into the
                                # prediction tensor
                                pred = torch.cat((pred, im), -2)

                                # Concatenate the prediction at the current time step to be used
                                # as input for the next time step
                                xx = torch.cat((xx[..., 1:, :], im), dim=-2)

                        else:
                            # Reshape input tensor into [b, x1, ..., xd, t_init*v]
                            inp = xx.reshape(inp_shape)
                            temp_shape = [0, -1]
                            temp_shape.extend([i for i in range(1, len(inp.shape) - 1)])
                            inp = inp.permute(temp_shape)

                            # Extract target at current time step
                            y = yy[..., t:t + 1, :]

                            # Model run
                            temp_shape = [0]
                            temp_shape.extend([i for i in range(2, len(inp.shape))])
                            temp_shape.append(1)
                            im = model(inp).permute(temp_shape).unsqueeze(-2)

                            # Loss calculation
                            loss += loss_fn(im.reshape(batch_size, -1), y.reshape(batch_size, -1))

                            # Concatenate the prediction at current time step into the
                            # prediction tensor
                            pred = torch.cat((pred, im), -2)

                            # Concatenate the prediction at the current time step to be used
                            # as input for the next time step
                            xx = torch.cat((xx[..., 1:, :], im), dim=-2)

                    train_l2_step += loss.item()
                    _batch = yy.size(0)
                    _yy = yy[..., :t_train, :]
                    l2_full = loss_fn(pred.reshape(_batch, -1), _yy.reshape(_batch, -1))
                    train_l2_full += l2_full.item()

                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()

            if training_type in ['single']:
                x = xx[..., 0, :]
                y = yy[..., t_train - 1:t_train, :]
                pred = model(x.permute([0, 2, 1])).permute([0, 2, 1])
                _batch = yy.size(0)
                loss += loss_fn(pred.reshape(_batch, -1), y.reshape(_batch, -1))

                train_l2_step += loss.item()
                train_l2_full += loss.item()

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            if ep % model_update == 0:
                val_l2_step = 0
                val_l2_full = 0
                with torch.no_grad():
                    for xx, yy in val_loader:
                        loss = 0
                        xx = xx.to(device)
                        yy = yy.to(device)

                        if training_type in ['autoregressive']:
                            pred = yy[..., :initial_step, :]
                            inp_shape = list(xx.shape)
                            inp_shape = inp_shape[:-2]
                            inp_shape.append(-1)

                            for t in range(initial_step, t_train):
                                inp = xx.reshape(inp_shape)
                                temp_shape = [0, -1]
                                temp_shape.extend([i for i in range(1, len(inp.shape) - 1)])
                                inp = inp.permute(temp_shape)
                                y = yy[..., t:t + 1, :]
                                temp_shape = [0]
                                temp_shape.extend([i for i in range(2, len(inp.shape))])
                                temp_shape.append(1)
                                im = model(inp).permute(temp_shape).unsqueeze(-2)
                                loss += loss_fn(im.reshape(batch_size, -1), y.reshape(batch_size, -1))

                                pred = torch.cat((pred, im), -2)

                                xx = torch.cat((xx[..., 1:, :], im), dim=-2)

                            val_l2_step += loss.item()
                            _batch = yy.size(0)
                            _pred = pred[..., initial_step:t_train, :]
                            _yy = yy[..., initial_step:t_train, :]
                            val_l2_full += loss_fn(_pred.reshape(_batch, -1), _yy.reshape(_batch, -1)).item()

                    if training_type in ['single']:
                        x = xx[..., 0, :]
                        y = yy[..., t_train - 1:t_train, :]
                        pred = model(x.permute([0, 2, 1])).permute([0, 2, 1])
                        _batch = yy.size(0)
                        loss += loss_fn(pred.reshape(_batch, -1), y.reshape(_batch, -1))

                        val_l2_step += loss.item()
                        val_l2_full += loss.item()

                    # if val_l2_full < loss_val_min:
                    #     loss_val_min = val_l2_full
                    #     torch.save({
                    #         'epoch': ep,
                    #         'model_state_dict': model.state_dict(),
                    #         'optimizer_state_dict': optimizer.state_dict(),
                    #         'loss': loss_val_min
                    #     }, model_path)

            t2 = default_timer()
            scheduler.step()
            print('epoch: {0}, loss: {1:.5f}, t2-t1: {2:.5f}, trainL2: {3:.5f}, testL2: {4:.5f}' \
                  .format(ep, loss.item(), t2 - t1, train_l2_step, val_l2_step))

    else:
        for ep in range(start_epoch, epochs):
            model.train()
            t1 = default_timer()
            train_l2_step = 0
            train_l2_full = 0

            for xx, yy in train_loader:
                loss = 0

                # xx: input tensor (first few time steps) [b, x1, ..., xd, t_init, v]
                # yy: target tensor [b, x1, ..., xd, t, v]
                xx = xx.to(device)
                yy = yy.to(device)

                # Initialize the prediction tensor
                pred = yy[..., :initial_step, :]

                # Extract shape of the input tensor for reshaping (i.e. stacking the
                # time and channels dimension together)
                inp_shape = list(xx.shape)
                inp_shape = inp_shape[:-2]
                inp_shape.append(-1)

                # Autoregressive loop
                for t in range(initial_step, t_train):
                    # Reshape input tensor into [b, x1, ..., xd, t_init*v]
                    inp = yy[..., t - initial_step:t, :].reshape(inp_shape)
                    temp_shape = [0, -1]
                    temp_shape.extend([i for i in range(1, len(inp.shape) - 1)])
                    inp = inp.permute(temp_shape)
                    inp = torch.normal(inp, 0.001)

                    # Extract target at current time step
                    y = yy[..., t:t + 1, :]

                    # Model run
                    temp_shape = [0]
                    temp_shape.extend([i for i in range(2, len(inp.shape))])
                    temp_shape.append(1)
                    im = model(inp).permute(temp_shape).unsqueeze(-2)

                    # Loss calculation
                    loss += loss_fn(im.reshape(batch_size, -1), y.reshape(batch_size, -1))

                    # Concatenate the prediction at current time step into the
                    # prediction tensor
                    pred = torch.cat((pred, im), -2)

                    # Concatenate the prediction at the current time step to be used
                    # as input for the next time step
                    # xx = torch.cat((xx[..., 1:, :], im), dim=-2)

                train_l2_step += loss.item()
                _batch = yy.size(0)
                _yy = yy[..., :t_train, :]  # if t_train is not -1
                l2_full = loss_fn(pred.reshape(_batch, -1), _yy.reshape(_batch, -1))
                train_l2_full += l2_full.item()

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            if ep % model_update == 0 or ep == epochs:
                val_l2_step = 0
                val_l2_full = 0
                with torch.no_grad():
                    for xx, yy in val_loader:
                        loss = 0
                        xx = xx.to(device)
                        yy = yy.to(device)

                        pred = yy[..., :initial_step, :]
                        inp_shape = list(xx.shape)
                        inp_shape = inp_shape[:-2]
                        inp_shape.append(-1)

                        for t in range(initial_step, t_train):
                            inp = yy[..., t - initial_step:t, :].reshape(inp_shape)
                            temp_shape = [0, -1]
                            temp_shape.extend([i for i in range(1, len(inp.shape) - 1)])
                            inp = inp.permute(temp_shape)
                            y = yy[..., t:t + 1, :]
                            temp_shape = [0]
                            temp_shape.extend([i for i in range(2, len(inp.shape))])
                            temp_shape.append(1)
                            im = model(inp).permute(temp_shape).unsqueeze(-2)
                            loss += loss_fn(im.reshape(batch_size, -1), y.reshape(batch_size, -1))

                            pred = torch.cat((pred, im), -2)

                        val_l2_step += loss.item()
                        _batch = yy.size(0)
                        _pred = pred[..., initial_step:t_train, :]
                        _yy = yy[..., initial_step:t_train, :]  # if t_train is not -1
                        val_l2_full += loss_fn(_pred.reshape(_batch, -1), _yy.reshape(_batch, -1)).item()

                    # if val_l2_full < loss_val_min:
                    #     loss_val_min = val_l2_full
                    #     torch.save({
                    #         'epoch': ep,
                    #         'model_state_dict': model.state_dict(),
                    #         'optimizer_state_dict': optimizer.state_dict(),
                    #         'loss': loss_val_min
                    #     }, model_path)

            t2 = default_timer()
            scheduler.step()
            print('epoch: {0}, loss: {1:.5f}, t2-t1: {2:.5f}, trainL2: {3:.5f}, testL2: {4:.5f}' \
                  .format(ep, loss.item(), t2 - t1, train_l2_step, val_l2_step))


if __name__ == "__main__":
    run_training()
    print("Done.")

I wonder if it is because I misuse the code. Looking forward to your reply.

Form of shallow water equations

Here are the shallow water equations on Wikipedia (https://en.wikipedia.org/wiki/Shallow_water_equations):

image

Setting the density rho=1, these agree with the equations given by Pyclaw: https://github.com/clawpack/pyclaw/blob/858498a9e2e2876519d4544bae4020675c0283e0/examples/shallow_2d/radial_dam_break.py#L10

However the equations in the supplementary material are quite different on the right-hand side:

image

For example, the derivatives are with respect to x in the second equation and y in the third equation, and there is no product of u and v. A key difference is that the pyclaw and wikipedia equations assume that the bathymetry is constant, but I do not see how this would rectify the other differences I mentioned.

Another related question I have is whether the bathymetry is actually spatially varying, or is it constant everywhere. If it varies spatially, how is it defined in terms of x and y?

2D Darcy Flow Training

What is the right command for running the Darcy Flow config? When initial_step is set to 10, t_train is only 2 as read from the data. This means that the code does not enter the training loop.

PINN supervision data

Hi,

From the code it seems that PINN samples 30% of ground truth data from the solution domain and use them to supervise training. If I understand correctly, the original PINN does not require supervision data. Is this the desired behavior? I could not locate related descriptions in the paper.

ratio = int(len(dataset) * 0.3)
data_split, _ = torch.utils.data.random_split(
dataset,
[ratio, len(dataset) - ratio],
generator=torch.Generator(device="cuda").manual_seed(42),
)
data_gt = data_split[:]
bc_data = dde.icbc.PointSetBC(data_gt[0].cpu(), data_gt[1], component=0)
data = dde.data.TimePDE(
geomtime,
pde_swe2d,
[bc, ic_h, ic_u, ic_v, bc_data],
num_domain=1000,
num_boundary=1000,
num_initial=5000,
)

Thanks!

How to normalize the data for 1D CFD ?

Hi, as described in the paper, the range of values for the density and pressure is really different from the velocity. How would you recommend to normalize these channels ?

Scaling issue for 1D Advection data

From Fig. 6 of the paper (in the Appendix):

image

Based on the closed form of the formula (Appendix D1), I feel the right-hand chart seems to have inconsistent scaling in time dimension compared to the PDE, i.e. the time value should be 2 times what the y-axis of the graph says, or beta should be double the value.

I also find a similar problem from the downloaded file from the url listed in data_download, where the dataset for the Advection dataset doesn't seem to be consistent with the Advection formula due to the similar scaling issue (based on just observing the array in the returned object).

Thanks in advance for your response.

What is the VLlimiter?

Hi,

I'm curious about what this VLlimiter is. It does not look like the Van Leer flux limiter. Do you have any reference paper or source of this so I can look into it? Thanks a lot.

def VLlimiter(a, b, c, alpha=2.):
return jnp.sign(c)\
*(0.5 + 0.5*jnp.sign(a*b))\
*jnp.minimum(alpha*jnp.minimum(jnp.abs(a), jnp.abs(b)), jnp.abs(c))

An error when using the CUDA=11.7

Thanks for your good work. When I am training the FNO with the default cuda/torch setting (conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia), it will lead to an error like that:

Traceback (most recent call last):
File "pdebench/models/train_models_forward.py", line 326, in
main()
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/main.py", line 94, in decorated_main
_run_hydra(
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
_run_app(
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/utils.py", line 457, in _run_app
run_and_report(
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
raise ex
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
return func()
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/utils.py", line 458, in
lambda: hydra.run(
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "pdebench/models/train_models_forward.py", line 223, in main
run_training_FNO(
File "/scratch/yzhu/TBv2-PDEBench/pdebench/models/fno/train.py", line 339, in run_training
im = model(inp, grid)
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/yzhu/TBv2-PDEBench/pdebench/models/fno/fno.py", line 221, in forward
x1 = self.conv0(x)
File "/scratch/yzhu/anaconda3/envs/myenv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/scratch/yzhu/TBv2-PDEBench/pdebench/models/fno/fno.py", line 163, in forward
x_ft = torch.fft.rfft2(x)
RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

2D Incompressible Navier-Stocks Dataset

Hi,

Thanks so much for your great job!

When I check the data page, I found there are so many repeat ns_incom_inhom_2d_512-275.h5. Are they the same or the wrong file name?

Thanks.

Broken `download_metadata.csv`

As the title suggests, in the last commit the file download_metadata.csv seems to have turned into a binary, leading to an error for pd.read_csv(...). Reverting to the old commit, solves the issue for now.

How to get more picture of result

Hi,
After I trained the PINN model and ran analyse_result_forward.py,I got the Result.pdf.There are only the images about MSE.
How can I get more information of the result?Such as the visualized data and visualized prediction or anything else,just like the pictures shown in your paper.

Darcy Flow Config Issues

I have run into two issues with darcy flow's config files:

  1. There are two config files config_darcy.yaml and args/config_Darcy.yaml. The documentation points to config_Darcy.yaml (capital D), but config_darcy.yaml (lowercase d) seems newer and more correct...? Should this be updated to fully replace the old one?

  2. config_darcy.yaml works with FNO, but has an error with Unet. By default, config_darcy sets initial_step=1 and t_train=1. I believe this is an error because the AR loops (here and here) go from initial_step to t_train, so it ends up not doing anything, since the range ends up being empty. This actually produces a confusing error, since the loss is initialized as a python int. Since the loop is empty, nothing is added onto loss, so it stays as an int:

Unet
Epochs = 500, learning rate = 0.001, scheduler step = 100, scheduler gamma = 0.5
Spatial Dimension 2
Total parameters = 7762465
start training...
Error executing job with overrides: []
Traceback (most recent call last):
  File "/dfs6/pub/afeeney/opensource/PDEBench/pdebench/models/train_models_forward.py", line 199, in main
    run_training_Unet(
  File "/data/homezvol2/afeeney/.conda/envs/pdebench/lib/python3.10/site-packages/pdebench/models/unet/train.py", line 414, in run_training
    train_l2_step += loss.item()
AttributeError: 'int' object has no attribute 'item'

I was able to get it running by setting t_train=2. I don't totally follow how the Darcy stuff is setup, so I'm not sure if that's a correct fix though...

Config files for provided pretrained models

Hi!

I was wondering if there are specific config files or further details available somewhere for all the pretrained models that are provided? I see that there are some under models/config/args, but there are also many missing from this folder it seems.

It would be nice to have specific information per pretrained model (e.g., which resolutions have been used, architectures, whether ar_mode has been used, pushforward, what initial_step, etc.) as I am currently struggling a bit with properly loading each pretrained model as well as mimicking the test results in the paper.

Kind regards,

Winfried

Issue on the grid of data

Greatings, authors, and sorry to bother you again with questions.

In the dataset, the mesh of the domain (e.g. 1-d interval [a, b]) would not include the points on the boundary, i.e. a and b. For Dirichlet BC, values on the boundaries are given by the B.C. so that users can fill them up by themselves. For Neumann BC, however, only the 1st derivatives are known and the boundary value can be essential for checking the boundary condition errors of learning models, and I find it not very trivial to recover the them. In addition, it's common to give and use the boundary value of solutions in many datasets/models. I wonder how can I get the boundary values in the datasets, especially the 2D diff-react data.

I look forward to hearing from you soon.

PINN Baseline PDE

Hi,
I downloaded the dataset for the 1D Navier-Stokes Compressible Equation., and I have several questions about the PINN PDE that is implemented in pinn .

  • Why is the loss not corresponding to the equation provided in Appendix D5 ? Several terms are missing in the code, is there an explanation for that ? (In particular for the case where eta and zeta are 1e-1, these terms may have some influence ?).
  • What is the value of the viscous stress tensor ? How to fins the missing config to implement the complete PINN ?
  • I have noticed that there was a - before the pressure term in the second equation in the code, shouldn't it be a + ?

Thanks a lot for your help!

PINN matrices cannot be multiplied

Hi,I met this problem when i used PINN and ran train_models_forward.py. it showed that two matrices cannot be multiplied

'compile' took 0.000202 s

Training model...

Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/xxx/PDEBench/pdebench/models/train_models_forward.py", line 232, in main
    run_training_PINN(
  File "/home/xxx/PDEBench/pdebench/models/pinn/train.py", line 472, in run_training
    test_pred, test_gt, model_name = _run_training(scenario, epochs, learning_rate, model_update, flnm,
  File "/home/xxx/PDEBench/pdebench/models/pinn/train.py", line 409, in _run_training
    losshistory, train_state = model.train(
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/utils/internal.py", line 22, in wrapper
    result = f(*args, **kwargs)
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/model.py", line 427, in train
    self._test()
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/model.py", line 559, in _test
    ) = self._outputs_losses(
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/model.py", line 356, in _outputs_losses
    outs = self.outputs_losses(training, inputs, targets)
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/model.py", line 229, in outputs_losses
    outputs_ = self.net(self.net.inputs)
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/deepxde/nn/pytorch/fnn.py", line 33, in forward
    x = self.activation(linear(x))
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/xxx/anaconda3/envs/PDEBench/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 103, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (10024x2 and 1x40)

And here is the config that i used for training

  - _self_
  - override hydra/hydra_logging: disabled
  - override hydra/job_logging: disabled

hydra:
  output_subdir: null
  run:
    dir: .

args:
  model_name: 'PINN'
  scenario: 'pde1D'
  model_update: 500
  filename: '1D_Advection_Sols_beta0.1.hdf5'
  epochs: 15000
  input_ch: 2
  output_ch: 1
  learning_rate: 1.e-3
  root_path: './data'
  val_num: 10
  if_periodic_bc: True
  period: 5000
  val_time: 2.0
  val_batch_idx: 10
  aux_params: [ 0.1 ]
  seed: '0000'

thanks and i need your help.

Range of the x-coordinate for 1D Burgers data

Thank you a lot for this benchmark dataset. In Appendix D.3 of the paper, the range of the x-axis is (0, 1) according to equation (11,12).
Screenshot_20231027_104930
However, in Fig.7, the range is (-1, 1) according to the curves, and (0, 1) according to the two-dimensional plot.
Screenshot_20231027_103936
In the configuration files for data generation, the range appears to be (-1, 1) as well:


For the dataset I downloaded:

with h5py.File('1D_Burgers_Sols_Nu0.01.hdf5', "r") as h5_file:
    print(h5_file['x-coordinate'][:])
# [-0.99902344 -0.9970703  -0.9951172  ...  0.9951172   0.9970703   0.99902344]

This seems a bit confusing to me.

Furthermore, if the range of the x-axis is (-1, 1), then I find a scaling issue similar to the Advection data mentioned here. If I have not misunderstood, for the Burgers equation, the shock propagates at a speed of (uL + uR) / 2. According to the first panel of Fig.7, the average of uL and uR is approximately -0.8, but the propagation speed seems to be doubled. I wonder whether this is related to the following lines in the data generation script:

f_upwd = (fR[1:cfg.multi.nx+2] + fL[2:cfg.multi.nx+3]
- 0.5*jnp.abs(uL[2:cfg.multi.nx+3] + uR[1:cfg.multi.nx+2])*(uL[2:cfg.multi.nx+3] - uR[1:cfg.multi.nx+2]))

In the updated script for the Advection equation, the value of the flux is multiplied by 0.5, but this operation is not applied here for the Burgers equation.

Thanks in advance for your response.

Parallel Data Writing in HDF5 Output

Hi,

First let me thank you for the great initiative to create dataset and benchmark for machine learning applied to PDEs.

I tried to generate data using your script :
python3 data_gen/gen_diff_react.py
I got the following error:

BlockingIOError: [Errno 11] Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

I assume this error arises because at least two processes of the pool are trying to write in the HDF5 output at the same time. HDF5 actually supports parallel writing (c.f. official documentation). This supports however relies on mpi. Hence, I'm not sure the multirpocessing.Pool allows the parallel writing of the simulation data in the output file. Let me know if such explanation is erroneous.

Which configuration do you use to be able to successfully generate and write the data in parallel? Which h5py installation do you use (mpi supported or standard one)?

A quick fix to the problem occurring is to retry the opening of the file until the lock is available:

busy = True
while busy:
    try:
        data_f = h5py.File(utils.expand_path(config.output_path), "a")
    except IOError: 
        time.sleep(1)
        print(f"Busy {busy}.")
        continue
    else:
        ...  # Writing of the dataset
        data_f.close()

Nonetheless, proper support of parallel HDF5 writing would be appreciated.

Appendices

Hi,

Thanks for providing such a powerful toolkit. I am wondering where can I find the appendices mentioned in the paper?

G.

3D CFD with M=0.1

I trained FNO3D on the dataset 3D_CFD_Rand_M1.0_Eta1e-08_Zeta1e-08_periodic_Train.hdf5. I also generated an identical dataset with M=0.1. I used spatial resolution 64 x 64 x 64. Here are my results, with M=0.1 in blue and M=1 in red.

image

For reference, here is the training log at the last epoch for M=1:

epoch: 499, loss: 10.87719, t2-t1: 35.52373, trainL2: 18.71870, testL2: 20.85403

It is not surprising that the M=1 problem is harder, but a 40-times increase is very high and makes me wonder how realistic the predicted rollouts are.

I observed a similar result in the 2D case with the files 2D_CFD_Rand_M1.0_Eta0.01_Zeta0.01_periodic_128_Train.hdf5 and 2D_CFD_Rand_M0.1_Eta1e-08_Zeta1e-08_periodic_512_Train.hdf5, trained on resolution 64 x 64.

image

Would it be possible to have an official M=0.1 3D dataset released on Darus for those that may want to benchmark CFD with M=0.1 for not only 2D CFD (as is currently available), but 3D as well?

Data download issue from python script

Hi, Thanks for your brilliant work first!

I am trying to download data from Sttutgart platform through download.py. However, I always face error message: KeyError: 'project',
and UserWarning: No 'API_TOKEN' found in the environment. Please be aware, that you might not have the rights to download this dataset.

If I go to the benchmarks dataset homepage and download by the link, everything is going well, the only problem is downloading through the download.py script.
I did not change any configuration in config.yaml, so I think the problem might be in other places.

ModuleNotFoundError: No module named 'pdebench.models'

Hi,

Great work! Thanks a lot for sharing your codes and dataset.

I have a problem with running the training code. The package worked fine on windows using pycharm, but when I installed it on linux and tried to run the train_models_forward.py, it kept giving me errors "ModuleNotFoundError: No module named 'pdebench.models'". I have tried to fix it using many online found solutions, such as adding an empty init.py file in the models subfolder, but none of them worked. Do you by any chance know how to resolve this issue?

BTW, I plotted the system path using print (sys.path), and did see the pdebench.models path there.

Many thanks,
Ning

Using Relative Error Metrics?

(this is more of a question than an issue, but I suppose it sort of works as a feature request :-3)

Hi, thanks for the awesome dataset!

I noticed in neuraloperator, they use relative error metrics/loss function for training (see implementation and use). AFAIK, PDEBench does not use relative metrics for training or evaluation (?)

Do you think it would make sense to include similar relative errors? Or is there any particular reason y'all chose to exclude the relative errors in PDEBench?

High PDE residual error for 2D reaction diffusion dataset

When running the provided data generation code for 128x128 2D reaction diffusion, I am observing very high PDE residual at the beginning of the trajectory. The lowest that the residual gets is around 1e-2, which as far as I understand isn't that great for a PDE solver. Below is a plot of the mean absolute residual value (i.e value of the PDE) over the 128x128 spatial grid as a function of time for a few generated sample trajectories of length 100. The code was run out-of-the-box with no changes to the PDE parameters/other settings. Ideally, the residual should be nonzero only for the initial condition, otherwise numerical errors in the dataset will be propagated to the trained neural networks. Any suggestions/pointers as to why I could be seeing this behavior?
provided_data_residuals

How to use the downloaded 3D dataset

I downloaded the 3D_CFD_Turb_M1.0_Eta1e-08_Zeta1e-08_periodic_Train.hdf5 file, but I didn't find a specific way to use the dataset, or a specific description of that dataset.

Meanwhile, I downloaded the 3DCFD_FNO.tar pre-trained weights file, how can I use this file?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.