Giter Site home page Giter Site logo

fourcastnet's Introduction

FourCastNet

nvidia nersc

This repository contains the code used for "FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators" [paper]

The code was developed by the authors of the preprint: Jaideep Pathak, Shashank Subramanian, Peter Harrington, Sanjeev Raja, Ashesh Chattopadhyay, Morteza Mardani, Thorsten Kurth, David Hall, Zongyi Li, Kamyar Azizzadenesheli, Pedram Hassanzadeh, Karthik Kashinath, Animashree Anandkumar

FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25∘ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor. It has important implications for planning wind energy resources, predicting extreme weather events such as tropical cyclones, extra-tropical cyclones, and atmospheric rivers. FourCastNet matches the forecasting accuracy of the ECMWF Integrated Forecasting System (IFS), a state-of-the-art Numerical Weather Prediction (NWP) model, at short lead times for large-scale variables, while outperforming IFS for variables with complex fine-scale structure, including precipitation. FourCastNet generates a week-long forecast in less than 2 seconds, orders of magnitude faster than IFS. The speed of FourCastNet enables the creation of rapid and inexpensive large-ensemble forecasts with thousands of ensemble-members for improving probabilistic forecasting. We discuss how data-driven deep learning models such as FourCastNet are a valuable addition to the meteorology toolkit to aid and augment NWP models.

FourCastNet is based on the vision transformer architecture with Adaptive Fourier Neural Operator (AFNO) attention proposed in Guibas-Mardani et al. [paper], [code].

Total Column of Water Vapor forecast using FourCastNet

Quick Links:

Pre-processed Training Data - Globus Download Link

Trained Model Weights - Globus Download Link

Trained Model Weights - Web Download Link

Version Notes:

Release Version Model Weights Training Data Normalization
Initial release v0.0.0 FCN_weights_v0 FCN_ERA5_data_v0 stats_v0

Training:

The model is trained on a subset of ERA5 reanalysis data on single levels [ Hersbach 2018 ] and pressure levels [ Hersbach 2018 ] that is pre-processed and stored into hdf5 files.

The subset of the ERA5 training data that FCN was trained on is hosted at the National Energy Resarch Scientific Computing Center (NERSC). For convenience it is available to all via Globus at the following link.

Pre-processed Training Data

You will need a Globus account and will need to be logged in to your account in order to access the data. The full dataset that this version of FourCastNet was trained on is approximately 5TB in size.

The data directory is organized as follows:

FCN_ERA5_data_v0
│   README.md
└───train
│   │   1979.h5
│   │   1980.h5
│   │   ...
│   │   ...
│   │   2015.h5
│   
└───test
│   │   2016.h5
│   │   2017.h5
│
└───out_of_sample
│   │   2018.h5
│
└───static
│   │   orography.h5
│
└───precip
│   │   train/
│   │   test/
│   │   out_of_sample/

Precomputed stats are provided at additional and have the directory structure:

stats_v0
│   global_means.npy  
│   global_stds.npy  
│   land_sea_mask.npy  
│   latitude.npy  
│   longitude.npy  
│   time_means.npy
│   time_means_daily.h5
└───precip
│   │   time_means.npy

Training configurations can be set up in config/AFNO.yaml. The following paths need to be set by the user. These paths should point to the data and stats you downloaded in the steps above:

afno_backbone: &backbone
  <<: *FULL_FIELD
  ...
  ...
  orography: !!bool False 
  orography_path: None # provide path to orography.h5 file if set to true, 
  exp_dir:             # directory path to store training checkpoints and other output
  train_data_path:     # full path to /train/
  valid_data_path:     # full path to /test/
  inf_data_path:       # full path to /out_of_sample. Will not be used while training.
  time_means_path:     # full path to time_means.npy
  global_means_path:   # full path to global_means.npy
  global_stds_path:    # full path to global_stds.npy

An example launch script for distributed data parallel training on the slurm based HPC cluster perlmutter is provided in submit_batch.sh. Please follow the pre-training and fine-tuning procedures as described in the pre-print.

To run the precipitation diagnostic model, see the following example config:

precip: &precip
  <<: *backbone
  ...
  ...
  precip:              # full path to precipitation data files 
  time_means_path_tp:  # full path to time means for precipitation
  model_wind_path:     # full path to backbone model weights ckpt

Inference:

In order to run FourCastNet in inference mode you will need to have the following files on hand.

  1. The path to the out of training sample hdf5 file. This could either be a new set of initial conditions that you downloaded from copernicus and processed yourself (see separate instructions for doing so in the next section), or it could be out_of_sample dataset hosted here. The inference script provided assumes that you are using the out_of_sample/2018.h5 file. You can modify the script to use a different h5 file that you processed yourself after downloading the raw data from Copernicus.
  2. The model weights hosted at Trained Model Weights
FCN_weights_v0/
│   backbone.ckpt  
│   precip.ckpt  
  1. The pre-computed normalization statistics hosted at additional. It is crucial that you use the statistics that are provided if you are using the pre-trained model weights that we have provided since these stats were used when trainig the model. The normalization statistics go hand-in-hand with the trained model weights. The stats folder contains:
stats_v0
│   global_means.npy  
│   global_stds.npy  
│   land_sea_mask.npy  
│   latitude.npy  
│   longitude.npy  
│   time_means.npy
│   time_means_daily.h5

Once you have all the file listed above you should be ready to go.

In config/AFNO.yaml, set the user defined paths

afno_backbone: &backbone
  <<: *FULL_FIELD
  ...
  ...
  orography: !!bool False 
  orography_path: None # provide path to orography.h5 file if set to true, 
  inf_data_path:       # full path to /out_of_sample. Will not be used while training.
  time_means_path:     # full path to time_means.npy
  global_means_path:   # full path to global_means.npy
  global_stds_path:    # full path to global_stds.npy

Run inference using

python inference/inference.py \
       --config=afno_backbone \
       --run_num=0 \
       --weights '/path/to/weights/backbone.ckpt' \
       --override_dir '/path/to/output/scratch/directory/ \' 

Run inference for precipitation using

python inference/inference_precip.py \
       --config=precip \
       --run_num=0 \
       --weights '/path/to/weights/precip.ckpt' \
       --override_dir '/path/to/output/scratch/directory/ \' 

Additional information on batched ensemble inference and precipitation model inference can be found at inference/README_inference.md

The outputs of the inference scripts will be written to an hdf5 file at the path specified in the --override_dir input argument. Depending on the params set in the config file, the output file will contain the computed ACC and RMSE of the forecasts and the raw forecasts of selected fields for visualization.

Inference for a custom interval

The steps will walk you through:

  1. Downloading an initial condition from the (continuously expanding) ERA5 dataset to initialize a FourCastNet model.
  2. Pre-processing the downloaded ERA5 files
  3. Running inference

Downloading an initial condition to initialize FourCastNet.

If you are interested in generating a forecast using FourCastNet for a specific time-interval, you should begin by downloading the ERA5 netCDF files for the relevant variables from the Copernicus Climate Change Service Data Store. For convenience, the scripts are provided in /copernicus. Specifically, you need the two scripts /copernicus/get_data_pl_short_length.py and /copernicus/get_data_sl_short_length.py. These two scripts will respectively download (a superset of) the atmospheric variables on single levels and pressure levels that are modelled by FourCastNet. Be sure to specify the correct time interval in both scripts. While a single temporal snapshot from ERA5 is sufficient to generate a forecast using FourCastNet, you will want to download the ground truth for the full interval you are interested in. This is so that you can analyze the skill of FourCastNet by comparing with the ERA5 ground truth via the RMSE and ACC metrics.

The example scripts show you how to download pl and sl variables in an interval from 19 October 2021 to 31 October 2021. Be sure to download consecutive days only and keep all snapshots at the 0, 6, 12, 18 hour timestamps.

Pre-processing

Once you have downloaded the relevant netCDF4 files, you will also need to pre-process them. The pre-processing step simply copies the variables into hdf5 files in the correct order that the trained FourCastNet model expects as input. The pre-processing can be performed using the script data_process/parallel_copy_small_set.py. While the script is MPI capable in order to deal with long time intervals, if your desired interval is short (say a few weeks), you can run it on a single process.

The example script shows you how to process pl and sl variables in the time interval from 19 October 2021 to 31 October 2021 that we downloaded in the previous step.

Running inference

Follow the general steps listed in the Inference section above. You will need to make appropriate modifications to the inference/inference.py script.

References:

ERA5 data [ Hersbach, H. et al., (2018) ] was downloaded from the Copernicus Climate Change Service (C3S) Climate Data Store.

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2018): ERA5 hourly data on pressure levels from 1959 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). , 10.24381/cds.bd0915c6

Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2018): ERA5 hourly data on single levels from 1959 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). , 10.24381/cds.adbb2d47

If you find this work useful, cite it using:

@article{pathak2022fourcastnet,
  title={Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators},
  author={Pathak, Jaideep and Subramanian, Shashank and Harrington, Peter and Raja, Sanjeev and Chattopadhyay, Ashesh and Mardani, Morteza and Kurth, Thorsten and Hall, David and Li, Zongyi and Azizzadenesheli, Kamyar and Hassanzadeh, Pedram and Kashinath, Karthik and Anandkumar, Animashree},
  journal={arXiv preprint arXiv:2202.11214},
  year={2022}
}

fourcastnet's People

Contributors

jdppthk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fourcastnet's Issues

Pre-processing parallel_copy.py

Thank you for your great code, this is SOTA model.
I had an issue with running the pre-processing parallel_copy.py or MPI.py (similar to parallel_copy.py but it's has different number of years) by running the exact datasets for full year(2016-2021) and still got the error which is: KeyError "ValueError: h5py was built without MPI support, can't use mpio driver"

I installed OpenMPI, mpi4py

(cast) mg@amru-System-Product-Name:~$ mpiexec -n 5 python -m mpi4py.bench helloworld
Hello, World! I am process 0 of 5 on amru-System-Product-Name.
Hello, World! I am process 1 of 5 on amru-System-Product-Name.
Hello, World! I am process 2 of 5 on amru-System-Product-Name.
Hello, World! I am process 3 of 5 on amru-System-Product-Name.
Hello, World! I am process 4 of 5 on amru-System-Product-Name.

I don't know what causes this problem because in my point of view everything must be ok with the code and datasets.

(cast) mg@amru-System-Product-Name:~/Documents/Data$ mpirun -n 4 python MPI.py 
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
==============================
rank 1
Nproc 4
==============================
Nimgtot 1460
Nproc 4
Nimg 365
Traceback (most recent call last):
  File "MPI.py", line 130, in <module>
    with h5py.File(f'{str(year)}.h5', 'w') as f:
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 116, in h5py.h5f.create
BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
{2016: 'j', 2017: 'j', 2018: 'k', 2019: 'k', 2020: 'a', 2021: 'a'}
2016
Traceback (most recent call last):
  File "MPI.py", line 133, in <module>
    writetofile(src, dest, 0, ['u10'])
  File "MPI.py", line 75, in writetofile
    fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 441, in __init__
    fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 144, in make_fapl
    set_fapl(plist, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 48, in _set_fapl_mpio
    raise ValueError("h5py was built without MPI support, can't use mpio driver")
ValueError: h5py was built without MPI support, can't use mpio driver
Traceback (most recent call last):
  File "MPI.py", line 130, in <module>
    with h5py.File(f'{str(year)}.h5', 'w') as f:
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 442, in __init__
    fid = make_fid(name, mode, userblock_size,
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 201, in make_fid
    fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 116, in h5py.h5f.create
BlockingIOError: [Errno 11] Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')
==============================
rank 2
Nproc 4
==============================
Nimgtot 1460
Nproc 4
Nimg 365
Traceback (most recent call last):
  File "MPI.py", line 133, in <module>
    writetofile(src, dest, 0, ['u10'])
  File "MPI.py", line 75, in writetofile
    fdest = h5py.File(dest, 'a', driver='mpio', comm=MPI.COMM_WORLD)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 441, in __init__
    fapl = make_fapl(driver, libver, rdcc_nslots, rdcc_nbytes, rdcc_w0, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 144, in make_fapl
    set_fapl(plist, **kwds)
  File "/home/mg/.local/lib/python3.8/site-packages/h5py/_hl/files.py", line 48, in _set_fapl_mpio
    raise ValueError("h5py was built without MPI support, can't use mpio driver")
ValueError: h5py was built without MPI support, can't use mpio driver
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[3210,1],0]
  Exit code:    1

datasets downloaded from cds.climate.copernicus.eu for each year with 20 paramenters

I just started using mpi4py and h5py, could you please help me to run the prepossessing parallel_copy.py

Smaller version of dataset

Hello,
It will be good to have a relatively smaller dataset for experimentation/learning purposes. By smaller, it may be low spatial resolution, smaller time-period, lesser number of variables etc.

Unable to download the weights

Hi,

I was looking to perform inference on the trained model and noticed that the download option was not available in the Globus App. Could someone please look into this.

Thanks,
Vignesh

Minor typo in README

The training section refers to National Energy Resarch Scientific Computing Center (NERSC).

I believe it should be "Research".

Regarding the pre-trained weight file backbone.ckpt

Regarding the pre-trained weight file backbone.ckpt, is this the weight file of the afno_backbone stage or the weight file of the afno_backbone_finetune stage? I will be extremely grateful for your prompt reply.

TP accuracy

Hello everyone. I have an issue with the TP acc which is extremely low. does anyone know what can be the problem? below you can find the output after an inference and my input data info:

input h5 shape: tp(120, 721, 1440).

inference:
2023-06-25 11:24:06,620 - root - INFO - Timestep 0 of 20. TP RMS Error: 0.0, ACC: 1.0
2023-06-25 11:24:09,065 - root - INFO - Timestep 1 of 20. TP RMS Error: 0.0016368781216442585, ACC: 0.35278499126434326
2023-06-25 11:24:09,388 - root - INFO - Timestep 2 of 20. TP RMS Error: 0.00156010827049613, ACC: 0.3554094731807709
2023-06-25 11:24:09,688 - root - INFO - Timestep 3 of 20. TP RMS Error: 0.0015420995187014341, ACC: 0.3688367009162903
2023-06-25 11:24:09,988 - root - INFO - Timestep 4 of 20. TP RMS Error: 0.0014832873130217195, ACC: 0.3889610171318054
2023-06-25 11:24:10,289 - root - INFO - Timestep 5 of 20. TP RMS Error: 0.0014608813216909766, ACC: 0.36930468678474426
2023-06-25 11:24:10,589 - root - INFO - Timestep 6 of 20. TP RMS Error: 0.0015008833725005388, ACC: 0.32038554549217224
2023-06-25 11:24:10,889 - root - INFO - Timestep 7 of 20. TP RMS Error: 0.0013966681435704231, ACC: 0.36792299151420593
2023-06-25 11:24:11,189 - root - INFO - Timestep 8 of 20. TP RMS Error: 0.001374703599140048, ACC: 0.3730277121067047
2023-06-25 11:24:11,490 - root - INFO - Timestep 9 of 20. TP RMS Error: 0.0013704199809581041, ACC: 0.33114054799079895
2023-06-25 11:24:11,790 - root - INFO - Timestep 10 of 20. TP RMS Error: 0.0014247711515054107, ACC: 0.2770615518093109
2023-06-25 11:24:12,090 - root - INFO - Timestep 11 of 20. TP RMS Error: 0.001329558901488781, ACC: 0.3315066695213318
2023-06-25 11:24:12,390 - root - INFO - Timestep 12 of 20. TP RMS Error: 0.0012841359712183475, ACC: 0.35082289576530457
2023-06-25 11:24:12,690 - root - INFO - Timestep 13 of 20. TP RMS Error: 0.0012776957591995597, ACC: 0.32422271370887756
2023-06-25 11:24:12,990 - root - INFO - Timestep 14 of 20. TP RMS Error: 0.001365577569231391, ACC: 0.2353413850069046
2023-06-25 11:24:13,290 - root - INFO - Timestep 15 of 20. TP RMS Error: 0.001326375175267458, ACC: 0.28706997632980347
2023-06-25 11:24:13,590 - root - INFO - Timestep 16 of 20. TP RMS Error: 0.0013120684307068586, ACC: 0.3122824430465698
2023-06-25 11:24:13,890 - root - INFO - Timestep 17 of 20. TP RMS Error: 0.001352619961835444, ACC: 0.270219087600708
2023-06-25 11:24:14,190 - root - INFO - Timestep 18 of 20. TP RMS Error: 0.0014553270302712917, ACC: 0.1826096624135971
2023-06-25 11:24:14,489 - root - INFO - Timestep 19 of 20. TP RMS Error: 0.0014066090807318687, ACC: 0.23088878393173218

Trouble with downloading data

Hello, thanks for open-sourcing such a well structured code!

I am currently having trouble with trying to download the pre-processed data shared on globus.

Even after logging in, it seems like the download button is disabled. Has anything changed with the permission? What can I do to fix this? Below is the screenshot of what I am facing.

image

minor fixes

  • Check license header for afnonet.py
  • create a versioning and model weights table
  • Logos at the top of wiki, center them

Possibility of using different resolution input data over smaller areas

I have a very general question, which is not clear to me, about this method. and sorry if my question is very basic because I am a noobie in this field.

I ran the code and got the results for the same datasets used inside the paper (ERA5 res 30km) for different dates.
Can I use higher res data (2km) for the smaller area too? and need to train the whole model from the scratch or I can use the checkpoints provided in the paper to run the code for higher-resolution datasets?
Thank you very much for your time! :)

Pre-processing stage key error

First of all, thank you for your GREAT code, it is a real game changer.
I had an issue with running the pre-processing stage by running the exact datasets used inside the written example code in this stage (13 days) and still got the error which is: KeyError "Unable to open object (object 'fields' doesn't exist)"

I don't know what causes this problem because in my point of view everything must be ok with the code and datasets.

How to understand the core code in FFT?

Hello! I can't understand the calculation method in FFT.
o1_real = x.real * w1[0] - x.imag * w1[1] + b1[0]
o1_imag = x.imag * w1[0] + x.real * w1[1] + b1[1]
o2_real = o1_real * w2[0] - o1_imag * w2[1] + b2[0]
o2_imag = o1_imag * w2[0] + o2_real * w2[1] + b2[1]
Why should the real be '-', '+', '+', the imag be '+', '+', '+'? What role does this calculation combination play in it? Can I change the first '-' to '+' or vice versa?
And what role does 'kept_modes' play in it? Dropout?

Checklist before open sourcing

  • Create universally shared globus directories for model weights, stats and data
  • Add ECMWF data license
  • Create staging directory for checkpoints
  • Rewrite training readme
  • Rewrite inference readme
  • Test inference workflow following readme
  • remove mins and maxs from stats, training and inference workflow

cfgrib.dataset.DatasetBuildError

When I try to run the.grib file, the output image is displayed with an error:cfgrib.dataset.DatasetBuildError: key present and new value is different: key='surface' value=Variable(dimensions=(), data=0.0) new_value=Variable(dimensions=(), data=2.0),How to solve the problem that the predicted result differs greatly from the actual value

problem with downloading pre-trained model

I do not have an organization and log into globus through personal identity.
I came across with network issue with personal endpoint. I wonder if there is other place, google cloud drive for example, to download the pretrained model file?

the way to calculate `time_means` in script get_stats.py is wrong

Please see: https://github.com/NVlabs/FourCastNet/blob/master/data_process/get_stats.py

**time_means = np.zeros((1,21,721, 1440))**

for ii, year in enumerate(years):
    
    with h5py.File('/pscratch/sd/s/shas1693/data/era5/train/'+ str(year) + '.h5', 'r') as f:

        rnd_idx = np.random.randint(0, 1460-500)
        global_means += np.mean(f['fields'][rnd_idx:rnd_idx+500], keepdims=True, axis = (0,2,3))
        global_stds += np.var(f['fields'][rnd_idx:rnd_idx+500], keepdims=True, axis = (0,2,3))

global_means = global_means/len(years)
global_stds = np.sqrt(global_stds/len(years))
**time_means = time_means/len(years)**

the time_means is constant zero follow this script.
What is the correct defination for this value?

BTW, may I know how you calculate the time_means_daily.h5 file?
From its size (127G) I can only guess it is a $(1460,21,720,1440)$ tensor.

Help editing code to run train.py on smaller h5 files

thank you for releasing this amazing repo!

I found the reason for the error - it has do with the number of in_channels specified in the AFNO.yaml file

I changed in_channels to [0, 1, 2] and I don't get the error now

I'm closing this issue, but it'd be great if someone could give me some insight or add a short note on the changes to be made to run train the model on smaller h5 files

thank you again for releasing the repo - I look forward to understanding the code better :)

==================

Original issue

I'm able to run train.py with the large h5 files available on Globus.

When I try to run train.py with the smaller h5 files (regional or era5_subsample) made available on the NERSC portal, the following line throws an error:

https://github.com/NVlabs/FourCastNet/blob/master/utils/data_loader_multifiles.py#L207

specifically:

self.files[year_idx][(local_idx-self.dt*self.n_history):(local_idx+1):self.dt, self.in_channels] throws the following error

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/user/anaconda3/envs/climax/lib/python3.8/site-packages/h5py/_hl/dataset.py", line 710, in __getitem__
    return self._fast_reader.read(args)
  File "h5py/_selector.pyx", line 351, in h5py._selector.Reader.read
  File "h5py/_selector.pyx", line 198, in h5py._selector.Selector.apply_args
IndexError: Fancy indexing out of range for (0-2)

the shape of self.files[year_idx] for the larger h5 files in Globus is

HDF5 dataset: shape (1460, 21, 721, 1440)

the shape of the self.files[year_idx] for the smaller h5 files on NERSC - regional or era5_subsample is

HDF5 dataset: shape (1460, 3, 360, 360)

I'm not very familiar with h5py yet - could someone please help me edit the code on https://github.com/NVlabs/FourCastNet/blob/master/utils/data_loader_multifiles.py#L207 so I can run train.py on the smaller h5 files .

is there some edit to the AFNO.yaml besides the file paths that needs to be made to train the model on smaller h5 files?

thank you! @jdppthk @MortezaMardani

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.