ecmwf-lab / ai-models Goto Github PK

View Code? Open in Web Editor NEW

307.0 307.0 40.0 159 KB

License: Apache License 2.0

Python 100.00%

ai-models's People

Contributors

Stargazers

Watchers

Forkers

ruitao-terry modons macweather dloves1314 mpartio rvalenzuelar mrudko owlmonig krs-dev swonderine josteinblyverket afjal1991 zharoshan94 kamjawang sadeghtabas-noaa louispoulain rddali sardingfish openweatherai jinwx oubahe iamasam cd10kfsu wp8733684 junjie2008v shalaka-deshan0110 xenos-code hommedesbois w-winnie adjebbar anna-mq deluair t1-wilson jayyoung84 mdiallofzj leoldn tmatc yeechianlow ohouck

ai-models's Issues

Graphcast not available

Hello,
First of all thanks for this plugin ! I tried to infer using graphcast but launching ai-models --some-options graphcast outputs the following error.
ai-models: error: argument MODEL: invalid choice: 'graphcast' (choose from 'fourcastnet', 'panguweather', 'fourcastnetv2-small')
Do you have have any guess why it happens ?

Thank you !

[Graphcast] issue with saving output file

Hello,

I get an error when I try to run the graphcast model with a pre-downloaded file. Everything goes well until the very end, when the code tries to save the resulting xarray as a grib file.

Here is the command line I am trying to run:

ai-models --assets input/graphcast --file input/graphcast/analysis_data/20210101_full.grib --lead-time 48 --path output/graphcast.grib graphcast

And here is the error I get:

2023-10-27 16:00:46,219 INFO Converting output xarray to GRIB and saving
ECCODES ERROR : Minimum value out of range: nan
ECCODES ERROR : GRIB2 simple packing: unable to set values (Encoding invalid)
ECCODES ERROR : unable to set double array codedValues (Encoding invalid)
2023-10-27 16:00:50,012 INFO Saving output data: 3 seconds.
2023-10-27 16:00:50,012 INFO Total time: 5 minutes 40 seconds.
Traceback (most recent call last):
File "/XXX/meteo/python3.10_ai/bin/ai-models", line 33, in
sys.exit(load_entry_point('ai-models', 'console_scripts', 'ai-models')())
File "/YYY/ecmwf/ai-models/ai_models/main.py", line 291, in main
_main()
File "/YYY/ecmwf/ai-models/ai_models/main.py", line 264, in _main
model.run()
File "/YYY/ecmwf/ai-models-graphcast/ai_models_graphcast/model.py", line 248, in run
save_output_xarray(
File "/YYY/ecmwf/ai-models-graphcast/ai_models_graphcast/output.py", line 60, in save_output_xarray
write(
File "/YYY/ecmwf/ai-models/ai_models/model.py", line 104, in write
self.output.write(*args, **kwargs),
File "/YYY/ecmwf/ai-models/ai_models/outputs/init.py", line 36, in write
return self.output.write(*args, **kwargs)
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/climetlab/readers/grib/output.py", line 143, in write
handle.set_values(values)
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/climetlab/readers/grib/codes.py", line 204, in set_values
eccodes.codes_set_values(self.handle, values.flatten())
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/gribapi/gribapi.py", line 2022, in grib_set_values
grib_set_double_array(gribid, "values", values)
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/gribapi/gribapi.py", line 1156, in grib_set_double_array
GRIB_CHECK(lib.grib_set_double_array(h, key.encode(ENC), a, length))
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/gribapi/gribapi.py", line 228, in GRIB_CHECK
errors.raise_grib_error(errid)
File "/XXX/meteo/python3.10_ai/lib/python3.10/site-packages/gribapi/errors.py", line 382, in raise_grib_error
raise ERROR_MAP[errid](errid)
gribapi.errors.EncodingError: Encoding invalid

As I understand it, it is because there are nan's in the xarray and ecCodes does not know how to convert them to a grib file... But that's just a guess, and I have no clue why I would get nan's in the resulting xarray.

Has anyone had this issue before? Any idea what I could do to solve it?

Thanks a lot!

[GraphCast Operational Model] Issue with Negative Precipitation Data in GraphCast Operational Model Output

Background:
I've been using the GraphCast Operational model to generate 6-hour cumulative precipitation forecast data and have attempted to visualize it with NASA's Panoply software. Despite the model's ability to predict precipitation without using rainfall as an input, as described by the authors in a Science article, I couldn't find a parameter explicitly labeled as precipitation amount in Panoply. Instead, I came across a parameter named "Mixed intervals Accumulation," which raises my suspicion that Panoply might not be compatible with displaying this type of data.

Questions:
By programming my way through the GRIB files, I managed to access the data marked as "Total precipitation." However, I've noticed that some of the data includes negative values, which seems counterintuitive for precipitation metrics. As I am not a professional meteorologist, I would like to understand the following:

Is it normal to encounter negative values in precipitation data, or could this indicate an error or some other issue?
If negative values are normal, what do they signify?
Should I apply any special treatment to these negative values when analyzing precipitation data?

Additional Information:
The code snippet I used is as follows:

import pygrib  # Library for reading GRIB format data
import datetime  # Library for handling dates and times
import numpy as np  # Library for mathematical operations
import matplotlib.pyplot as plt  # Library for plotting
import cartopy.crs as ccrs  # Library for map projections
import cartopy.feature as cfeature  # Library for map features
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER  # Formatting for map gridline labels

# Open the GRIB2 file
file_path = 'graphcast.grib'  # Replace with your GRIB2 file path
grbs = pygrib.open(file_path)

# Set the target date and time
target_date = datetime.datetime(2023, 12, 30, 18, 0)  # Example date and time
# Find data matching the specific variable and date
for grb in grbs:
    if grb.name == "Total precipitation" and grb.validDate == target_date:
        data = grb.values  # Read the data
        lats, lons = grb.latlons()  # Get the latitudes and longitudes corresponding to the data
        break
grbs.close()

# Print out statistical information about the data
print(f'Minimum value: {data.min()}')  # Minimum value
print(f'Maximum value: {data.max()}')  # Maximum value
print(f'Mean value: {data.mean()}')  # Mean value
print(f'Median value: {np.median(data)}')  # Median value
print(f'Standard deviation: {data.std()}')  # Standard deviation


# Create the map
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(1, 1, 1, projection=ccrs.PlateCarree())
# Create color levels for the contour plot
levels = np.linspace(-0.0002, 0.12, 10)  # Create levels from a bit below the minimum to above the maximum value

# Plot the contour map using the color levels
precipitation = ax.contourf(lons, lats, data, levels=levels, transform=ccrs.PlateCarree(), cmap='viridis')

# Add map features like coastlines and borders
ax.coastlines()
ax.add_feature(cfeature.BORDERS, linestyle=':')

# Add gridlines and labels for longitude and latitude
gl = ax.gridlines(crs=ccrs.PlateCarree(), draw_labels=True)
gl.top_labels = False  # Disable top labels
gl.right_labels = False  # Disable right labels
gl.xformatter = LONGITUDE_FORMATTER  # Format for longitude
gl.yformatter = LATITUDE_FORMATTER  # Format for latitude

# Add a colorbar to explain the color encoding of precipitation levels
plt.colorbar(precipitation, ax=ax, orientation='horizontal', pad=0.05, aspect=50, label='Precipitation (m/6hr)', ticks=levels)
plt.title(f'6-Hour Accumulated Precipitation Forecast (Ending on {target_date.strftime("%Y-%m-%d %H:%M")})')

plt.show()  # Display the map

# python3.10 requirements.txt
Cartopy==0.22.0
certifi==2023.11.17
contourpy==1.2.0
cycler==0.12.1
fonttools==4.47.0
kiwisolver==1.4.5
matplotlib==3.8.2
mplcursors==0.5.2
numpy==1.26.2
packaging==23.2
pandas==2.1.4
Pillow==10.1.0
pygrib==2.1.5
pyparsing==3.1.1
pyproj==3.6.1
pyshp==2.3.1
python-dateutil==2.8.2
pytz==2023.3.post1
shapely==2.0.2
six==1.16.0
tzdata==2023.3
xarray==2023.12.0

A Python Code visualization screenshot of the precipitation data:

A Panoply visualization screenshot of the precipitation data:

My data file can be downloaded from the following Google Drive link:

I am grateful for any explanation and advice.
Attachments:
Download link for Panoply Software: https://www.giss.nasa.gov/tools/panoply/download/
Download link for the grib(6.5GB) data file :
https://drive.google.com/file/d/1JrsCXZcRBXgEQg-Xu0rd7EsvR_XLARPU/view?usp=drive_link

problem running ai-models with ecmwf open data

Hello,

I am getting following error when running

ai-models --input opendata --lead-time 24 graphcast

Traceback (most recent call last):
  File "/usr/lib/python3.11/site-packages/xarray/core/dataset.py", line 1408, in _copy_listed
    variables[name] = self._variables[name]
                      ~~~~~~~~~~~~~~~^^^^^^
KeyError: 'geopotential'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/ai-models", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 322, in main
    _main(sys.argv[1:])
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 270, in _main
    run(vars(args), unknownargs)
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 295, in run
    model.run()
  File "/usr/lib/python3.11/site-packages/ai_models_graphcast/model.py", line 226, in run
    ) = data_utils.extract_inputs_targets_forcings(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/graphcast/data_utils.py", line 354, in extract_inputs_targets_forcings
    inputs = inputs[list(input_variables)]
             ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/xarray/core/dataset.py", line 1552, in __getitem__
    return self._copy_listed(key)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/xarray/core/dataset.py", line 1410, in _copy_listed
    ref_name, var_name, var = _get_virtual_variable(
                              ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/xarray/core/dataset.py", line 207, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'geopotential'

also with fourcastnet;
ai-models --input opendata --lead-time 24 fourcastnet

2024-03-28 14:26:20,758 INFO Writing results to fourcastnet.grib.
2024-03-28 14:26:20,758 INFO Loading ./global_means.npy
2024-03-28 14:26:20,759 INFO Loading ./global_stds.npy
2024-03-28 14:26:20,759 INFO Loading surface fields from OPENDATA
2024-03-28 14:26:20,779 WARNING MARS post-processing keywords {'area'} not supported
2024-03-28 14:26:22,066 INFO Downloading <multiple>
2024-03-28 14:26:28,038 INFO Loading pressure fields from OPENDATA                                                                                                                          
2024-03-28 14:26:28,430 WARNING No index entries for param=z
2024-03-28 14:26:29,010 INFO Downloading <multiple>
2024-03-28 14:26:41,956 INFO Total time: 22 seconds.                                                                                                                                        
Traceback (most recent call last):
  File "/usr/bin/ai-models", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 322, in main
    _main(sys.argv[1:])
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 270, in _main
    run(vars(args), unknownargs)
  File "/usr/lib/python3.11/site-packages/ai_models/__main__.py", line 295, in run
    model.run()
  File "/usr/lib/python3.11/site-packages/ai_models_fourcastnet/model.py", line 159, in run
    all_fields_numpy = self.normalise(all_fields_numpy)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/ai_models_fourcastnet/model.py", line 109, in normalise
    new_data = (data - self.means) / self.stds
                ~~~~~^~~~~~~~~~~~
ValueError: operands could not be broadcast together with shapes (1,14,720,1440) (1,26,1,1)

How to use the --file option with graphcast

Dear Developers,

Thanks so much for making your ai-models wrapper available. I have graphcast running with the wrapper. Next I would like to use a local analysis created using 4DEnVar to initialise graphcast. The analysis is Global and has a resolution of about 12km. Data is available at all the required pressure levels.

Should the source file be grib or netCDF?

Do I need to regrid the analyses to 0.25 degrees resolution? I can use CDO to do this but just wondered if it is necessary.

Also, graphcast needs analyses at two time periods, 6 hours apart. What time-stamps should I put on the input file?

Apologies if this is all in the documentation. I might figure all this out by trial and error but any extra guidance is much appreciated.

Thanks

pip dependency conflicts on MacOS

Hello! In attempting to install this package on MacOS I have been running into a pip dependency conflict. Specifically, all versions of this package depend on ecmwflibs>=0.5.3 . After examining the available packages, I find that on MacOS the only version of ecmwflibs that seems to be available via pip is 0.5.0. By contrast, on a Linux machine that I have access to, there are several more versions of ecmwflibs available and so this conflict does not occur.

CDS input data not available

When downloading the assets using the CDS input, the following error message is received:

raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://cds.climate.copernicus.eu/resources/reanalysis-era5-single-levels

Is this service still available or am I missing some rights to access to the resources?

No access to services/mars

I made my own ECMWF account and when I try to retrieve data I have the following error :
ecmwfapi.api.APIException: "ecmwf.API error 1: User '{myemail}' has no access to services/mars"

{myemail} is the email I used to register (I setup the key prior to doing this)

From what I understand from :
https://confluence.ecmwf.int/display/UDOC/ecmwf.API+error+1%3A+User+has+no+access+to+services+mars+-+Web+API+FAQ
There is a differences between the full MARS client and Access MARS service.
Since I made the account myself and I am not internal to ECMWF, can I use 'ai models' or do you need an internal account ?

Unable to utilize GPU

Environment

cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.6 LTS"

pip show ai-models
Name: ai-models
Version: 0.2.5
Summary: A package to run AI weather models
Home-page: https://github.com/ecmwf-lab/ai-models
Author: European Centre for Medium-Range Weather Forecasts (ECMWF)
Author-email: [email protected]
License: Apache License Version 2.0
Location: /home/umang/mambaforge/lib/python3.10/site-packages
Requires: climetlab, ecmwflibs, entrypoints, gputil, multiurl
Required-by: ai-models-panguweather

pip show ai-models-panguweather
Name: ai-models-panguweather
Version: 0.0.2
Summary: An ai-models plugin to run PanguWeather
Home-page: https://github.com/ecmwf-lab/ai-models-panguweather
Author: European Centre for Medium-Range Weather Forecasts (ECMWF)
Author-email: [email protected]
License: Apache License Version 2.0
Location: /home/umang/mambaforge/lib/python3.10/site-packages
Requires: ai-models, GPUtil, onnx, onnxruntime
Required-by:

Model logs when run on this environment

 ai-models --input cds --date 20230907 --time 0600 --download-assets panguweather
2023-09-13 23:19:22,746 INFO Writing results to panguweather.grib.
2023-09-13 23:19:22,746 INFO Loading pressure fields from CDS
2023-09-13 23:19:24,224 INFO Loading surface fields from CDS
2023-09-13 23:19:24,328 INFO ONNXRuntime providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
2023-09-13 23:19:24,329 INFO Using device 'GPU'. The speed of inference depends greatly on the device.
2023-09-13 23:19:26.297443220 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
2023-09-13 23:20:08,008 INFO Loading ./pangu_weather_24.onnx: 43 seconds.
2023-09-13 23:20:14.753181033 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:640 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.
2023-09-13 23:20:57,365 INFO Loading ./pangu_weather_6.onnx: 49 seconds.
2023-09-13 23:20:57,365 INFO Model initialisation: 1 minute 34 seconds
2023-09-13 23:20:57,365 INFO Starting inference for 40 steps (240h).
2023-09-13 23:26:20,898 INFO Done 1 out of 40 in 5 minutes 23 seconds (6h), ETA: 3 hours 35 minutes 41 seconds.
2023-09-13 23:31:40,521 INFO Done 2 out of 40 in 5 minutes 19 seconds (12h), ETA: 3 hours 29 minutes 1 second.
2023-09-13 23:37:03,102 INFO Done 3 out of 40 in 5 minutes 22 seconds (18h), ETA: 3 hours 23 minutes 52 seconds.

On the readme , it is mentioned that when running any of the model it should takes minutes on gpu , here based on the logs , the system is configured properly to utilise gpu and has nvidia libraries installed

pip freeze | grep onnx
onnx==1.14.1
onnxruntime-gpu==1.15.1

pip freeze | grep cuda
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99

nvidia-smi output while the model was running

Wed Sep 13 23:31:48 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.199.02   Driver Version: 470.199.02   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8     9W /  70W |    105MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       966      G   /usr/lib/xorg/Xorg                 95MiB |
|    0   N/A  N/A      1040      G   /usr/bin/gnome-shell                8MiB |
+-----------------------------------------------------------------------------+

based on the nvidia-smi output it is clear that the gpu detected on this vm is not utilised
runtime check on onnx and torch in python repl
please provide debug direction to find why the logs state that GPU is utilizes even though it isn't (nvidia-smi output has 0 load)
or please clarify if this is the expected behavior and the time taken by each step is well within expected run time for a single gpu setup (tesla t4)

ZeroDivisionError Encountered When Running ai-models with Lead-time 0 in Fourcastnetv2-Small

Summary:
Encountered a ZeroDivisionError when attempting to execute the command: ai-models --input cds --date 20230110 --time 0000 --lead-time 0 fourcastnetv2-small.

Detailed Description:
Upon executing the aforementioned command, the following traceback was observed:

Traceback (most recent call last):
File "/h/username/miniconda3/envs/AI_weather/bin/ai-models", line 8, in
sys.exit(main())
File "/h/username/miniconda3/envs/AI_weather/lib/python3.10/site-packages/ai_models/main.py", line 305, in main
_main()
File "/h/username/miniconda3/envs/AI_weather/lib/python3.10/site-packages/ai_models/main.py", line 278, in _main
model.run()
File "/h/username/miniconda3/envs/AI_weather/lib/python3.10/site-packages/ai_models_fourcastnetv2/model.py", line 209, in run
with self.stepper(self.hour_steps) as stepper:
File "/h/username/miniconda3/envs/AI_weather/lib/python3.10/site-packages/ai_models/stepper.py", line 46, in exit
LOG.info("Average: %s per step.", seconds(elapsed / self.num_steps))
ZeroDivisionError: float division by zero

Expected Behavior:
The command should execute without encountering a ZeroDivisionError, even with a lead-time of 0.

Steps to Reproduce:

Execute the command ai-models --input cds --date 20230110 --time 0000 --lead-time 0 fourcastnetv2-small.
Observe the traceback provided above.

Model run fails when data fetched from cds

When running ai-models with cds input it fails.

Command:

ai-models --input cds --assets-sub-directory --lead-time 6 --date 20230515 --time 1200 --path my_out_path/earthkit/out_cds_fourcastnet_20230515_1200.grib fourcastnet

Error:

Traceback (most recent call last):
  ...
  File "git/ai-models/ai_models/__main__.py", line 233, in main
    model.run()
  File "git/ai-models-fourcastnet/ai_models_fourcastnet/model.py", line 162, in run
    all_fields_numpy = self.normalise(all_fields_numpy)
  File "ai-models-fourcastnet/ai_models_fourcastnet/model.py", line 110, in normalise
    new_data = (data - self.means) / self.stds
ValueError: operands could not be broadcast together with shapes (1,624,720,1440) (1,26,1,1)

This happens because ai-models now seems to retrieve ensemble data (eg. from stream=edmm) from the cds, so in the example above data contains 624 GRIB fields instead of the expected 26. Adding

product_type='reanalysis'

to the cds retrieval seems to fix the problem.

I want to run the graphcast by my own gribfile downloaded from ERA5,i select the needed var correctly,but i meet a bug,how to slove it?

2024-04-14 17:12:09,858 INFO Building model: 0.6 second.
2024-04-14 17:12:50,841 INFO Creating forcing variables: 40 seconds.
2024-04-14 17:13:01,958 INFO Converting GRIB to xarray: 11 seconds.
2024-04-14 17:13:13,666 INFO Reindexing: 11 seconds.
2024-04-14 17:13:14,930 INFO Creating training data: 1 minute 5 seconds.
2024-04-14 17:13:38,654 INFO Extracting input targets: 23 seconds.
2024-04-14 17:13:38,655 INFO Creating input data (total): 1 minute 28 seconds.
2024-04-14 17:13:38,655 INFO Total time: 1 minute 39 seconds.
Traceback (most recent call last):
File "/data1/home/songrj/anaconda3/envs/am/lib/python3.10/site-packages/xarray/core/dataset.py", line 1408, in _copy_listed
variables[name] = self._variables[name]
KeyError: 'geopotential_at_surface'

Precipitation forecasting from FourCastNet-v2

Since fourcastnet paper claims to do precipitation forecasting, is it possible to do using this library. I ran it once, but I didn't find any precipitation in the output file?

Invalid times between 1am - 9am

Hi,

I get the following error when trying to evaluate single digit times:

command:

ai-models --input cds --date 20180101 --time 0100 --lead-time 12 --path output/output_panguweather_12h_forecast_20180101_0100.grib panguweather

error:

2023-06-16 09:48:41,797 INFO Request is failed
2023-06-16 09:48:41,797 INFO Request is failed
2023-06-16 09:48:41,797 ERROR Message: the request you have submitted is not valid
2023-06-16 09:48:41,797 ERROR Message: the request you have submitted is not valid
2023-06-16 09:48:41,797 ERROR Reason: Mars server task finished in error; BadValue: Invalid time 100 [mars]; Error code is -2; Request failed; Some errors reported (last error -2)
2023-06-16 09:48:41,797 ERROR Reason: Mars server task finished in error; BadValue: Invalid time 100 [mars]; Error code is -2; Request failed; Some errors reported (last error -2)

2023-06-16 09:48:41,799 ERROR home.cds.cdsservices.services.mars.init.py.exceptions.MarsException: Mars server task finished in error; BadValue: Invalid time 100 [mars]; Error code is -2; Request failed; Some errors reported (last error -2)
2023-06-16 09:48:41,799 ERROR home.cds.cdsservices.services.mars.init.py.exceptions.MarsException: Mars server task finished in error; BadValue: Invalid time 100 [mars]; Error code is -2; Request failed; Some errors reported (last error -2)
Traceback (most recent call last):
File "/home/iluise/ai-models/pyenv/bin/ai-models", line 8, in
sys.exit(main())
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/ai_models/main.py", line 160, in main
model.run()
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/ai_models_panguweather/model.py", line 43, in run
fields_pl = self.fields_pl
File "/usr/lib/python3.9/functools.py", line 969, in get
val = self.func(instance)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/ai_models/model.py", line 229, in fields_pl
return self.input.fields_pl
File "/usr/lib/python3.9/functools.py", line 969, in get
val = self.func(instance)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/ai_models/model.py", line 91, in fields_pl
return cml.load_source("cds", "reanalysis-era5-pressure-levels", request)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/init.py", line 178, in load_source
src = get_source(name, *args, **kwargs)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/init.py", line 159, in call
source = klass(*args, **kwargs)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/core/init.py", line 25, in call
obj.init(*args, **kwargs)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/cds.py", line 89, in init
self.path = [self._retrieve(dataset, r) for r in requests]
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/cds.py", line 89, in
self.path = [self._retrieve(dataset, r) for r in requests]
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/cds.py", line 101, in _retrieve
return self.cache_file(
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/init.py", line 69, in cache_file
return cache_file(owner, create, args, **kwargs)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/core/caching.py", line 704, in cache_file
owner_data = create(path + ".tmp", args)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/climetlab/sources/cds.py", line 99, in retrieve
client().retrieve(args[0], args[1], target)
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/cdsapi/api.py", line 364, in retrieve
result = self._api("%s/resources/%s" % (self.url, name), request, "POST")
File "/home/iluise/ai-models/pyenv/lib/python3.9/site-packages/cdsapi/api.py", line 519, in _api
raise Exception(
Exception: the request you have submitted is not valid. Mars server task finished in error; BadValue: Invalid time 100 [mars]; Error code is -2; Request failed; Some errors reported (last error -2).

Should I use another syntax? It actually works very well for 0000 or 2 digit time steps (eg 1000, 1200 etc..)

p.s. Off-topic: Not sure if it is the right thing to do, but I managed to solve the CUDA issue by adding a torch import before importing onnx in ai_models_panguweather/model.py:

import torch
import onnxruntime as ort

Thanks in advance for the help!
Best,

Ilaria

WrongStepError while trying to save the output in GRIB File

Hi,

I am trying to run the model on the GPU instance by using the following command:

ai-models --assets assets-graphcast --input cds --date 20231123 --time 600 --only-gpu graphcast

It is able to do the inference, but it fails while trying to save the output in the GBIB file. Mentioning the error log below.
How can I fix this?

2023-12-15 07:56:41,285 INFO Doing full rollout prediction in JAX: 3 minutes 3 seconds.
<class 'xarray.core.dataset.Dataset'>
2023-12-15 07:56:41,285 INFO Converting output xarray to GRIB and saving
ECCODES ERROR   :  endStep < startStep (6 < 11)
2023-12-15 07:56:44,026 ERROR Error setting step=6
2023-12-15 07:56:44,026 INFO Saving output data: 2 seconds.
2023-12-15 07:56:44,026 INFO Total time: 3 minutes 32 seconds.

Traceback (most recent call last):
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/climetlab/readers/grib/codes.py", line 243, in set
    return eccodes.codes_set(self.handle, name, value)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/gribapi/gribapi.py", line 2121, in grib_set
    grib_set_long(msgid, key, value)
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/gribapi/gribapi.py", line 993, in grib_set_long
    GRIB_CHECK(lib.grib_set_long(h, key.encode(ENC), value))
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/gribapi/gribapi.py", line 226, in GRIB_CHECK
    errors.raise_grib_error(errid)
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/gribapi/errors.py", line 381, in raise_grib_error
    raise ERROR_MAP[errid](errid)
gribapi.errors.WrongStepError: Unable to set step

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/azureuser/anaconda3/bin/ai-models", line 33, in <module>
    sys.exit(load_entry_point('ai-models', 'console_scripts', 'ai-models')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/ai-models/ai_models/__main__.py", line 291, in main
    _main()
  File "/home/azureuser/ai-models/ai_models/__main__.py", line 264, in _main
    model.run()
  File "/home/azureuser/ai-models-graphcast/ai_models_graphcast/model.py", line 262, in run
    save_output_xarray(
  File "/home/azureuser/ai-models-graphcast/ai_models_graphcast/output.py", line 60, in save_output_xarray
    write(
  File "/home/azureuser/ai-models/ai_models/model.py", line 104, in write
    self.output.write(*args, **kwargs),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/ai-models/ai_models/outputs/__init__.py", line 36, in write
    return self.output.write(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/climetlab/readers/grib/output.py", line 141, in write
    handle.set(k, v)
  File "/home/azureuser/anaconda3/lib/python3.11/site-packages/climetlab/readers/grib/codes.py", line 246, in set
    raise ValueError("Error setting %s=%s (%s)" % (name, value, e))
ValueError: Error setting step=6 (Unable to set step)

issue with download asset with Mars access

I am trying to download asset data using Mars as shown in the readme, but it is failing.

I think it's an access permission issue, correct?
I used the following command:

https://github.com/ecmwf-lab/ai-models

Do I need a commercial license to obtain this data?
Is there another way to download asset data?

Refined grid?

This may be more a philosophical question than an issue but I was wondering whether, at the current state of research, there is any way of running the models with a refined grid, say for example 0.1 degrees instead than 0.25.
I guess the prediction can be only made on the same grid was used for the training (ERA5, 0.25) and there's no way around it right?
It would be amazing if the model could somehow downscale the information learned at those resolution to something better for limited area modeling like 1 km resolution

issue with input data from cds for graphcast

when i use " ai-models --input cds --date 20230110 --time 0000 graphcast " testing the graphcast from cds era5, I got these files in "/tmp/climetlab-user" and errors as follows. Is the model use the following .cache files to give the inference? And key='time' is related to what physical parameters (maybe the precipitation for accumulation value), should I revise download scripts the to run the inference?

files at /tmp/climetlab-user:

_- cache-2.db                                                                            grib-index-650de97dadfda5522ef6065aba6d53070b7f5474f2c88d3ecfae8603f179df6a.json
- cds-retriever-1a1d7637aaaf5616ccdf2c650bb81dd5b6613e9f032826fa57c364fbd61136c5.cache  grib-index-74269e34badef4add821af3244ea8d64cc621103c1d653d33b7896d707c62fcf.json  
- cds-retriever-30f1a29a4568ca8f99f88d3e20200b657d39a0b2a54534e9875953cc04808e8d.cache  grib-index-8a9615581ca2c8859f1fe76024ba7ace185fd86c85b22b916c2befd7cec92fae.json  
- cds-retriever-68098f31c2301f0a4c62ac25822c3a33eb28b44a8c8e22c17ba136f1067dff9e.cache  grib-index-a6fc434c8b189aa395892a5544257e762196fd7599dde47bf169ed1c9b301b12.json  
- cds-retriever-7576dbeeafe941786b728196693f1945d77a76cd427776d24eccb6dbbfa42c02.cache_

error messages:

2023-09-18 19:06:25,607 INFO Creating training dataset
2023-09-18 19:06:28,613 INFO Creating input data: 3 minutes.
2023-09-18 19:06:28,613 INFO Total time: 3 minutes 1 second.
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/ai-models/bin/ai-models", line 8, in <module>
    sys.exit(main())
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/ai_models/__main__.py", line 274, in main
    _main()
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/ai_models/__main__.py", line 247, in _main
    model.run()
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/ai_models_graphcast/model.py", line 199, in run
    training_xarray, time_deltas = create_training_xarray(
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/ai_models_graphcast/input.py", line 60, in create_training_xarray
    fields_sfc.to_xarray().rename(GRIB_TO_XARRAY_SFC).isel(number=0, surface=0)
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/climetlab/readers/grib/xarray.py", line 106, in to_xarray
    result = xr.open_dataset(
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/xarray/backends/api.py", line 570, in open_dataset
    backend_ds = backend.open_dataset(
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 108, in open_dataset
    store = CfGribDataStore(
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 40, in __init__
    self.ds = opener(filename, **backend_kwargs)
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/dataset.py", line 750, in open_fieldset
    return open_from_index(filtered_index, read_keys, time_dims, extra_coords, **kwargs)
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/dataset.py", line 726, in open_from_index
    dimensions, variables, attributes, encoding = build_dataset_components(
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/dataset.py", line 680, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/user/anaconda3/envs/ai-models/lib/python3.10/site-packages/cfgrib/dataset.py", line 611, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='time' value=Variable(dimensions=('time',), data=array([1673287200, 1673308800])) new_value=Variable(dimensions=('time',), data=array([1673244000, 1673287200]))

How to run the "GraphCast" pretrained model？

when I try to run the graphcast, the program uses “GraphCast_operational” pretrained model file by default. So how could I use the “GraphCast” pretrained model to run ？

Thank you.

I want to run ai-models on more than one GPU. Is it possible?

Hi,

We are currently running ai-models on a server with 4 gpu's and we found out that the calculations are exclusively executed by one gpu. Is there a possibility to execute a forecast with more than one gpu.

Thanks a lot

Paper for FourCastNet v2?

Hi,

I noticed that you have a FourCastNet v2 running. A web search did not uncover any related publications.
Are there further resources, such as papers or original GitHub code or pretrained pytorch weigths?
I am curious to see what the differences are between the original V1 and this V2 version.

Thanks!

Cant type or paste the URL and API key in kaggle notebook

Hi everyone!
I am very new to GraphCast and programming as well. I used your code in Google Colaboratory which worked fine. Now I am trying to run the code in Kaggle notebook. I dont know but whenever I have to input the URL and API key (as per requirement), I am unable to do it. I cant find the input box or cursor which lets me paste it. I tried multiple times but have no idea why I cant write or paste because in Google Colaboratory it worked fine.
Could someone please help me.
Thank you!

problem

I want to run the graphcast by my own gribfile downloaded from ERA5(CDS),I select the needed var correctly(maybe not but i don’t find the geopotential_at_surface sio I select geopotential on single level data),but i meet this bug,how to slove it?

Could anyone else provides a grib format example to use the --file option with graphcast？

Run out of memory when forecast much longer steps

For ai-models-graphcast, it works fine when I predict only a few time steps. However, it fails with an "out of memory" error when I try to predict over a longer lead time, such as 10 days. I have 188 GB of memory for CPU or 24 GB for GPU. Is there any solution to avoid this issue? It appears that the memory used by the model is not released after completing each step.

Thanks for your reply in advance!

pip install fails with latest version (0.5.1)

pip install ai-models is failing with the latest release (0.5.1), on fresh install.

Environment:

models@psf3sp336f92:~$ python -V
Python 3.10.14

models@psf3sp336f92:~$ pip -V
pip 24.0 from [redacted]/python3.10/site-packages/pip (python 3.10)

models@psf3sp336f92:~$ pip list
Package    Version
---------- -------
pip        24.0
setuptools 69.5.1
wheel      0.43.0

Error:

Building wheel for ai-models (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [76 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib
      creating build/lib/ai_models
      copying ai_models/checkpoint.py -> build/lib/ai_models
      copying ai_models/__init__.py -> build/lib/ai_models
      copying ai_models/model.py -> build/lib/ai_models
      copying ai_models/__main__.py -> build/lib/ai_models
      copying ai_models/stepper.py -> build/lib/ai_models
      creating build/lib/ai_models/outputs
      copying ai_models/outputs/__init__.py -> build/lib/ai_models/outputs
      creating build/lib/ai_models/inputs
      copying ai_models/inputs/__init__.py -> build/lib/ai_models/inputs
      creating build/lib/ai_models/remote
      copying ai_models/remote/config.py -> build/lib/ai_models/remote
      copying ai_models/remote/api.py -> build/lib/ai_models/remote
      copying ai_models/remote/__init__.py -> build/lib/ai_models/remote
      copying ai_models/remote/model.py -> build/lib/ai_models/remote
      running egg_info
      writing ai_models.egg-info/PKG-INFO
      writing dependency_links to ai_models.egg-info/dependency_links.txt
      writing entry points to ai_models.egg-info/entry_points.txt
      writing requirements to ai_models.egg-info/requires.txt
      writing top-level names to ai_models.egg-info/top_level.txt
      reading manifest file 'ai_models.egg-info/SOURCES.txt'
      adding license file 'LICENSE'
      writing manifest file 'ai_models.egg-info/SOURCES.txt'
      /home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
      !!

              ********************************************************************************
              Please avoid running ``setup.py`` directly.
              Instead, use pypa/build, pypa/installer or other
              standards-based tools.

              See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
              ********************************************************************************

      !!
        self.initialize_options()
      installing to build/bdist.linux-x86_64/wheel
      running install
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-ukow8kqu/ai-models_3edcd1c77e1d4c3593e46213141cac9d/setup.py", line 43, in <module>
          setuptools.setup(
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 403, in run
          self.run_command("install")
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/home/models/micromamba/envs/[redacted]/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-ukow8kqu/ai-models_3edcd1c77e1d4c3593e46213141cac9d/setup.py", line 35, in run
          from ai_models.remote.config import config_exists, create_config
        File "/tmp/pip-install-ukow8kqu/ai-models_3edcd1c77e1d4c3593e46213141cac9d/ai_models/remote/__init__.py", line 1, in <module>
          from .api import RemoteAPI
        File "/tmp/pip-install-ukow8kqu/ai-models_3edcd1c77e1d4c3593e46213141cac9d/ai_models/remote/api.py", line 7, in <module>
          import requests
      ModuleNotFoundError: No module named 'requests'

The issue appears to be the new ai_models remote api. setuptools had

cmdclass={
        "install": PostInstall,
    },

added, but when that is run, requests, multiurl, and climetlab have not been fully installed causing the install to fail with the above error.

Two current workarounds:

pip install requests multiurl climetlab then pip install ai-models
pip install ai-models==0.4.3 then pip install ai-models --upgrade

issue with downloading assets

running ai-models --download-assets panguweather breaks during the download with

urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(326220731 bytes read, 855490456 more expected)', IncompleteRead(326220731 bytes read, 855490456 more expected))

how to change the ecmwf account name after initial setup

I have put in a public account which doesn't have full access to Mars archive data during the initial setup of the ai-models through ai-models --downloa-dassets panguweather. I am now have a commerical license which I want to switch to but don't know how to switch to an new account.

Cannot install on macOS

I created a conda env with Python 3.10.0 on my macbook (macOS 11.6.2 (20G314), Darwin 20.6.0) but cannot install ai-models into it. This is the error I get:

pip install ai-models
Collecting ai-models
  Using cached ai-models-0.1.0.tar.gz (12 kB)
  Preparing metadata (setup.py) ... done
Collecting entrypoints
  Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB)
Collecting climetlab>=0.15.0
  Using cached climetlab-0.15.0.tar.gz (179 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting multiurl
  Using cached multiurl-0.2.1-py3-none-any.whl
Collecting ai-models
  Using cached ai-models-0.0.3.tar.gz (12 kB)
  Preparing metadata (setup.py) ... done
ERROR: Cannot install ai-models==0.0.3 and ai-models==0.1.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    ai-models 0.1.0 depends on ecmwflibs>=0.5.3
    ai-models 0.0.3 depends on ecmwflibs>=0.5.3

Can't get MARS access.

I am trying to retrieve data through MARS. Here's my experience so far:

I visited this page: https://confluence.ecmwf.int/display/WEBAPI/Access+MARS
They suggest downloading the "mars" script. However, the link to this script is broken (https://confluence.ecmwf.int/download/attachments/56664858/mars).
I tried to report the error on this page: https://www.ecmwf.int/en/support and went to the "Software and computing" link, but received a "not permitted" error.
I'm writing here because I don't know where else to turn. :) By the way, CDS works like a charm...

[GraphCast Operational Model] Issue with Negative Precipitation Data in GraphCast Operational Model Output

Environment

Operating System: Ubuntu Server 20.04.6 LTS
Memory: 256GB
GPU: NVIDIA A100-PCle-80GB
Python Version: 3.10 (Conda Environment)
Storage: 500GB SSD
Command: ai-models --input cds --download-assets --date 20231229 --time 0000 graphcast
Download link for the grib(6.5GB) data file :
https://drive.google.com/file/d/1JrsCXZcRBXgEQg-Xu0rd7EsvR_XLARPU/view?usp=drive_link
Data Visualization: #27

Issue Description

I am encountering an unexpected issue while running the GraphCast Operational model. The GRIB file output contains negative values for precipitation, which is not a typical behavior expected from meteorological data. I have validated the input data, and everything seems to be in order, which makes these negative values particularly puzzling.

Expected Behavior

Normally, precipitation values in GRIB files should range from zero to positive values, indicating the amount of precipitation.

Actual Behavior

The output GRIB file contains negative numbers for precipitation, which is inconsistent with typical meteorological data standards.

Additional Context

This issue has been consistent across multiple runs of the model.
I've ensured that the model's latest version is being used.
No modifications have been made to the model or its code; I have simply run the command line as per the official examples provided.

Runtime Log

(graph) root@83c79cf9b834:~/sspaas-tmp# ai-models --input cds --download-assets  --date 20231229 --time 0000 graphcast
2024-01-23 06:40:14,809 INFO Writing results to graphcast.grib.
/opt/conda/envs/graph/lib/python3.10/site-packages/ecmwflibs/__init__.py:83: UserWarning: /lib/x86_64-linux-gnu/libgobject-2.0.so.0: undefined symbol: ffi_type_uint32, version LIBFFI_BASE_7.0
  warnings.warn(str(e))
2024-01-23 06:40:15,170 INFO Model description: 
GraphCast model at 0.25deg resolution, with 13 pressure levels. This model is
trained on ERA5 data from 1979 to 2017, and fine-tuned on HRES-fc0 data from
2016 to 2021 and can be causally evaluated on 2022 and later years. This model
does not take `total_precipitation_6hr` as inputs and can make predictions in an
operational setting (i.e., initialised from HRES-fc0).

2024-01-23 06:40:15,170 INFO Model license: 
The model weights are licensed under the Creative Commons
Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0). You
may obtain a copy of the License at:
https://creativecommons.org/licenses/by-nc-sa/4.0/.
The weights were trained on ERA5 data, see README for attribution statement.

2024-01-23 06:40:15,170 INFO Loading params/GraphCast_operational - ERA5-HRES 1979-2021 - resolution 0.25 - pressure levels 13 - mesh 2to6 - precipitation output only.npz: 0.4 second.
2024-01-23 06:40:15,170 INFO Building model: 0.4 second.
2024-01-23 06:40:15,170 INFO Loading surface fields from CDS
2024-01-23 06:40:15,303 INFO Loading pressure fields from CDS
                                                                                                                                              
2024-01-23 06:40:28,533 INFO Creating forcing variables: 13 seconds.
2024-01-23 06:40:33,124 INFO Converting GRIB to xarray: 4 seconds.
2024-01-23 06:40:36,937 INFO Reindexing: 3 seconds.
2024-01-23 06:40:36,963 INFO Creating training data: 21 seconds.
2024-01-23 06:40:44,200 INFO Extracting input targets: 7 seconds.
2024-01-23 06:40:44,200 INFO Creating input data (total): 29 seconds.
2024-01-23 06:40:44,405 INFO Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA
2024-01-23 06:40:44,410 INFO Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so: libtpu.so: cannot open shared object file: No such file or directory
/opt/conda/envs/graph/lib/python3.10/site-packages/graphcast/autoregressive.py:202: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  scan_length = targets_template.dims['time']
/opt/conda/envs/graph/lib/python3.10/site-packages/graphcast/autoregressive.py:115: FutureWarning: The return type of `Dataset.dims` will be changed to return a set of dimension names in future, in order to be more consistent with `DataArray.dims`. To access a mapping from dimension names to lengths, please use `Dataset.sizes`.
  num_inputs = inputs.dims['time']
2024-01-23 06:41:49.862443: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 1s:

  %pad.149 = bf16[3114720,8]{1,0} pad(bf16[3114720,4]{1,0} %constant.320, bf16[] %constant.725), padding=0_0x0_4, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/while/body/remat/mesh2grid_gnn/_embed/mesh2grid_gnn/sequential/encoder_edges_mesh2grid_mlp/linear_0/dot_general[dimension_numbers=(((2,), (0,)), ((), ())) precision=None preferred_element_type=bfloat16]" source_file="/opt/conda/envs/graph/bin/ai-models" source_line=8}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-01-23 06:41:56.031038: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 7.168667185s
Constant folding an instruction is taking > 1s:

  %pad.149 = bf16[3114720,8]{1,0} pad(bf16[3114720,4]{1,0} %constant.320, bf16[] %constant.725), padding=0_0x0_4, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/while/body/remat/mesh2grid_gnn/_embed/mesh2grid_gnn/sequential/encoder_edges_mesh2grid_mlp/linear_0/dot_general[dimension_numbers=(((2,), (0,)), ((), ())) precision=None preferred_element_type=bfloat16]" source_file="/opt/conda/envs/graph/bin/ai-models" source_line=8}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-01-23 06:41:58.352044: E external/xla/xla/service/slow_operation_alarm.cc:65] Constant folding an instruction is taking > 2s:

  %pad.1 = bf16[1618752,8]{1,0} pad(bf16[1618745,4]{1,0} %constant.327, bf16[] %constant.643), padding=0_7x0_4, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/while/body/remat/grid2mesh_gnn/_embed/grid2mesh_gnn/sequential/encoder_edges_grid2mesh_mlp/linear_0/dot_general[dimension_numbers=(((2,), (0,)), ((), ())) precision=None preferred_element_type=bfloat16]" source_file="/opt/conda/envs/graph/bin/ai-models" source_line=8}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-01-23 06:41:59.895336: E external/xla/xla/service/slow_operation_alarm.cc:133] The operation took 3.543396776s
Constant folding an instruction is taking > 2s:

  %pad.1 = bf16[1618752,8]{1,0} pad(bf16[1618745,4]{1,0} %constant.327, bf16[] %constant.643), padding=0_7x0_4, metadata={op_name="jit(<unnamed wrapped function>)/jit(main)/while/body/remat/grid2mesh_gnn/_embed/grid2mesh_gnn/sequential/encoder_edges_grid2mesh_mlp/linear_0/dot_general[dimension_numbers=(((2,), (0,)), ((), ())) precision=None preferred_element_type=bfloat16]" source_file="/opt/conda/envs/graph/bin/ai-models" source_line=8}

This isn't necessarily a bug; constant-folding is inherently a trade-off between compilation time and speed at runtime. XLA has some guards that attempt to keep constant folding from taking too long, but fundamentally you'll always be able to come up with an input program that takes a long time.

If you'd like to file a bug, run with envvar XLA_FLAGS=--xla_dump_to=/tmp/foo and attach the results.
2024-01-23 06:42:13,257 INFO Doing full rollout prediction in JAX: 1 minute 29 seconds.
2024-01-23 06:42:13,258 INFO Converting output xarray to GRIB and saving
2024-01-23 06:44:28,391 INFO Saving output data: 2 minutes 15 seconds.
2024-01-23 06:44:28,494 INFO Total time: 4 minutes 14 seconds.

(graph) root@83c79cf9b834:~/sspaas-tmp# du -lh graphcast.grib 
6.5G    graphcast.grib

Given the nature of this anomaly, I would appreciate any insights or suggestions on why this might be happening and how to rectify it. Please let me know if additional information is needed to diagnose the issue further.

Thank you for your time and assistance.

Best regards

how to set the time resolution to 3h or 1h?

The time resolution of AI models are 6h default. How to set the time resolution to 3h or 1h?
Thank you.

Are those ai-models only run on Linux/Macs?

I am trying to install those on a windows machine but it gave me conflicting packages message. Specifically, those ai-models depends on ecmwflibs>=0.5.3 but it is not available for python on windows. So my question is are those models only run on Linux/MacOS machines?

Datasets have too many fields

I have made it nearly to the finish line, but cannot quite get all the needed files. This is the point at which I cannot progress:

The step is:
ai-models --date 20230905 --time 0000 --input cds --download-assets fourcastnet

The error message I get is:
File "/home/ec2-user/.local/lib/python3.9/site-packages/ai_models_fourcastnet/model.py", line 109, in normalise
new_data = (data - self.means) / self.stds
ValueError: operands could not be broadcast together with shapes (1,338,720,1440) (1,26,1,1)

I believe that when retrieving the data from CDS, the code is getting every field (338 of them), not just the 26 that are needed for FourCastNet

Commercial use

Hi there,
Could you please clarify the situation regarding use of the output from the various systems for commercial use?
All system but FourCastNet V1 seem to be under an Apache 2.0 licence which allows commercial use - but there is a conflicting note for Pangu Weather which says "The commercial use of these models is forbidden." (https://github.com/ecmwf-lab/ai-models-panguweather/tree/main).
Thanks
Dan Harding

for a given date, retrieving 00z analysis from MARS using the ECMWF WebAPI ('--date & --time option')

From the README seems that it is possible to use a specific date (option '--date YYYYMMDD') and analysis time (option '--time HHMM') to retrieve data from MARS (ECMWF WebAPI).

For instance, for a given date YYYYMMDD, I use the following line to generate forecasts starting form the analysis time HHMM:

   ai-models --date YYYYMMDD --time HHMM --lead-time NN fourcastnet

When 'HHMM=1200' the MARS retrieval (surface and pressure variables) is consistent with the DATE/TIME that I request, but if I change to 'HHMM=0000', the dataset retrieved from MARS is always for the 12z analysis.

Below I copied the printed lines I got when I request the 00z analysis ('HHMM=0000'). TIME is equal to 1200 and not to 0000 as it should be.

RETRIEVE,
CLASS = OD,
TYPE = AN,
STREAM = OPER,
EXPVER = 0001,
REPRES = GG,
LEVTYPE = SFC,
PARAM = 165/166/167/134/151/137/246.228/247.228,
TIME = 1200,
STEP = 00,
DOMAIN = G,
RESOL = AUTO,
AREA = 90.0/0.0/-90.0/359.75,
GRID = 0.25/0.25,
PADDING = 0,
DATE = 20230914

do a google colab version

ERA5-HRES Model Humidity Data Displaying as Zero — Seeking Insights and Clarification on Model Naming

Issue Summary:
While analyzing meteorological data with the GraphCast_operational - ERA5-HRES 1979-2021 model, I encountered an issue with the humidity data displaying as zero. The model has a resolution of 0.25, covers 13 pressure levels, and utilizes a 2to6 mesh grid system, focusing its output on major surface variables such as temperature, humidity, wind speed, wind direction, and mean sea level pressure.

Detailed Description:
Global historical meteorological data was obtained through the cds api, and after inference using the model, it generated a 6.5GB grib file. When examining these data with NASA's Panoply software, all variables except for humidity displayed normally. Puzzlingly, the humidity values were uniformly shown as zero, which is significantly different from what was expected.

As a non-professional individual deeply interested in the analysis of meteorological data, I am concerned that there might be a misunderstanding in my interpretation of the data or an unknown issue encountered during the data processing.

Therefore, I urgently require assistance from professionals to explain why the humidity data is abnormally displaying as zero, and whether this suggests an error in the model's output or the data conversion process.

Additionally, I am curious about the model naming "mesh2 to 6" and "precipitation output only.npz". Does this imply that the model is primarily used for predicting precipitation? I wish to learn more about the background of the model's naming and design focus.

Attachments:
Download link for Panoply Software: https://www.giss.nasa.gov/tools/panoply/download/
Download link for the grib(6.5GB) data file :
https://drive.google.com/file/d/1yVgTWT1DJNewRepib4qkOLbBz0HGIARY/view?usp=drive_link
Screenshot of humidity data observed in Panoply software
Download link for the grib data file
I would greatly appreciate it if professionals or those knowledgeable in this area could provide answers and assistance. Thank you!

requests.exceptions.HTTPError: 404 Client Error: ai-models/fuxi/meidum for url: https://get.ecmwf.int/repository/test-data/ai-models/fuxi/meidum

Hello, When I use fuxi to run, there is always an error when downloading the medium model from ecmwf. Finally, I found that in your ai-models-fuxi, medium in the model.py file is written as meidum. I hope you can change it as soon as possible.

[Graphcast] 10u, 10v, total_precipitation_6hr variables are not included in output GRIB file

Looks like three surface variables (10u,10v and total_precipitation_6hr) are silently dropped when the output xarray is written to grib file.

Is this expected?

Issue with Mars service access

Hello,
I have download all ai-models and their assets, but when I wan't to do some inference, I got this API exception:
" ecmwfapi.api.APIException: "ecmwf.API error 1: User '[myadressmail]' has no access to services/mars"

However, on the ECMWF web site I see that public members have access to Mars services? Any advice to avoid that?

Thank you

ecmwf-lab / ai-models Goto Github PK

ai-models's People

Contributors

Stargazers

Watchers

Forkers

ai-models's Issues

Environment

Model logs when run on this environment

Environment

Issue Description

Expected Behavior

Actual Behavior

Additional Context

Runtime Log

Recommend Projects

Recommend Topics

Recommend Org