Giter Site home page Giter Site logo

Comments (11)

tom-andersson avatar tom-andersson commented on July 30, 2024

@bryandunn614, the ./download_era5_data_in_parallel.sh runs the download processes in the background (note the & commands at the end of each line in the script). They should be running, which you can sanity check with ps and top in the command line (assuming you are now running in the Windows Subsystem for Linux). Note that this outputs logs - check for example cat logs/era5_download_logs/tas.txt. What do you see? Has the ERA5 data downloaded to the data/obs/ folder?

Are you sure you need the CMIP6 data for your project? It is very large and doesn't lead to a substantial performance boost (see our paper).

Good luck with the SEAS5 access request. You can also use the SEAS5 performance metrics that we computed by downloading the paper generated data. This would save you a lot of time.

from icenet-paper.

tom-andersson avatar tom-andersson commented on July 30, 2024

FYI I have updated the README to make it more clear that the parallel bash scripts run python processes in the background, and state the log file folder paths: 4f5fb27

from icenet-paper.

 avatar commented on July 30, 2024

Hello @tom-andersson and @JimCircadian :
Thank you for your reply,I am busy with my final last couple of weeks.Now start working on this project.However there are two issues with data downloading which I can't solve now.
When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:

Rotating wind data in data/cmip6/EC-Earth3/r2i1p1f1
Traceback (most recent call last):
File "icenet/rotate_wind_data.py", line 91, in
wind_cubes[var] = iris.load_cube(EASE_path)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 387, in load_cube
cubes = _load_collection(uris, constraints, callback).cubes()
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 325, in _load_collection
result = iris.cube._CubeFilterCollection.from_cubes(cubes, constraints)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/cube.py", line 157, in from_cubes
for cube in cubes:
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 312, in _generate_cubes
for cube in iris.io.load_files(part_names, callback, constraints):
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 193, in load_files
all_file_paths = expand_filespecs(filenames)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 176, in expand_filespecs
raise IOError(msg)
OSError: One or more of the files specified did not exist:
"/mnt/e/icenet-paper/data/cmip6/EC-Earth3/r2i1p1f1/uas_EASE_cmpr.nc" didn't match any files

I have checked all data/cmip6 subfiles every folder just contains a single siconca_latlon.nc file,no uas_EASE_cmpr.nc is generated

I think because download_cmip6_data.py doesn't generate or download uas_EASE_cmpr.nc correctly.All I modify to the project code is just comment out line 310 of download_cmip6_data.py(query['data_node'] = data_node),referring to another issue raised by @xinaesthete

Thank you for your help

from icenet-paper.

 avatar commented on July 30, 2024

nco should be installed to run the download_seas5_forecasts.py successfully
When I run the download_seas5_forecasts.py it keeps reporting that:
Regridding to EASE... /bin/sh: 1: ncatted: not found
ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument

I tried to find what's wrong with the code,but everything seems to be fine.So I just search on chatgpt
conda install -c conda-forge nco
should be installed on icenet environment to successfully run the command
but not installed before

from icenet-paper.

tom-andersson avatar tom-andersson commented on July 30, 2024

Hi @bryandunn614.

When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:

The issue with downloading MRI-ESM2.0 data not returning files has now been fixed (#4). Can you try git pulling and rerunning the MRI-ESM2.0 download (assuming your laptop has space) and check that the log files show uas/vas wind data downloading correctly? You can re-open if you find a bug in the code.

from icenet-paper.

tom-andersson avatar tom-andersson commented on July 30, 2024

Regarding the download_seas5_forecasts.py issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.

I ran python3 icenet/download_seas5_forecasts.py --leadtime 1 locally on my laptop and I was able to download the data to latlon/ successfully but then the process was killed during regridding because my laptop doesn't have enough memory. I then manually reduced the size of seas5_leadtime1_latlon.nc by selecting just a year of data and confirmed that the regridding worked.

I did look into using dask to regrid the SEAS5 forecasts in chunks and avoid memory issues, but it looks like iris doesn't support regridding with dask: https://scitools-iris.readthedocs.io/en/latest/userguide/real_and_lazy_data.html#:~:text=lazy%20evaluation.-,Certain%20operations,-%2C%20including%20regridding%20and

There will be a workaround to avoid a large memory footprint during regridding (e.g. by looping over the data in years) but I don't have time to implement this unfortunately. Your best bet will be to move to a HPC where memory is not a constraint.

from icenet-paper.

 avatar commented on July 30, 2024

Hello @tom-andersson :
I don't think this work on my end,I run the download_seas5_forecasts.sh on my cloud server,it seems the procedure get stuck in the regridding and finally with memory of 190GB filled up and crashes.Herer is the message I get:

Regridding to EASE... ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument

nqstat_anu 90554322
                                %CPU  WallTime  Time Lim     RSS    mem memlim cpus
 90554322 R cd8380 zv32  job2.sh  74  01:04:49  10:00:00  176GB  189GB  190GB     1

I have checked the disk space,it finished downloading and did not generate anything on the disk space
but still running

I don't think it is normal that single ./download_seas5_forecasts_in_parallel.sh can take up 190GB memory space

from icenet-paper.

tom-andersson avatar tom-andersson commented on July 30, 2024

Hi @bryandunn614

190 GB sounds correct given what is happening computationally. For each of the 6 lead times, the SEAS5 lat/lon forecast is 1440x360x240x25 which is roughly 13 GB when loaded uncompressed into memory. I ran python3 icenet/download_seas5_forecasts.py --leadtime 1 on my HPC and monitored the memory usage and it went up to around 80 GB. So if running all 1-6 monthly lead times in parallel with the bash script, and if multiple downloads complete at the same time leading to multiple regriddings happening in parallel, you could easily end up using 100s of GB of memory.

You can modify the SEAS5 parallel download script to run the commands in sequence rather than in parallel by changing the & to &&.

from icenet-paper.

 avatar commented on July 30, 2024

Thank you @tom-andersson for your reply,
another issue is also related with data download,I submit the download_cmip6_data_in_parallel.sh to the cloud server to run.
But none of them download the complete dataset,EC_r14i1p1f1 and EC_r12i1p1f1 even report the error,the error messages are same which is:

`siconca: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:17s... compressing & saving... done in 2.0m:56s... Done.

tas: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:18s... compressing & saving... done in 1.0m:8s... Done.


ta: searching ESGF... found 0 files. 500.0 hPa, loading metadata... Traceback (most recent call last):
  File "icenet/download_cmip6_data.py", line 334, in <module>
    cmip6_da = xr.open_mfdataset(results, combine='by_coords', chunks={'time': '499MB'})[variable_id]
  File "/**_(my clodd server path)_**/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/backends/api.py", line 921, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open
` 

I am wondering whether it is caused by run 5 programs parallelly(I only run 5 EC-Earth3 download commands) caused running out of memory(190GB memory ) which I checked is not this kind of situation or this is because r14i1p1f1 and r12i1p1f1 should be changed to download data successfully.This program just used up my single node usage time (10 hours) and did not report failed.The download speed should be fast,because I just connect to the supercomputer server node based in Australia to run the command.

Thank your for help!

from icenet-paper.

tom-andersson avatar tom-andersson commented on July 30, 2024

Hi @bryandunn614, I can't reproduce your error:

> python3 icenet/download_cmip6_data.py  --source_id EC-Earth3 --member_id r14i1p1f1

Downloading data for EC-Earth3, r14i1p1f1
...
ta: searching ESGF... found historical, found ssp245, found 251 files. 500.0 hPa, loading metadata... downloading with xarray... 

It's possible there was a connection issue during the download, either from the ESGF data node or your HPC. Did you try running the download for the missing variables again? icenet/download_cmip6_data.py doesn't let you select specific variables but you can easily edit it to do this.

from icenet-paper.

 avatar commented on July 30, 2024

Hello @tom-andersson I don't think this issue can be ignored

download_seas5_forecasts.py issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.

If I just ignored the issue I will have a error when runnin the biascorrect_seas5_forecasts.py command:
Traceback (most recent call last): File "icenet/biascorrect_seas5_forecasts.py", line 63, in <module> [da.mean('number') for da in seas5_forecast_da_list], 'leadtime') File "/scratch/zv32/cd8380/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/core/concat.py", line 174, in concat raise ValueError("must supply at least one object to concatenate") ValueError: must supply at least one object to concatenate

I have successfully run the above commands,but ran into this problem.

from icenet-paper.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.