Comments (11)
@bryandunn614, the ./download_era5_data_in_parallel.sh
runs the download processes in the background (note the &
commands at the end of each line in the script). They should be running, which you can sanity check with ps
and top
in the command line (assuming you are now running in the Windows Subsystem for Linux). Note that this outputs logs - check for example cat logs/era5_download_logs/tas.txt
. What do you see? Has the ERA5 data downloaded to the data/obs/
folder?
Are you sure you need the CMIP6 data for your project? It is very large and doesn't lead to a substantial performance boost (see our paper).
Good luck with the SEAS5 access request. You can also use the SEAS5 performance metrics that we computed by downloading the paper generated data. This would save you a lot of time.
from icenet-paper.
FYI I have updated the README to make it more clear that the parallel bash scripts run python processes in the background, and state the log file folder paths: 4f5fb27
from icenet-paper.
Hello @tom-andersson and @JimCircadian :
Thank you for your reply,I am busy with my final last couple of weeks.Now start working on this project.However there are two issues with data downloading which I can't solve now.
When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:
Rotating wind data in data/cmip6/EC-Earth3/r2i1p1f1
Traceback (most recent call last):
File "icenet/rotate_wind_data.py", line 91, in
wind_cubes[var] = iris.load_cube(EASE_path)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 387, in load_cube
cubes = _load_collection(uris, constraints, callback).cubes()
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 325, in _load_collection
result = iris.cube._CubeFilterCollection.from_cubes(cubes, constraints)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/cube.py", line 157, in from_cubes
for cube in cubes:
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/init.py", line 312, in _generate_cubes
for cube in iris.io.load_files(part_names, callback, constraints):
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 193, in load_files
all_file_paths = expand_filespecs(filenames)
File "/home/bryan/anaconda3/envs/icenet/lib/python3.7/site-packages/iris/io/init.py", line 176, in expand_filespecs
raise IOError(msg)
OSError: One or more of the files specified did not exist:
"/mnt/e/icenet-paper/data/cmip6/EC-Earth3/r2i1p1f1/uas_EASE_cmpr.nc" didn't match any files
I have checked all data/cmip6 subfiles every folder just contains a single siconca_latlon.nc file,no uas_EASE_cmpr.nc is generated
I think because download_cmip6_data.py doesn't generate or download uas_EASE_cmpr.nc correctly.All I modify to the project code is just comment out line 310 of download_cmip6_data.py(query['data_node'] = data_node),referring to another issue raised by @xinaesthete
Thank you for your help
from icenet-paper.
nco should be installed to run the download_seas5_forecasts.py successfully
When I run the download_seas5_forecasts.py it keeps reporting that:
Regridding to EASE... /bin/sh: 1: ncatted: not found
ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument
I tried to find what's wrong with the code,but everything seems to be fine.So I just search on chatgpt
conda install -c conda-forge nco
should be installed on icenet environment to successfully run the command
but not installed before
from icenet-paper.
Hi @bryandunn614.
When running the command ./rotate_wind_data_in_parallel.sh,all log file report error that there is a missing file uas_EASE_cmpr.nc,just like below:
The issue with downloading MRI-ESM2.0 data not returning files has now been fixed (#4). Can you try git pull
ing and rerunning the MRI-ESM2.0 download (assuming your laptop has space) and check that the log files show uas
/vas
wind data downloading correctly? You can re-open if you find a bug in the code.
from icenet-paper.
Regarding the download_seas5_forecasts.py
issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.
I ran python3 icenet/download_seas5_forecasts.py --leadtime 1
locally on my laptop and I was able to download the data to latlon/
successfully but then the process was killed during regridding because my laptop doesn't have enough memory. I then manually reduced the size of seas5_leadtime1_latlon.nc
by selecting just a year of data and confirmed that the regridding worked.
I did look into using dask
to regrid the SEAS5 forecasts in chunks and avoid memory issues, but it looks like iris
doesn't support regridding with dask
: https://scitools-iris.readthedocs.io/en/latest/userguide/real_and_lazy_data.html#:~:text=lazy%20evaluation.-,Certain%20operations,-%2C%20including%20regridding%20and
There will be a workaround to avoid a large memory footprint during regridding (e.g. by looping over the data in years) but I don't have time to implement this unfortunately. Your best bet will be to move to a HPC where memory is not a constraint.
from icenet-paper.
Hello @tom-andersson :
I don't think this work on my end,I run the download_seas5_forecasts.sh on my cloud server,it seems the procedure get stuck in the regridding and finally with memory of 190GB filled up and crashes.Herer is the message I get:
Regridding to EASE... ut_scale(): NULL factor argument
ut_are_convertible(): NULL unit argument
nqstat_anu 90554322
%CPU WallTime Time Lim RSS mem memlim cpus
90554322 R cd8380 zv32 job2.sh 74 01:04:49 10:00:00 176GB 189GB 190GB 1
I have checked the disk space,it finished downloading and did not generate anything on the disk space
but still running
I don't think it is normal that single ./download_seas5_forecasts_in_parallel.sh can take up 190GB memory space
from icenet-paper.
Hi @bryandunn614
190 GB sounds correct given what is happening computationally. For each of the 6 lead times, the SEAS5 lat/lon forecast is 1440x360x240x25 which is roughly 13 GB when loaded uncompressed into memory. I ran python3 icenet/download_seas5_forecasts.py --leadtime 1
on my HPC and monitored the memory usage and it went up to around 80 GB. So if running all 1-6 monthly lead times in parallel with the bash script, and if multiple downloads complete at the same time leading to multiple regriddings happening in parallel, you could easily end up using 100s of GB of memory.
You can modify the SEAS5 parallel download script to run the commands in sequence rather than in parallel by changing the &
to &&
.
from icenet-paper.
Thank you @tom-andersson for your reply,
another issue is also related with data download,I submit the download_cmip6_data_in_parallel.sh to the cloud server to run.
But none of them download the complete dataset,EC_r14i1p1f1 and EC_r12i1p1f1 even report the error,the error messages are same which is:
`siconca: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:17s... compressing & saving... done in 2.0m:56s... Done.
tas: searching ESGF... found historical, found ssp245, found 251 files. loading metadata... downloading with xarray... saving to regrid in iris... regridding... done in 0.0m:18s... compressing & saving... done in 1.0m:8s... Done.
ta: searching ESGF... found 0 files. 500.0 hPa, loading metadata... Traceback (most recent call last):
File "icenet/download_cmip6_data.py", line 334, in <module>
cmip6_da = xr.open_mfdataset(results, combine='by_coords', chunks={'time': '499MB'})[variable_id]
File "/**_(my clodd server path)_**/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/backends/api.py", line 921, in open_mfdataset
raise OSError("no files to open")
OSError: no files to open
`
I am wondering whether it is caused by run 5 programs parallelly(I only run 5 EC-Earth3 download commands) caused running out of memory(190GB memory ) which I checked is not this kind of situation or this is because r14i1p1f1 and r12i1p1f1 should be changed to download data successfully.This program just used up my single node usage time (10 hours) and did not report failed.The download speed should be fast,because I just connect to the supercomputer server node based in Australia to run the command.
Thank your for help!
from icenet-paper.
Hi @bryandunn614, I can't reproduce your error:
> python3 icenet/download_cmip6_data.py --source_id EC-Earth3 --member_id r14i1p1f1
Downloading data for EC-Earth3, r14i1p1f1
...
ta: searching ESGF... found historical, found ssp245, found 251 files. 500.0 hPa, loading metadata... downloading with xarray...
It's possible there was a connection issue during the download, either from the ESGF data node or your HPC. Did you try running the download for the missing variables again? icenet/download_cmip6_data.py
doesn't let you select specific variables but you can easily edit it to do this.
from icenet-paper.
Hello @tom-andersson I don't think this issue can be ignored
download_seas5_forecasts.py
issue you mentioned, these are just warnings about missing metadata in the NetCDF file and can be ignored.
If I just ignored the issue I will have a error when runnin the biascorrect_seas5_forecasts.py command:
Traceback (most recent call last): File "icenet/biascorrect_seas5_forecasts.py", line 63, in <module> [da.mean('number') for da in seas5_forecast_da_list], 'leadtime') File "/scratch/zv32/cd8380/mambaforge/envs/icenet/lib/python3.7/site-packages/xarray/core/concat.py", line 174, in concat raise ValueError("must supply at least one object to concatenate") ValueError: must supply at least one object to concatenate
I have successfully run the above commands,but ran into this problem.
from icenet-paper.
Related Issues (8)
- Reproducing the forecasts HOT 1
- question about models.py HOT 1
- Issues with downloading cmip6 MRI-ESM2-0 data HOT 10
- Encountered problems while solving: - nothing provides requested tensorflow ==2.2.0 gpu_py37h1a511ff_0 HOT 6
- Why python3 icenet/download_cmip6_data.py takes so much memory space to run HOT 2
- ./download_seas5_forecasts_in_parallel.sh laton folder is empty HOT 1
- Missing file sea_ice_outlook_errors.csv HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from icenet-paper.