SNAP Coordinated Stratospheric Nudging Experiments
SNAP project repo: https://github.com/aph42/snap
SNAP Coordinated Stratospheric Nudging Experiments
License: BSD 2-Clause "Simplified" License
SNAP Coordinated Stratospheric Nudging Experiments
SNAP project repo: https://github.com/aph42/snap
Data received from the UKMO has been run with the model GloSea6 however there is only GloSea5 currently listed in the snapsi_cv.json
It is suggested that we add a source_id for GloSea6 to the CV table:
"GloSea6": {
"activity_participation": "SNAPSI",
"cohort": "Registered",
"institution_id": "UKMO",
"source_id": "GloSea6",
"source": "GloSea6 (description)"
},
I suggest we use the same pattern as the CMIP6 and CCMI-2022 archives
e.g.
/snap/data/<mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable_id>/<grid_label>/
For ccmi-2022 we're using the date that data is uploaded to CEDA for the version number and we're asking users to create a version directory for this, so data uploaded on 26th March 2021 would have vv20210326
Peter requested a group work space for snapsi but I have not got further than my initial enquiries with respect to setting it up.
I think it best that I hand this task back to the snapsi team.
Instructions for requesting project resources (e.g. a GWS) on jasmin can be found here:
https://help.jasmin.ac.uk/article/5022-requesting-resources
I noticed that CNRM-CM61 files I downloaded through CEDA all had bad time coordinates. Specifically, trying to open them with a vanilla xarray open_dataset
call results in an OverflowError
in both pandas and cftime (and a resultant ValueError
message from xarray about being unable to decode time units).
Opening the files with xr.open_dataset(..., decode_times=False)
shows that the time coordinate array is filled with equal values of 9.96921e+36 (doing a diff across the array returns an array of zeros).
In [1]: import xarray as xr
In [2]: ds = xr.open_dataset('ua_6hrPt_CNRM-CM61_control_s20180125-r10i1p1f1_gr_20180125-20180310.nc', decode_times=False)
In [3]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
* time (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36
In [4]: ds = xr.open_dataset('zg_6hrPt_CNRM-CM61_nudged_s20191001-r10i1p1f1_gr_20191001-20191114.nc', decode_times=False)
In [5]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
* time (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36
I have not checked whether all the CNRM-CM61 files exhibit this issue, just a handful of random cases (differing variables, experiments, and initializations).
I have not encountered this problem with the other model datasets currently available (GRIMs and GEM-NEMO); the files that I've tested for these have resulted in valid time arrays.
I have been processing some of the basic fields (ua, va, ta, wap, zg) for the group-SNAPSI effort. In doing so, I've run into a couple of corrupted netcdf files that give HDF errors when trying to read them:
Meteo-France/CNRM-CM61/nudged/s20191001/r7i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r7i1p1f1_gr_20191001-20191114.nc
Meteo-France/CNRM-CM61/nudged/s20191001/r8i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r8i1p1f1_gr_20191001-20191114.nc
UKMO/GloSea6/control-full/s20180125/r10i1p1f1/6hrPt/wap/gn/v20230403/wap_6hrPt_GloSea6_control-full_s20180125-r10i1p1f1_gn_201801250600-201803260000.nc
When/if I encounter more, I will post them here.
#12 (comment)
The SNAPSI_CV.json info for CNRM-CM 6.1 contains a source_id with a space and a "." both of which are not compatible with CEDA filename conventions.
"CNRM-CM 6.1": { "activity_participation": "SNAPSI", "cohort": "Registered", "institution_id": "Meteo-France", "source_id": "CNRM-CM 6.1", "source": "CNRM-CM 6.1 (description)" },
I recommend that the source_id is updated to "CNRM-CM61"
Will need to update:
source_id - list of participating institutions/models
frequency - list of output frequencies
realm - do we need ocean data to include SSTs/sea-ice quantities?
table_id - list of tables (6hr, E6hr, 6hrZ, E6hrZ)
The following is the list of global attributes used in CMIP6.
Marked with an "x" when required for SNAP.
Elements which are part of the Data Reference Syntax (DRS) are marked as needed.
CF-1.8 SNAP
or CF-1.8
. Needed by CF, useful for SNAP.SNAP
The data passed all of the checks except for the global attribute for the CF convention
[global_ncattribute_cv]: FAILED:: Global attributes do not match constraints: [('Conventions', 'CF-1.7', "['CF-1.8 SNAP']")]
The convention provided was "CF-1.7" but the correct convention is "CF-1.8 SNAP"
The errors found by the ceda-cc code in the comments below.
Each comment describes a separate error and includes a list of the variables in whose data files the error is found.
is missing, as all are ensemble members for ta for this particular experiment and initialization.
The following errors were picked up by the ceda-cc and cf checks for the free s20180125 experiment v20230526 data:
Below are the amendments made by CEDA before the archival of CNRM-CM61 v20221123 data:
mip_era changed from 'CMIP6' to 'SNAPSI'
filename replacing underscore with dash from 'subexperiment_variantlabel' to 'subexperiment-variantlabel'
e.g. 'clt_6hr_CNRM-CM61_free_s20180125_r42i1p1f1_gr_20180125-20180310.nc'
to 'clt_6hr_CNRM-CM61_free_s20180125-r42i1p1f1_gr_20180125-20180310.nc'
command run in linux terminal:
find -type f -name '*_r*
'
-execdir bash -c 'mv -- "$1" "${1//_r/-r}"' bash {} ;
Some files in 'free' experiment had an experiment_id 'nudged' and incorrect 'experiment' description.
files: with variantlabel starting r15-r25 in s20181213 and r42-r50 in s20180125 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntrl, tntrs, tos, ua, uas, utendmp, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id from 'nudged' to 'free' and experiment description
Some files in 'nudged' experiment had an experiment_id 'control' instead of 'nudged', incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = "s20180125".
files: with variantlabel starting r14-r50 of s20190108 and r25 of s20190829 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntnd, tntrl, tntrs, tos, ua, uas, utendmp, utendnd, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id 'control' to 'nudged', change incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = 's20180125' to either 's20190108' or 's20190829'.
appears to be missing. That is, I only see ensemble members 2 through 59 for this particular experiment from KMA. Ensemble member 1 is not there.
I have tried downloading the following two files in multiple ways from the CEDA servers, but somehow the files appear to be corrupted when I try to read them on my servers.
ua/ECMWF/ua_6hrPt_IFS_free_s20190829-r26i1p1f1_gr_201908290000-201910140000.nc
ua/NCAR/ua_6hrPt_CESM2-CAM6_nudged_s20191001-r12i1p1f1_gn_20191001-20191115.nc
Both of them appear to have the following issue:
"NetCDF: HDF error
Location: file ; line 478"
Defining experiment descriptions for snapsi catalogue records
They should have enough information to be meaningful yet generic enough to be applicable to all the snapsi data providers.
For comparison here are the github issues for confirming the general descriptions of the CCMI-2022 experiments
refD1 cedadev/ccmi-2022#59
refD2 cedadev/ccmi-2022#60
senD2-sai cedadev/ccmi-2022#61
senD2-ssp126 cedadev/ccmi-2022#62
senD2-ssp370 cedadev/ccmi-2022#63
long_name corrections needed for
tauv: long_name="Surface Downward Northward Stress" [correct: "Surface Downward Northward Wind Stress"]
tauu: long_name="Surface Downward Eastward Stress" [correct: "Surface Downward Eastward Wind Stress"]
snw: long_name="Surface Snow Amout" [correct: "Surface Snow Amount"]
is empty, as are around 2/3 of the ensemble members for this particular expreiment and initialization.
The following errors were raised by the ceda-cc quality checks for the control/s20180125-r3i1p2f1/ test sample:
the file SNU/ua_6hrPt_GRIMs_control_s20180125-r1i1p1f1_gr_20180125-20180311.nc
is way too big as compared to other ensemble members, and when I try to open it up (either using matlab or ncview) my computer complains the format is incorrect.
The following errors were picked up by the ceda-cc checks for the control s20180125 experiment v20230215 data:
The data passed checks except comments below:
The tracking_id
in CMIP6 files is a resolvable handle.net identifier, e.g.
hdl:21.14100/5a26143c-222d-4cea-aed2-923b45d930c9 for the file http://esgf.bsc.es/thredds/fileServer/esg_dataroot/a247-CMIP-r2/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r2i1p1f1/Omon/tos/gr/v20201215/tos_Omon_EC-Earth3_historical_r2i1p1f1_gr_192401-192412.nc
The infrastructure behind this is run by DKRZ and is not funded to cover projects outside CMIP6 (an possibly not for CMIP6 data added after the AR6 report), so we cannot use it in the same way here. The CMIP5 version of the tracking_id
just used a unique identifier, such as 5a26143c-222d-4cea-aed2-923b45d930c9
. This is not resolvable in the same way, but does provide a useful way of tracking files when publication updates lead to multiple files with the same name.
I recommend using the CMIP5 version of tracking_id
here.
Dear supporter for the snapsi,
I am not sure if it is proper to open an issue here.
I have uploaded the snapsi data for BCC-CSM2-HR on Jan. 6 2024 and placed the data into the snapsi/v20240106.
But the data hasn't been checked and moved into the SNAPSI archive.
Could you help me with that?
My DEDA account is zhaoyang.
Thank you.
Best wishes,
Zhaoyang
20240305
Hi Peter,
Apologies for the slow response.
@aph42 , @charliepascoe : I ran ceda-cc
on the SNAPSI data that has been uploaded, and it reports an error because vaclim
is not in the 6hrRef
table.
Could you extend the table or reformat the files with a correct variable name?
If possible, download ceda-cc
(see pypi.org) and check the files yourself after making the correction.
https://data.ceda.ac.uk/badc/snap/data/post-cmip6/SNAPSI/KMA/GloSea6-GC32/nudged/s20181213
is missing the fourth ensemble member
In other words
/badc/snap/data/post-cmip6/SNAPSI/KMA/GloSea6-GC32/nudged/s20181213/r1i1p4f1
does not exist
In preparing for Met Office data production the following anomalies in the variable definitions used here were noted.
Pt
, dimensions and frequency suggest time: mean
, but cell_methods contain time: point
:6hr/clt
Pt
, frequency = 6hrPt
cell_methods time: point
but dimensions containtime
6hrPt/hus
6hrPt/ta
6hrPt/ua
6hrPt/va
6hrPt/wap
6hrPt/zg
time
-> time1
on these variablesPt
, frequency = 6hrPt
suggest cell_methods of time: point
, but cell_methods contain time: mean
and dimensions contain time
rather than time1
:6hrPt/tos
6hrPt/siconca
6hrPt/sithick
6hrPt/mrso
6hrPtZ/o3
6hrPtZ/epfy
6hrPtZ/epfz
6hrPtZ/vtem
6hrPtZ/wtem
Note that when the dimensions contain time
rather than time1
the file naming used by CMOR is slightly different and the bounds on the time coordinate are required.
The following errors were raised by the cf checks. Some may ignored if the metadata is consistent with the SNAPSI mip tables.
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i47p1f1_gr_20180125-20180310.nc
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i49p1f1_gr_20180125-20180310.nc
--
and
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i48p1f1_gr_20180125-20180310.nc
are significantly larger than earlier ensemble members, and my attempts to open these files failed.
The data passed all checks except the comments below:
For variable clt the cell_methods attribute needs to be changed from "area: mean time: point" to "area: mean time: mean"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [clt] has incorrect attributes: cell_methods="area: mean time: point" [correct: "area: mean time: mean"]
For variable tasmax the long_name attribute needs to be changed from "a 6hourly Maximum Near-Surface Air Temperature" to "6 hourly Maximum Near-Surface Air Temperature"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [tasmax] has incorrect attributes: long_name="a 6hourly Maximum Near-Surface Air Temperature" [correct: "6 hourly Maximum Near-Surface Air Temperature"]
add 'plev33' requested pressure levels
Suggest we use a similar convention to ccmi-2022
<variable_id>_<table_id>_<source_id>_<experiment_id >_<variant_label>_<grid_label>[_<time_range>].nc
e.g.
zmo3_monthly_HadGEM3-ES_refC1_r1i1p1_"gridLabel"_196001-196810.nc
The following errors were raised by the cf checks. Some were ignored if the metadata was consistent with the SNAPSI mip tables.
The errors found by the ceda-cc code in the comments below.
Each comment describes a separate error and includes a list of the variables in whose data files the error is found.
In preparing for MetOffice production of data for SNAPSI I've had to make a few tweaks to get CMOR to accept the CV and MIP tables held here.
To be updated when I've prepared a pull request
The data uploaded as of 16/02/2023 passed checks except comments below:
The following errors were picked up by the ceda-cc checks for the control s20180125 experiment v20230225 data:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.