The snap from cedadev

add source_id GloSea6 to snapsi_cv.json

Data received from the UKMO has been run with the model GloSea6 however there is only GloSea5 currently listed in the snapsi_cv.json

It is suggested that we add a source_id for GloSea6 to the CV table:

"GloSea6": {
"activity_participation": "SNAPSI",
"cohort": "Registered",
"institution_id": "UKMO",
"source_id": "GloSea6",
"source": "GloSea6 (description)"
},

Directory Structure for SNAP

I suggest we use the same pattern as the CMIP6 and CCMI-2022 archives
e.g.
/snap/data/<mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable_id>/<grid_label>/

For ccmi-2022 we're using the date that data is uploaded to CEDA for the version number and we're asking users to create a version directory for this, so data uploaded on 26th March 2021 would have vv20210326

SPARC/snapsi group work space

Peter requested a group work space for snapsi but I have not got further than my initial enquiries with respect to setting it up.
I think it best that I hand this task back to the snapsi team.

Instructions for requesting project resources (e.g. a GWS) on jasmin can be found here:
https://help.jasmin.ac.uk/article/5022-requesting-resources

Bad time coordinates in CNRM-CM61 v20221123 data

I noticed that CNRM-CM61 files I downloaded through CEDA all had bad time coordinates. Specifically, trying to open them with a vanilla xarray open_dataset call results in an OverflowError in both pandas and cftime (and a resultant ValueError message from xarray about being unable to decode time units).

Opening the files with xr.open_dataset(..., decode_times=False) shows that the time coordinate array is filled with equal values of 9.96921e+36 (doing a diff across the array returns an array of zeros).

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('ua_6hrPt_CNRM-CM61_control_s20180125-r10i1p1f1_gr_20180125-20180310.nc', decode_times=False)

In [3]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * time     (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36

In [4]: ds = xr.open_dataset('zg_6hrPt_CNRM-CM61_nudged_s20191001-r10i1p1f1_gr_20191001-20191114.nc', decode_times=False)

In [5]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * time     (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36

I have not checked whether all the CNRM-CM61 files exhibit this issue, just a handful of random cases (differing variables, experiments, and initializations).

I have not encountered this problem with the other model datasets currently available (GRIMs and GEM-NEMO); the files that I've tested for these have resulted in valid time arrays.

corrupted data files

I have been processing some of the basic fields (ua, va, ta, wap, zg) for the group-SNAPSI effort. In doing so, I've run into a couple of corrupted netcdf files that give HDF errors when trying to read them:

Meteo-France/CNRM-CM61/nudged/s20191001/r7i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r7i1p1f1_gr_20191001-20191114.nc
Meteo-France/CNRM-CM61/nudged/s20191001/r8i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r8i1p1f1_gr_20191001-20191114.nc
UKMO/GloSea6/control-full/s20180125/r10i1p1f1/6hrPt/wap/gn/v20230403/wap_6hrPt_GloSea6_control-full_s20180125-r10i1p1f1_gn_201801250600-201803260000.nc

When/if I encounter more, I will post them here.

update the snapsi_cv.json source_id for CNRM-CM61

#12 (comment)
The SNAPSI_CV.json info for CNRM-CM 6.1 contains a source_id with a space and a "." both of which are not compatible with CEDA filename conventions.
"CNRM-CM 6.1": { "activity_participation": "SNAPSI", "cohort": "Registered", "institution_id": "Meteo-France", "source_id": "CNRM-CM 6.1", "source": "CNRM-CM 6.1 (description)" },
I recommend that the source_id is updated to "CNRM-CM61"

CV table tasks

Will need to update:
source_id - list of participating institutions/models
frequency - list of output frequencies
realm - do we need ocean data to include SSTs/sea-ice quantities?
table_id - list of tables (6hr, E6hr, 6hrZ, E6hrZ)

Required global attributes: are all CMIP6 attributes required?

The following is the list of global attributes used in CMIP6.

Marked with an "x" when required for SNAP.

Elements which are part of the Data Reference Syntax (DRS) are marked as needed.

dchong ceda-cc report for SNU v20220807

The data passed all of the checks except for the global attribute for the CF convention

[global_ncattribute_cv]: FAILED:: Global attributes do not match constraints: [('Conventions', 'CF-1.7', "['CF-1.8 SNAP']")]

The convention provided was "CF-1.7" but the correct convention is "CF-1.8 SNAP"

dspecq ceda-cc report for CNRM-CM61 v20220818

The errors found by the ceda-cc code in the comments below.
Each comment describes a separate error and includes a list of the variables in whose data files the error is found.

missing all ensemble members

https://data.ceda.ac.uk/badc/snap/data/post-cmip6/SNAPSI/Meteo-France/CNRM-CM61/free/s20180125/r12i1p1f1/6hrPt/ta/gr

is missing, as all are ensemble members for ta for this particular experiment and initialization.

cobarton ceda-cc and cfcheck report for NAVGEM v20230526

The following errors were picked up by the ceda-cc and cf checks for the free s20180125 experiment v20230526 data:

dspecq CNRM-CM61 v20221123 changes made by CEDA before archival

Below are the amendments made by CEDA before the archival of CNRM-CM61 v20221123 data:

mip_era changed from 'CMIP6' to 'SNAPSI'

filename replacing underscore with dash from 'subexperiment_variantlabel' to 'subexperiment-variantlabel'
e.g. 'clt_6hr_CNRM-CM61_free_s20180125_r42i1p1f1_gr_20180125-20180310.nc'
to 'clt_6hr_CNRM-CM61_free_s20180125-r42i1p1f1_gr_20180125-20180310.nc'
command run in linux terminal:
find -type f -name '*_r*'
-execdir bash -c 'mv -- "$1" "${1//_r/-r}"' bash {} ;

Some files in 'free' experiment had an experiment_id 'nudged' and incorrect 'experiment' description.
files: with variantlabel starting r15-r25 in s20181213 and r42-r50 in s20180125 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntrl, tntrs, tos, ua, uas, utendmp, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id from 'nudged' to 'free' and experiment description

Some files in 'nudged' experiment had an experiment_id 'control' instead of 'nudged', incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = "s20180125".
files: with variantlabel starting r14-r50 of s20190108 and r25 of s20190829 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntnd, tntrl, tntrs, tos, ua, uas, utendmp, utendnd, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id 'control' to 'nudged', change incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = 's20180125' to either 's20190108' or 's20190829'.

missing file

https://data.ceda.ac.uk/badc/snap/data/post-cmip6/SNAPSI/KMA/GloSea6-GC32/nudged/s20181213/[r1i1p1f1]

appears to be missing. That is, I only see ensemble members 2 through 59 for this particular experiment from KMA. Ensemble member 1 is not there.

buggy files?

I have tried downloading the following two files in multiple ways from the CEDA servers, but somehow the files appear to be corrupted when I try to read them on my servers.

ua/ECMWF/ua_6hrPt_IFS_free_s20190829-r26i1p1f1_gr_201908290000-201910140000.nc

ua/NCAR/ua_6hrPt_CESM2-CAM6_nudged_s20191001-r12i1p1f1_gn_20191001-20191115.nc

Both of them appear to have the following issue:
"NetCDF: HDF error
Location: file ; line 478"

experiment descriptions for snapsi catalogue records

Defining experiment descriptions for snapsi catalogue records
They should have enough information to be meaningful yet generic enough to be applicable to all the snapsi data providers.

For comparison here are the github issues for confirming the general descriptions of the CCMI-2022 experiments
refD1 cedadev/ccmi-2022#59
refD2 cedadev/ccmi-2022#60
senD2-sai cedadev/ccmi-2022#61
senD2-ssp126 cedadev/ccmi-2022#62
senD2-ssp370 cedadev/ccmi-2022#63

dspecq ceda-cc report for CNRM-CM61 v20221004

long_name corrections needed for
tauv: long_name="Surface Downward Northward Stress" [correct: "Surface Downward Northward Wind Stress"]
tauu: long_name="Surface Downward Eastward Stress" [correct: "Surface Downward Eastward Wind Stress"]
snw: long_name="Surface Snow Amout" [correct: "Surface Snow Amount"]

missing data

https://data.ceda.ac.uk/badc/snap/data/post-cmip6/SNAPSI/Meteo-France/CNRM-CM61/free/s20180208/r33i1p1f1/6hrPt/ta/gr

is empty, as are around 2/3 of the ensemble members for this particular expreiment and initialization.

janstey ceda-cc and cfcheck report for CanESM5 v20190429

The following errors were raised by the ceda-cc quality checks for the control/s20180125-r3i1p2f1/ test sample:

(potentially) corrupted file

the file SNU/ua_6hrPt_GRIMs_control_s20180125-r1i1p1f1_gr_20180125-20180311.nc
is way too big as compared to other ensemble members, and when I try to open it up (either using matlab or ncview) my computer complains the format is incorrect.

isimpson002 ceda-cc report for CESM2-CAM6 v20230215

The following errors were picked up by the ceda-cc checks for the control s20180125 experiment v20230215 data:

hlin011 ceda-cc report for GEM-NEMO v20221004

The data passed checks except comments below:

Tracking Identifier

The tracking_id in CMIP6 files is a resolvable handle.net identifier, e.g.

hdl:21.14100/5a26143c-222d-4cea-aed2-923b45d930c9 for the file http://esgf.bsc.es/thredds/fileServer/esg_dataroot/a247-CMIP-r2/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r2i1p1f1/Omon/tos/gr/v20201215/tos_Omon_EC-Earth3_historical_r2i1p1f1_gr_192401-192412.nc

The infrastructure behind this is run by DKRZ and is not funded to cover projects outside CMIP6 (an possibly not for CMIP6 data added after the AR6 report), so we cannot use it in the same way here. The CMIP5 version of the tracking_id just used a unique identifier, such as 5a26143c-222d-4cea-aed2-923b45d930c9. This is not resolvable in the same way, but does provide a useful way of tracking files when publication updates lead to multiple files with the same name.

I recommend using the CMIP5 version of tracking_id here.

BCC-CSM2-HR hasn't been checked

Dear supporter for the snapsi,

I am not sure if it is proper to open an issue here.

I have uploaded the snapsi data for BCC-CSM2-HR on Jan. 6 2024 and placed the data into the snapsi/v20240106.
But the data hasn't been checked and moved into the SNAPSI archive.

Could you help me with that?
My DEDA account is zhaoyang.

Thank you.

Best wishes,
Zhaoyang
20240305

vaclim not in 6hrRef table

Hi Peter,

Apologies for the slow response.

@aph42 , @charliepascoe : I ran ceda-cc on the SNAPSI data that has been uploaded, and it reports an error because vaclim is not in the 6hrRef table.

Could you extend the table or reformat the files with a correct variable name?

If possible, download ceda-cc (see pypi.org) and check the files yourself after making the correction.

folder missing

https://data.ceda.ac.uk/badc/snap/data/post-cmip6/SNAPSI/KMA/GloSea6-GC32/nudged/s20181213

is missing the fourth ensemble member

In other words
/badc/snap/data/post-cmip6/SNAPSI/KMA/GloSea6-GC32/nudged/s20181213/r1i1p4f1
does not exist

Inconsistencies in cell_methods/dimensions/frequency within variable definitions

In preparing for Met Office data production the following anomalies in the variable definitions used here were noted.

MIP Table name doesn't contain Pt, dimensions and frequency suggest time: mean, but cell_methods contain time: point:

6hr/clt
Suggestion: change cell_methods on this variable

MIP Table name contains Pt, frequency = 6hrPt cell_methods time: point but dimensions containtime

6hrPt/hus
6hrPt/ta
6hrPt/ua
6hrPt/va
6hrPt/wap
6hrPt/zg
Suggestion: change dimensions time -> time1 on these variables

MIP Table name contains Pt, frequency = 6hrPt suggest cell_methods of time: point, but cell_methods contain time: mean and dimensions contain time rather than time1:

6hrPt/tos
6hrPt/siconca
6hrPt/sithick
6hrPt/mrso
6hrPtZ/o3
6hrPtZ/epfy
6hrPtZ/epfz
6hrPtZ/vtem
6hrPtZ/wtem

Note that when the dimensions contain time rather than time1 the file naming used by CMOR is slightly different and the bounds on the time coordinate are required.

isimpson002 cf checks report for CESM2-CAM6 v20230225

The following errors were raised by the cf checks. Some may ignored if the metadata is consistent with the SNAPSI mip tables.

potentially corrupted file

CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i47p1f1_gr_20180125-20180310.nc
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i49p1f1_gr_20180125-20180310.nc

--
and
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i48p1f1_gr_20180125-20180310.nc

are significantly larger than earlier ensemble members, and my attempts to open these files failed.

jknight003 ceda-cc report for GloSea6 v20221216

The data passed all checks except the comments below:

For variable clt the cell_methods attribute needs to be changed from "area: mean time: point" to "area: mean time: mean"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [clt] has incorrect attributes: cell_methods="area: mean time: point" [correct: "area: mean time: mean"]

For variable tasmax the long_name attribute needs to be changed from "a 6hourly Maximum Near-Surface Air Temperature" to "6 hourly Maximum Near-Surface Air Temperature"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [tasmax] has incorrect attributes: long_name="a 6hourly Maximum Near-Surface Air Temperature" [correct: "6 hourly Maximum Near-Surface Air Temperature"]

cedadev / snap Goto Github PK

snap's Introduction

snap

snap's People

Contributors

Watchers

Forkers

snap's Issues

Recommend Projects

Recommend Topics

Recommend Org