Giter Site home page Giter Site logo

snap's Introduction

snap's People

Contributors

aph42 avatar leknifrg avatar charliepascoe avatar matthew-mizielinski avatar agstephens avatar martinjuckes avatar

Watchers

James Cloos avatar Philip Kershaw avatar  avatar Zac Lawrence avatar Matt Pritchard avatar Andrew Harwood avatar Sam Pepler avatar Steve Donegan avatar  avatar Graham Parton avatar Zhaoyang avatar  avatar  avatar

snap's Issues

add source_id GloSea6 to snapsi_cv.json

Data received from the UKMO has been run with the model GloSea6 however there is only GloSea5 currently listed in the snapsi_cv.json

It is suggested that we add a source_id for GloSea6 to the CV table:

"GloSea6": {
"activity_participation": "SNAPSI",
"cohort": "Registered",
"institution_id": "UKMO",
"source_id": "GloSea6",
"source": "GloSea6 (description)"
},

Directory Structure for SNAP

I suggest we use the same pattern as the CMIP6 and CCMI-2022 archives
e.g.
/snap/data/<mip_era>/<activity_id>/<institution_id>/<source_id>/<experiment_id>/<member_id>/<table_id>/<variable_id>/<grid_label>/

For ccmi-2022 we're using the date that data is uploaded to CEDA for the version number and we're asking users to create a version directory for this, so data uploaded on 26th March 2021 would have vv20210326

Bad time coordinates in CNRM-CM61 v20221123 data

I noticed that CNRM-CM61 files I downloaded through CEDA all had bad time coordinates. Specifically, trying to open them with a vanilla xarray open_dataset call results in an OverflowError in both pandas and cftime (and a resultant ValueError message from xarray about being unable to decode time units).

Opening the files with xr.open_dataset(..., decode_times=False) shows that the time coordinate array is filled with equal values of 9.96921e+36 (doing a diff across the array returns an array of zeros).

In [1]: import xarray as xr

In [2]: ds = xr.open_dataset('ua_6hrPt_CNRM-CM61_control_s20180125-r10i1p1f1_gr_20180125-20180310.nc', decode_times=False)

In [3]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * time     (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36

In [4]: ds = xr.open_dataset('zg_6hrPt_CNRM-CM61_nudged_s20191001-r10i1p1f1_gr_20191001-20191114.nc', decode_times=False)

In [5]: print(ds.time.diff('time'))
<xarray.DataArray 'time' (time: 179)>
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0.])
Coordinates:
  * time     (time) float64 9.969e+36 9.969e+36 ... 9.969e+36 9.969e+36

I have not checked whether all the CNRM-CM61 files exhibit this issue, just a handful of random cases (differing variables, experiments, and initializations).

I have not encountered this problem with the other model datasets currently available (GRIMs and GEM-NEMO); the files that I've tested for these have resulted in valid time arrays.

corrupted data files

I have been processing some of the basic fields (ua, va, ta, wap, zg) for the group-SNAPSI effort. In doing so, I've run into a couple of corrupted netcdf files that give HDF errors when trying to read them:

Meteo-France/CNRM-CM61/nudged/s20191001/r7i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r7i1p1f1_gr_20191001-20191114.nc
Meteo-France/CNRM-CM61/nudged/s20191001/r8i1p1f1/6hrPt/ta/gr/v20230215/ta_6hrPt_CNRM-CM61_nudged_s20191001-r8i1p1f1_gr_20191001-20191114.nc
UKMO/GloSea6/control-full/s20180125/r10i1p1f1/6hrPt/wap/gn/v20230403/wap_6hrPt_GloSea6_control-full_s20180125-r10i1p1f1_gn_201801250600-201803260000.nc

When/if I encounter more, I will post them here.

CV table tasks

Will need to update:
source_id - list of participating institutions/models
frequency - list of output frequencies
realm - do we need ocean data to include SSTs/sea-ice quantities?
table_id - list of tables (6hr, E6hr, 6hrZ, E6hrZ)

Required global attributes: are all CMIP6 attributes required?

The following is the list of global attributes used in CMIP6.

Marked with an "x" when required for SNAP.

Elements which are part of the Data Reference Syntax (DRS) are marked as needed.

  • Conventions: Should be fixed, e.g. CF-1.8 SNAP or CF-1.8. Needed by CF, useful for SNAP.
  • activity_id: Fixed, set to SNAP
  • contact: We need a contact for each data provider .. this is important for users.
  • creation_date: Standard NetCDF metadata
  • data_specs_version: it is really hard to deal with changes in CMOR tables if the version information is not recorded in files.
  • experiment:
  • experiment_id: Part of the DRS
  • forcing_index: Part of the DRS
  • frequency: Part of the DRS
  • grid:
  • grid_label: Part of the DRS
  • initialization_index: Part of the DRS
  • institution:
  • institution_id: Part of the DRS
  • license: Part of the DRS
  • mip_era: Part of the DRS
  • nominal_resolution:
  • physics_index: Part of the DRS
  • product:
  • realization_index: Part of the DRS
  • realm: Part of the DRS
  • source:
  • source_id: Part of the DRS
  • source_type:
  • sub_experiment:
  • sub_experiment_id: Part of the DRS
  • table_id: Part of the DRS
  • tracking_id: Really useful for tracking different versions of files .. see #6
  • variable_id: Part of the DRS
  • variant_label: Part of the DRS

dchong ceda-cc report for SNU v20220807

The data passed all of the checks except for the global attribute for the CF convention

[global_ncattribute_cv]: FAILED:: Global attributes do not match constraints: [('Conventions', 'CF-1.7', "['CF-1.8 SNAP']")]

The convention provided was "CF-1.7" but the correct convention is "CF-1.8 SNAP"

dspecq CNRM-CM61 v20221123 changes made by CEDA before archival

Below are the amendments made by CEDA before the archival of CNRM-CM61 v20221123 data:

mip_era changed from 'CMIP6' to 'SNAPSI'

filename replacing underscore with dash from 'subexperiment_variantlabel' to 'subexperiment-variantlabel'
e.g. 'clt_6hr_CNRM-CM61_free_s20180125_r42i1p1f1_gr_20180125-20180310.nc'
to 'clt_6hr_CNRM-CM61_free_s20180125-r42i1p1f1_gr_20180125-20180310.nc'
command run in linux terminal:
find -type f -name '*_r*'
-execdir bash -c 'mv -- "$1" "${1//_r/-r}"' bash {} ;

Some files in 'free' experiment had an experiment_id 'nudged' and incorrect 'experiment' description.
files: with variantlabel starting r15-r25 in s20181213 and r42-r50 in s20180125 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntrl, tntrs, tos, ua, uas, utendmp, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id from 'nudged' to 'free' and experiment description

Some files in 'nudged' experiment had an experiment_id 'control' instead of 'nudged', incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = "s20180125".
files: with variantlabel starting r14-r50 of s20190108 and r25 of s20190829 (for variables: clt, hfds, hus, mrso, mrsos, o3, pr, prc, ps, psl, rlut, siconca, sithick, snd, snw, ta, tas, tasmax, tasmin, tauu, tauv, tntmp, tntnd, tntrl, tntrs, tos, ua, uas, utendmp, utendnd, utendnogw, utendogw, va, vas, vtendnogw, vtendogw, wap, zg)
correction: change experiment_id 'control' to 'nudged', change incorrectly assigned 'experiment' and incorrectly assigned sub_experiment_id = 's20180125' to either 's20190108' or 's20190829'.

buggy files?

I have tried downloading the following two files in multiple ways from the CEDA servers, but somehow the files appear to be corrupted when I try to read them on my servers.

ua/ECMWF/ua_6hrPt_IFS_free_s20190829-r26i1p1f1_gr_201908290000-201910140000.nc

ua/NCAR/ua_6hrPt_CESM2-CAM6_nudged_s20191001-r12i1p1f1_gn_20191001-20191115.nc

Both of them appear to have the following issue:
"NetCDF: HDF error
Location: file ; line 478"

experiment descriptions for snapsi catalogue records

Defining experiment descriptions for snapsi catalogue records
They should have enough information to be meaningful yet generic enough to be applicable to all the snapsi data providers.

For comparison here are the github issues for confirming the general descriptions of the CCMI-2022 experiments
refD1 cedadev/ccmi-2022#59
refD2 cedadev/ccmi-2022#60
senD2-sai cedadev/ccmi-2022#61
senD2-ssp126 cedadev/ccmi-2022#62
senD2-ssp370 cedadev/ccmi-2022#63

dspecq ceda-cc report for CNRM-CM61 v20221004

long_name corrections needed for
tauv: long_name="Surface Downward Northward Stress" [correct: "Surface Downward Northward Wind Stress"]
tauu: long_name="Surface Downward Eastward Stress" [correct: "Surface Downward Eastward Wind Stress"]
snw: long_name="Surface Snow Amout" [correct: "Surface Snow Amount"]

(potentially) corrupted file

the file SNU/ua_6hrPt_GRIMs_control_s20180125-r1i1p1f1_gr_20180125-20180311.nc
is way too big as compared to other ensemble members, and when I try to open it up (either using matlab or ncview) my computer complains the format is incorrect.

Tracking Identifier

The tracking_id in CMIP6 files is a resolvable handle.net identifier, e.g.

hdl:21.14100/5a26143c-222d-4cea-aed2-923b45d930c9 for the file http://esgf.bsc.es/thredds/fileServer/esg_dataroot/a247-CMIP-r2/CMIP6/CMIP/EC-Earth-Consortium/EC-Earth3/historical/r2i1p1f1/Omon/tos/gr/v20201215/tos_Omon_EC-Earth3_historical_r2i1p1f1_gr_192401-192412.nc

The infrastructure behind this is run by DKRZ and is not funded to cover projects outside CMIP6 (an possibly not for CMIP6 data added after the AR6 report), so we cannot use it in the same way here. The CMIP5 version of the tracking_id just used a unique identifier, such as 5a26143c-222d-4cea-aed2-923b45d930c9. This is not resolvable in the same way, but does provide a useful way of tracking files when publication updates lead to multiple files with the same name.

I recommend using the CMIP5 version of tracking_id here.

BCC-CSM2-HR hasn't been checked

Dear supporter for the snapsi,

I am not sure if it is proper to open an issue here.

I have uploaded the snapsi data for BCC-CSM2-HR on Jan. 6 2024 and placed the data into the snapsi/v20240106.
But the data hasn't been checked and moved into the SNAPSI archive.

screenshot for uploading data

Could you help me with that?
My DEDA account is zhaoyang.

Thank you.

Best wishes,
Zhaoyang
20240305

vaclim not in 6hrRef table

Hi Peter,

Apologies for the slow response.

@aph42 , @charliepascoe : I ran ceda-cc on the SNAPSI data that has been uploaded, and it reports an error because vaclim is not in the 6hrRef table.

Could you extend the table or reformat the files with a correct variable name?

If possible, download ceda-cc (see pypi.org) and check the files yourself after making the correction.

Inconsistencies in cell_methods/dimensions/frequency within variable definitions

In preparing for Met Office data production the following anomalies in the variable definitions used here were noted.

  1. MIP Table name doesn't contain Pt, dimensions and frequency suggest time: mean, but cell_methods contain time: point:
  • 6hr/clt
    Suggestion: change cell_methods on this variable
  1. MIP Table name contains Pt, frequency = 6hrPt cell_methods time: point but dimensions containtime
  • 6hrPt/hus
  • 6hrPt/ta
  • 6hrPt/ua
  • 6hrPt/va
  • 6hrPt/wap
  • 6hrPt/zg
    Suggestion: change dimensions time -> time1 on these variables
  1. MIP Table name contains Pt, frequency = 6hrPt suggest cell_methods of time: point, but cell_methods contain time: mean and dimensions contain time rather than time1:
  • 6hrPt/tos
  • 6hrPt/siconca
  • 6hrPt/sithick
  • 6hrPt/mrso
  • 6hrPtZ/o3
  • 6hrPtZ/epfy
  • 6hrPtZ/epfz
  • 6hrPtZ/vtem
  • 6hrPtZ/wtem

Note that when the dimensions contain time rather than time1 the file naming used by CMOR is slightly different and the bounds on the time coordinate are required.

potentially corrupted file

CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i47p1f1_gr_20180125-20180310.nc
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i49p1f1_gr_20180125-20180310.nc

--
and
CNR-ISAC/ua_6hrPt_GLOBO_free_s20180125-r1i48p1f1_gr_20180125-20180310.nc

are significantly larger than earlier ensemble members, and my attempts to open these files failed.

jknight003 ceda-cc report for GloSea6 v20221216

The data passed all checks except the comments below:

For variable clt the cell_methods attribute needs to be changed from "area: mean time: point" to "area: mean time: mean"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [clt] has incorrect attributes: cell_methods="area: mean time: point" [correct: "area: mean time: mean"]

For variable tasmax the long_name attribute needs to be changed from "a 6hourly Maximum Near-Surface Air Temperature" to "6 hourly Maximum Near-Surface Air Temperature"
Error:
C4.002.005: [variable_ncattribute_mipvalues]: FAILED:: Variable [tasmax] has incorrect attributes: long_name="a 6hourly Maximum Near-Surface Air Temperature" [correct: "6 hourly Maximum Near-Surface Air Temperature"]

File naming convention for SNAP data

Suggest we use a similar convention to ccmi-2022
<variable_id>_<table_id>_<source_id>_<experiment_id >_<variant_label>_<grid_label>[_<time_range>].nc
e.g.
zmo3_monthly_HadGEM3-ES_refC1_r1i1p1_"gridLabel"_196001-196810.nc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.