Giter Site home page Giter Site logo

how_to_eurec4a's Introduction

EUREC4A meta data repository

Shared metadata, code and standards designed to structure the EUREC4A data reposistory, improve treatment of metadata and thereby ease the EUREC4A data analysis.

State of the project

Currently, the goals of the metadata repository are being refined while experiments on fist metadata structures, mostly considering measurement platforms are conducted. If you just came across this repository, please have a look at the current goals document as well as issues and pull requests. Please don't hesitate to add more issues or pull requests if you have use cases, expectations, opinions or questions about EUREC4A data or metadata, which are not yet written down.

Metadata concept

EUREC4A metadata will be sourced from the owners of the objects the metadata describes. That is each instrument would provide instrument metadata, with a minimal set of controlled language, and a subset of this infomation would be inherited by the platform metadata. Campaign metadata can then inherit information from the plafforms. This ensures that all the metadata is provided by the owners. At each stage of the process additional information could be included an inherited. An example would be the flight-track dictionaries being developed for HALO, which would then find their way into the HALO metadata.

Metadata also proide a controlled vocabulary i.e. a list of valid ways to refer to a platform, instrument, etc.

Tentiative naming conventions

Although file naming conventions are not a particular goal of this metadata concept, some ideas have already evolved and can be found in naming_conventions.md for reference.

how_to_eurec4a's People

Contributors

annalea-albright avatar clauclouds avatar d70-t avatar fjansson avatar geet-george avatar jroettenbacher avatar juleradtke avatar martinjanssens avatar merax avatar observingclouds avatar robertpincus avatar tmieslinger avatar vpoertge avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

how_to_eurec4a's Issues

access to (ICON model) output and input

I am interested in data from the ICON model runs (in- and output), but this question could be generalised to other data as well.

  1. Is the output from the ICON model runs publicly available? The eurec4a intake repository folder for ICON contains yaml files with links to dkrz, which I suspect to be the location of the output data, but I do not understand how to access them.
  2. Are the corresponding input settingsfiles publicly available as well? (They might be bundled directly with the output of course.)

Cache updates

While the caching already helps a lot reducing the build time, there is still an issue with flaky notebooks:

If a cache entry (with a given key) exists, it is restored. Afterwards all the previously failed and newly added or modified notebooks are executed. When finished, there will ideally be more successfully executed notebooks in the cache, thus the cache probably should get updated. Currently this does not happen (as the save cache action refuses to write a cache entry which already exists). There are probably two ways out:

  • purge cache before saving
  • find a clever way to use restore keys or something similar, to create new cache entries which are preferred during restore over older entries and let github's cache eviction logic take care of older entries

Homage to EUREC⁴A

Hej!
Shall we try to use the superscript ⁴ and write EUREC⁴A wherever we can in this repository? I think that would be quite nice and a good first issue :)

References with Author Year style

References should be in Author Year style in stead of initial letter. However we are not yet sure how to set this up properly. See #54 for the progress so far.

Publish the book on Github pages?

We are currently using a Gitlab workflow to publish the compiled book on GWDG, and using a separate Github workflow as continuous integration to ensure the book compiles.

It seems that we could publish the pages on Github relatively easily. Should we do this too, or instead or, publishing on GWDG via Gitlab?

requirements cleanup

From searching though the code, I found that only the following packages are first order requirements:

  • xarray
  • simplification
  • datetime
  • eurec4a
  • ipyleaflet
  • matplotlib

Maybe one could add intake but that is commented out. At the moment, I don't even see code importing Numpy, so no need for that.

Issue with intake 3.x.x

The request of datasets that are accessed via intake currently fail with:

("missing 'module'", {'module': 'intake_xarray'})

This is an issue we already fixed for the eurec4a intake catalog CI by fixing the version of intake to <3.0.0

How to add cartopy?

For making maps in many cases we will want to add the cartopy Python module but simply adding the module to requirements.txt leads to unresolvable dependencies (RobertPincus@acac4cc). I can't get my local module stack to work with conda - can someone (@d70-t ?) show me how to modify requirements.txt?

(I love that autocorrect wants to change cartopy to "carroty")

correlate rttov channels and GOES-16 ABI channels

The rttov data contains contains 7 variables called "synsat_rttov_forward_model_1__abi_ir__goes_16__channel_1" through "..._channel7". As far as I understand the process to calculate brightness temperatures from ABI channels as outlined in the GOES-16 ATBD uses 7 channels (with an additional one for backup). I assume that the 7 rttov channels are the model data of these 7 channels in ABI.

But how do I know which channel in the rttov data corresponds to a certain channel from ABI?

Compiled book is not visible to outsiders

Hi @tmieslinger I'm delighted that this book is now part of the EUREC4A organization. Can you check that the compiled book on Gitlab pages is visible to all? I just tried to follow the link to the book, I was asked to authorize at GWDG, and when I did the page is not found. Many people won't have GWDG address.

tipps and tricks and more information on data handling with python

I thought that it could be nice to collect some useful general information on data handling, such as python packages (xarray, ...) or tricks that we use within the scripts. My first idea would be to make a collection of links to documentation pages or short descriptions (or even examples?). A possible place could be in the introduction chapter. Any ideas/comments/feelings on that?

Split code example chapter

Maybe split chapter code examples into two chapters, one on basic code examples and another for advanced or combined products?

ICON_LES is broken

The ICON_LES page is currently (and since a while) broken. This was useful for testing #101 but should be fixed.

My initial investigation showed that the EUREC4A_ICON-LES_control_DOM03_reff_native.zarr dataset seems to be corrupted. While .zmetadata shows there should be a time variable, there is none. For some reason, this doesn't get picked up correctly by the client library as an error, and instead some bytes of the error response seem to be interpreted as time values. These (garbage) values don't fit into a human time range, and thus are decoded as CFtime objects instead of np.datetime64, which in turn crashes matplotlib.

Possible fixes for the notebook:

  • fix the dataset by (re-) uploading the time variable
  • throw the (likely unusable) dataset out of the eurec4a_intake catalog
  • some code in the notebook which catches the situation and keeps it running for the other datasets

@observingClouds any thoughts?

Request multiple datasets concurrently

For some reason, requesting multiple datasets via opendap is done sequentially, even if it is explicitly sent of to multiple threads. In theory, the netcdf library should release the global interpreter lock (GIL) since version 1.1.7 (March 2015), but this not observed for opendap access in the cloudmasks notebook.

As the cloudmasks notebook includes an analysis of almost 100 individual datasets and its runtime should mostly be limited by the round trip time of dataset requests, finding a solution for this issue would make a huge difference.

Make the book importable

Through the evolution of the book, I'd expect that there will be some pages coming up which describe useful handcrafted functions which may be reusable in other parts of the book as well as in user code outside of the book. It would be great to come up with a possibility to mark parts of the code or code-cells as exported and then collect all the exported code segments into a python module which would be published to pypi.org. If that works well, one could write code like:

from how_to_eurec4a.chapter_name import nice_utility_function
nice_utility_function(my_dataset)

CI build node12 depreciation

GitHub CI is warning about outdated actions using Node12 instead of 16:

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16:
actions/checkout@v2, actions/setup-python@v2, actions/cache@v2.
For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

Version incompatibilities of dask and pandas

The CI currently fails due to the following version incompatibility: dask/dask#10164

Error message from failing CI:

File /usr/share/miniconda/envs/how_to_eurec4a/lib/python3.9/site-packages/dask/dataframe/accessor.py:276, in StringAccessor()
    272         meta = (self._series.name, object)
    273     return self._function_map(method, pat=pat, n=n, expand=expand, meta=meta)
    275 @derived_from(
--> 276     pd.core.strings.StringMethods,
    277     inconsistencies="``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``",
    278 )
    279 def split(self, pat=None, n=-1, expand=False):
    280     """Known inconsistencies: ``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``."""
    281     return self._split("split", pat=pat, n=n, expand=expand)

AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'
AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.