eurec4a / how_to_eurec4a Goto Github PK

View Code? Open in Web Editor NEW

6.0 6.0 20.0 249.96 MB

Code examples to get you started with EUREC⁴A data.

Home Page: https://howto.eurec4a.eu

License: MIT License

TeX 88.72% CSS 0.92% Shell 3.04% Python 7.32%

eurec4a

how_to_eurec4a's Introduction

EUREC4A meta data repository

Shared metadata, code and standards designed to structure the EUREC4A data reposistory, improve treatment of metadata and thereby ease the EUREC4A data analysis.

State of the project

Currently, the goals of the metadata repository are being refined while experiments on fist metadata structures, mostly considering measurement platforms are conducted. If you just came across this repository, please have a look at the current goals document as well as issues and pull requests. Please don't hesitate to add more issues or pull requests if you have use cases, expectations, opinions or questions about EUREC4A data or metadata, which are not yet written down.

Metadata concept

EUREC4A metadata will be sourced from the owners of the objects the metadata describes. That is each instrument would provide instrument metadata, with a minimal set of controlled language, and a subset of this infomation would be inherited by the platform metadata. Campaign metadata can then inherit information from the plafforms. This ensures that all the metadata is provided by the owners. At each stage of the process additional information could be included an inherited. An example would be the flight-track dictionaries being developed for HALO, which would then find their way into the HALO metadata.

Metadata also proide a controlled vocabulary i.e. a list of valid ways to refer to a platform, instrument, etc.

Tentiative naming conventions

Although file naming conventions are not a particular goal of this metadata concept, some ideas have already evolved and can be found in naming_conventions.md for reference.

how_to_eurec4a's People

Contributors

Stargazers

Watchers

Forkers

d70-t juleradtke xychen-ocn vpoertge jroettenbacher merax tmieslinger robertpincus clauclouds cgentemann annalea-albright observingclouds geet-george leifdenby fjansson wesfloyd martinjanssens torresalavez lvol08

how_to_eurec4a's Issues

access to (ICON model) output and input

I am interested in data from the ICON model runs (in- and output), but this question could be generalised to other data as well.

Is the output from the ICON model runs publicly available? The eurec4a intake repository folder for ICON contains yaml files with links to dkrz, which I suspect to be the location of the output data, but I do not understand how to access them.
Are the corresponding input settingsfiles publicly available as well? (They might be bundled directly with the output of course.)

matplotlib style: relative paths as `str` no longer supported

import matplotlib.pyplot as plt
plt.style.use("./mplstyle/book")

raises error

[REQUEST] Add HALO photo collection

I think it would be neat if we could add the HALO photo collection app (https://observations.ipsl.fr/aeris/eurec4a-data/AIRCRAFT/HALO/PHOTOCOLLECTION-APP/) to the HowTo book. It took me quite some time to find it. HALO paper -> dataset DOI -> Download section -> Photo gallery

Cache updates

While the caching already helps a lot reducing the build time, there is still an issue with flaky notebooks:

If a cache entry (with a given key) exists, it is restored. Afterwards all the previously failed and newly added or modified notebooks are executed. When finished, there will ideally be more successfully executed notebooks in the cache, thus the cache probably should get updated. Currently this does not happen (as the save cache action refuses to write a cache entry which already exists). There are probably two ways out:

purge cache before saving
find a clever way to use restore keys or something similar, to create new cache entries which are preferred during restore over older entries and let github's cache eviction logic take care of older entries

Homage to EUREC⁴A

Hej!
Shall we try to use the superscript ⁴ and write EUREC⁴A wherever we can in this repository? I think that would be quite nice and a good first issue :)

Details about EUREC4A-MIP boundary conditions

This issue addresses a remaining discussion of #84

The section on boundary conditions and pseudo-global warming data is currently missing some details that need to be added to make the available output useful for new users.

References with Author Year style

References should be in Author Year style in stead of initial letter. However we are not yet sure how to set this up properly. See #54 for the progress so far.

make a chapter on HAMP cloud mask?

separate chapter on calculating the HAMP cloud mask using Mareks functions and plotting the data according to his example.

Maybe combine it with issue#19?

Publish the book on Github pages?

We are currently using a Gitlab workflow to publish the compiled book on GWDG, and using a separate Github workflow as continuous integration to ensure the book compiles.

It seems that we could publish the pages on Github relatively easily. Should we do this too, or instead or, publishing on GWDG via Gitlab?

requirements cleanup

From searching though the code, I found that only the following packages are first order requirements:

xarray
simplification
datetime
eurec4a
ipyleaflet
matplotlib

Maybe one could add intake but that is commented out. At the moment, I don't even see code importing Numpy, so no need for that.

add Marek's HAMP LWP data

https://observations.ipsl.fr/aeris/eurec4a-data/AIRCRAFT/HALO/HAMP/LWP_IWV_v0.8/

Failing to compile notebooks with latest environment

Running the CI for #81 failed due to the following attribute error:

AttributeError: 'GeoAxesSubplot' object has no attribute '_autoscaleXon

when calling Cartopy.
Affected by this error are the notebooks p3_AXRT.md, p3_wsra.md and icon_les.md.

Related Cartopy Issue

Issue with intake 3.x.x

The request of datasets that are accessed via intake currently fail with:

("missing 'module'", {'module': 'intake_xarray'})

This is an issue we already fixed for the eurec4a intake catalog CI by fixing the version of intake to <3.0.0

How to add cartopy?

For making maps in many cases we will want to add the cartopy Python module but simply adding the module to requirements.txt leads to unresolvable dependencies (RobertPincus@acac4cc). I can't get my local module stack to work with conda - can someone (@d70-t ?) show me how to modify requirements.txt?

(I love that autocorrect wants to change cartopy to "carroty")

correlate rttov channels and GOES-16 ABI channels

The rttov data contains contains 7 variables called "synsat_rttov_forward_model_1__abi_ir__goes_16__channel_1" through "..._channel7". As far as I understand the process to calculate brightness temperatures from ABI channels as outlined in the GOES-16 ATBD uses 7 channels (with an additional one for backup). I assume that the 7 rttov channels are the model data of these 7 channels in ABI.

But how do I know which channel in the rttov data corresponds to a certain channel from ABI?

Compiled book is not visible to outsiders

Hi @tmieslinger I'm delighted that this book is now part of the EUREC4A organization. Can you check that the compiled book on Gitlab pages is visible to all? I just tried to follow the link to the book, I was asked to authorize at GWDG, and when I did the page is not found. Many people won't have GWDG address.

tipps and tricks and more information on data handling with python

I thought that it could be nice to collect some useful general information on data handling, such as python packages (xarray, ...) or tricks that we use within the scripts. My first idea would be to make a collection of links to documentation pages or short descriptions (or even examples?). A possible place could be in the introduction chapter. Any ideas/comments/feelings on that?

Split code example chapter

Maybe split chapter code examples into two chapters, one on basic code examples and another for advanced or combined products?

ICON_LES is broken

The ICON_LES page is currently (and since a while) broken. This was useful for testing #101 but should be fixed.

My initial investigation showed that the EUREC4A_ICON-LES_control_DOM03_reff_native.zarr dataset seems to be corrupted. While .zmetadata shows there should be a time variable, there is none. For some reason, this doesn't get picked up correctly by the client library as an error, and instead some bytes of the error response seem to be interpreted as time values. These (garbage) values don't fit into a human time range, and thus are decoded as CFtime objects instead of np.datetime64, which in turn crashes matplotlib.

Possible fixes for the notebook:

fix the dataset by (re-) uploading the time variable
throw the (likely unusable) dataset out of the eurec4a_intake catalog
some code in the notebook which catches the situation and keeps it running for the other datasets

@observingClouds any thoughts?

local myst references

Add EUREC4A-Intake catalog citation recommendations

I think we should provide information about how to cite the catalog and it's used datasets somewhere.

Request multiple datasets concurrently

For some reason, requesting multiple datasets via opendap is done sequentially, even if it is explicitly sent of to multiple threads. In theory, the netcdf library should release the global interpreter lock (GIL) since version 1.1.7 (March 2015), but this not observed for opendap access in the cloudmasks notebook.

As the cloudmasks notebook includes an analysis of almost 100 individual datasets and its runtime should mostly be limited by the round trip time of dataset requests, finding a solution for this issue would make a huge difference.

Make the book importable

Through the evolution of the book, I'd expect that there will be some pages coming up which describe useful handcrafted functions which may be reusable in other parts of the book as well as in user code outside of the book. It would be great to come up with a possibility to mark parts of the code or code-cells as exported and then collect all the exported code segments into a python module which would be published to pypi.org. If that works well, one could write code like:

from how_to_eurec4a.chapter_name import nice_utility_function
nice_utility_function(my_dataset)

update the HAMP data version to v0.9

CI build node12 depreciation

GitHub CI is warning about outdated actions using Node12 instead of 16:

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16:
actions/checkout@v2, actions/setup-python@v2, actions/cache@v2.
For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

Version incompatibilities of dask and pandas

The CI currently fails due to the following version incompatibility: dask/dask#10164

Error message from failing CI:

File /usr/share/miniconda/envs/how_to_eurec4a/lib/python3.9/site-packages/dask/dataframe/accessor.py:276, in StringAccessor()
    272         meta = (self._series.name, object)
    273     return self._function_map(method, pat=pat, n=n, expand=expand, meta=meta)
    275 @derived_from(
--> 276     pd.core.strings.StringMethods,
    277     inconsistencies="``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``",
    278 )
    279 def split(self, pat=None, n=-1, expand=False):
    280     """Known inconsistencies: ``expand=True`` with unknown ``n`` will raise a ``NotImplementedError``."""
    281     return self._split("split", pat=pat, n=n, expand=expand)

AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'
AttributeError: module 'pandas.core.strings' has no attribute 'StringMethods'