Giter Site home page Giter Site logo

Comments (12)

observingClouds avatar observingClouds commented on August 21, 2024 1

Hi @felix-mue,
Thank you very much for reaching out to us and raise your question here.

To 1)
You can access most (if not all) datasets listed on the how.eurec4a.eu page via our intake catalog without even the need to know where they are stored 😎.

import eurec4a
cat = eurec4a.get_intake_catalog()
datasets = list(cat.simulations.ICON.LES_CampaignDomain_control)  # show all available entries of a catalog level
ds = cat.simulations.ICON.LES_CampaignDomain_control.surface_DOM01.to_dask()  # lazy loading of data

In addition, this will only download the data that you are actually using in your analysis (keyword: lazy loading). No need to download all the TB of output πŸ₯³

Please try it out! Does this answer your first question?

from how_to_eurec4a.

observingClouds avatar observingClouds commented on August 21, 2024 1

To 2)
The run-scripts are available at the experiment repository. Please let me know if you have access to those.

from how_to_eurec4a.

felix-mue avatar felix-mue commented on August 21, 2024 1

A simple barrier sadly: Our code is running in matlab, not python. So I have to access the data from matlab and assumed that isn't possible with the python package.

from how_to_eurec4a.

observingClouds avatar observingClouds commented on August 21, 2024 1

Sorry to hear that! Maybe it's time for a change πŸ₯³ MATLAB supports yaml files so you could read those files and grep the links. But honestly it seems like you would need to invent the wheel again. MATLAB's python support might also be something to look into but I'd be surprised if it works well.

Another issue you might face with MATLAB is that the simulations are saved in the zarr-format. It seems like MATLAB has no dedicated driver for this format yet. However, zarr is now besides HDF5 also a supported backend of netCDF and is supported by the newer libraries. You should therefore be able to load the zarr-files (after downloading them) through the netCDF library. The syntax is however a bit unusual.

So, here is an example how you can download a zarr-file from the catalog and read it with the netCDF library:

  1. Download the data with wget
wget -r -H -N --cut-dirs=3 --include-directories="/v1/" "https://swiftbrowser.dkrz.de/public/dkrz_948e7d4bbfbb445fbff5315fc433e36a/EUREC4A_LES/experiment_2/meteograms/EUREC4A_ICON-LES_control_meteogram_DOM03_BCO.zarr/?show_all"

Note the change of the prefix and ending of the url compared to the one given in the catalog.

  1. Note that wget creates two directories (swift.dkrz.de, swiftbrowser.dkrz.de). The actual dataset is in swift.dkrz.de.
  2. Append the absolute path of the zarr file following the scheme: file:///path/to/zarr/file.zarr#mode=xarray
  3. You should be able to use this path with your favourite netCDF tool, e.g.
ncdump -h "file:///path/to/swift.dkrz.de/experiment_2/meteograms/EUREC4A_ICON-LES_control_meteogram_DOM03_BCO.zarr#mode=xarray"

Unfortunately, reading a variable from this dataset is for this particular case not working on my end. It might be that the used compressor is not supported (although it seems) or the blosc library (we use lz4 as a compressor here) is not linked to the netcdf library.

ncdump -v time "file:///path/to/swift.dkrz.de/experiment_2/meteograms/EUREC4A_ICON-LES_control_meteogram_DOM03_BCO.zarr#mode=xarray"

returns the metadata and then

data:

NetCDF: Filter error: undefined filter encountered
Location: file ?; fcn ? line 478
 time = % 

from how_to_eurec4a.

d70-t avatar d70-t commented on August 21, 2024 1

Download the data with wget

If you really really want a download of a subset, I'd probably recommend to just open the data with intake / xarray, then do some ds[[vars...]].sel(...).to_netcdf(). But just as @observingClouds said, I didn't yet discover cases in which downloading would be so much better that it would justify the additional hassle involved.

from how_to_eurec4a.

felix-mue avatar felix-mue commented on August 21, 2024

Thanks for the quick reply!
Yes, I have access to the other repository. I will work through the data handling and the files and get back to you when something comes up.

from how_to_eurec4a.

felix-mue avatar felix-mue commented on August 21, 2024

About accessing the data: While lazy loading is great for many places, for me it would actually be helpful to have one big download of the data (maybe subset by variables). Is that available as well?

from how_to_eurec4a.

observingClouds avatar observingClouds commented on August 21, 2024

May I ask what your application is? The latency to access the files here should be fairly low and loading the data lazily ensures that you will always access the latest version.

At https://howto.eurec4a.eu/eurec4a_mip.html we show you how you can download data with wget. The paths you can find in the eurec4a catalog files, e.g. here

from how_to_eurec4a.

observingClouds avatar observingClouds commented on August 21, 2024

@d70-t do you have an idea what is going on here? The filter in .zmetadata/.zarray is actually null and deleting it entirely does not help to solve the problem.

from how_to_eurec4a.

d70-t avatar d70-t commented on August 21, 2024

.zmetadata is a zarr-python extension, which as far as I know isn't adopted yet by netCDF. But .zarray is used.

You probably need netCDF >= 4.9 and there are some steps required for setting up netCDF to run with filters.

from how_to_eurec4a.

felix-mue avatar felix-mue commented on August 21, 2024

Thanks a lot to both of you! I agree, of course I'd rather not download. I just didn't see a way to access it otherwise (within matlab).

I will try the cross-accessibility features @observingClouds mentioned, but I also don't have high hopes.
I am also downloading some data simultaneously to try if that gets me further.

from how_to_eurec4a.

felix-mue avatar felix-mue commented on August 21, 2024

I ended up downloading the data with a python script to save them as netcdf files. This is of course unfortunate, because the pythonic way of accessing this data is way more comfortable! Thanks a lot again for your help and providing the data in the first place!

from how_to_eurec4a.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.