Giter Site home page Giter Site logo

awesome-zarr's Introduction

Awesome-Zarr

drawing

Zarr is a cloud-native, chunked, compressed, and hierarchical array data format.

Contents

Resources

Topics

Resources

Existing resources

The Zarr website is already an excellent resource for learning about Zarr and its ecosystem. This list is intended to complement the website with a curated and opinionated list of resources.

This list focuses on Geo/Earth Sciences, but is not limited to that domain.

Existing lists

Lists

Introductory videos

Introductory talks Youtube playlist

Two excellent and up-to-date introductory talks:

Zarr V3

Zarr V3 is the upcoming version of Zarr. It is a major update that will bring many new features and improvements.

If you're getting into Zarr now, it might be a good idea to start with Zarr V3.

For an excellent in-depth overview, see the ESIP series of talks

Libraries

This list contains libraries that directly relate to Zarr in some way.

For implementations of Zarr, see Zarr Implementations.

Storage & I/O

ETL

Developer-oriented

  • numcodecs: Compression and transformation codecs used by Zarr
  • pydantic-zarr: Pydantic models for Zarr objects
  • traverzarr: Traversing Zarr JSON as if it's a filesystem
  • zarr_checksum: Calculating checksum information form Zarr
  • zarrdump: Describe zarr stores from the command line

Visualization: For tools & libraries for visualization, see visualization section

Kerchunk

Kerchunk allows you to efficiently read chunked data formats such as GRID, NetCDF, COGs by exposing them as a Zarr store.

Talks and tutorials

Future of Kerchunk

In the future, Kerchunk will be split into upstream functionality in Zarr itself and a new VirtualiZarr package.

Platforms

  • Arraylake: a data lake platform based on Zarr. The company, Earthmover was started by core Zarr developers.

Articles

Talks & Videos

Existing lists

Talks

Life sciences

Zarr has seen great adoption in the life sciences domain.

  • bdz: Zarr-based format for storing quantitative biosystems dynamics data
  • ome-zarr-py: Implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.
  • ez_zarr: Easy, high-level access to OME-Zarr filesets
  • hdmf-zarr: Zarr I/O backend for HDMF

Talks and resources

Visualization

Zarr has seen most work on visualization in the bioimaging community:

Topics

Zarr & other array data formats

For a general overview, see Cloud-Optimized Geospatial Formats Guide.

Essentially all other common array data formats can be exposed as Zarr. See Kerchunk.

NetCDF & HDF5

Zarr, NetCDF, and HDF5 are three separate data formats that nonetheless relate to each other in multiple way.

Resources

COG: Cloud-Optimized GeoTIFF

N5

Zarr and N5 are two similar array data formats that share common goals and development.

The Zarr V3 spec aims to provide a common implementation target (sources: 1, 2)

Links

GeoZarr

GeoZarr is a proposal for a Zarr-based geospatial data format, being submitted as an OGC standard

GeoZarr will define a metadata convention for Zarr stores that contain geospatial data.

It will also define the relationship of Zarr with CF and NetCDF

Links

Zarr & STAC

STAC provides a common structure for describing and cataloging spatiotemporal assets.

With its hierarchical structure and key-value metadata support, Zarr's capabilities overlap significantly with STAC.

The communities have not yet converged on a canonical representation of Zarr datasets through STAC.

Today, a good example of exposing Zarr in STAC is Planetary Computer

More discussion & Related links

In the future, the Zarr V3 Spec and GeoZarr convention will likely enable greater interoperability between STAC and Zarr.

awesome-zarr's People

Contributors

dahnj avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.