Giter Site home page Giter Site logo

eo2cube / odc-tools Goto Github PK

View Code? Open in Web Editor NEW

This project forked from opendatacube/odc-tools

0.0 0.0 0.0 4.18 MB

ODC features that DEA is experimenting with or prototyping with the intention of being integrated into odc-core in the future

License: Apache License 2.0

Shell 2.07% Python 69.87% Makefile 0.38% Jupyter Notebook 27.23% Dockerfile 0.47%

odc-tools's Introduction

Test Status codecov

DEA Prototype Code

This repository provides developmental libraries and CLI tools for Open Datacube.

  • AWS S3 tools
  • CLIs for using ODC data from AWS S3 and SQS
  • Utilities for data visualizations in notebooks
  • Experiments on optimising Rasterio usage on AWS S3

Full list of libraries, and install instructions:

  • odc.ui tools for data visualization in notebook/lab
  • odc.io common IO utilities, used by apps mainly
  • odc-cloud[ASYNC,AZURE,THREDDS] cloud crawling support package
    • odc.aws AWS/S3 utilities, used by apps mainly
    • odc.aio faster concurrent fetching from S3 with async, used by apps odc-cloud[ASYNC]
    • odc.{thredds,azure} internal libs for cloud IO odc-cloud[THREDDS,AZURE]

Promoted to their own repositories

  • odc.stats large scale processing framework (Moved to odc-stats)
  • odc.stac STAC to ODC conversion tools (Moved to odc-stac)
  • odc.dscache experimental key-value store where key=UUID, value=Dataset (moved to odc-dscache)

Installation

Libraries and applications in this repository are published to PyPI, and can be installed
with pip like so:

pip install \
  odc-ui \
  odc-stac \
  odc-stats \
  odc-io \
  odc-cloud[ASYNC] \
  odc-dscache

For Conda Users

Some odc-tools are available via conda from the conda-forge channel.

conda install -c conda-forge odc-apps-dc-tools odc-io odc-cloud 

Cloud Tools

Installation

Cloud tools depend on the aiobotocore package, which depends on specific versions of botocore. Another package we use, boto3, also depends on specific versions of botocore. As a result, having both aiobotocore and boto3 in one environment can be a bit tricky. The way to solve this is to install aiobotocore[awscli,boto3] before anything else, which will install compatible versions of boto3 and awscli into the environment.

pip install -U "aiobotocore[awscli,boto3]==1.3.3"
# OR for conda setups
conda install "aiobotocore==1.3.3" boto3 awscli
  1. For cloud (AWS only)
    pip install odc-apps-cloud
    
  2. For cloud (GCP, THREDDS and AWS)
    pip install odc-apps-cloud[GCP,THREDDS]
    
  3. For dc-index-from-tar (indexing to datacube from tar archive)
    pip install odc-apps-dc-tools
    

Apps

  1. s3-find list S3 bucket with wildcard
  2. s3-to-tar fetch documents from S3 and dump them to a tar archive
  3. gs-to-tar search GS for documents and dump them to a tar archive
  4. dc-index-from-tar read yaml documents from a tar archive and add them to datacube

Example:

#!/bin/bash

s3_src='s3://dea-public-data/L2/sentinel-2-nrt/**/*.yaml'

s3-find "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

Fastest way to list regularly placed files is to use fixed depth listing:

#!/bin/bash

# only works when your metadata is same depth and has fixed file name
s3_src='s3://dea-public-data/L2/sentinel-2-nrt/S2MSIARD/*/*/ARD-METADATA.yaml'

s3-find --skip-check "${s3_src}" | \
  s3-to-tar | \
    dc-index-from-tar --env s2 --ignore-lineage

When using Google Storage:

#!/bin/bash

# Google Storage support
gs-to-tar --bucket data.deadev.com --prefix mangrove_cover
dc-index-from-tar --protocol gs --env mangroves --ignore-lineage metadata.tar.gz

Local Development

The following steps are used in the GitHub Actions workflow main.yml

# build environment from file
mamba env create -f tests/test-env.yml

# this environment name is defined in tests/test-env.yml file
conda activate odc-tools-tests

# install additional packages
./scripts/dev-install.sh --no-deps

# setup database for testing
./scripts/setup-test-db.sh

# run test
echo "Running Tests"
pytest --cov=. \
--cov-report=html \
--cov-report=xml:coverage.xml \
--timeout=30 \
libs apps

# Optional, to delete the environment
conda env remove -n odc-tools-tests

Use conda env update -f <file> to install all needed dependencies for odc-tools libraries and apps.

Conda `environment.yaml` (click to expand)
channels:
  - conda-forge
dependencies:
  # Datacube
  - datacube>=1.8.5

  # odc.dscache
  - python-lmdb
  - zstandard

  # odc.ui
  - ipywidgets
  - ipyleaflet
  - tqdm

  # odc-apps-dc-tools
  - pystac>=1
  - pystac-client>=0.2.0
  - azure-storage-blob
  - fsspec
  - lxml  # needed for thredds-crawler

  # odc.{aio,aws}: aiobotocore/boto3
  #  pin aiobotocore for easier resolution of dependencies
  - aiobotocore==1.3.3
  - boto3

  # eodatasets3 (used by odc-stats)
  - boltons
  - ciso8601
  - python-rapidjson
  - requests-cache
  - ruamel.yaml
  - structlog
  - url-normalize

  # for dev
  - pylint
  - autopep8
  - flake8
  - isort
  - black
  - mypy

  # For tests
  - pytest
  - pytest-httpserver
  - pytest-cov
  - pytest-timeout
  - moto
  - deepdiff

  - pip>=20
  - pip:
      # odc.apps.dc-tools
      - thredds-crawler

      # odc.stats
      - eodatasets3

      # tests
      - pytest-depends

      # odc.ui
      - jupyter-ui-poll

      # odc-tools libs
      - odc-stac
      - odc-ui
      - odc-dscache
      - odc-stats

      # odc-tools CLI apps
      - odc-apps-cloud
      - odc-apps-dc-tools

Release Process

  1. Manually edit {lib,app}/{pkg}/odc/{pkg}/_version.py file to increase version number
  2. Merge changes to the develop branch via a Pull Request
  3. Fast-forward the pypi/publish branch to match develop
  4. Push to GitHub

Steps 3 and 4 can be done by an authorized user with ./scripts/sync-publish-branch.sh script.

Publishing to PyPi happens automatically when changes are pushed to the protected pypi/publish branch. Only members of Open Datacube Admins group have the permission to push to this branch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.