Giter Site home page Giter Site logo

opendatacube / datacube-core Goto Github PK

View Code? Open in Web Editor NEW
488.0 54.0 175.0 46.73 MB

Open Data Cube analyses continental scale Earth Observation data through time

Home Page: http://www.opendatacube.org

License: Apache License 2.0

Python 99.61% Shell 0.21% Dockerfile 0.12% Makefile 0.06%
python gis scientific-computing remote-sensing netcdf numpy raster gdal hacktoberfest

datacube-core's People

Contributors

alex-ip avatar alexgleith avatar andrewdhicks avatar ariana-b avatar awalshie avatar bellemae avatar benjimin avatar dependabot[bot] avatar gypsybojangles avatar harshurampur avatar jeremyh avatar kirill888 avatar mpaget avatar omad avatar petewa avatar pindge avatar pre-commit-ci[bot] avatar richardscottoz avatar robbibt avatar rowanwins avatar rtaib avatar simonaoliver avatar snowman2 avatar spacemanpaul avatar spaxe avatar uchchwhash avatar v0lat1le avatar whatnick avatar woodcockr avatar zhang01ga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datacube-core's Issues

Specify Resampling Algorithm when Ingesting

The resampling method for ingester should support the following:

  • near: nearest neighbour resampling (default, fastest algorithm, worst interpolation quality).
  • bilinear: bilinear resampling.
  • cubic: cubic resampling.
  • cubicspline: cubic spline resampling.

At the moment there is a space for interpolation in the storage_config.yaml file used to specify the output of an ingestion process. It's currently being ignored by the ingester.

This will be replaced with an option to specify a resampling algorithm, as implemented by gdal. The allowed options are based on http://www.gdal.org/gdalwarper_8h.html#a4775b029869df1f9270ad554c0633843.

Enable per pixel metadata/provenance tracking

to enable provenance tracking on a per pixel level - investigate methods which may enable tracing of pixel provenance where the contents of a storage unit may have mixed provenance i.e. Landsat scene overlap

_gdfnetcdf.py - NetCDF - Set dimension issue

Dear Team,

I found an issue in the 'set_dimension' function in _gdfnetcdf.py:
I replaced

            dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), self.decimal_places)

by

            dimension_index_vector = np.around(np.linspace(dimension_min, dimension_max, dimension_config['dimension_elements']), self.decimal_places)

because when using a non-integer step, such as 0.25, the results is often not be consistent. (the number of elements doesn't match the expected one, e.g. 1001 instead 1000).

It is better to use linspace for these cases.

Kind regards,

Didier

Storage unit temporal aggregator tool

Aggregate all timeslices/storage units to single storage unit

  1. Query database
  2. Return single aggregated unit
  3. Update DB

Operationally will be triggered by operator - user command line:
Assumes component timeslices exist
Fixed time range: 1 year for Landsat
Input time slices to aggregate will appropriately tagged to exclude from analysis / processing.

ingest data of >4 dimensions

storage write code currently hard coded dimensions
confirm storage unit with higher order dimensionality can be ingested

Ability to load storage and ingest configuration via command-line

As discussed with Alex, most of our configuration handling (such as dataset types, tile/storage types) could be simpler if stored as JSONB documents directly, rather than split across many tables. We've toyed with this in the doc db prototype.

We could allow the user to configure the AGDC (such as adding a new product/dataset type) by specifying json/yaml config documents directly from the command-line, rather than require direct database editing (as with AGDCv1).

Search API for minimal implementation

Search for Storage Units using criteria specified in #11

Search results must include following Storage Unit information:

  • URI
  • projection
  • dimensions (assume cf conventions + labels)
  • variables/measurements
  • coordinate extents
  • number of measurements along each dimension (coordinate length?)
  • DatasetID (provenance tracking)

_gdfnetcf.py - Invalid GDF.DECIMAL_PLACES

Dear Team,

In _gdfnetcf.py, instruction at line 155, failed.

I think GDF.DECIMAL_PLACES must be changed by self.decimal_places

Replace:
dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), GDF.DECIMAL_PLACES)

by:
dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), self.decimal_places)

Kind regards,

Determine reasonable storage compression for storage units

Investigate available compression algorithms and specify a compression rate to optimise data access performance.

Produce compression ratio versus access speed for different algorithm parameters (Nan percentages) -ignore storage file size as a constraint for now. Using image data - not random generated.

Data Access API interface as per GDF interface.

Interface between the Data Access API and the AE/EE.
As per GDF with the following modifications:

get_descriptor:
storage_units has storage_max, storage_min and storage_shape. storage_path is to be added.

get_data:
returned data numpy arrays are now xray.DataArrays.

enable spatial aggregation of storage units

Storage Unit Access method returns an out of tile nD array or object that wraps this
get_descriptor & get_data requirements are fulfilled.
Further documentation required to explicit define the requirement.

Travis-CI integration is broken - agdc-v2 repo is not showing up in travis and cannot be enabled

The transfer of the agdc v2 repository from the agdc-research-trial organisation to the data-cube organisation has broken travis-ci.org integration. The newly located repository doesn't display in the profiles page where travis configures the github organisation. Pressing sync to update the profile with github fails to change this.

I've emails the travis-ci contact email in an effort to find out what has gone wrong. Judging by a google search this happens from time to time.

subset from input dataset at ingest

For cases where the storage unit only needs to be populated for a subset extent of the input data.

Example: Himawari 8 datasets have data for the entire globe, but we might only want to ingest and store data covering Australia.

Group-based permissions & finer-grained authentication

  • Use of individual user accounts rather than a single shared user/password.
  • But users should still be able to "module load" the api without any further configuration (editing conf files).
    • AGDCv1 did this via a shared user and password hard-coded into the module, but this isn't ideal. It would be preferable to use existing environmental user accounts instead.
      • Look at PAM, LDAP, ident usage within NCI? All are built into Postgres and easy to configure.
  • Postgres grants should all be to group roles, not user roles.

This will allow for many useful features:

  • Logging of per-user actions: such as who ingested a dataset
  • More fine-grained access control (who can ingest, who can administer, who can query).
  • Minimise password management by using existing systems.

Add ability to configure 'locations'

Locations are named URI 'base paths'. For example gdata location can 'point to' file:///g/data. For now only file system locations are required. In the future web and S3 locations could be supported

see 59c1fbe for a potential config file solution

Ingest Landsat Datasets

Tie together the following functionality

  • Index the dataset (see #10 )
  • Generate Storage Units (see #9 )
  • Index Storage Units (see #11 )

to ingest Geoscience Australia (in the first instance) LS 5,7,8 L1T, PQ, NBAR and FC packaged (eo-datasets) products into specified storage format on demand

Enable query across multiple AGDC database instances of equal version

For common versions of datacube databases on common infrastructure - enable query and data access using more than one database/datastore.

User story: user has a local datacube implementation (datacube 1) but wants to use data from a public instance (datacube 2) in the query.

Aggregate tiles to multi-time storage units

Combine multiple storage units into one by stacking the data along a specified dimension.

  • Input storage units must not be modified
  • Coordinates along the stacked dimension must be sorted in the combined SU
  • Fail if the input storage units do not align perfectly; i.e. do not pad data with NDV's

_gdfnetcdf.py - NetCDF - SetAttributes failed

Dear Team,

In _gdfnetcdf.py, function georeference_from_file failed because of invalid (?) attribute name 'name' in crs_metadata

...
crs_metadata = {'crs:name': spatial_reference.GetAttrValue('geogcs'),
'crs:longitude_of_prime_meridian': 0.0, #TODO: This needs to be fixed!!! An OSR object should have this, but maybe only for specific OSR references??
'crs:inverse_flattening': spatial_reference.GetInvFlattening(),
'crs:semi_major_axis': spatial_reference.GetSemiMajor(),
'crs:semi_minor_axis': spatial_reference.GetSemiMinor(),
}
self.set_variable('crs', dims=(), dtype='i4')
self.set_attributes(crs_metadata)
...

Exception raised while processing storage unit (2015, -28, 111): 'name' is one of the reserved attributes ('_grpid', '_grp', '_varid', 'groups', 'dimensions', 'variables', 'dtype', 'data_model', 'disk_format', '_nunlimdim', 'path', 'parent', 'ndim', 'mask', 'scale', 'cmptypes', 'vltypes', '_isprimitive', 'file_format', '_isvlen', '_iscompound', '_cmptype', '_vltype', 'name', 'orthogoral_indexing', 'keepweakref'), cannot rebind. Use setncattr instead.

I replaced
crs_metadata = {'crs:name': spatial_reference.GetAttrValue('geogcs'),
by
crs_metadata = {'crs:standard_name': spatial_reference.GetAttrValue('geogcs'),
or
crs_metadata = {'crs:long_name_name': spatial_reference.GetAttrValue('geogcs'),

as a workaround

Kind regards,

Add collection metadata to storage units

When setting up a datacube collection, we need to record collection level metadata like:

  • title: Experimental Data files From the Australian Geoscience Data Cube v2 Development - DO NOT USE
  • summary: These files are experimental, short lived, and the format will change.
  • source: This data is a reprojection and retile of Landsat surface reflectance scene data available from /g/data/rs0/scenes/
  • product_version: 0.0.0
  • license: Creative Commons Attribution 4.0 International CC BY 4.0

These data need to be loaded from a configuration file into the database and made available to the storage unit writer. The NetCDF files we are writing now won't pass NCI validation without these pieces of metadata.

At the moment this sort of data is in the mapping documents, but probably doesn't belong there. Where should it go and what do we need to get it passed around?

Storage Access API - retrieve data from minimal build

Use results returned by the Search API(#13) and provide:

  • construct analysis array elements from storage units
  • labeled data (with xray)
  • lazy loading to facilitate out of core processing (with dask for example)
  • group data by inter-operability
    • same projection
    • same coordinate extents
    • same 'resolution'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.