opendatacube / datacube-core Goto Github PK
View Code? Open in Web Editor NEWOpen Data Cube analyses continental scale Earth Observation data through time
Home Page: http://www.opendatacube.org
License: Apache License 2.0
Open Data Cube analyses continental scale Earth Observation data through time
Home Page: http://www.opendatacube.org
License: Apache License 2.0
The resampling method for ingester should support the following:
At the moment there is a space for interpolation in the storage_config.yaml
file used to specify the output of an ingestion process. It's currently being ignored by the ingester.
This will be replaced with an option to specify a resampling algorithm, as implemented by gdal. The allowed options are based on http://www.gdal.org/gdalwarper_8h.html#a4775b029869df1f9270ad554c0633843.
to enable provenance tracking on a per pixel level - investigate methods which may enable tracing of pixel provenance where the contents of a storage unit may have mixed provenance i.e. Landsat scene overlap
Dear Team,
I found an issue in the 'set_dimension' function in _gdfnetcdf.py:
I replaced
dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), self.decimal_places)
by
dimension_index_vector = np.around(np.linspace(dimension_min, dimension_max, dimension_config['dimension_elements']), self.decimal_places)
because when using a non-integer step, such as 0.25, the results is often not be consistent. (the number of elements doesn't match the expected one, e.g. 1001 instead 1000).
It is better to use linspace for these cases.
Kind regards,
Didier
We need PostgreSQL 9.4 running somewhere accessible from raijin.
Storage unit search returns the same storage unit more than once in the case of time aggregated storage units and search query using dataset fields.
Analytics Expression Language - a intuitive language for performing analytics.
Aggregate all timeslices/storage units to single storage unit
Operationally will be triggered by operator - user command line:
Assumes component timeslices exist
Fixed time range: 1 year for Landsat
Input time slices to aggregate will appropriately tagged to exclude from analysis / processing.
Define terms relating to the storage unit and datasets - what is storage unit? config etc.
storage write code currently hard coded dimensions
confirm storage unit with higher order dimensionality can be ingested
As discussed with Alex, most of our configuration handling (such as dataset types, tile/storage types) could be simpler if stored as JSONB documents directly, rather than split across many tables. We've toyed with this in the doc db prototype.
We could allow the user to configure the AGDC (such as adding a new product/dataset type) by specifying json/yaml config documents directly from the command-line, rather than require direct database editing (as with AGDCv1).
Search for Storage Units using criteria specified in #11
Search results must include following Storage Unit information:
Dear Team,
In _gdfnetcf.py, instruction at line 155, failed.
I think GDF.DECIMAL_PLACES must be changed by self.decimal_places
Replace:
dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), GDF.DECIMAL_PLACES)
by:
dimension_index_vector = np.around(np.arange(dimension_min, dimension_max, element_size), self.decimal_places)
Kind regards,
Investigate available compression algorithms and specify a compression rate to optimise data access performance.
Produce compression ratio versus access speed for different algorithm parameters (Nan percentages) -ignore storage file size as a constraint for now. Using image data - not random generated.
Create a set of:
Include GA NBAR and PQ for filter query examples
Interface between the Data Access API and the AE/EE.
As per GDF with the following modifications:
get_descriptor:
storage_units has storage_max, storage_min and storage_shape. storage_path is to be added.
get_data:
returned data numpy arrays are now xray.DataArrays.
Enable read for package input (YAML and packaged data from data preparation process). Enable tile and reproject for granule/scene dataset inputs.
Storage Unit Access method returns an out of tile nD array or object that wraps this
get_descriptor & get_data requirements are fulfilled.
Further documentation required to explicit define the requirement.
The transfer of the agdc v2 repository from the agdc-research-trial organisation to the data-cube organisation has broken travis-ci.org integration. The newly located repository doesn't display in the profiles page where travis configures the github organisation. Pressing sync to update the profile with github fails to change this.
I've emails the travis-ci contact email in an effort to find out what has gone wrong. Judging by a google search this happens from time to time.
For cases where the storage unit only needs to be populated for a subset extent of the input data.
Example: Himawari 8 datasets have data for the entire globe, but we might only want to ingest and store data covering Australia.
User side tool for writing configurations
This will allow for many useful features:
Investigate inclusion of GDAL tags to support gdal read of storage units
Reprojection from ingest projection to new projection
As reported by Josh @sixy6e :
The spatial metadata in the sample files is a little off. Here's an ipython notebook documenting the problem.
https://github.com/sixy6e/my_code/blob/master/Python/notebooks/nc_metadata_tests.ipynb
The current spatial extents exclude the east-most and south-most lines of pixels, instead of including them. Correcting this should fix both the extents and the pixel sizes.
Index storage units to allow searching based on:
Index input datasets to allow searching based on:
Locations are named URI 'base paths'. For example gdata location can 'point to' file:///g/data. For now only file system locations are required. In the future web and S3 locations could be supported
see 59c1fbe for a potential config file solution
Implement an application (or applications) demonstrating Analytical Engine (#22)
Something involving simple band maths and/or statistics over time
Investigate how DataCube is going to support ingesting following datasets:
For common versions of datacube databases on common infrastructure - enable query and data access using more than one database/datastore.
User story: user has a local datacube implementation (datacube 1) but wants to use data from a public instance (datacube 2) in the query.
Combine multiple storage units into one by stacking the data along a specified dimension.
Dear Team,
In _gdfnetcdf.py, function georeference_from_file failed because of invalid (?) attribute name 'name' in crs_metadata
...
crs_metadata = {'crs:name': spatial_reference.GetAttrValue('geogcs'),
'crs:longitude_of_prime_meridian': 0.0, #TODO: This needs to be fixed!!! An OSR object should have this, but maybe only for specific OSR references??
'crs:inverse_flattening': spatial_reference.GetInvFlattening(),
'crs:semi_major_axis': spatial_reference.GetSemiMajor(),
'crs:semi_minor_axis': spatial_reference.GetSemiMinor(),
}
self.set_variable('crs', dims=(), dtype='i4')
self.set_attributes(crs_metadata)
...
Exception raised while processing storage unit (2015, -28, 111): 'name' is one of the reserved attributes ('_grpid', '_grp', '_varid', 'groups', 'dimensions', 'variables', 'dtype', 'data_model', 'disk_format', '_nunlimdim', 'path', 'parent', 'ndim', 'mask', 'scale', 'cmptypes', 'vltypes', '_isprimitive', 'file_format', '_isvlen', '_iscompound', '_cmptype', '_vltype', 'name', 'orthogoral_indexing', 'keepweakref'), cannot rebind. Use setncattr instead.
I replaced
crs_metadata = {'crs:name': spatial_reference.GetAttrValue('geogcs'),
by
crs_metadata = {'crs:standard_name': spatial_reference.GetAttrValue('geogcs'),
or
crs_metadata = {'crs:long_name_name': spatial_reference.GetAttrValue('geogcs'),
as a workaround
Kind regards,
README.md currently has the agdc-research-trial documentation and this needs to be replaced with the AGDC v2 description and logo
When setting up a datacube collection, we need to record collection level metadata like:
These data need to be loaded from a configuration file into the database and made available to the storage unit writer
. The NetCDF files we are writing now won't pass NCI validation without these pieces of metadata.
At the moment this sort of data is in the mapping documents, but probably doesn't belong there. Where should it go and what do we need to get it passed around?
Use results returned by the Search API(#13) and provide:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.