Giter Site home page Giter Site logo

dimensionaldata.jl's Introduction

DimensionalData

Build Status Codecov

Add named dimensions to Julia arrays and other types. This is a work in progress under active development, it may be a while before the interface stabilises and things are fully documented.

DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions with positional values. It's a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDimensions.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in GeoData.jl.

Dimensions

The core component is the AbstractDimension, and types that inherit from it, such as Time, X, Y, Z, the generic Dim{:x} or others you define manually using the @dim macro.

Dims can be used for indexing and views without knowing dimension order: a[X(20)], view(a, X(1:20), Y(30:40)) and for indicating dimesions to reduce mean(a, dims=Time), or permute permutedims(a, [X, Y, Z, Time]) in julia Base and Statistics functions that have dims arguments.

Selectors

Selectors can be used in getindex, setindex! and view to select indices matching the passed in value(s)

  • At(x) : get indices exactly matching the passed in value(s)
  • Near(x) : get the closest indices to the passed in value(s)
  • Between(a, b) : get all indices between two values (inclusive)

It's easy to add your own custom Selector if your need a different behaviour.

Example usage:

a[Between(a, b), At(2)]
a[Time<|Between(a, b)]

Methods where dims can be used containing indices or Selectors

  • getindex
  • setindex!
  • view

Methods where dims can be used instead of integer dims, as X() or just the type X

  • size
  • axes
  • permutedims
  • mapslices
  • eachslice
  • reverse
  • dropdims
  • reduce
  • mapreduce
  • sum
  • prod
  • maximum
  • minimum
  • mean
  • std
  • var
  • cor
  • cov
  • median

Example usage:

size(a, Time)

mean(a, dims=X)

For package developers

Goals:

  • Maximum extensibility: always use method dispatch. Regular types over special syntax. Recursion over @generated.
  • Flexibility: dims and selectors are parametric types with multiple uses
  • Abstraction: never dispatch on concrete types, maximum re-usability of methods
  • Clean, readable syntax. Minimise required parentheses, minimise of exported methods, and instead extend Base methods whenever possible.
  • Minimal interface: implementing a dimension-aware type should be easy.
  • Functional style: structs are always rebuilt, and other than the array data, fields are not mutated in place.
  • Least surprise: everything works the same as in Base, but with named dims. If a method accepts numeric indices or dims=X in base, you should be able to use DimensionalData.jl dims.
  • Type stability: dimensional methods should be type stable more often than Base methods
  • Zero cost dimensional indexing a[Y(4), X(5)] of a single value.
  • Low cost indexing for range getindex and views: these cant be zero cost as dim ranges have to be updated.
  • Plotting is easy: data should plot sensibly and correctly with useful labels - after all transformations using dims or indices
  • Prioritise spatial data: other use cases are a free bonus of the modular approach.

Why this package

Why not AxisArrays.jl or NamedDims.jl?

Structure

Both AxisArrays and NamedDims use concrete types for dispatch on arrays, and for dimension type Axis in AxisArrays. This makes them hard to extend.

Its a little easier with DimensionalData.jl. You can inherit from AbstractDimensionalArray, or just implement dims and rebuild methods. Dims and selectors in DimensionalData.jl are also extensible. Recursive primitive methods allow inserting whatever methods you want to add extra types. @generated is only used to match and permute arbitrary tuples of types, and contain no type-specific details. The @generated functions in AxisArrays internalise axis/index conversion behaviour preventing extension in external packages and scripts.

Syntax

AxisArrays.jl is verbose by default: a[Axis{:y}(1)] vs a[Y(1)] used here. NamedDims.jl has concise syntax, but the dimensions are no longer types.

Data types and the interface

DimensionalData.jl provides the concrete DimenstionalArray type. But it's core purpose is to be easily used with other array types.

Some of the functionality in DimensionalData.jl will work without inheriting from AbstractDimensionalArray. The main requirement define a dims method that returns a Tuple of AbstractDimension that matches the dimension order and axis values of your data. Define rebuild, and base methods for similar and parent if you want the metadata to persist through transformations (see the DimensionalArray and AbstractDimensionalArray types). A refdims method returns the lost dimensions of a previous transformation, passed in to the rebuild method. Refdims can be discarded, the main loss being plot labels.

Inheriting from AbstractDimensionalArray will give a few benefits, such as methods currently blocked by problems with dims dispatch in Julia Base, and indexing using regular integer dimensions but updating your wrapper type with new dims.

New dimensions can be generated with the @dim macro at top level scope:

@dim Band "Raster band"

Dimensions use the same types that are used for indexing. The dims(a) method should return a tuple something like this:

(Y(-40.5:40.5, (units="degrees_north",), X(1.0:40.0, (units="degrees_east",))`) 

either stored or generated from other data. The metadata can be anything, preferably in a NamedTuple. Some standards may be introduced as they are worked out over time.

dimensionaldata.jl's People

Contributors

balinus avatar ivirshup avatar rafaqz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.