Add named dimensions to Julia arrays and other types. This is a work in progress under active development, it may be a while before the interface stabilises and things are fully documented.
DimensionalData.jl provides tools and abstractions for working with datasets that have named dimensions with positional values. It's a pluggable, generalised version of AxisArrays.jl with a cleaner syntax, and additional functionality found in NamedDimensions.jl. It has similar goals to pythons xarray, and is primarily written for use with spatial data in GeoData.jl.
The core component is the AbstractDimension
, and types that inherit from it,
such as Time
, X
, Y
, Z
, the generic Dim{:x}
or others you
define manually using the @dim
macro.
Dims can be used for indexing and views without knowing dimension order:
a[X(20)]
, view(a, X(1:20), Y(30:40))
and for indicating dimesions to reduce
mean(a, dims=Time)
, or permute permutedims(a, [X, Y, Z, Time])
in julia
Base
and Statistics
functions that have dims arguments.
Selectors can be used in getindex
, setindex!
and view
to select
indices matching the passed in value(s)
At(x)
: get indices exactly matching the passed in value(s)Near(x)
: get the closest indices to the passed in value(s)Between(a, b)
: get all indices between two values (inclusive)
It's easy to add your own custom Selector
if your need a different behaviour.
Example usage:
a[Between(a, b), At(2)]
a[Time<|Between(a, b)]
getindex
setindex!
view
size
axes
permutedims
mapslices
eachslice
reverse
dropdims
reduce
mapreduce
sum
prod
maximum
minimum
mean
std
var
cor
cov
median
Example usage:
size(a, Time)
mean(a, dims=X)
- Maximum extensibility: always use method dispatch. Regular types over special syntax. Recursion over @generated.
- Flexibility: dims and selectors are parametric types with multiple uses
- Abstraction: never dispatch on concrete types, maximum re-usability of methods
- Clean, readable syntax. Minimise required parentheses, minimise of exported methods, and instead extend Base methods whenever possible.
- Minimal interface: implementing a dimension-aware type should be easy.
- Functional style: structs are always rebuilt, and other than the array data, fields are not mutated in place.
- Least surprise: everything works the same as in Base, but with named dims. If
a method accepts numeric indices or
dims=X
in base, you should be able to use DimensionalData.jl dims. - Type stability: dimensional methods should be type stable more often than Base methods
- Zero cost dimensional indexing
a[Y(4), X(5)]
of a single value. - Low cost indexing for range getindex and views: these cant be zero cost as dim ranges have to be updated.
- Plotting is easy: data should plot sensibly and correctly with useful labels - after all transformations using dims or indices
- Prioritise spatial data: other use cases are a free bonus of the modular approach.
Why not AxisArrays.jl or NamedDims.jl?
Both AxisArrays and NamedDims use concrete types for dispatch on arrays, and for
dimension type Axis
in AxisArrays. This makes them hard to extend.
Its a little easier with DimensionalData.jl. You can inherit from
AbstractDimensionalArray
, or just implement dims
and rebuild
methods. Dims
and selectors in DimensionalData.jl are also extensible. Recursive primitive
methods allow inserting whatever methods you want to add extra types.
@generated
is only used to match and permute arbitrary tuples of types, and
contain no type-specific details. The @generated
functions in AxisArrays
internalise axis/index conversion behaviour preventing extension in external
packages and scripts.
AxisArrays.jl is verbose by default: a[Axis{:y}(1)]
vs a[Y(1)]
used here.
NamedDims.jl has concise syntax, but the dimensions are no longer types.
DimensionalData.jl provides the concrete DimenstionalArray
type. But it's
core purpose is to be easily used with other array types.
Some of the functionality in DimensionalData.jl will work without inheriting
from AbstractDimensionalArray
. The main requirement define a dims
method
that returns a Tuple
of AbstractDimension
that matches the dimension order
and axis values of your data. Define rebuild
, and base methods for similar
and parent
if you want the metadata to persist through transformations (see
the DimensionalArray
and AbstractDimensionalArray
types). A refdims
method
returns the lost dimensions of a previous transformation, passed in to the
rebuild
method. Refdims can be discarded, the main loss being plot labels.
Inheriting from AbstractDimensionalArray
will give a few benefits, such as
methods currently blocked by problems with dims
dispatch in Julia Base, and
indexing using regular integer dimensions but updating your wrapper type with
new dims.
New dimensions can be generated with the @dim
macro at top level scope:
@dim Band "Raster band"
Dimensions use the same types that are used for indexing. The dims(a)
method should return a tuple something like this:
(Y(-40.5:40.5, (units="degrees_north",), X(1.0:40.0, (units="degrees_east",))`)
either stored or generated from other data. The metadata can be anything,
preferably in a NamedTuple
. Some standards may be introduced as they are
worked out over time.