Giter Site home page Giter Site logo

astropenguin / xarray-dataclasses Goto Github PK

View Code? Open in Web Editor NEW
65.0 2.0 3.0 3.33 MB

:zap: xarray data creation made easy by dataclass

Home Page: https://pypi.org/project/xarray-dataclasses

License: MIT License

Python 100.00%
python xarray python-package xarray-extension dataarray dataset typing dataclass

xarray-dataclasses's Introduction

xarray-dataclasses

Release Python Downloads DOI Tests

xarray data creation made easy by dataclass

Overview

xarray-dataclasses is a Python package that makes it easy to create xarray's DataArray and Dataset objects that are "typed" (i.e. fixed dimensions, data type, coordinates, attributes, and name) using the Python's dataclass:

from dataclasses import dataclass
from typing import Literal
from xarray_dataclasses import AsDataArray, Coord, Data


X = Literal["x"]
Y = Literal["y"]


@dataclass
class Image(AsDataArray):
    """2D image as DataArray."""

    data: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0

Features

  • Typed DataArray or Dataset objects can easily be created:
    image = Image.new([[0, 1], [2, 3]], [0, 1], [0, 1])
  • NumPy-like filled-data creation is also available:
    image = Image.zeros([2, 2], x=[0, 1], y=[0, 1])
  • Support for features by the Python's dataclass (field, __post_init__, ...).
  • Support for static type check by Pyright.

Installation

pip install xarray-dataclasses

Basic usage

xarray-dataclasses uses the Python's dataclass. Data (or data variables), coordinates, attributes, and a name of DataArray or Dataset objects will be defined as dataclass fields by special type hints (Data, Coord, Attr, Name), respectively. Note that the following code is supposed in the examples below.

from dataclasses import dataclass
from typing import Literal
from xarray_dataclasses import AsDataArray, AsDataset
from xarray_dataclasses import Attr, Coord, Data, Name


X = Literal["x"]
Y = Literal["y"]

Data field

Data field is a field whose value will become the data of a DataArray object or a data variable of a Dataset object. The type hint Data[TDims, TDtype] fixes the dimensions and the data type of the object. Here are some examples of how to specify them.

Type hint Inferred dimensions
Data[tuple[()], ...] ()
Data[Literal["x"], ...] ("x",)
Data[tuple[Literal["x"]], ...] ("x",)
Data[tuple[Literal["x"], Literal["y"]], ...] ("x", "y")
Type hint Inferred data type
Data[..., Any] None
Data[..., None] None
Data[..., float] numpy.dtype("float64")
Data[..., numpy.float128] numpy.dtype("float128")
Data[..., Literal["datetime64[ns]"]] numpy.dtype("<M8[ns]")

Coordinate field

Coordinate field is a field whose value will become a coordinate of a DataArray or a Dataset object. The type hint Coord[TDims, TDtype] fixes the dimensions and the data type of the object.

Attribute field

Attribute field is a field whose value will become an attribute of a DataArray or a Dataset object. The type hint Attr[TAttr] specifies the type of the value, which is used only for static type check.

Name field

Name field is a field whose value will become the name of a DataArray object. The type hint Name[TName] specifies the type of the value, which is used only for static type check.

DataArray class

DataArray class is a dataclass that defines typed DataArray specifications. Exactly one data field is allowed in a DataArray class. The second and subsequent data fields are just ignored in DataArray creation.

@dataclass
class Image(AsDataArray):
    """2D image as DataArray."""

    data: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0
    units: Attr[str] = "cd / m^2"
    name: Name[str] = "luminance"

A DataArray object will be created by a class method new():

Image.new([[0, 1], [2, 3]], x=[0, 1], y=[0, 1])

<xarray.DataArray "luminance" (x: 2, y: 2)>
array([[0., 1.],
       [2., 3.]])
Coordinates:
  * x        (x) int64 0 1
  * y        (y) int64 0 1
Attributes:
    units:    cd / m^2

NumPy-like class methods (zeros(), ones(), ...) are also available:

Image.ones((3, 3))

<xarray.DataArray "luminance" (x: 3, y: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * x        (x) int64 0 0 0
  * y        (y) int64 0 0 0
Attributes:
    units:    cd / m^2

Dataset class

Dataset class is a dataclass that defines typed Dataset specifications. Multiple data fields are allowed to define the data variables of the object.

@dataclass
class ColorImage(AsDataset):
    """2D color image as Dataset."""

    red: Data[tuple[X, Y], float]
    green: Data[tuple[X, Y], float]
    blue: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0
    units: Attr[str] = "cd / m^2"

A Dataset object will be created by a class method new():

ColorImage.new(
    [[0, 0], [0, 0]],  # red
    [[1, 1], [1, 1]],  # green
    [[2, 2], [2, 2]],  # blue
)

<xarray.Dataset>
Dimensions:  (x: 2, y: 2)
Coordinates:
  * x        (x) int64 0 0
  * y        (y) int64 0 0
Data variables:
    red      (x, y) float64 0.0 0.0 0.0 0.0
    green    (x, y) float64 1.0 1.0 1.0 1.0
    blue     (x, y) float64 2.0 2.0 2.0 2.0
Attributes:
    units:    cd / m^2

Advanced usage

Coordof and Dataof type hints

xarray-dataclasses provides advanced type hints, Coordof and Dataof. Unlike Data and Coord, they specify a dataclass that defines a DataArray class. This is useful when users want to add metadata to dimensions for plotting. For example:

from xarray_dataclasses import Coordof


@dataclass
class XAxis:
    data: Data[X, int]
    long_name: Attr[str] = "x axis"
    units: Attr[str] = "pixel"


@dataclass
class YAxis:
    data: Data[Y, int]
    long_name: Attr[str] = "y axis"
    units: Attr[str] = "pixel"


@dataclass
class Image(AsDataArray):
    """2D image as DataArray."""

    data: Data[tuple[X, Y], float]
    x: Coordof[XAxis] = 0
    y: Coordof[YAxis] = 0

General data variable names in Dataset creation

Due to the limitation of Python's parameter names, it is not possible to define data variable names that contain white spaces, for example. In such cases, please define DataArray classes of each data variable so that they have name fields and specify them by Dataof in a Dataset class. Then the values of the name fields will be used as data variable names. For example:

@dataclass
class Red:
    data: Data[tuple[X, Y], float]
    name: Name[str] = "Red image"


@dataclass
class Green:
    data: Data[tuple[X, Y], float]
    name: Name[str] = "Green image"


@dataclass
class Blue:
    data: Data[tuple[X, Y], float]
    name: Name[str] = "Blue image"


@dataclass
class ColorImage(AsDataset):
    """2D color image as Dataset."""

    red: Dataof[Red]
    green: Dataof[Green]
    blue: Dataof[Blue]
ColorImage.new(
    [[0, 0], [0, 0]],
    [[1, 1], [1, 1]],
    [[2, 2], [2, 2]],
)

<xarray.Dataset>
Dimensions:      (x: 2, y: 2)
Dimensions without coordinates: x, y
Data variables:
    Red image    (x, y) float64 0.0 0.0 0.0 0.0
    Green image  (x, y) float64 1.0 1.0 1.0 1.0
    Blue image   (x, y) float64 2.0 2.0 2.0 2.0

Customization of DataArray or Dataset creation

For customization, users can add a special class attribute, __dataoptions__, to a DataArray or Dataset class. A custom factory for DataArray or Dataset creation is only supported in the current implementation.

import xarray as xr
from xarray_dataclasses import DataOptions


class Custom(xr.DataArray):
    """Custom DataArray."""

    __slots__ = ()

    def custom_method(self) -> bool:
        """Custom method."""
        return True


@dataclass
class Image(AsDataArray):
    """2D image as DataArray."""

    data: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0

    __dataoptions__ = DataOptions(Custom)


image = Image.ones([3, 3])
isinstance(image, Custom)  # True
image.custom_method()  # True

DataArray and Dataset creation without shorthands

xarray-dataclasses provides functions, asdataarray and asdataset. This is useful when users do not want to inherit the mix-in class (AsDataArray or AsDataset) in a DataArray or Dataset dataclass. For example:

from xarray_dataclasses import asdataarray


@dataclass
class Image:
    """2D image as DataArray."""

    data: Data[tuple[X, Y], float]
    x: Coord[X, int] = 0
    y: Coord[Y, int] = 0


image = asdataarray(Image([[0, 1], [2, 3]], [0, 1], [0, 1]))

xarray-dataclasses's People

Contributors

astropenguin avatar dependabot[bot] avatar shaunc avatar sohumb avatar thewtex avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

xarray-dataclasses's Issues

Release v0.1.2

  • Update version numbers written in:
    • pyproject.toml
    • xarray_dataclasses/__init__.py
    • tests/test_metadata.py

Fix wrong comment

Fix wrong comment at xarray_dataclasses/typing.py:L38.

  • Before fix: # for Python 3.7 and 3.9
  • After fix: # for Python 3.7 and 3.8

Update typing module

Update xarray_dataclaases.typing.DataArray so that it accepts the same parameters as xarray.DataArray.

Release v0.1.1

  • Update version numbers written in:
    • pyproject.toml
    • xarray_dataclasses/__init__.py
    • tests/test_metadata.py

Update instance check of DataArray type

Update __instancecheck__ of typing.DataAarray so that the following tests are passed.

import numpy as np
from dataclasses import field
from xarray_dataclasses import DataArray


assert isinstance(0, DataArray['x', int])
assert isinstance(field(default=0), DataArray['x', int])
assert isinstance([0, 1, 2], DataArray['x', int])
assert isinstance(np.array([0, 1, 2]), DataArray['x', int])

Add factory fields for custom DataArray/Dataset creation

Add support of special fields (__dataarray_factory__, __dataset_factory__) for custom DataArray or Dataset creation.

class CustomDataset(xr.Dataset):
    __slots__ = ()


@dataclass
class Custom(DataArrayMixin):
    data: Data[tuple["x", "y"], float]
    __dataset_factory__ = CustomDataset


ds = asdataset(Custom(...)) # statically typed as CustomDataset
type(ds) # -> CustomDataset

Add Coord and Data types

Add Coord and Data types (subclass of typing.DataArray) to explicitly distinguish between data (data var) and coordinates. This is necessary to be done before adding @datasetclass (#24).

from xarray_dataclasses import Coord, Data, dataarrayclass


@dataarrayclass
class Image:
    data: Data[('x', 'y'), float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0

Attributes on data array in dataset. (Feature)

Currently there seems to be no way to specify attributes for a DataArray in a Dataset. Indeed, even if they are passed to new() in a DataArray, they are discarded.

It would be nice to reuse DataArray specs to be able specify these:

from xarray_datasetclasses import dataarrayclass, datasetspce, Data, Attr, DataArray

@dataarrayclass
class FooSpec:
    data: Data['x', float]
    meta: Attr[int]

@datasetclass
class BarSpec:
    array: DataArray[FooSpec]

Does that look useful? Here DataArray is from xarray_datasetclasses ... if this name isn't good because of conflict with xarray, perhaps ... ArraySpec?

Update dev environment

  • Update dev Python packages
  • Update dev JavaScript packages
  • Fix codes that cause type check errors

Update dataclass type

Use typing.ParamSpec in the dataclass type hint so that .new() can be statically typed.

Add typing module

Add typing module which provides a type for xarray.DataArray with fixed dims and dtype.

DataArray[("x", "y"), "f8"]

# DataArray[('x', 'y'), float64]

The type can be instantiated.

DataArray[("x", "y"), "f8"]([[0, 1], [2, 3]])

# <xarray.DataArray (x: 2, y: 2)>
# array([[0., 1.],
#        [2., 3.]])
# Dimensions without coordinates: x, y

Swap the order of dims and dtype in Coord[...] and Data[...]

Swap the order of dims and dtype s.t. Coord[dims, dtype] and Data[dims, dtype].
This is because their order in the printed format is (dims, dtype, value).

<xarray.DataArray (x: 3, y: 3)>
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])
Coordinates:
  * x        (x) int64 0 0 0
  * y        (y) int64 0 0 0

Release v0.2.0

  • Update docstrings of functions to be more detailed
  • Update package version (0.1.2 โ†’ 0.2.0)
  • Update README

Release v0.3.1

Release package that closes the following issues/PRs.

  • #40 loosen Xarray dependency requirements

Fix parsing dataclass under __future__.annotations enabled

The current (v0.4.0) dataclass parser (xarray_dataclasses.parse) does not work when the postponed evaluation of annotations (PEP 563) is enabled by importing from __future__ import annotations. This is because the parser gets type hints of dataclass fields from cls.__dataclass_fields__: Under such environments, field.type becomes string and loses runtime information.

This issue is for resolving the problem by using typing_extensions.get_type_hints(cls, include_extras=True) that ensures evaluated type hints made from strings.

Update mix-in classes

Update mix-in classes (AsDataArray, AsDataset) so that the annotations of .new() are dynamically updated.

Do not use variables with ClassVar or InitVar types

Update field.infer_field_kind() so as not to assign values with ClassVar or InitVar types to xarray's attrs.

from dataclasses import ClassVar, InitVar
from xarray_dataclasses import Coord, Data, dataarrayclass


@dataarrayclass
class Image:
    data: Data[('x', 'y'), float]
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0

    spam: str = "spam"  # -> a member of attrs
    ham: ClassVar[str] = "ham"  # -> not a member of attrs
    egg: InitVar[str] = "egg"  # -> not a member of attrs

Add bases module

Add bases module which provides DataArrayClass, a base class for dataclasses.

from xarray_dataclasses import DataArray, DataArrayClass


class Image(DataArrayClass):
    data: DataArray[("x", "y"), float]
    x: DataArray["x", int] = 0
    y: DataArray["y", int] = 0

Add dataset module

Similar to dataarrayclass, introduce a class decorator datasetclass.
Here is an example code to express the dataset of xarray's docs.

from xarray_dataclasses.dataset import datasetclass
from xarray_dataclasses.typing import Coord, Data


@datasetclass
class Weather:
    # data variables
    temperature: Data[('x', 'y', 'time'), float]]
    precipitation: Data[('x', 'y', 'time'), float]]

    # dimensions
    x: Coord['x', int] = 0
    y: Coord['y', int] = 0
    time: Coord['time', 'datetime64[ns]'] = '2021-01-01'

    # coordinates
    lon: Coord[('x', 'y'), float] = 0.0
    lat: Coord[('x', 'y'), float] = 0.0
    reference_time: Coord[(), 'datetime64[ns]'] = '2021-01-01'

Fix type hints

  • Update Python codes to be compatible with the strict mode of Pyright
  • Fix the version of Pyright

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.