timothyas / xesn Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 1.0 1.9 MB

Large scale Echo State Networks powered by xarray and dask.

Home Page: https://xesn.readthedocs.io

License: Other

Python 35.49% Jupyter Notebook 64.43% Shell 0.08%

xesn's Introduction

xesn

Echo State Networks powered by xarray and dask.

Description

xesn is a python package for implementing Echo State Networks (ESNs), a particular form of Recurrent Neural Network originally introduced by Jaeger (2001). The main purpose of the package is to enable ESNs for relatively large scale weather and climate applications, for example as by Smith et al., (2023) and Arcomano et al., (2020). The package is designed to strike the balance between simplicity and flexibility, with a focus on implementing features that were shown to matter most by Platt et al., (2022).

xesn uses xarray to handle multi-dimensional data, relying on dask for parallelization and to handle datasets/networks that are too large for a single compute node. At its core, xesn uses numpy and cupy for efficient CPU and GPU deployment.

Installation

Installation from conda-forge

conda install -c conda-forge xesn

Installation from pip

pip install xesn

Installation from source

git clone https://github.com/timothyas/xesn.git
cd xesn
pip install -e .

Note that additional dependencies can be installed to run the unit test suite::

pip install -e .[test]
pytest xesn/test/*.py

Getting Started

To learn how to use xesn, check out the documentation here

Get in touch

Report bugs, suggest features, or view the source code on GitHub.

License and Copyright

xesn is licensed under version 3 of the GNU Lesser General Public License.

Development occurs on GitHub at https://github.com/timothyas/xesn.

xesn's People

Contributors

Watchers

Forkers

tse-chunchen

xesn's Issues

Installation guide, contributing, etc

Finalize the installation guide, contributor guidelines, API docs, statement of need as outlined here

LazyESN boundary

Need to map from named dict to numbered axes.

Basic Sanity Check Scaling

Test with SQG dataset:

micro_training
testing
- evaluate appending to zarr store, different looping here ... is this ultimately sped up by using GPUs
macro_calibration
- test when .persist, .compute are called in cost
- test creation of empty dask containers vs appending to lists (in _cost and call)
- should we call compute in call? or leave it to user?

Check overlap arg for time, add it by default or something

And also, convert string labelled dimensions to integers

Bias Sparse check

Should check to make sure is_sparse is not in the kwargs for bias.

Basic micro training

Want basic capability, using driver to train an ESN

Spiff up the readme

high level overview of these required aspects of documentation specified here

Write the paper.md

And send it to co authors

Docstrings part 2

Un-Normalize Data

Data are normalized before going into ESN. When making predictions via driver, we need to pop them back.

Convert data_chunks -> esn_chunks... or something

Dask Handling

Consolidate and possibly generalize the creation of a dask cluster to do the work.

Data normalization

Add in basic capability to normalize by computed "std" and "minmax"

esn_weights section in config

this seems somewhat annoying, is there a better way to do this?

Finalize Documentation and Package Scope

For instance, get rid of ddc references in docs. for example will have to either import a package that has L96, code it up, or store it in a dataset... or something.

add some attributes to test results

when driver.run_test writes out test results, it would be nice to include things like

what type of esn created it, and the parameters used to create that esn
sample index, maybe, connected to original dataset store
...

Add a license

and make sure it's osi compliant

Documentation Structure

Create the overall structure

Overview / methods
a. ESN methods
b. LazyESN / distributed methods
Example usage of each component
a. Data preparation
b. ESN usage with reference to matrix generation
c. LazyESN usage
d. Macro training usage
e. Driver macro_training / training / testing

All done except Macro training example and driver example in #58

Inference/prediction

Just do basic inference/prediction for testing

Sample indices for macro_training and testing

This should be re-worked to how it was in the original repo, so that the index marks the start of the prediction, not the start of the spinup. That way it's more useful for comparing to other non RNN methods.

Make validation -vs- macro_calibration and training -vs- micro_calibration consistent

Figure out print_log with xdata, cost, etc

As opposed to before, xdata and cost will be handled by entirely different modules (most likely). Will need to figure out how to get the log info into one file.

Convert existing example usage notebooks to docs

Initialize documentation

Port over our sphinx setup

Decide how to specify using LazyESN or ESN in yaml/config

Docstrings

Make sure the docstrings are minimal but useful

xarray in ESN/LazyESN

and grab .data in lazy or .values in esn accordingly

Spectral Macro Optimization

Add in the PSD/KE_RMSE cost function.

Handle data_chunks as tuple/dict etc

and do we need to specify the time chunk, since it's always 1?

RMSE Macro Optimization

Use SMT to do an RMSE based optimization.

Chunking

LazyESN (re)chunks the data at the last minute before training and prediction. The input chunks could be different though (e.g., smaller zarr chunks that are mapped to larger working/dask chunks for operation. A test should be added to make sure that this is happening.

Create a driver micro/macro cost example usage

Actually create a macro_training example, as well as a driver example that does all three (training, macro_training, and testing).

input/adjacency/bias factors

it's awkward to separately specify input_factor/adjacency_factor/bias_factor in config... maybe just always specify a dict?

Also, bias -> bias_factor