Giter Site home page Giter Site logo

nansencenter / dapper Goto Github PK

View Code? Open in Web Editor NEW
328.0 27.0 119.0 6.52 MB

Data Assimilation with Python: a Package for Experimental Research

Home Page: https://nansencenter.github.io/DAPPER

License: MIT License

Python 91.44% Makefile 0.14% Fortran 8.22% Shell 0.20%
data-assimilation enkf kalman-filtering state-estimation particle-filter kalman bayesian-methods bayesian-filter chaos

dapper's Introduction

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The numerical experiments provide support and guidance for new developments in DA. The typical set-up is a synthetic (twin) experiment, where you specify a dynamic model and an observational model, and use these to generate a synthetic truth (multivariate time series), and then estimate that truth given the models and noisy observations.

Github CI Coveralls pre-commit PyPI - Version PyPI - Downloads

Getting started

Install, then read, run and try to understand examples/basic_{1,2,3}.py. Some of the examples also exist as Jupyter notebooks, and can be run in the cloud without installation (but requiring Google login): Open In Colab. This screencast provides an overview to DAPPER. The documentation includes general guidelines and the API reference, but most users must expect to read the code as well. If used towards a publication, please cite as The experiments used (inspiration from) DAPPER [ref], version 1.6.0, or similar, where [ref] points to DOI. Also see the interactive tutorials on DA theory with Python.

Highlights

DAPPER enables the numerical investigation of DA methods through a variety of typical test cases and statistics. It (a) reproduces numerical benchmarks results reported in the literature, and (b) facilitates comparative studies, thus promoting the (a) reliability and (b) relevance of the results. For example, the figure below is generated by examples/basic_3.py, reproduces figure 5.7 of these lecture notes. DAPPER is (c) open source, written in Python, and (d) focuses on readability; this promotes the (c) reproduction and (d) dissemination of the underlying science, and makes it easy to adapt and extend.

Comparative benchmarks with Lorenz-96 plotted as a function of the ensemble size (N)

DAPPER demonstrates how to parallelise ensemble forecasts (e.g., the QG model), local analyses (e.g., the LETKF), and independent experiments (e.g., examples/basic_3.py). It includes a battery of diagnostics and statistics, which all get averaged over subdomains (e.g., "ocean" and "land") and then in time. Confidence intervals are computed, including correction for auto-correlations, and used for uncertainty quantification, and significant digits printing. Several diagnostics are included in the on-line "liveplotting" illustrated below, which may be paused for further interactive inspection. In summary, DAPPER is well suited for teaching and fundamental DA research. Also see its drawbacks.

EnKF - Lorenz-96

Installation

Successfully tested on Linux/Mac/Windows.

Prerequisite: Python>=3.9

If you're an expert, setup a python environment however you like. Otherwise: Install Anaconda, then open the Anaconda terminal and run the following commands:

conda create --yes --name dapper-env python=3.9
conda activate dapper-env
python --version

Ensure the printed version is 3.9 or more.
Keep using the same terminal for the commands below.

Install

Either: Install for development (recommended)

Do you want the DAPPER code available to play around with? Then

  • Download and unzip (or git clone) DAPPER.
  • Move the resulting folder wherever you like,
    and cd into it (ensure you're in the folder with a setup.py file).
  • pip install -e '.'

Or: Install as library

Do you just want to run a script that requires DAPPER? Then

  • If the script comes with a requirements.txt file that lists DAPPER, then do
    pip install -r path/to/requirements.txt.
  • If not, hopefully you know the version of DAPPER needed. Run
    pip install dapper==1.6.0 to get version 1.6.0 (as an example).

Finally: Test the installation

You should now be able to do run your script with python path/to/script.py.
For example, if you are in the DAPPER dir,

python examples/basic_1.py

PS: If you closed the terminal (or shut down your computer), you'll first need to run conda activate dapper-env

DA methods

Method Literature reproduced
EnKF 1 Sakov08, Hoteit15, Grudzien2020
EnKF-N Bocquet12, Bocquet15
EnKS, EnRTS Raanes2016
iEnKS / iEnKF / EnRML / ES-MDA 2 Sakov12, Bocquet12, Bocquet14
LETKF, local & serial EAKF Bocquet11
Sqrt. model noise methods Raanes2014
Particle filter (bootstrap) 3 Bocquet10
Optimal/implicit Particle filter 3 Bocquet10
NETF Tödter15, Wiljes16
Rank histogram filter (RHF) Anderson10
4D-Var
3D-Var
Extended KF
Optimal interpolation
Climatology

1: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.). Serial forms are also available.
Tuned with inflation and "random, orthogonal rotations".
2: Also supports the bundle version, and "EnKF-N"-type inflation.
3: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

For a list of ready-made experiments with suitable, tuned settings for a given method (e.g., the iEnKS), use:

grep -r "xp.*iEnKS" dapper/mods

Test cases (models)

Simple models facilitate the reliability, reproducibility, and interpretability of experiment results.

Model Lin TLM** PDE? Phys.dim. State len Lyap≥0 Implementer
Id Yes Yes No N/A * 0 Raanes
Linear Advect. (LA) Yes Yes Yes 1d 1000 * 51 Evensen/Raanes
DoublePendulum No Yes No 0d 4 2 Matplotlib/Raanes
Ikeda No Yes No 0d 2 1 Raanes
LotkaVolterra No Yes No 0d 5 * 1 Wikipedia/Raanes
Lorenz63 No Yes "Yes" 0d 3 2 Sakov
Lorenz84 No Yes No 0d 3 2 Raanes
Lorenz96 No Yes No 1d 40 * 13 Raanes
Lorenz96s No Yes No 1d 10 * 4 Grudzien
LorenzUV No Yes No 2x 1d 256 + 8 * ≈60 Raanes
LorenzIII No No No 1d 960 * ≈164 Raanes
Vissio-Lucarini 20 No Yes No 1d 36 * 10 Yumeng
Kuramoto-Sivashinsky No Yes Yes 1d 128 * 11 Kassam/Raanes
Quasi-Geost (QG) No No Yes 2d 129²≈17k ≈140 Sakov
  • *: Flexible; set as necessary
  • **: Tangent Linear Model included?

The models are found as subdirectories within dapper/mods. A model should be defined in a file named __init__.py, and illustrated by a file named demo.py. Most other files within a model subdirectory are usually named authorYEAR.py and define a HMM object, which holds the settings of a specific twin experiment, using that model, as detailed in the corresponding author/year's paper. A list of these files can be obtained using

find dapper/mods -iname '[a-z]*[0-9]*.py'

Some files contain settings used by several papers. Moreover, at the bottom of each such file should be (in comments) a list of suitable, tuned settings for various DA methods, along with their expected, average rmse.a score for that experiment. As mentioned above, DAPPER reproduces literature results. You will also find results that were not reproduced by DAPPER.

Similar projects

DAPPER is aimed at research and teaching (see discussion up top). Example of limitations:

  • It is not suited for very big models (>60k unknowns).
  • Non-uniform time sequences.

The scope of DAPPER is restricted because

framework_to_language

Moreover, even straying beyond basic configurability appears unrewarding when already building on a high-level language such as Python. Indeed, you may freely fork and modify the code of DAPPER, which should be seen as a set of templates, and not a framework.

Also, DAPPER comes with no guarantees/support. Therefore, if you have an operational or real-world application, such as WRF, you should look into one of the alternatives, sorted by approximate project size.

Name Developers Purpose (approximately)
DART NCAR General
PDAF AWI General
JEDI JCSDA (NOAA, NASA, ++) General
OpenDA TU Delft General
EMPIRE Reading (Met) General
ERT Statoil History matching (Petroleum DA)
PIPT CIPR History matching (Petroleum DA)
MIKE DHI Oceanographic
OAK Liège Oceanographic
Siroco OMP Oceanographic
Verdandi INRIA Biophysical DA
PyOSSE Edinburgh, Reading Earth-observation DA

Below is a list of projects with a purpose more similar to DAPPER's (research in DA, and not so much using DA):

Name Developers Notes
DAPPER Raanes, Chen, Grudzien Python
SANGOMA Conglomerate* Fortran, Matlab
hIPPYlib Villa, Petra, Ghattas Python, adjoint-based PDE methods
FilterPy R. Labbe Python. Engineering oriented.
DASoftware Yue Li, Stanford Matlab. Large inverse probs.
Pomp U of Michigan R
EnKF-Matlab Sakov Matlab
EnKF-C Sakov C. Light-weight, off-line DA
pyda Hickman Python
PyDA Shady-Ahmed Python
DasPy Xujun Han Python
DataAssim.jl Alexander-Barth Julia
DataAssimilationBenchmarks.jl Grudzien Julia, Python
EnsembleKalmanProcesses.jl Clim. Modl. Alliance Julia, EKI (optim)
Datum Raanes Matlab
IEnKS code Bocquet Python

The EnKF-Matlab and IEnKS codes have been inspirational in the development of DAPPER.

*: AWI/Liege/CNRS/NERSC/Reading/Delft

Contributing

Issues and Pull requests

Do not hesitate to open an issue, whether to report a problem or ask a question. It may take some time for us to get back to you, since DAPPER is primarily a volunteer effort. Please start by perusing the documentation and searching the issue tracker for similar items.

Pull requests are very welcome. Examples: adding a new DA method, dynamical models, experimental configuration reproducing literature results, or improving the features and capabilities of DAPPER. Please keep in mind the intentional limitations and read the developers guidelines.

Contributors

Patrick N. Raanes, Yumeng Chen, Colin Grudzien, Maxime Tondeur, Remy Dubois

DAPPER is developed and maintained at NORCE (Norwegian Research Institute) and the Nansen Environmental and Remote Sensing Center (NERSC), in collaboration with the University of Reading, the UK National Centre for Earth Observation (NCEO), and the Center for Western Weather and Water Extremes (CW3E).

NORCE NERSC

Publications

dapper's People

Contributors

alin256 avatar aperrin66 avatar cgrudz avatar dafeda avatar patnr avatar yumengch avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dapper's Issues

Liveplot splitting then Joint Estimation

Take for example_1.py:

  1. How did you get the sliding_marginals to return x,y,z separately? Have you hard-coded params labels='xyz' to plot x,y,z separately? Otherwise would it handle slices as well?
  2. Could spatial1d handle slices? e.g. plot just x,y or [x,y] with a constant array like [0,1]
    Suppose you want to Jointly estimate the state and parameters, i.e. [x,y,z,sig,rho,beta] is the new model output.
  3. Can you separate sig, rho, beta easily/omit them from live plotting?

Much Appreciated!

Linear dynamic from `linear_model_setup()`, matrix power instead of multiplication.

Currently the linear_model_setup defines the model and jacob method as

  def model(x,t,dt): return dt*(ModelMatrix@x)
  def jacob(x,t,dt): return dt*ModelMatrix

If I understand it correctly, shouldn't it be something like this?

  def model(x,t,dt): return np.linalg.matrix_power(ModelMatrix, dt) @ x
  def jacob(x,t,dt): return np.linalg.matrix_power(ModelMatrix, dt)

Add Corona model using real data.

Cross-validate with "future".

Example should say:

- DAPPER is not made to assimilate real data cases.
- It requires a truth time series for its statistics and liveplotting.
- Corona1 model is ridiculously oversimplified, using no compartments.
- It is not chaotic, but highly sensitive to the R parameter,
  which is modulated by societal intervention.
- The biggest question seems to be the modelling of the observation operator.
    - How to deal with numbers counted on different days.
    - How to deal with test volume vs number of positives.

UncertainQtty string formatting, feature or bug?

Sorry for my recent silence.

function mean_with_conf seems to be in the quite high rank of the TODO list.
One part of the code is

if (not np.isfinite(mu)) or N <= 5:
    uq = UncertainQtty(mu, np.nan)

This means for short time series, the confidence interval is NaN, which is understandable.

However, a quick test:

>>> np.random.random(4)
array([0.83974936, 0.80939852, 0.9260896 , 0.93455182])
>>> xx = np.random.random(4)
>>> mean_with_conf(xx)
UncertainQtty(val=0.475499999999999978239628717346931807696819305419921875000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, conf=nan)

The reason is that, in __str__ of class UncertainQtty, n = log10int(c) is -300 since c is NaN.
This leads to a format of %0.300f for the mean value.
I wonder whether it is designed by purpose or a bug?

Thank you.

requirements.txt

I hate dependency management. I never quite learnt the inns and outs (who has?). Recently I started using poetry for my libraries, which is great, but I think DAPPER will stick with anaconda/setup.py for the foreseeable future (Why?). I don't want to use Anaconda's environment.yaml file because sometimes I like to try installing without Anaconda.

However, setup.py:install_requires only lists the direct dependencies. Not the sub-dependencies. This means other installs may get different (untested) environments. Since DAPPER is an app, not a library/package, we need to freeze all dependencies. So we need to do

  • So
    • when we feel like it (i.e. for a new release)
    • or when we a new package to install_requires

we need to

  • Create a new, clean environment
  • Update versions in install_requires if desired
  • Install DAPPER
  • pip freeze > requirements.txt.
    Unfortunately this creates yet another file.

Also, we need to update the install instructions in README to use requirements.txt. Unfortunately, we cannot include the python version in requirements.txt. This is frustrating.

And the version specifications in install_requires needs revision. Should we pin them? The requirements.txt file will already use pinned versions, so it might not be necessary. It would also require more manual editing when updating the direct dependencies.

@yumengch Any thoughts?

Main reference: https://stackoverflow.com/a/33685899

Edit:
Instead of requirements.txt, requirements-dev.txt, etc, maybe the pinned lists could be included in setupy.py as well somehow?

Edit:
This is probably a more useful reference: https://news.ycombinator.com/item?id=11210370
The discussion predates pipenv/poetry, but that's ok here.

Create example script for variable-length obs/states

Be conservative about the changes necessary to the core of DAPPER.

ATM, this requires pretending variable-length state or obs are fixed-length. This enables using np.ndarrays without hassle, but includes overhead.

  • Is "pretending" compatible with all DA methods?
  • How should RMSEs be calculated?

Simplify, improve and generalize (total rewrite?) time sequence management

  • At the moment, t (absolute time) is "too important" in the code,
    compared to k (the time index). For example,

    • dxdt have signature dxdt(x,t,dt) instead of dxdt(x,k,dt).
      But a common situation is that dxdt needs to look-up some
      property (e.g. parmeter value) from a pre-defined table.
      For that purpose, k is better suited that t because it is
      and integer, not a fload.
    • If models simply have the signature HMM.dyn(x,k)
      (i.e. ticker would only yield k and kObs)
      then variable dt is effectively supported.
  • Change KObs to KObs-1 ?

  • Highlight the fact that t=0 is special

    • There shoult not be any obs then
    • Some of the plotting functionality are maybe
      assuming that tt[0] == 0.

Running On windows

Hello,

I am trying to run the Dapper with Windows and receiving following error.

Truth & Obs: 100%|██████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 14237.16it/s] Initializing liveplotting... Press <Enter> to toggle live plot ON/OFF. Press <Space> and then <Enter> to pause. EnKF: 0%| | 0/3000 [00:00<?, ?it/s] Traceback (most recent call last): File "example_1.py", line 17, in <module> stats = config.assimilate(setup,xx,yy) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\admin.py", line 99, in assim_caller assimilator(stats,setup,xx,yy) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\da_methods.py", line 32, in assimilator stats.assess(k,kObs,E=E) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\stats.py", line 142, in assess self.lplot.update(k,kObs,**state_prms) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\viz.py", line 448, in update if self.skip_plotting(): return File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\viz.py", line 433, in skip_plotting key = poll_input() # =None if <space> was pressed above File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\utils.py", line 62, in poll_input i,o,e = select.select([sys.stdin],[],[],0.0001) OSError: [WinError 10038] An operation was attempted on something that is not a socket

Let me know if there are some other settings which are required to be done to run Dapper with windows-10.

Thank you

assignment by reference of HMM in todo

Hi Patrick

I finally get some time to look at the DAPEPR again and I was thinking about the following issue in the TODO.

Consider this code:

from dapper.mods.Lorenz63.sakov2012 import HMM1
from dapper.mods.Lorenz63.wiljes2017 import HMM as HMM2

Document (somewhere) that HMM1.t is changed by the second import, because it itself imports and changes HMM1. This became a bug in test_example_2 when pytest was run with --doctest-modules because --doctest-modules goes through all of the modules.

I don't see the mentioned problem when I run pytest --doctest-modules. It is possible that I use it wrong.

However, I can see why it becomes a problem due to the assignment by reference.

The goal is to make HMM in dapper.mods.Lorenz63.sakov2012 and dapper.mods.Lorenz63.wiljes2017 two independent variable with independent reference.

A simple yet possibly inelegant method is to use copy in dapper.mods.Lorenz63.wiljes2017:

1,40c1,43
< from dapper.mods.Lorenz63.sakov2012 import HMM, Nx
---
> from copy import copy
>
> from dapper.mods.Lorenz63.sakov2012 import HMM as _HMM
> from dapper.mods.Lorenz63.sakov2012 import Nx as _Nx
>
> Nx = _Nx
> HMM = copy(_HMM)

An alternative is adding a copy method to HMM (hiding the copy import in the dapper.mods.__init__):

import copy as cp
class HiddenMarkovModel(struct_tools.NicePrint):
    ...
    def copy(self):
        return cp.copy(self)

This will give:

1,40c1,43
< from dapper.mods.Lorenz63.sakov2012 import HMM, Nx
---
> from dapper.mods.Lorenz63.sakov2012 import HMM as _HMM
> from dapper.mods.Lorenz63.sakov2012 import Nx as _Nx
>
> Nx = _Nx
> HMM = _HMM.copy()

I don't know any other way myself. Any better suggestions? I can make the changes in other mods as well if you want.

Issue when length=1 1D array is returned from observation function

Hi,
first of all, thanks for the amazing software you build, all the work you put into it as well as making it open source for everyone.

When familiarizing myself with the software using a very simple model like this:

# measurement function: sample polynomial by time
@ens_compatible
def hxt(x, t):
    # case x is 1-dimensional (as in the simulation part)
    if len(x.shape) == 1:
        r = x[0]*t**2 + x[1] * t + x[2]
    else: # case of N different parameters (ensemble)
        r = x[0,:]*t**2 + x[1,:] * t + x[2,:]
    return r

@ens_compatible
def model_step(x0, t0, dt):
    return x0

# write the function for model state update in an Operator class compatible dict
Dyn = {
    'M': Nx,
    'model': step,
    'noise': 0
    }

# write the measurement function which maps states to observations to an Operator class compatible dict
Obs = {
    'M': 1,
    'model': NamedFunc(hxt, 'MapStatesP'),
    'noise': 1.5
    }

I get an error in the EnKF function. However, if I artificially return a length 2 1-D array by returning r twice, the model works fine.

Am I doing something wrong, or is that a bug?

I will give detailed error info later, since I do not have full access to my system and codes.

Create example script for real-world data

Be conservative about the changes necessary to the core of DAPPER.

This is not part of the main ambitions of DAPPER, but should be possible to support in a limited capacity.

round2 function in math module

I'm thinking about adding some unit tests and tried to do some simple tests for the round2 function and understand its behaviour.

It works with integer input for the number of significant figures.

With decimal input which is viewed as the precision, it can generate results I don't understand:

For example:

>>> round2(1234.4, 2.7)
>>> array(1234.)
>>> round2(1234.4, 2.5)
>>> array(1235.)
>>> dpr.round2(1234.45, 0.3)
>>> array(1234.5)
>>> dpr.round2(1234.45, 0.4)
>>> array(1234.4)

Such behaviour seems appear whenever the last digit of the precision is not 1.
I believe the reason for such behaviour arises from the use of function:
_round2prec(num, prec)

I wonder if this is done by intention, or if the precision should be limited in the form of 0.1, 0.01, etc.

Figure numbers

For liveplotting, the figure numbers are assigned when defining the liveplots for the model. See lorenz63/__init__.py.

This is fine. Except there might be conflicting numbers. E.g. with the defaults defined at the bottom of liveplotting.py.

But requesting specific liveplotters via assimilate() then requires remembering these numbers. This is inelegant. It should be done by (fuzzy?) names instead.

Diagonal CovMat sorting

from dapper import *
d = [.3, 1, 2, 1]
print(np.diag(CovMat(d).full)) # yields [1.  0.3 1.  2. ]

Move documentation from L63 and L96

  • Move documentation from L63 and L96, which is in the form of code comments,
    into its own doc page (maybe mods/__init__.py).
    • Expand upon these comments. For example:
    • Note that OOP (exemplifed with QG, LUV, etc) is more flexible & robust,
      and therefore better suited when different parameter settings are to be
      investigated. Caution: for parameter-estimation, the parameters are input
      variables to the "forward model", which does not necessarily require OOP.
    • Note that the mod settings files do not keep a very high programming
      standard.
    • Note that HMMs can be copied to create new settings.
    • Note pitfalls (eg changing Force)
    • To simplify L63 even more for beginners, move d2x_dtdx, dstep_dx, and
      LPs from mods/L63/__init__.py into mods/L63/extras.py ?

ModuleNotFoundError: No module named 'dill'

Thank u for your project.How could I resolve this error?
from resources.workspace import *

Initializing DAPPER...

ModuleNotFoundError Traceback (most recent call last)
in
1 # Run this cell now.
----> 2 from resources.workspace import *

E:\projectsSpace\pythonspace\DA-tutorials-master\resources\workspace.py in
6
7 # Load DAPPER
----> 8 from dapper import *
9
10 # Load answers

e:\projectsspace\pythonspace\dapper-master\dapper_init_.py in
155 from .tools.data_management import *
156 from .stats import *
--> 157 from .admin import *
158 from .da_methods.ensemble import *
159 from .da_methods.particle import *

e:\projectsspace\pythonspace\dapper-master\dapper\admin.py in
549
550
--> 551 import dill
552 def save_data(name,*args,**kwargs):
553 """"Utility for saving experimental data.

ModuleNotFoundError: No module named 'dill'

Plan relases

Make roadmaps for v1.2 and v2.

Ideally, this should consist of "enhancement" issues that get tagged with the "project" labels v1.2 and v2.

Discuss

Keyboard input for Liveplot

I notice that several issues in the Github are related to liveplotting.

I avoided the liveplotting module since beginning and I think I should take this chance to check it.

I made some tests under Windows Subsystem for Linux (WSL), which is a Ubuntu system.

  • I tested with the new param_estim.py with liveplotting. The function read1 blocks the update (the program pauses its running) until I provide keyboard input for figure update at each time step.
  • test_plotting does not require keyboard input in the runtime because it sets user_interaction to False and read1 always returns None.

This running result leads me to some design questions:
Is the blocking of program run by read1 done by intention?
If so:

  • Is the pause functionality still necessary? As the liveplotting does not continue unless an arbitrary keyboard input, except Enter, Space and i, is given.
  • It might not be necessary to print information about keyboard input if user_interaction is False.

If it is a bug:

  • I guess there is a problem with the non-blocking read in certain environment and I will check it in Windows environment as well.

DA multilayer shallow water

Hi,

I am working on setting up a parallelized (MPI) python multilayer shallow water code and would like eventually to do DA with it.

The typical problems I will look at will have state vectors of the following size approx.: 10 vars x O(1000)^2 grid points.

Will I be able to use dapper to do that?
If yes, what should I look for when setting up the code
If not, what would be a good alternative?

Disclaimer I am a physical oceanographer and not a DA expert.

cheers

Liveplotting in Jupyter notebook ?

Is there a way to have the live plots work in a Jupyter notebook, or at least to have the replays?
Maybe something with the matplotlib backend?

Sorry if it is a naive question, I find this package amazing and am only learning to use it.

Update Bocquet book/figure link

Link in README. Google books preview no longer shows the figure. Why?

Anyways, we can just link to the lecture notes instead.

Modular iEnKS

Make iEnKS code more modular.
In particular, the analysis should be separated
from the time stepping mechanisms.

iEnKS implementation

I was looking at the code for iEnKS in DAPPER and I realise it is a bit different from the Bocquet and Sakov 2013 (BS13) paper.

Within the Gauss-Newton iteration, BS13 does not update the ensemble anomaly. In DAPPER, you update the ensemble anomaly in each iteration. For SDA, the observation anomaly is ‘de-conditioned’, and the ‘de-condition’ is not applied to MDA. This treatment is intuitively reasonable as this propagate the new observation assimilation into all DA window, but I wonder if you know/have any literature/reference that can back this treatment?

Thanks!

MPL backend installation

Right now QtAgg is among the "extras" of setup.py.
That could mean an interactive matplotlib (mpl) backend might not get installed unless using pip install -e .[Qt].

However, I almost always seem to get an interactive backend installed anyways. My recent fresh install came with both MacOSX and TkAgg backends. I have a feeling it is because the environment was created with Anaconda, which provides some things to begin with, but I'm not sure.

So:

  • How to guarantee installation of an interactive backend?
  • Should we require using one in particular, or (somehow) make it platform dependent? For example,
    • I believe Qt5Agg is cross-platform, and can be installed via pip only, but slower & uglier than MacOSX.
    • I believe TkAgg is cross-platform, but might require complicated installation instructions beyond pip. However, Anaconda seems (?) to provide it automatically.

@yumengch Thoughts?

Examples running in Colab

Make Colab work for examples.

To install DAPPER, this should suffice:

# !pip install git+https://github.com/nansencenter/DAPPER.git

But DAPPER requires that Colab upgrades to python 3.7 (see setup.py for why), and I don't know when that will happen.
Possible workarounds:

The .ipynb examples should be synced with the .py versions via jupytext, using a minimally-intrusive syntax.

Edit: maybe just use binder?

Discussion for a dapper.da_methods.__init__.py

Hi Patrick

I really like the way that all new DA methods are using OOP style.
And I understand that dataclasses provides some nice features.
However, I feel the treatment is a bit complex to me.
I understand this is an issue of personal preference, so I just want to share a bit a possible alternative for the implementation.
I think the ups is that it introduces abstract class to ensure the existence of assimilate function and avoids the use of decorators.
The down is that we might need to add some features by ourselves.
How do you think about it?

import time
import abc
import numpy as np
from struct_tools import NicePrint

import dapper.stats


class da_method(abc.ABC, NicePrint):
    printopts = dict(
        excluded=NicePrint.printopts["excluded"]+["assimilator", "assimilate"]
    )

    def __init__(self, **kwargs):
        # Use -5 to avoid object,
        # abc.ABC, NicePrint and da_method class
       # A similar loop also appears in dataclasses code.
        for b in self.__class__.__mro__[-5::-1]:
            for key in b.__annotations__.keys():
                value = getattr(b, key, None)
                if key in kwargs.keys():
                    setattr(self, key, kwargs[key])
                elif value is not None:
                    setattr(self, key, value)
                else:
                    raise TypeError(f"{key} must have a value")

        self.assimilator = self.assimilate
        self.assimilate = self._assimilate

    def _assimilate(self, HMM, xx, yy, desc=None, **stat_kwargs):
        # Progressbar name
        pb_name_hook = self.__class__.__name__ if desc is None else desc # noqa
        # Init stats
        # self.stats = dapper.stats.Stats(self, HMM, xx, yy, **stat_kwargs)
        # Assimilate
        time_start = time.time()
        self.assimilator(HMM, xx, yy)
        # dapper.stats.register_stat(self.stats, "duration", time.time()-time_start)

    @abc.abstractmethod
    def assimilate(HMM, xx, yy):
        pass

    def __repr__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__repr__()

    def __str__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__str__()


class ens_method(da_method):
    """Declare default ensemble arguments."""
    infl: float        = 1.0
    rot: bool          = False
    fnoise_treatm: str = 'Stoch'


class EnKF(ens_method):
    """Declare default ensemble arguments."""
    upd_a: str
    N: int

    def assimilate(self, HMM, xx, yy):
        print("Running assimilation")

UncertainQtty unexpected result

Bug Description:

Motivated by the output in test_example_2.py, using 7.7676 ±1.2464 as input:

>>> from dapper.tools.rounding import UncertainQtty
>>> uq = UncertainQtty(7.7676, 1.2464)
>>> print(uq)
UncertainQtty(val=8, prec=1.0)

This is in contrast to the expected UncertainQtty(val=8.0, prec=1.0) or UncertainQtty(val=8, prec=1). I think the latter is preferred.

Cause of the bug:

Class UncertainQtty prints the first significant figure of the prec with the following code:

>>> np.round(1.2464 ,0)
1.0
>>> np.round(0.2464 ,0)
0.0

Possible Solutions:

Turn float to int if the first significant figure is ahead of the decimal separator.

Note

This issue is related to an old version TODO, which speculates the use of uq in the xps.tabulate_avrgs and xpSpace.print.

The difficulty is that, uq cannot be used directly for the output because it does not allow the designated decimals for its output. test_example_2.py has examples for decimals = 4.

Any comments?

Support list, dict, ndarray and other non-trivial params da_method

For example, Var3D(B=B) should work, where B is an ndarray background matrix.

  • The problem is largely that == comparison with ndarray fails in split_attrs. I'm not sure about other non-trivial types.
  • When printing xpLists multiline attributes will be an issue. I suppose one should insert the id in that case.

run time error in example_3.py

The installation and tests of DAPPER on ubuntu 16.04 LTS (conda/anaconda python 3.7) was performed. The first two example_1.py and example_2.py run without any issues. But both example_3.py F and N did not complete. Both return an error in LETKF executions, where loc_rad='$' (F run) or N='?' and loc_rad='? (N run) it seems.

(py37) mon137@smudge-hf:~/projects/v14/python_work/DAPPER$ python -i example_3.py N
Initializing DAPPER...Done
Will save to data/example_3/smudge-hf/N_run3...

xticks[0/831] N: 2
Truth & Obs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:01<00:00, 4944.07it/s]
Climatology: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:04<00:00, 1166.58it/s]
OptInterp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:08<00:00, 639.15it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:11<00:00, 427.32it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 409.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:13<00:00, 382.22it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:13<00:00, 389.10it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 409.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 397.79it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:14<00:00, 347.07it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:14<00:00, 342.55it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:17<00:00, 292.61it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:17<00:00, 289.62it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:38<00:00, 134.29it/s]
LETKF: 0%| | 0/5120 [00:00<?, ?it/s]
Traceback (most recent call last):
File "example_3.py", line 127, in
stat = C.assimilate(HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 139, in assim_caller
raise ERR
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 120, in assim_caller
assimilator(stats,HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 612, in assimilator
for ii, (Eii, za) in zip(state_batches, result):
TypeError: 'NoneType' object is not iterable

exit()


(py37) mon137@smudge-hf:~/projects/v14/python_work/DAPPER$ python -i example_3.py F
Initializing DAPPER...Done
Will save to data/example_3/smudge-hf/F_run4...

xticks[0/543] F: 3
Truth & Obs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:01<00:00, 5057.07it/s]
Climatology: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:04<00:00, 1058.39it/s]
OptInterp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:08<00:00, 586.09it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 336.81it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 325.71it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 332.34it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 319.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 332.05it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 328.55it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 325.31it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 321.57it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 313.47it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 317.52it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:45<00:00, 113.07it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:40<00:00, 126.30it/s]
LETKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [01:06<00:00, 76.47it/s]
LETKF: 0%| | 0/5120 [00:00<?, ?it/s]
Traceback (most recent call last):
File "example_3.py", line 127, in
stat = C.assimilate(HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 139, in assim_caller
raise ERR
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 120, in assim_caller
assimilator(stats,HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 612, in assimilator
for ii, (Eii, za) in zip(state_batches, result):
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 580, in local_analysis
jj, tapering = obs_localizer(ii)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 188, in obs_localizer
return inds_and_coeffs(dists, radius, tag=tag)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 94, in inds_and_coeffs
coeffs = dist2coeff(dists, radius, tag)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 63, in dist2coeff
R = radius * 1.82 # Sakov: 1.7386
TypeError: can't multiply sequence by non-int of type 'float'

Cannot run single test anymore

I want the command
pytest
to run all tests.
That's why the addopts in pyproject.toml is so long.

But now it seems impossible to run just a single tests using
pytest tests/test_desired.py

So I think maybe the different purposes of testing (regular, doctests, coverage, etc) should be split,
and made into different tox test environments.

This might complicate the travis setup... we'll see

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.