nansencenter / dapper Goto Github PK

Data Assimilation with Python: a Package for Experimental Research

Home Page: https://nansencenter.github.io/DAPPER

License: MIT License

Python 91.44% Makefile 0.14% Fortran 8.22% Shell 0.20%

data-assimilation enkf kalman-filtering state-estimation particle-filter kalman bayesian-methods bayesian-filter chaos

dapper's Introduction

DAPPER is a set of templates for benchmarking the performance of data assimilation (DA) methods. The numerical experiments provide support and guidance for new developments in DA. The typical set-up is a synthetic (twin) experiment, where you specify a dynamic model and an observational model, and use these to generate a synthetic truth (multivariate time series), and then estimate that truth given the models and noisy observations.

Getting started

Install, then read, run and try to understand examples/basic_{1,2,3}.py. Some of the examples also exist as Jupyter notebooks, and can be run in the cloud without installation (but requiring Google login): . This screencast provides an overview to DAPPER. The documentation includes general guidelines and the API reference, but most users must expect to read the code as well. If used towards a publication, please cite as The experiments used (inspiration from) DAPPER [ref], version 1.6.0, or similar, where [ref] points to . Also see the interactive tutorials on DA theory with Python.

Highlights

DAPPER enables the numerical investigation of DA methods through a variety of typical test cases and statistics. It (a) reproduces numerical benchmarks results reported in the literature, and (b) facilitates comparative studies, thus promoting the (a) reliability and (b) relevance of the results. For example, the figure below is generated by examples/basic_3.py, reproduces figure 5.7 of these lecture notes. DAPPER is (c) open source, written in Python, and (d) focuses on readability; this promotes the (c) reproduction and (d) dissemination of the underlying science, and makes it easy to adapt and extend.

DAPPER demonstrates how to parallelise ensemble forecasts (e.g., the QG model), local analyses (e.g., the LETKF), and independent experiments (e.g., examples/basic_3.py). It includes a battery of diagnostics and statistics, which all get averaged over subdomains (e.g., "ocean" and "land") and then in time. Confidence intervals are computed, including correction for auto-correlations, and used for uncertainty quantification, and significant digits printing. Several diagnostics are included in the on-line "liveplotting" illustrated below, which may be paused for further interactive inspection. In summary, DAPPER is well suited for teaching and fundamental DA research. Also see its drawbacks.

Installation

Successfully tested on Linux/Mac/Windows.

Prerequisite: Python>=3.9

If you're an expert, setup a python environment however you like. Otherwise: Install Anaconda, then open the Anaconda terminal and run the following commands:

conda create --yes --name dapper-env python=3.9
conda activate dapper-env
python --version

Ensure the printed version is 3.9 or more.
Keep using the same terminal for the commands below.

Install

Either: Install for development (recommended)

Do you want the DAPPER code available to play around with? Then

Download and unzip (or git clone) DAPPER.
Move the resulting folder wherever you like,
and cd into it (ensure you're in the folder with a setup.py file).
pip install -e '.'

Or: Install as library

Do you just want to run a script that requires DAPPER? Then

If the script comes with a requirements.txt file that lists DAPPER, then do
pip install -r path/to/requirements.txt.
If not, hopefully you know the version of DAPPER needed. Run
pip install dapper==1.6.0 to get version 1.6.0 (as an example).

Finally: Test the installation

You should now be able to do run your script with python path/to/script.py.
For example, if you are in the DAPPER dir,

python examples/basic_1.py

PS: If you closed the terminal (or shut down your computer), you'll first need to run conda activate dapper-env

DA methods

Method	Literature reproduced
EnKF ¹	Sakov08, Hoteit15, Grudzien2020
EnKF-N	Bocquet12, Bocquet15
EnKS, EnRTS	Raanes2016
iEnKS / iEnKF / EnRML / ES-MDA ²	Sakov12, Bocquet12, Bocquet14
LETKF, local & serial EAKF	Bocquet11
Sqrt. model noise methods	Raanes2014
Particle filter (bootstrap) ³	Bocquet10
Optimal/implicit Particle filter ³	Bocquet10
NETF	Tödter15, Wiljes16
Rank histogram filter (RHF)	Anderson10
4D-Var
3D-Var
Extended KF
Optimal interpolation
Climatology

¹: Stochastic, DEnKF (i.e. half-update), ETKF (i.e. sym. sqrt.). Serial forms are also available.
Tuned with inflation and "random, orthogonal rotations".
²: Also supports the bundle version, and "EnKF-N"-type inflation.
³: Resampling: multinomial (including systematic/universal and residual).
The particle filter is tuned with "effective-N monitoring", "regularization/jittering" strength, and more.

For a list of ready-made experiments with suitable, tuned settings for a given method (e.g., the iEnKS), use:

grep -r "xp.*iEnKS" dapper/mods

Test cases (models)

Simple models facilitate the reliability, reproducibility, and interpretability of experiment results.

Model	Lin	TLM**	PDE?	Phys.dim.	State len	Lyap≥0	Implementer
Id	Yes	Yes	No	N/A	*	0	Raanes
Linear Advect. (LA)	Yes	Yes	Yes	1d	1000 *	51	Evensen/Raanes
DoublePendulum	No	Yes	No	0d	4	2	Matplotlib/Raanes
Ikeda	No	Yes	No	0d	2	1	Raanes
LotkaVolterra	No	Yes	No	0d	5 *	1	Wikipedia/Raanes
Lorenz63	No	Yes	"Yes"	0d	3	2	Sakov
Lorenz84	No	Yes	No	0d	3	2	Raanes
Lorenz96	No	Yes	No	1d	40 *	13	Raanes
Lorenz96s	No	Yes	No	1d	10 *	4	Grudzien
LorenzUV	No	Yes	No	2x 1d	256 + 8 *	≈60	Raanes
LorenzIII	No	No	No	1d	960 *	≈164	Raanes
Vissio-Lucarini 20	No	Yes	No	1d	36 *	10	Yumeng
Kuramoto-Sivashinsky	No	Yes	Yes	1d	128 *	11	Kassam/Raanes
Quasi-Geost (QG)	No	No	Yes	2d	129²≈17k	≈140	Sakov

*: Flexible; set as necessary
**: Tangent Linear Model included?

The models are found as subdirectories within dapper/mods. A model should be defined in a file named __init__.py, and illustrated by a file named demo.py. Most other files within a model subdirectory are usually named authorYEAR.py and define a HMM object, which holds the settings of a specific twin experiment, using that model, as detailed in the corresponding author/year's paper. A list of these files can be obtained using

find dapper/mods -iname '[a-z]*[0-9]*.py'

Some files contain settings used by several papers. Moreover, at the bottom of each such file should be (in comments) a list of suitable, tuned settings for various DA methods, along with their expected, average rmse.a score for that experiment. As mentioned above, DAPPER reproduces literature results. You will also find results that were not reproduced by DAPPER.

Similar projects

DAPPER is aimed at research and teaching (see discussion up top). Example of limitations:

It is not suited for very big models (>60k unknowns).
Non-uniform time sequences.

The scope of DAPPER is restricted because

Moreover, even straying beyond basic configurability appears unrewarding when already building on a high-level language such as Python. Indeed, you may freely fork and modify the code of DAPPER, which should be seen as a set of templates, and not a framework.

Also, DAPPER comes with no guarantees/support. Therefore, if you have an operational or real-world application, such as WRF, you should look into one of the alternatives, sorted by approximate project size.

Name	Developers	Purpose (approximately)
DART	NCAR	General
PDAF	AWI	General
JEDI	JCSDA (NOAA, NASA, ++)	General
OpenDA	TU Delft	General
EMPIRE	Reading (Met)	General
ERT	Statoil	History matching (Petroleum DA)
PIPT	CIPR	History matching (Petroleum DA)
MIKE	DHI	Oceanographic
OAK	Liège	Oceanographic
Siroco	OMP	Oceanographic
Verdandi	INRIA	Biophysical DA
PyOSSE	Edinburgh, Reading	Earth-observation DA

Below is a list of projects with a purpose more similar to DAPPER's (research in DA, and not so much using DA):

Name	Developers	Notes
DAPPER	Raanes, Chen, Grudzien	Python
SANGOMA	Conglomerate*	Fortran, Matlab
hIPPYlib	Villa, Petra, Ghattas	Python, adjoint-based PDE methods
FilterPy	R. Labbe	Python. Engineering oriented.
DASoftware	Yue Li, Stanford	Matlab. Large inverse probs.
Pomp	U of Michigan	R
EnKF-Matlab	Sakov	Matlab
EnKF-C	Sakov	C. Light-weight, off-line DA
pyda	Hickman	Python
PyDA	Shady-Ahmed	Python
DasPy	Xujun Han	Python
DataAssim.jl	Alexander-Barth	Julia
DataAssimilationBenchmarks.jl	Grudzien	Julia, Python
EnsembleKalmanProcesses.jl	Clim. Modl. Alliance	Julia, EKI (optim)
Datum	Raanes	Matlab
IEnKS code	Bocquet	Python

The EnKF-Matlab and IEnKS codes have been inspirational in the development of DAPPER.

*: AWI/Liege/CNRS/NERSC/Reading/Delft

Contributing

Issues and Pull requests

Do not hesitate to open an issue, whether to report a problem or ask a question. It may take some time for us to get back to you, since DAPPER is primarily a volunteer effort. Please start by perusing the documentation and searching the issue tracker for similar items.

Pull requests are very welcome. Examples: adding a new DA method, dynamical models, experimental configuration reproducing literature results, or improving the features and capabilities of DAPPER. Please keep in mind the intentional limitations and read the developers guidelines.

Contributors

Patrick N. Raanes, Yumeng Chen, Colin Grudzien, Maxime Tondeur, Remy Dubois

DAPPER is developed and maintained at NORCE (Norwegian Research Institute) and the Nansen Environmental and Remote Sensing Center (NERSC), in collaboration with the University of Reading, the UK National Centre for Earth Observation (NCEO), and the Center for Western Weather and Water Extremes (CW3E).

Publications

dapper's People

Contributors

Stargazers

Watchers

Forkers

alileoalilei remydubois johannah zhucer2003 lastproxy c67nobz mengbinzhu forgetbear davidsrussell marciafof sanakel fitsumw linix100 harshapbs yfa008 myoungl deephydro emited brajard ecomodeller soiltssm andrewpensoneault zhangallen98 fangqx datafusion18 geirev jimmy-ksu wk1984 evy-kong xclmj uzbekdev1 jmsdig lj-mu sciencewiki miaoneng codevariations tkoyama010 cecidip rameezu wme7 aydogduali haibianshifeng lijunde yiliouc anjingdetianfen sebastienbarthelemy clementetienam ddlddl58 sdwfrost xinleizhang zhh710 centurylee kyou-stu liuyuanqing bgh1369 shuowang-ai tripleess 1895-art rainbow1994 myying ualehosaini luoxinyue chinaray weimingkong passion4energy malzubairy cgrudz aperrin66 cycle13 yacine-prog elswit rohitpandey13 zhaozhongguo95 siavash-h yusheng-wang sreeds82 senyu30 wuxinwang1997 cxqs dafeda srinivas2036 kumaritesh jaqueszanon yoavfreund liao007 zhaomy93 zhangwei20d alin256 safeguardand yusufaydogdu wjswjx davidchoi76 dlmorenob ivopasmans-reading tchaikov-sky lkampoli dpshorten ensemblexinxin rmallof lesley08

dapper's Issues

Make autoscaler.py work also for other users.

I dont know if it's condor_- that outputs user-specific numbers, or glcoud resize command that is user-specific, or what.

Make separate figures for the 2 panels of sliding_diagnostics

Use class inheritance?

Also, for the err/rmse figure, one should create one panel for each sector of state vector, so that one can see separately the
land-rmse and ocean-rmse, for example.

Make superclasses for the filter, smoother, and iterative smoother

OOP and inheritance is a tantalizing prospect for organizing methods. Would it require big changes? Would it be worth it?

Liveplot splitting then Joint Estimation

Take for example_1.py:

How did you get the sliding_marginals to return x,y,z separately? Have you hard-coded params labels='xyz' to plot x,y,z separately? Otherwise would it handle slices as well?
Could spatial1d handle slices? e.g. plot just x,y or [x,y] with a constant array like [0,1]
Suppose you want to Jointly estimate the state and parameters, i.e. [x,y,z,sig,rho,beta] is the new model output.
Can you separate sig, rho, beta easily/omit them from live plotting?

Much Appreciated!

Linear dynamic from `linear_model_setup()`, matrix power instead of multiplication.

Currently the linear_model_setup defines the model and jacob method as

  def model(x,t,dt): return dt*(ModelMatrix@x)
  def jacob(x,t,dt): return dt*ModelMatrix

If I understand it correctly, shouldn't it be something like this?

  def model(x,t,dt): return np.linalg.matrix_power(ModelMatrix, dt) @ x
  def jacob(x,t,dt): return np.linalg.matrix_power(ModelMatrix, dt)

Add Corona model using real data.

Cross-validate with "future".

Example should say:

- DAPPER is not made to assimilate real data cases.
- It requires a truth time series for its statistics and liveplotting.
- Corona1 model is ridiculously oversimplified, using no compartments.
- It is not chaotic, but highly sensitive to the R parameter,
  which is modulated by societal intervention.
- The biggest question seems to be the modelling of the observation operator.
    - How to deal with numbers counted on different days.
    - How to deal with test volume vs number of positives.

De-indent level 1 of HMM print

To much whitespace is being used when printing HMMs

Enable fancy indexing for xpList

For example:

xps[xps.da_methods=="EnKF_N"]

Note that this already kindof exists for xpSpace

Do you have any benchmark for handling corrupt data sources?

I'm very new to DA. Is data corruption (out of bounds, drift, frozen state) modeled in DA? For example multiple measurements of the same observation, where few of them may drop out.

UncertainQtty string formatting, feature or bug?

Sorry for my recent silence.

function mean_with_conf seems to be in the quite high rank of the TODO list.
One part of the code is

if (not np.isfinite(mu)) or N <= 5:
    uq = UncertainQtty(mu, np.nan)

This means for short time series, the confidence interval is NaN, which is understandable.

However, a quick test:

>>> np.random.random(4)
array([0.83974936, 0.80939852, 0.9260896 , 0.93455182])
>>> xx = np.random.random(4)
>>> mean_with_conf(xx)
UncertainQtty(val=0.475499999999999978239628717346931807696819305419921875000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000, conf=nan)

The reason is that, in __str__ of class UncertainQtty, n = log10int(c) is -300 since c is NaN.
This leads to a format of %0.300f for the mean value.
I wonder whether it is designed by purpose or a bug?

Thank you.

requirements.txt

I hate dependency management. I never quite learnt the inns and outs (who has?). Recently I started using poetry for my libraries, which is great, but I think DAPPER will stick with anaconda/setup.py for the foreseeable future (Why?). I don't want to use Anaconda's environment.yaml file because sometimes I like to try installing without Anaconda.

However, setup.py:install_requires only lists the direct dependencies. Not the sub-dependencies. This means other installs may get different (untested) environments. Since DAPPER is an app, not a library/package, we need to freeze all dependencies. So we need to do

So
- when we feel like it (i.e. for a new release)
- or when we a new package to install_requires

we need to

Create a new, clean environment
Update versions in install_requires if desired
Install DAPPER
pip freeze > requirements.txt.
Unfortunately this creates yet another file.

Also, we need to update the install instructions in README to use requirements.txt. Unfortunately, we cannot include the python version in requirements.txt. This is frustrating.

And the version specifications in install_requires needs revision. Should we pin them? The requirements.txt file will already use pinned versions, so it might not be necessary. It would also require more manual editing when updating the direct dependencies.

@yumengch Any thoughts?

Main reference: https://stackoverflow.com/a/33685899

Edit:
Instead of requirements.txt, requirements-dev.txt, etc, maybe the pinned lists could be included in setupy.py as well somehow?

Edit:
This is probably a more useful reference: https://news.ycombinator.com/item?id=11210370
The discussion predates pipenv/poetry, but that's ok here.

Subclass collections.UserList for xpList?

I don't know if it would be any better though

Ref https://stackoverflow.com/q/25464647

Merge UncertainQtty with the ca class, including its eq ?

Rename Ca ?

Create example script for concurrent use of different models

Be conservative about the changes necessary to the core of DAPPER.

Typical use: one model for truth, another for the DA method.

Create example script for variable-length obs/states

Be conservative about the changes necessary to the core of DAPPER.

ATM, this requires pretending variable-length state or obs are fixed-length. This enables using np.ndarrays without hassle, but includes overhead.

Is "pretending" compatible with all DA methods?
How should RMSEs be calculated?

Simplify, improve and generalize (total rewrite?) time sequence management

At the moment, t (absolute time) is "too important" in the code,
compared to k (the time index). For example,
- dxdt have signature dxdt(x,t,dt) instead of dxdt(x,k,dt).
  But a common situation is that dxdt needs to look-up some
  property (e.g. parmeter value) from a pre-defined table.
  For that purpose, k is better suited that t because it is
  and integer, not a fload.
- If models simply have the signature HMM.dyn(x,k)
  (i.e. ticker would only yield k and kObs)
  then variable dt is effectively supported.
Change KObs to KObs-1 ?
Highlight the fact that t=0 is special
- There shoult not be any obs then
- Some of the plotting functionality are maybe
  assuming that tt[0] == 0.

Pause after Enter doesn't work on Ubuntu?

For liveplotting.

Running On windows

Hello,

I am trying to run the Dapper with Windows and receiving following error.

Truth & Obs: 100%|██████████████████████████████████████████████████████████████| 3000/3000 [00:00<00:00, 14237.16it/s] Initializing liveplotting... Press <Enter> to toggle live plot ON/OFF. Press <Space> and then <Enter> to pause. EnKF: 0%| | 0/3000 [00:00<?, ?it/s] Traceback (most recent call last): File "example_1.py", line 17, in <module> stats = config.assimilate(setup,xx,yy) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\admin.py", line 99, in assim_caller assimilator(stats,setup,xx,yy) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\da_methods.py", line 32, in assimilator stats.assess(k,kObs,E=E) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\stats.py", line 142, in assess self.lplot.update(k,kObs,**state_prms) File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\viz.py", line 448, in update if self.skip_plotting(): return File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\viz.py", line 433, in skip_plotting key = poll_input() # =None if <space> was pressed above File "U:\Project_Dapper\DAPPER-master\DAPPER-master\tools\utils.py", line 62, in poll_input i,o,e = select.select([sys.stdin],[],[],0.0001) OSError: [WinError 10038] An operation was attempted on something that is not a socket

Let me know if there are some other settings which are required to be done to run Dapper with windows-10.

Thank you

assignment by reference of HMM in todo

Hi Patrick

I finally get some time to look at the DAPEPR again and I was thinking about the following issue in the TODO.

Consider this code:
from dapper.mods.Lorenz63.sakov2012 import HMM1
from dapper.mods.Lorenz63.wiljes2017 import HMM as HMM2
Document (somewhere) that HMM1.t is changed by the second import, because it itself imports and changes HMM1. This became a bug in test_example_2 when pytest was run with --doctest-modules because --doctest-modules goes through all of the modules.

I don't see the mentioned problem when I run pytest --doctest-modules. It is possible that I use it wrong.

However, I can see why it becomes a problem due to the assignment by reference.

The goal is to make HMM in dapper.mods.Lorenz63.sakov2012 and dapper.mods.Lorenz63.wiljes2017 two independent variable with independent reference.

A simple yet possibly inelegant method is to use copy in dapper.mods.Lorenz63.wiljes2017:

1,40c1,43
< from dapper.mods.Lorenz63.sakov2012 import HMM, Nx
---
> from copy import copy
>
> from dapper.mods.Lorenz63.sakov2012 import HMM as _HMM
> from dapper.mods.Lorenz63.sakov2012 import Nx as _Nx
>
> Nx = _Nx
> HMM = copy(_HMM)

An alternative is adding a copy method to HMM (hiding the copy import in the dapper.mods.__init__):

import copy as cp
class HiddenMarkovModel(struct_tools.NicePrint):
    ...
    def copy(self):
        return cp.copy(self)

This will give:

1,40c1,43
< from dapper.mods.Lorenz63.sakov2012 import HMM, Nx
---
> from dapper.mods.Lorenz63.sakov2012 import HMM as _HMM
> from dapper.mods.Lorenz63.sakov2012 import Nx as _Nx
>
> Nx = _Nx
> HMM = _HMM.copy()

I don't know any other way myself. Any better suggestions? I can make the changes in other mods as well if you want.

Issue when length=1 1D array is returned from observation function

Hi,
first of all, thanks for the amazing software you build, all the work you put into it as well as making it open source for everyone.

When familiarizing myself with the software using a very simple model like this:

# measurement function: sample polynomial by time
@ens_compatible
def hxt(x, t):
    # case x is 1-dimensional (as in the simulation part)
    if len(x.shape) == 1:
        r = x[0]*t**2 + x[1] * t + x[2]
    else: # case of N different parameters (ensemble)
        r = x[0,:]*t**2 + x[1,:] * t + x[2,:]
    return r

@ens_compatible
def model_step(x0, t0, dt):
    return x0

# write the function for model state update in an Operator class compatible dict
Dyn = {
    'M': Nx,
    'model': step,
    'noise': 0
    }

# write the measurement function which maps states to observations to an Operator class compatible dict
Obs = {
    'M': 1,
    'model': NamedFunc(hxt, 'MapStatesP'),
    'noise': 1.5
    }

I get an error in the EnKF function. However, if I artificially return a length 2 1-D array by returning r twice, the model works fine.

Am I doing something wrong, or is that a bug?

I will give detailed error info later, since I do not have full access to my system and codes.

Create example script for real-world data

Be conservative about the changes necessary to the core of DAPPER.

This is not part of the main ambitions of DAPPER, but should be possible to support in a limited capacity.

Should use dataclass for HiddenMarkovModel?

Not sure...

But the current __init__ might be made obsolete.

Probably NicePrint should still be used, however.

round2 function in math module

I'm thinking about adding some unit tests and tried to do some simple tests for the round2 function and understand its behaviour.

It works with integer input for the number of significant figures.

With decimal input which is viewed as the precision, it can generate results I don't understand:

For example:

>>> round2(1234.4, 2.7)
>>> array(1234.)
>>> round2(1234.4, 2.5)
>>> array(1235.)
>>> dpr.round2(1234.45, 0.3)
>>> array(1234.5)
>>> dpr.round2(1234.45, 0.4)
>>> array(1234.4)

Such behaviour seems appear whenever the last digit of the precision is not 1.
I believe the reason for such behaviour arises from the use of function:
_round2prec(num, prec)

I wonder if this is done by intention, or if the precision should be limited in the form of 0.1, 0.01, etc.

Figure numbers

For liveplotting, the figure numbers are assigned when defining the liveplots for the model. See lorenz63/__init__.py.

This is fine. Except there might be conflicting numbers. E.g. with the defaults defined at the bottom of liveplotting.py.

But requesting specific liveplotters via assimilate() then requires remembering these numbers. This is inelegant. It should be done by (fuzzy?) names instead.

Move `fail_gently` functionality

Move fail_gently functionality into da_methods.__init__:da_method:assimilate

Diagonal CovMat sorting

from dapper import *
d = [.3, 1, 2, 1]
print(np.diag(CovMat(d).full)) # yields [1.  0.3 1.  2. ]

Move documentation from L63 and L96

Move documentation from L63 and L96, which is in the form of code comments,
into its own doc page (maybe mods/__init__.py).
- Expand upon these comments. For example:
- Note that OOP (exemplifed with QG, LUV, etc) is more flexible & robust,
  and therefore better suited when different parameter settings are to be
  investigated. Caution: for parameter-estimation, the parameters are input
  variables to the "forward model", which does not necessarily require OOP.
- Note that the mod settings files do not keep a very high programming
  standard.
- Note that HMMs can be copied to create new settings.
- Note pitfalls (eg changing Force)
- To simplify L63 even more for beginners, move d2x_dtdx, dstep_dx, and
  LPs from mods/L63/__init__.py into mods/L63/extras.py ?

ModuleNotFoundError: No module named 'dill'

Thank u for your project.How could I resolve this error?
from resources.workspace import *

Initializing DAPPER...

ModuleNotFoundError Traceback (most recent call last)
in
1 # Run this cell now.
----> 2 from resources.workspace import *

E:\projectsSpace\pythonspace\DA-tutorials-master\resources\workspace.py in
6
7 # Load DAPPER
----> 8 from dapper import *
9
10 # Load answers

e:\projectsspace\pythonspace\dapper-master\dapper_init_.py in
155 from .tools.data_management import *
156 from .stats import *
--> 157 from .admin import *
158 from .da_methods.ensemble import *
159 from .da_methods.particle import *

e:\projectsspace\pythonspace\dapper-master\dapper\admin.py in
549
550
--> 551 import dill
552 def save_data(name,*args,**kwargs):
553 """"Utility for saving experimental data.

ModuleNotFoundError: No module named 'dill'

Plan relases

Make roadmaps for v1.2 and v2.

Ideally, this should consist of "enhancement" issues that get tagged with the "project" labels v1.2 and v2.

Discuss

Keyboard input for Liveplot

I notice that several issues in the Github are related to liveplotting.

I avoided the liveplotting module since beginning and I think I should take this chance to check it.

I made some tests under Windows Subsystem for Linux (WSL), which is a Ubuntu system.

I tested with the new param_estim.py with liveplotting. The function read1 blocks the update (the program pauses its running) until I provide keyboard input for figure update at each time step.
test_plotting does not require keyboard input in the runtime because it sets user_interaction to False and read1 always returns None.

This running result leads me to some design questions:
Is the blocking of program run by read1 done by intention?
If so:

Is the pause functionality still necessary? As the liveplotting does not continue unless an arbitrary keyboard input, except Enter, Space and i, is given.
It might not be necessary to print information about keyboard input if user_interaction is False.

If it is a bug:

I guess there is a problem with the non-blocking read in certain environment and I will check it in Windows environment as well.

Use correct orientation of correlation matrix in liveplots

Should be lower triangular, with xticks on bottom (shared with autocorr panel).

Also include diagonal elements.

Implement load_xps(file, "latest")

Factor out filename parsing in save_as so as to implement something like

load_xps(__file__, "latest")

DA multilayer shallow water

Hi,

I am working on setting up a parallelized (MPI) python multilayer shallow water code and would like eventually to do DA with it.

The typical problems I will look at will have state vectors of the following size approx.: 10 vars x O(1000)^2 grid points.

Will I be able to use dapper to do that?
If yes, what should I look for when setting up the code
If not, what would be a good alternative?

Disclaimer I am a physical oceanographer and not a DA expert.

cheers

Liveplotting in Jupyter notebook ?

Is there a way to have the live plots work in a Jupyter notebook, or at least to have the replays?
Maybe something with the matplotlib backend?

Sorry if it is a naive question, I find this package amazing and am only learning to use it.

Update Bocquet book/figure link

Link in README. Google books preview no longer shows the figure. Why?

Anyways, we can just link to the lecture notes instead.

Modular iEnKS

Make iEnKS code more modular.
In particular, the analysis should be separated
from the time stepping mechanisms.

Apply ens_compatible to L96 step (but not dxdt)

Thereby dxdt can be re-used in examples/param_estim.py:dxdt_augmented without the heavy use of transposing.

Note that this requires checking all other uses of Lorenz96/__init__.py:dxdt

iEnKS implementation

I was looking at the code for iEnKS in DAPPER and I realise it is a bit different from the Bocquet and Sakov 2013 (BS13) paper.

Within the Gauss-Newton iteration, BS13 does not update the ensemble anomaly. In DAPPER, you update the ensemble anomaly in each iteration. For SDA, the observation anomaly is ‘de-conditioned’, and the ‘de-condition’ is not applied to MDA. This treatment is intuitively reasonable as this propagate the new observation assimilation into all DA window, but I wonder if you know/have any literature/reference that can back this treatment?

Thanks!

MPL backend installation

Right now QtAgg is among the "extras" of setup.py.
That could mean an interactive matplotlib (mpl) backend might not get installed unless using pip install -e .[Qt].

However, I almost always seem to get an interactive backend installed anyways. My recent fresh install came with both MacOSX and TkAgg backends. I have a feeling it is because the environment was created with Anaconda, which provides some things to begin with, but I'm not sure.

So:

How to guarantee installation of an interactive backend?
Should we require using one in particular, or (somehow) make it platform dependent? For example,
- I believe Qt5Agg is cross-platform, and can be installed via pip only, but slower & uglier than MacOSX.
- I believe TkAgg is cross-platform, but might require complicated installation instructions beyond pip. However, Anaconda seems (?) to provide it automatically.

@yumengch Thoughts?

Simplify and/or improve (total rewrite?) the CovMat class

Examples running in Colab

Make Colab work for examples.

To install DAPPER, this should suffice:

# !pip install git+https://github.com/nansencenter/DAPPER.git

But DAPPER requires that Colab upgrades to python 3.7 (see setup.py for why), and I don't know when that will happen.
Possible workarounds:

The .ipynb examples should be synced with the .py versions via jupytext, using a minimally-intrusive syntax.

Edit: maybe just use binder?

Ensure precision is positive in UncertainQtty

Use a "setter" method

Discussion for a dapper.da_methods.init.py

Hi Patrick

I really like the way that all new DA methods are using OOP style.
And I understand that dataclasses provides some nice features.
However, I feel the treatment is a bit complex to me.
I understand this is an issue of personal preference, so I just want to share a bit a possible alternative for the implementation.
I think the ups is that it introduces abstract class to ensure the existence of assimilate function and avoids the use of decorators.
The down is that we might need to add some features by ourselves.
How do you think about it?

import time
import abc
import numpy as np
from struct_tools import NicePrint

import dapper.stats


class da_method(abc.ABC, NicePrint):
    printopts = dict(
        excluded=NicePrint.printopts["excluded"]+["assimilator", "assimilate"]
    )

    def __init__(self, **kwargs):
        # Use -5 to avoid object,
        # abc.ABC, NicePrint and da_method class
       # A similar loop also appears in dataclasses code.
        for b in self.__class__.__mro__[-5::-1]:
            for key in b.__annotations__.keys():
                value = getattr(b, key, None)
                if key in kwargs.keys():
                    setattr(self, key, kwargs[key])
                elif value is not None:
                    setattr(self, key, value)
                else:
                    raise TypeError(f"{key} must have a value")

        self.assimilator = self.assimilate
        self.assimilate = self._assimilate

    def _assimilate(self, HMM, xx, yy, desc=None, **stat_kwargs):
        # Progressbar name
        pb_name_hook = self.__class__.__name__ if desc is None else desc # noqa
        # Init stats
        # self.stats = dapper.stats.Stats(self, HMM, xx, yy, **stat_kwargs)
        # Assimilate
        time_start = time.time()
        self.assimilator(HMM, xx, yy)
        # dapper.stats.register_stat(self.stats, "duration", time.time()-time_start)

    @abc.abstractmethod
    def assimilate(HMM, xx, yy):
        pass

    def __repr__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__repr__()

    def __str__(self):
        with np.printoptions(threshold=10, precision=3):
            return self.__class__.__name__+":"+super().__str__()


class ens_method(da_method):
    """Declare default ensemble arguments."""
    infl: float        = 1.0
    rot: bool          = False
    fnoise_treatm: str = 'Stoch'


class EnKF(ens_method):
    """Declare default ensemble arguments."""
    upd_a: str
    N: int

    def assimilate(self, HMM, xx, yy):
        print("Running assimilation")

UncertainQtty unexpected result

Bug Description:

Motivated by the output in test_example_2.py, using 7.7676 ±1.2464 as input:

>>> from dapper.tools.rounding import UncertainQtty
>>> uq = UncertainQtty(7.7676, 1.2464)
>>> print(uq)
UncertainQtty(val=8, prec=1.0)

This is in contrast to the expected UncertainQtty(val=8.0, prec=1.0) or UncertainQtty(val=8, prec=1). I think the latter is preferred.

Cause of the bug:

Class UncertainQtty prints the first significant figure of the prec with the following code:

>>> np.round(1.2464 ,0)
1.0
>>> np.round(0.2464 ,0)
0.0

Possible Solutions:

Turn float to int if the first significant figure is ahead of the decimal separator.

Note

This issue is related to an old version TODO, which speculates the use of uq in the xps.tabulate_avrgs and xpSpace.print.

The difficulty is that, uq cannot be used directly for the output because it does not allow the designated decimals for its output. test_example_2.py has examples for decimals = 4.

Any comments?

Create example script for parameter estimation

Be conservative about the changes necessary to the core of DAPPER.

Support list, dict, ndarray and other non-trivial params da_method

For example, Var3D(B=B) should work, where B is an ndarray background matrix.

The problem is largely that == comparison with ndarray fails in split_attrs. I'm not sure about other non-trivial types.
When printing xpLists multiline attributes will be an issue. I suppose one should insert the id in that case.

Enable Google Cloud Compute for other users

Main question: how to deal with billing and security?

Alternatively: provide disk images, or tutorials, to setup architecture.

run time error in example_3.py

The installation and tests of DAPPER on ubuntu 16.04 LTS (conda/anaconda python 3.7) was performed. The first two example_1.py and example_2.py run without any issues. But both example_3.py F and N did not complete. Both return an error in LETKF executions, where loc_rad='$' (F run) or N='?' and loc_rad='? (N run) it seems.

(py37) mon137@smudge-hf:~/projects/v14/python_work/DAPPER$ python -i example_3.py N
Initializing DAPPER...Done
Will save to data/example_3/smudge-hf/N_run3...

xticks[0/831] N: 2
Truth & Obs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:01<00:00, 4944.07it/s]
Climatology: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:04<00:00, 1166.58it/s]
OptInterp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:08<00:00, 639.15it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:11<00:00, 427.32it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 409.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:13<00:00, 382.22it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:13<00:00, 389.10it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 409.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:12<00:00, 397.79it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:14<00:00, 347.07it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:14<00:00, 342.55it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:17<00:00, 292.61it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:17<00:00, 289.62it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:38<00:00, 134.29it/s]
LETKF: 0%| | 0/5120 [00:00<?, ?it/s]
Traceback (most recent call last):
File "example_3.py", line 127, in
stat = C.assimilate(HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 139, in assim_caller
raise ERR
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 120, in assim_caller
assimilator(stats,HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 612, in assimilator
for ii, (Eii, za) in zip(state_batches, result):
TypeError: 'NoneType' object is not iterable

exit()

(py37) mon137@smudge-hf:~/projects/v14/python_work/DAPPER$ python -i example_3.py F
Initializing DAPPER...Done
Will save to data/example_3/smudge-hf/F_run4...

xticks[0/543] F: 3
Truth & Obs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:01<00:00, 5057.07it/s]
Climatology: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:04<00:00, 1058.39it/s]
OptInterp: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:08<00:00, 586.09it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 336.81it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 325.71it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 332.34it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 319.30it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 332.05it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 328.55it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 325.31it/s]
EnKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:15<00:00, 321.57it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 313.47it/s]
EnKF_N: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:16<00:00, 317.52it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:45<00:00, 113.07it/s]
LETKF: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [00:40<00:00, 126.30it/s]
LETKF: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5120/5120 [01:06<00:00, 76.47it/s]
LETKF: 0%| | 0/5120 [00:00<?, ?it/s]
Traceback (most recent call last):
File "example_3.py", line 127, in
stat = C.assimilate(HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 139, in assim_caller
raise ERR
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/admin.py", line 120, in assim_caller
assimilator(stats,HMM,xx,yy)
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 612, in assimilator
for ii, (Eii, za) in zip(state_batches, result):
File "/home/mon137/projects/v14/python_work/DAPPER/da_methods/da_methods.py", line 580, in local_analysis
jj, tapering = obs_localizer(ii)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 188, in obs_localizer
return inds_and_coeffs(dists, radius, tag=tag)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 94, in inds_and_coeffs
coeffs = dist2coeff(dists, radius, tag)
File "/home/mon137/projects/v14/python_work/DAPPER/tools/localization.py", line 63, in dist2coeff
R = radius * 1.82 # Sakov: 1.7386
TypeError: can't multiply sequence by non-int of type 'float'

Include legend in sliding_marginals

Cannot run single test anymore

I want the command
pytest
to run all tests.
That's why the addopts in pyproject.toml is so long.

But now it seems impossible to run just a single tests using
pytest tests/test_desired.py

So I think maybe the different purposes of testing (regular, doctests, coverage, etc) should be split,
and made into different tox test environments.

This might complicate the travis setup... we'll see

nansencenter / dapper Goto Github PK

dapper's Introduction

Getting started

Highlights

Installation

Prerequisite: Python>=3.9

Install

Either: Install for development (recommended)

Or: Install as library

Finally: Test the installation

DA methods

Test cases (models)

Similar projects

Contributing

Issues and Pull requests

Contributors

Publications

dapper's People

Contributors

Stargazers

Watchers

Forkers

dapper's Issues

Initializing DAPPER...

Edit: maybe just use binder?

Bug Description:

Cause of the bug:

Possible Solutions:

Note

Recommend Projects

Recommend Topics

Recommend Org