clima / ensemblekalmanprocesses.jl Goto Github PK

Implements Optimization and approximate uncertainty quantification algorithms, Ensemble Kalman Inversion, and Ensemble Kalman Processes.

Home Page: https://clima.github.io/EnsembleKalmanProcesses.jl/dev

License: Apache License 2.0

Julia 96.23% TeX 3.77%

ensemblekalmanprocesses.jl's Introduction

EnsembleKalmanProcesses.jl

Implements optimization and approximate uncertainty quantification algorithms, Ensemble Kalman Inversion, and Ensemble Kalman Processes.

Citing us

If you use the examples or code, please cite our article at JOSS in your published materials.

Documentation
DOI
Docs Build
Unit tests
Code Coverage
Bors
JOSS

Requirements

Julia version 1.6+

ensemblekalmanprocesses.jl's People

Contributors

Stargazers

Watchers

Forkers

ilopezgp yairchn mfkiwl glwagner odunbar tsj5 lm2612 eviatarbach pitmonticone gyf135 anthony-meza edgesight-labs tanbendong

ensemblekalmanprocesses.jl's Issues

Using EKP with deterministic and non-deterministic models

Deterministic setup

For a deterministic forward map G, the current EKP implementation works on the mean \bar{y} and covariance \Gamma of the data. Internally a regularization technique in the form drawing new samples of data for each iteration and ensemble member.

Effectively this solves the problem y = G(\theta) + N(0,\Gamma) by generating samples of \gamma_i ~ N(0,\Gamma) and using samples y_i = \bar{y} + \gamma_i in the algorithm. As we provide \bar{y} we achieve an unbiased result.

Also note - There should be no issue in this formulation working where \bar{y} is replaced by a sample y (and the algorithm still perturbs this internally). In this case one will acheive a (naturally) sample-biased result.

Nondeterministic setup

For nondeterministic forward map G (with mean \bar{G}) we would like to use a sample y and covariance \Gamma here. We do not regularize (redraw the data at every iteration) because the internal variability is sufficient.

Effectively this solves the problem y = G(\theta) = \bar{G}(\theta) + N(0,\Gamma) , where we don't have access to \bar{G}, thus a sample of G already contains the correct variability and no regularization should be performed. As we provide only a sample y this will acheive a (naturally) biased result

To obtain consistency with the current CES framework, the sample of data for EKI and MCMC should be the same.

Solution?

Perhaps the simplest way to do this would be to have a flag deterministic_forward_map=true.

If true, the sample is perturbed at every iteration and ensemble member (as is traditional)
If false, the sample is unperturbed.

Deterministic behavior of stochastic functions should not be the default

A consequence of #95 is that now, some functions that should be stochastic by default are deterministic.

An example of this is the following,

using Distributions
using Random

using EnsembleKalmanProcesses
using EnsembleKalmanProcesses.ParameterDistributions

d = Parameterized(Normal(0, 1))
c = no_constraint()
name = "unconstrained_normal"
u = ParameterDistribution(d, c, name)

for i in 1:5
   println(construct_initial_ensemble(u, 1))
end

This outputs,

[0.7883556016042917;;]
[0.7883556016042917;;]
[0.7883556016042917;;]
[0.7883556016042917;;]
[0.7883556016042917;;]

which means that we are no longer sampling i.i.d from the distribution. This is desirable for reproducibility of tests, but not as the default behavior. The expected user behavior can be achieved at the moment using

for i in 1:5
       println(construct_initial_ensemble(u, 1, rng= Random.GLOBAL_RNG))
end

which for example can output

[0.9548452855545422;;]
[0.3785684038407532;;]
[-0.257360852691082;;]
[-1.1705533137462902;;]
[0.9307323496630826;;]

Proposed solution: Make Random.GLOBAL_RNG the default AbstractRNG kwarg for all functions in the package, pass rng when the user wants reproducibility (e.g., test suite)

Cleaning up ParameterDistribution constructor

Due to my coding naivity, we have things that look like this

EnsembleKalmanProcesses.jl/examples/LossMinimization/loss_minimization.jl

Lines 41 to 47 in 966bec1

    
           prior_distributions = [Parameterized(Normal(0, 1)), Parameterized(Normal(0, 1))] 
        
           constraints = [[no_constraint()], [no_constraint()]] 
        
           parameter_names = ["u1", "u2"] 
        
           prior = ParameterDistribution(prior_distributions, constraints, parameter_names)

Where we see constraints are always arrays, or other oddities.

We should instead ask that we create these with a non-flattened format initially e.g. array of dictionaries

prior_u = 
[
    Dict(
        name => "u1", 
        distribution => Parameterized(Normal(0, 1)), 
        constraints => no_constraint()
    ),
    Dict(
        name => "u2", 
        distribution => Parameterized(MvNormal(4, 0.1)), 
        constraints => [no_constraint(), no_constraint(), no_constraint(), no_constraint()]
    )
]
u = ParameterDistribution(prior_u)

I'm willing to iterate on precise forms here. But something along these lines will be far easier to do all the dimensionality checks while in this structured form, before we flatten it internally.

Consistency in variable names

The naming of variables is currently not consistent throughout the module -- e.g., the variables denoting the dimension of the data include N_y, data_dim, and output_dim. Having consistent variable names would make the code more readable and user friendly.

nightly causing test fails

EnsembleKalmanProcesses.jl/.github/workflows/Tests.yml

Line 19 in d176e8d

- nightly

Lets remove a quick changing version of Julia from testing, it is causing unexpected failures.

Returning constrained parameters

Our current getter methods for EKP provide exactly the stored computational parameters. However in typical use with the G model requiring constrained parameters, it would be nice to have a getter function for this (otherwise one needs to apply it separately)

i.e for some ekp::EnsembleKalmanProcess object, and a created prior::ParameterDistribution currently one applies

computational_params = get_u_final(ekp)
physical_params = transform_unconstrained_to_constrained(prior,computational_params)
model_outputs = run_model_ensemble(physical_params)

Solution options

We wish to apply the transformation internally.

We could add new getters get_u_constrained and get_u_constrained_final.

Or we could have a flag to return_constrained in the current getter

Move `\Delta t`to be stored as a "Process" variable

Unless we have a suitably general way of including timestepping into every EKP, then this object really should be a process variable of Inversion and Sampler, and not an EnsembleKalmanProcess variable

@bielim co-authored

DocumenterFlavor not defined

Trying to compile docs offline (and using the new installation instructions) gives me this error

>julia --project=docs docs/make.jl 
ERROR: LoadError: UndefVarError: DocumenterFlavor not defined
...
in expression starting at /home/odunbar/Dropbox/Caltech/CESjl/EnsembleKalmanProcesses.jl/docs/make.jl:22

Make use of @test_logs less brittle

This issue is raised to describe problems encountered here with getting the test suite to pass for an unrelated PR (#161). I've been able to reproduce it locally on main as well (since it originates in a dependency).

Description

The linked test suite failure arises from a unit test of the form @test_logs (:warn,): when run under GH actions, the test gets two warnings, instead of the single warning it expected, and fails.

The second warning isn't related to EKP: it's a deprecation warning raised by MvNormal:

┌ Warning: `dim(a::AbstractMatrix)` is deprecated, use `LinearAlgebra.checksquare(a)` instead.
│   caller = MvNormal at mvnormal.jl:186 [inlined]
└ @ Core ~/.julia/packages/Distributions/9Albf/src/multivariate/mvnormal.jl:186

I can reproduce this locally: it arises because PDMats.jl deprecated dim in its latest release, two days ago as of this writing, and Distributions.jl hasn't been updated.

Proposed fix

I propose to fix this by modifying EKP's usage of @test_logs to 1) test for the text of the specific, expected warning message, and 2) to ignore any other warnings via match_mode=:any.

It's not feasible to pin the version of PDMats in EKP's Project.toml; it's a dependency of a dependency (LinearAlgebra) and we'd have to remember to unpin and update it when Distributions.jl fixes its deprecation warnings.
At the same time, we should keep --depwarn=yes in EKP's test suite, to warn about deprecations in code we control.
@test_logs provides some functionality for filtering log messages, but the filters are applied in one-to-one order with the log messages raised; there doesn't appear to be a way to filter out logs that may or may not be present, except through the use of match_mode=:any.

Converting non-flattened parameter arrays

We should add some ParameterDistribution functions that are based on the provided method batch to help clean up/convert outputs.

At the end of EKP we are often returned a N_ens x total_param_dimensions-array (or similar).
However sometimes we may need the following forms

a dictionary Dict(param_1_name => Array(N_ens x param_1_dimensions), ... )
an N_ens-array of dictionaries [Dict(param_1_name => Array(param_1_dimensions), ... ), ...]

The core of these methods is the slicing of the array back into parameter-sized chunks, but this is exactly what batch provides

Doc strings for functions

All functions should have proper doc strings (unless it's totally obvious what a function does -- simple getters and setters may not need a doc string). Establishing this standard will lead to cleaner and more user friendly code down the road, as people typically use the existing code base as a starting point/template for developing their contributions.

Issues with new versions of Convex and SCS

I tried updating to Convex 0.15 and SCS 1.0, and I ran into the warnings and errors below. In order to update, I followed the release notes of Convex v0.15, which make the change

instead of `() -> SCS.Optimizer(verbose = 0)`, use `MOI.OptimizerWithAttributes(SCS.Optimizer, "verbose" => 0)`.

However, this leads to errors in the tests,


┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: New ensemble covariance determinant is less than 0.01 times its previous value.
│ Consider reducing the EK time step.
└ @ EnsembleKalmanProcesses ~/EnsembleKalmanProcesses.jl/src/SparseEnsembleKalmanInversion.jl:169
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_DUAL_INFEASIBLE; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: New ensemble covariance determinant is less than 0.01 times its previous value.
│ Consider reducing the EK time step.
└ @ EnsembleKalmanProcesses ~/EnsembleKalmanProcesses.jl/src/SparseEnsembleKalmanInversion.jl:169
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
┌ Warning: Problem status ALMOST_OPTIMAL; solution may be inaccurate.
└ @ Convex ~/.julia/packages/Convex/o98db/src/solution.jl:342
SparseInversion: Error During Test at /Users/ilopezgo/EnsembleKalmanProcesses.jl/test/EnsembleKalmanProcess/runtests.jl:367
  Got exception outside of a @test
  SingularException(3)
  Stacktrace:
    [1] checknonsingular
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/factorization.jl:19 [inlined]
    [2] checknonsingular
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/factorization.jl:21 [inlined]
    [3] #lu!#146
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:82 [inlined]
    [4] #lu#153
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:279 [inlined]
    [5] lu (repeats 2 times)
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:278 [inlined]
    [6] \(A::Matrix{Float64}, B::Diagonal{Float64, Vector{Float64}})
      @ LinearAlgebra /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:1142
    [7] update_ensemble!(ekp::EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, g::Matrix{Float64}; cov_threshold::Float64, Δt_new::Float64, deterministic_forward_map::Bool, failed_ens::Nothing)
      @ EnsembleKalmanProcesses ~/EnsembleKalmanProcesses.jl/src/SparseEnsembleKalmanInversion.jl:142
    [8] macro expansion
      @ ~/EnsembleKalmanProcesses.jl/test/EnsembleKalmanProcess/runtests.jl:408 [inlined]
    [9] macro expansion
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [10] macro expansion
      @ ~/EnsembleKalmanProcesses.jl/test/EnsembleKalmanProcess/runtests.jl:369 [inlined]
   [11] macro expansion
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [12] top-level scope
      @ ~/EnsembleKalmanProcesses.jl/test/EnsembleKalmanProcess/runtests.jl:14
   [13] include
      @ ./client.jl:451 [inlined]
   [14] macro expansion
      @ ./timing.jl:299 [inlined]
   [15] include_test(_module::String)
      @ Main ~/EnsembleKalmanProcesses.jl/test/runtests.jl:11
   [16] macro expansion
      @ ~/EnsembleKalmanProcesses.jl/test/runtests.jl:29 [inlined]
   [17] macro expansion
      @ /Applications/Julia-1.7.app/Contents/Resources/julia/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
   [18] top-level scope
      @ ~/EnsembleKalmanProcesses.jl/test/runtests.jl:17
   [19] include(mod::Module, _path::String)
      @ Base ./Base.jl:418
   [20] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:292
   [21] _start()
      @ Base ./client.jl:495

I am not sure whether the tests were ill-posed or there is an error upstream. However, we will need to resolve these issues before updating dependencies.

I'm worried that we don't do enough copies.

EnsembleKalmanProcesses.jl/src/DataStorage.jl

Line 24 in 249133c

new{FT}(stored_data)

I think when we put stuff in storage, or retrieve it, I have concerns that none of the reference links are being broken. Are there some safety measures we can do to help here to enforce copies or something?

The last thing i caught, was when i thought there was an issue making a gif, but it actually resulted from the lack of a copy here:

EnsembleKalmanProcesses.jl/src/EnsembleKalmanInversion.jl

Line 66 in 249133c

u = get_u_final(ekp)

in EKI leading to every stored value being changed into the final one.

List of documentation required

How-to's - Ollie/Jake

How to get started using the tool
"How-to-contribute" - written into the home page
How-to add a new tool
How-to use the examples - general template

Theory of the tools

Theory Ensemble based methods - Daniel
What does EKI do? and how does one create the Inversion object - Ignacio
What does EKS do? and how does one create the Sampler object - Melanie
What does UKI do? and how does one create the Unscented object - Daniel
Observations/DataStorage/ParameterDistributions - Ollie

Tutorials:

Loss Minimization
Cloudy - Melanie
Lorenz 96 - Mike
Climate Machine example - Ignacio

Automated

Docstrings for all functions (Everyone can add to a single PR for this)

Distribution of work:

Create a (documenter.jl type) markdown for each task
Add the markdown link intomake.jl
Render Locally julia --project make.jl (and one can view in build/index.html)
Create a PR
One can also view the new documentation online (click on latest green check in PR push, click on details of deploy/docs task)

DataContainers only stores a deep copy of the data when !data_are_columns

We currently have

struct DataContainer{FT <: Real}
     #stored data, each piece of data is a column [data dimension × number samples]
     stored_data::AbstractMatrix{FT}
     #constructor with 2D arrays
     function DataContainer(stored_data::AbstractMatrix{FT}; data_are_columns = true) where {FT <: Real}

         if data_are_columns
             new{FT}(stored_data)
         else
             new{FT}(permutedims(stored_data, (2, 1)))
         end
     end
 end

The behavior for this struct is as follows,

julia > u = rand(5,3)

output > 5×3 Matrix{Float64}:
 0.553978   0.100353  0.455385
 0.393232   0.138405  0.74255
 0.0660367  0.174017  0.401449
 0.669909   0.231632  0.249801
 0.113858   0.4794    0.315958

julia > data1 = DataContainer(u)
julia> data2 = DataContainer(u, data_are_columns = false)

julia> get_data(data1)

output > 5×3 Matrix{Float64}:
 0.273102   0.24338   0.728426
 0.0066842  0.320747  0.562981
 0.420169   0.521042  0.620609
 0.431041   0.177357  0.631251
 0.284572   0.120433  0.876362

julia> u[1,1] = -5

julia> get_data(data1)

output > 5×3 Matrix{Float64}:
 -5.0        0.24338   0.728426
  0.0066842  0.320747  0.562981
  0.420169   0.521042  0.620609
  0.431041   0.177357  0.631251
  0.284572   0.120433  0.876362

# With data_are_columns, the memory is shared.

julia> get_data(data2)

3×5 Matrix{Float64}:
  0.273102  0.0066842  0.420169  0.431041  0.284572
  0.24338   0.320747   0.521042  0.177357  0.120433
  0.728426  0.562981   0.620609  0.631251  0.876362

# With !data_are_columns, the memory is not shared.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Small question: parallellization

Hi!

I have a small question, if I may.
Have you included any parallelization (over the particles) in this package?

Thanks a lot,
Ignace

Random seeds

We add noise during various updates, for example when the forward map is deterministic it is used to stabilize the method. Some users may require fixing random seeds, we have done this during the initial ensemble construction, but not yet for the update of the ensemble. for example in EKI
e.g.

EnsembleKalmanProcesses.jl/src/EnsembleKalmanInversion.jl

Line 77 in 4252146

noise = rand(MvNormal(zeros(N_obs), scaled_obs_noise_cov), ekp.N_ens)

It would be nice to have the random seeding option
Coauthored with @trontrytel

Require new parameter type for "vector of parameterized" distributions

Issue

Currently Parameterized has limitations to types that Julia specified. A new type that appears in practice e.g. for specifying independent priors for a function at 100 points, we may wish to define

distribution = repeat([Parameterized(Normal(0,1))],100)
constraint = repeat([no_constraint()],100)
name = "function_at_100_points"

Our current setup can do either

replace repeat([Parameterized(Normal(0,1))],100) with a single Parameterized(MvNormal(100,1.0)). This is restrictive to only Multivariate distributions from Distributions
replace name with the array of names name = [string("function_at_point_",i) for i =1:100]. This means that each of the 100 distributions have different names, and may require additional parsing.

Proposal: `VectorOfParameterized` distribution type

distribution = VectorOfParameterized( repeat([Normal(0,1)],100) )
constraint = repeat([no_constraint()],100)
name = "function_at_100_points"

when provided with corresponding functions sample/var/cov etc. this fits within the current framework. The stacked distributions can have different priors and be different dimensions.

Unify the handling of different types of Ensemble Kalman methods

EnsembleKalmanProcesses.jl currently contains three Ensemble Kalman methods: Ensemble Kalman Inversion (EKI), Ensemble Kalman Sampling (EKS), and Unscented Kalman Inversion (UKI). Ideally, these three methods can be used through a common interface, which highlights their conceptual similarities while abstracting away their differences to the extent possible.

Goal:
The general idea is that an EnsembleKalmanProcess should be constructed from the observational mean, the covariance of the observational noise, and a Process struct (Inversion, Sampler, or Unscented), which contains the additional information that is needed for the particular process type. The construction of an EnsembleKalmanProcess should also include the generation of the initial ensemble of particles.

The signature of all EnsembleKalmanProcess outer constructors should thus look as follows:
EnsembleKalmanProcess(obs_mean::Array{Float64, 1}, obs_cov::Array{Float64, 2}, process::Process)

Current status:
At the moment, there are two EnsembleKalmanProcess outer constructors: a constructor for Inversion and Sampler processes, and a constructor for Unscented processes. In EKI and EKS, the ensemble of initial particles is generated prior to the construction of an EnsembleKalmanProcess, and the initial ensemble is then passed as an argument to the constructor, whereas in UKI the initial ensemble is generated as part of the construction. Both constructors take a time step argument \Delta t, which actually remains unused in UKI (see issue #16).

To do:
The unified interface outlined above requires the following changes:

Move the construction of the initial ensemble (done by construct_initial_ensemble) into the EnsembleKalmanProcess constructor for EKI and EKS
construct_initial_ensemble takes the prior (a ParameterDistribution object) and the desired ensemble size N_ens as inputs, so these two pieces of information have to be added to the Inversion and Sampler structs.
The prior mean and covariance currently stored in the Sampler struct can be removed, as this information can be derived from the prior distribution
Remove the time step \Delta t from the EnsembleKalmanProcess constructors and add it to the Inversion and Sampler structs instead (the Unscented struct doesn't need a \Delta t)
A nice additional feature to have would be a function get_solution(ekp::EnsembleKalmanProcess), which returns the solution of the particular Ensemble Kalman process (e.g., the mean of the final iteration for EKI, or the mean and covariance for EKS).

Sparse operations in `SparseInversion` occur after updating the ekp object.

The SparseInversion update performs the following operations

# Sparse EKI
    cov_vv = vcat(hcat(cov_uu, cov_ug), hcat(cov_ug', cov_gg))
    H_u = hcat(1.0 * I(size(u)[1]), fill(FT(0), size(u)[1], size(g)[1]))
    H_g = hcat(fill(FT(0), size(g)[1], size(u)[1]), 1.0 * I(size(g)[1]))

    H_uc = H_u[ekp.process.uc_idx, :]

    cov_vv_inv = cov_vv \ (1.0 * I(size(cov_vv)[1]))

    # Loop over ensemble members to impose sparsity
    Threads.@threads for j in 1:(ekp.N_ens)
        # Solve a quadratic programming problem
        u[:, j] = sparse_qp(ekp, v[j, :], cov_vv_inv, H_u, H_g, y[:, j], H_uc = H_uc)

        # Threshold the results if needed
        if ekp.process.threshold_eki
            u[ekp.process.uc_idx, j] =
                u[ekp.process.uc_idx, j] .* (abs.(u[ekp.process.uc_idx, j]) .> ekp.process.threshold_value)
        end

        # Add small noise to constrained elements of u
        u[ekp.process.uc_idx, j] += rand(
            ekp.rng,
            MvNormal(zeros(size(ekp.process.uc_idx)[1]), ekp.process.reg * I(size(ekp.process.uc_idx)[1])),
        )
    end

after the ekp object has been updated,

push!(ekp.u, DataContainer(u, data_are_columns = true))
push!(ekp.g, DataContainer(g, data_are_columns = true))

This means that these manipulations do not really affect the advanced parameters. The initial block of operations copied in this issue should be done before pushing new DataContainers to the ekp object.

Named tuples for parameters

For applications with large numbers of parameters input into complex models, it's useful (even necessary) to use objects like NamedTuple that associate parameter names with values to hold / transport lists of parameters.

However, as far as I can tell, parameters are constrained to be vectors. Throughout the code in fact, many data structures seem constrained to Array or Vector. Eg, this code does not work:

using Distributions
using LinearAlgebra
using Random

using EnsembleKalmanProcesses.EnsembleKalmanProcessModule
using EnsembleKalmanProcesses.ParameterDistributionStorage
using EnsembleKalmanProcesses.EnsembleKalmanProcessModule: EnsembleKalmanProcess, construct_initial_ensemble

rng_seed = 41
Random.seed!(rng_seed)

# The system
θ★ = (a=1, b=-1)
G₁(θ) = [sqrt((θ.a - θ★.a)^2 + (θ.b - θ★.b)^2)]

# Hyperparameters
N_ensemble = 50
noise_level = 1e-3

unconstrained_prior_distributions = (a=Parameterized(Normal(0, 1)), b=Parameterized(Normal(0, 1)))
constraints = (a=no_constraint(), b=no_constraint())
parameter_names = ("a", "b")

prior = ParameterDistribution(unconstrained_prior_distributions, constraints, parameter_names)

and I get

julia> prior = ParameterDistribution(unconstrained_prior_distributions, constraints, ["a", "b"])
ERROR: MethodError: no method matching ParameterDistribution(::NamedTuple{(:a, :b), Tuple{Parameterized, Parameterized}}, ::NamedTuple{(:a, :b), Tuple{Constraint, Constraint}}, ::Vector{String})

Is it necessary to constrain parameters to type Vector? It seems like it should be possible to generically support any iterable object (or objects that support broadcasting?) In the case that we need to perform LinearAlgebra operations, we can constructor the appropriate matrices or vectors from iterable containers as needed:

julia> parameters = (a=1, b=2)
(a = 1, b = 2)

julia> parameter_vector =[θ for θ in parameters]
2-element Vector{Int64}:
 1
 2

Allocations could even be minimized, though I think this is usually unnecessary because typical applications of EnsembleKalmanProcesses.jl have computations dominated by the cost of evaluating a loss function.

I'm happy to get to work on refactoring the internals to support more general parameter containers if someone can identify the minimal requirements for the parameters object. It's tough to figure out what methods would have to be defined without detailed knowledge of the code because the struct definitions are over-constrained, preventing experimentation.

Consistency with TC.jl

To build a Julia environment that runs with TC.jl (dependencies have recently been modified), I had to unpin Distributions in the top-level Project.toml

Remove groupname from sbatch

I just noticed that we have left in some -A esm flags in the sbatch scripts. These are specific to the developer's specific HPC group, and should be removed.

EnsembleKalmanProcesses.jl/examples/ClimateMachine/ekp_calibration.sbatch

Lines 30 to 35 in 2fd2ab7

    
           if [ "$it" = "1" ]; then 
        
               id_ens_array=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_init_ens --array=1-$n ekp_single_cm_run.sbatch $it) 
        
           else 
        
               id_ens_array=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_ek_upd --array=1-$n ekp_single_cm_run.sbatch $it) 
        
           fi 
        
           id_ek_upd=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_ens_array --export=n=$n ekp_cont_calibration.sbatch $it)

Ensemble Kalman Inversion documentation points to a docs/preview url

The link here:

EnsembleKalmanProcesses.jl/docs/src/ensemble_kalman_inversion.md

Line 78 in 05e1264

    
           In the previous update, note that the parameters stored in `ekiobj` are given in the unconstrained Gaussian space where the EKI algorithm is performed. The map $\mathcal{T}^{-1}$ between this unconstrained space and the (possibly constrained) physical space of parameters is encoded in the `prior` object. The dynamical model `Ψ` accepts as inputs the parameters in (possibly constrained) physical space, so it is necessary to apply `transform_unconstrained_to_constrained` before evaluations. See the [Prior distributions](https://clima.github.io/EnsembleKalmanProcesses.jl/previews/PR21/parameter_distributions/) section for more details on parameter transformations.

should be changed to a relative url.

This is impossible to satisfy

EnsembleKalmanProcesses.jl/src/Observations.jl

Line 146 in 5d3d539

@assert(ndims(obs_noise_cov) == 0, err)

Functionality to plot prior pdf (in constrained and unconstrained space)

Proposal
I think that users would appreciate the ability to plot the pdf of a ParameterDistribution, both in unconstrained an in constrained space. E.g., one could provide a function plot_pdf(xarray::Array{<:Real,1}, pd::ParameterDistribution, constrained::Bool=false, kw...) that takes a parameter distribution and plots either the pdf of the unconstrained distribution or that of the constrained distribution.

Why?
In practice, the choice of priors is probably one of the most difficult aspects of using EnsembleKalmanProcesses.jl as it requires one to understand the concept of transformations between constrained and unconstrained spaces, and also because most scientists working with a model really tend to think about parameter distributions in the constrained model space (rather than in the unconstrained space where the algorithm takes place), since that is what they have some physical insight or intuition about. So it would be nice for them to see what the pdf in unconstrained space gets mapped into when sending it into the constrained space.

Implementing plotting capability in a reasonably general way (including high dimensions, distributions from correlated samples etc.) requires a lot of thought. But even an implementation that is limited to simple cases could be useful.

Standardize UKI constructor to be consistent with EKS, document kwargs

The UKI constructor requires α_reg and update_freq as arguments that do not have a default value. It would be good to make them keyword arguments with default to make it easier for users to use an off-the-shelf version, with a similar constructor to EKS (so, just an initial mean and variance).

I think α_reg=1 by default, but what about update_freq?

In addition κ and β are not defined, and there is a TODO flag within the constructor. It would be good to clean this up.

Fix warnings in building docs

When building docs locally, a number of warning about missing files are raised.

The build command is the same as what's used in the GH Action:

julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
julia --color=yes --project=docs/ docs/make.jl

which results in

┌ Warning: couldn't find file "/Users/tsj/Documents/climate/clima/EnsembleKalmanProcesses.jl/docs/src/literated/<unknown>/examples/LossMinimization/loss_minimization.jl" when generating URL
└ @ Documenter.Utilities ~/.julia/packages/Documenter/MLty7/src/Utilities/Utilities.jl:450
┌ Warning: invalid local link: unresolved path in ensemble_kalman_inversion.md
│   link.text =
│    1-element Vector{Any}:
│     "Prior distributions"
│   link.url = "../parameter_distributions/"
└ @ Documenter.Writers.HTMLWriter ~/.julia/packages/Documenter/MLty7/src/Writers/HTMLWriter.jl:2081
┌ Warning: invalid local link: unresolved path in ensemble_kalman_inversion.md
│   link.text =
│    1-element Vector{Any}:
│     "Prior distributions"
│   link.url = "../parameter_distributions/"
└ @ Documenter.Writers.HTMLWriter ~/.julia/packages/Documenter/MLty7/src/Writers/HTMLWriter.jl:2081
┌ Warning: invalid local link: unresolved path in unscented_kalman_inversion.md
│   link.text =
│    1-element Vector{Any}:
│     "Lorenz96"
│   link.url = "../examples/Lorenz_example"
└ @ Documenter.Writers.HTMLWriter ~/.julia/packages/Documenter/MLty7/src/Writers/HTMLWriter.jl:2081
┌ Warning: invalid local link: unresolved path in unscented_kalman_inversion.md
│   link.text =
│    1-element Vector{Any}:
│     "Cloudy"
│   link.url = "../examples/Cloudy_example"
└ @ Documenter.Writers.HTMLWriter ~/.julia/packages/Documenter/MLty7/src/Writers/HTMLWriter.jl:2081
┌ Warning: couldn't find file "/Users/tsj/Documents/climate/clima/EnsembleKalmanProcesses.jl/docs/src/literated/<unknown>/examples/SparseLossMinimization/loss_minimization_sparse_eki.jl" when generating URL
└ @ Documenter.Utilities ~/.julia/packages/Documenter/MLty7/src/Utilities/Utilities.jl:450
┌ Warning: couldn't find file "/Users/tsj/Documents/climate/clima/EnsembleKalmanProcesses.jl/docs/src/literated/<unknown>/examples/AerosolActivation/aerosol_activation.jl" when generating URL
└ @ Documenter.Utilities ~/.julia/packages/Documenter/MLty7/src/Utilities/Utilities.jl:450

Visualization of priors in Docs

I would really like if we can get some visualization of the flexibility of Normal distributions in the Prior distribution docs.
Down the road an interactive tool would be great. However until then, I would like animations to represent the main cases:
For each of unbounded() , bounded_below(0), bounded_above(10), and bounded(0,10)I would like 2 animations of the 1D graphs of

(shifting variance)f = unconstrained_to_constrained(Normal(0,sd)) running over a range sd=[0.1, ... , 3]
(shifting mean)f = unconstrained_to_constrained(Normal(m,1)) running over a range m=[-3,3]

Ensemble Kalman Sampler in docs doesn't match the implementation

We talk about the vanilla EKS implementation, however in implementation we use a linear-corrected version proposed through the paper chain of here, and subsequently here . This is shown to have better numerical and analytical properties than the original proposed EKS.

The authors of the final paper I believe give the correction of the method a new name too. We should take care with how it is presented. Perhaps I can work with @agarbuno with the documentation for this

Simplify interface

One of the nice parts of Julia is that we can extend existing methods, for example,

get_mean(d::Parameterized) = mean(d.distribution)

could simply extend the StatsBase mean method

import StatsBase
...
StatsBase.mean(d::Parameterized) = StatsBase.mean(d.distribution)

This can be applied to several methods (e.g., get_mean, len, size etc.)

Create package based on CES.jl structure

We need to create this package with bors, etc (the same as used in CalibrateEmulateSample.jl)
We need a readme.md for the front page
We need to populate the Project toml with the packages from CalibrateEmulateSample.jl required (only) for EnsembleKalmanProcesses (This will not include the python dependencies)
We need to copy across the Loss Minimization and to copy-and-cut the optimization part of Lorenz 96/Cloudy examples.

Add a simple pedagogical example with finding the parameters of a sinusoid

As part of @anagha548's SURF project, her, @odunbar, and I have been playing with the simple inverse problem

f(A, c) = A*sin(t) + c

We take the observations to be a noisy realization from time 0 to 2*pi, and with a random phase. We take the observables to be the mean of this curve (to find the vertical shift) and the maximum minus the minimum (to find the amplitude).

I think this is a simple pedagogical example which could be helpful in the documentation as a problem to introduce EKI.

Random error in SparseEKI docs

The following error occurs sometimes when building the docs,

┌ Warning: failed to run `@example` block in src/literated/loss_minimization_sparse_eki.md:108-116
[590](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:590)
│ ```@example loss_minimization_sparse_eki
[591](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:591)
│ for i in 1:N_iterations
[592](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:592)
│     params_i = get_u_final(ensemble_kalman_process)
[593](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:593)
│ 
[594](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:594)
│     g_ens = hcat([G₁(params_i[:, i]) for i in 1:N_ensemble]...)
[595](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:595)
│ 
[596](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:596)
│     EKP.update_ensemble!(ensemble_kalman_process, g_ens)
[597](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:597)
│ end
[598](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:598)
│ ```
[599](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:599)
│   c.value =
[600](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:600)
│    TaskFailedException
[601](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:601)
│    
[602](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:602)
│        nested task error: Quadratic form only defined for Hermitian matrices
[603](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:603)
│        Stacktrace:
[604](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:604)
│         [1] error(s::String)
[605](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:605)
│           @ Base ./error.jl:33
[606](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:606)
│         [2] quadform(x::Convex.Variable, A::Matrix{Float64}; assume_psd::Bool)
[607](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:607)
│           @ Convex ~/.julia/packages/Convex/uI27T/src/atoms/second_order_cone/quadform.jl:42
[608](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:608)
│         [3] sparse_qp(ekp::EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, v_j::Vector{Float64}, cov_vv_inv::Matrix{Float64}, H_u::SparseArrays.SparseMatrixCSC{Float64, Int64}, H_g::SparseArrays.SparseMatrixCSC{Float64, Int64}, y_j::Vector{Float64}; H_uc::SparseArrays.SparseMatrixCSC{Float64, Int64})
[609](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:609)
│           @ EnsembleKalmanProcesses ~/work/EnsembleKalmanProcesses.jl/EnsembleKalmanProcesses.jl/src/SparseEnsembleKalmanInversion.jl:61
[610](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:610)
│         [4] macro expansion
[611](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:611)
│           @ ~/work/EnsembleKalmanProcesses.jl/EnsembleKalmanProcesses.jl/src/SparseEnsembleKalmanInversion.jl:141 [inlined]
[612](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:612)
│         [5] (::EnsembleKalmanProcesses.var"#49#threadsfor_fun#21"{EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, Matrix{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, Matrix{Float64}, Matrix{Float64}, UnitRange{Int64}})(onethread::Bool)
[613](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:613)
│           @ EnsembleKalmanProcesses ./threadingconstructs.jl:85
[614](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:614)
│         [6] (::EnsembleKalmanProcesses.var"#49#threadsfor_fun#21"{EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, Matrix{Float64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, SparseArrays.SparseMatrixCSC{Float64, Int64}, Matrix{Float64}, Matrix{Float64}, UnitRange{Int64}})()
[615](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:615)
│           @ EnsembleKalmanProcesses ./threadingconstructs.jl:52
[616](https://github.com/CliMA/EnsembleKalmanProcesses.jl/runs/5074938116?check_suite_focus=true#step:7:616)
└ @ Documenter.Expanders ~/.julia/packages/Documenter/Nu7Lp/src/Expanders.jl:591

Examples don't have project tomls - causing initial failures

This is because they have dependencies on e.g. Plots that may not exist in fresh Julia installations or in EKP

The fix is simply to add these files into example directories

Clarify notation and units in Ensemble Kalman Inversion documentation?

A few clarifications, definitions of symbols, and perhaps units in the docs on Ensemble Kalman Inversion might improve the docs-reading experience. The docs now read

For example, the meaning of "curly N"

it's parameters, or the units of the covariance matrix are not specified. This leads to questions about whether the parameter update formula has the correct dimensions. The inner produce <., .> is also not defined.

New Docs pages request

New Pages

DataContainers
failure handling in the different particle methods - Ignacio
Use cases HPC vs "local julia" implementation (see e.g. the response in #148) (pmap/distributed vs SLURM & new file interface) - Haakon/Melanie/Ollie
Sparse EKI documentation - (Jinlong)
Localization and sampling error correction - Ignacio
Extra info on SEC and localization example with Lorenz 96 (a commented version of the current test would work) - Eviatar
Recommended settings of "bells and whistles" (localization/regularization/inflation...)

Please add suggestions below!

Should we be building the Docs with Julia v1.6.4?

EnsembleKalmanProcesses.jl/.github/workflows/Docs.yml

Line 29 in f5eac9b

version: 1.5.4

Incompatability of SCS with Julia 1.7.2 on Mac "M1"

coauthored with @anagha548

On fresh install with Mac M1, we found an issue with precompile of "BinaryProviders" within SCS

Seems like this issue is down to them choosing not to support Julia 1.7, it however is fixed with Julia 1.6
jump-dev/SCS.jl#235

Unit test failure for inflation

On a Mac, and Julia 1.6.6

line 522 of test/EnsembleKalmanProcesses/runtests.jl.

Causes Domain-error with eigmax, (as well as the desired warning)

co-authored with @anagha548

SparseInversion not as robust as Inversion

I tried using the SparseInversion algorithm instead of Inversion for an integration test in CEDMF.jl, and I run into a SingularException in L111.

The configuration used for the integration test is

γ = 1.0
threshold_eki = true
threshold_value = 5e-2
reg = 1e-2

In general, we are missing documentation for how to choose reg and how to make the convex optimization part of the update robust.

Stack trace:

Stacktrace:
--
  | [1] checknonsingular
  | @ /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/factorization.jl:19 [inlined]
  | [2] checknonsingular
  | @ /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/factorization.jl:21 [inlined]
  | [3] #lu!#146
  | @ /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:82 [inlined]
  | [4] #lu#153
  | @ /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:279 [inlined]
  | [5] lu (repeats 2 times)
  | @ /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/lu.jl:278 [inlined]
  | [6] \(A::Matrix{Float64}, B::Diagonal{Float64, Vector{Float64}})
  | @ LinearAlgebra /central/software/julia/1.7.0/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:1142
  | [7] sparse_eki_update(ekp::EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, u::Matrix{Float64}, g::Matrix{Float64}, y::Matrix{Float64}, obs_noise_cov::Matrix{Float64})
  | @ EnsembleKalmanProcesses /central/scratch/esm/slurm-buildkite/calibrateedmf-ci/depot/cpu/packages/EnsembleKalmanProcesses/OjLuD/src/SparseEnsembleKalmanInversion.jl:111
  | [8] (::EnsembleKalmanProcesses.var"#failsafe_update#26")(ekp::EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, u::Matrix{Float64}, g::Matrix{Float64}, y::Matrix{Float64}, obs_noise_cov::Matrix{Float64}, failed_ens::Vector{Int64})
  | @ EnsembleKalmanProcesses /central/scratch/esm/slurm-buildkite/calibrateedmf-ci/depot/cpu/packages/EnsembleKalmanProcesses/OjLuD/src/SparseEnsembleKalmanInversion.jl:24
  | [9] update_ensemble!(ekp::EnsembleKalmanProcess{Float64, Int64, SparseInversion{Float64, Int64}}, g::Matrix{Float64}; cov_threshold::Float64, Δt_new::Float64, deterministic_forward_map::Bool, failed_ens::Nothing)
  | @ EnsembleKalmanProcesses /central/scratch/esm/slurm-buildkite/calibrateedmf-ci/depot/cpu/packages/EnsembleKalmanProcesses/OjLuD/src/SparseEnsembleKalmanInversion.jl:191

UKI does not allow for positive semidefinite covariances

Covariances are in general positive semidefinite, and in UKI the estimate covariance may become positive semidefinite. However, the Cholesky factorization in construct_sigma_ensemble fails when the matrix is not positive definite.

Covariance matrix `obs_noise_cov` is restricted to `Array{FT, 2}`

The type of EnsembleKalmanProcess.obs_noise_cov is restricted to Array{FT, 2}, which is inconvenient and prevents the use of common LinearAlgebra abstractions that represent arrays.

For example:

julia> using EnsembleKalmanProcesses, EnsembleKalmanProcesses.ParameterDistributions, LinearAlgebra, Distributions

julia> distribution = ParameterDistribution(Parameterized(Normal(0, 1)), no_constraint(), "");

julia> sample = sample_distribution(distribution, 1);

julia> obs_noise_cov = 0.1*I
UniformScaling{Float64}
0.1*I

julia> ekp = EnsembleKalmanProcess(sample, zeros(2), obs_noise_cov, Inversion())
ERROR: MethodError: no method matching EnsembleKalmanProcess(::Matrix{Float64}, ::Vector{Float64}, ::UniformScaling{Float64}, ::Inversion)
...

0.1 * I is a valid and efficient way to specify a covariance matrix. The code itself likely works as is, since key lines like

julia> noise = rand(MvNormal(zeros(2), 0.1 * I), 2)
2×2 Matrix{Float64}:
 0.0551659  0.720873
 0.057236   0.289555

are valid for a wide variety of matrix types. Thus probably the only change needed is to release obs_noise_cov from Array{FT, 2}.

For large observation vectors the need to convert objects like UniformScaling to Array{FT, 2} is a waste of memory:

julia> Matrix(0.1*I, 5, 5)
5×5 Matrix{Float64}:
 0.1  0.0  0.0  0.0  0.0
 0.0  0.1  0.0  0.0  0.0
 0.0  0.0  0.1  0.0  0.0
 0.0  0.0  0.0  0.1  0.0
 0.0  0.0  0.0  0.0  0.1

Another important matrix type is Diagonal representing a diagonal matrix.

cc @simonbyrne

Module and module file names are inconsistent

It is standard in Julia to have the same name for the file and for the module it contains. The current modules do not follow this rule.

Slightly clearer error message for incorrect ensemble sizes

Perhaps instead of

EnsembleKalmanProcesses.jl/src/EnsembleKalmanInversion.jl

Lines 60 to 62 in da4b7da

    
           if !(size(g)[2] == ekp.N_ens) 
        
               throw(DimensionMismatch("ensemble size in EnsembleKalmanProcess and g do not match, try transposing g or check ensemble size")) 
        
           end

something like

"N_ens in EnsembleKalmanProcess and g do not match..."

or "Ensemble size N_ens in EnsembleKalmanProcess..."

The first time I saw this error, it took me a little while to figure out that N_ens is the ensemble size.

Add LearningRateSchedulers

Following typical optimizer structures in deep learning (tensor flow, pytorch), we can add the possibility of defining a LearningRateScheduler at construction time for EKP.

Then, we would set the default dt based on the scheduler, and perhaps still expose this parameter for the user in case they want to try out some other schedule.

Error when running the Lorenz 96 Example

I made a fresh installation of Julia 1.7.2 on Ubuntu 20.04. Cloned the repository, then followed the installation instructions.
Running the test cases works without issues, passing 340 of 340 tests.
I cd into the directory of /EnsembleKalmanProcesses.jl/examples/Lorenz/.
The documentation under 'Examples - Lorenz' advises:

The L96 parameter estimation can be run using julia --project Lorenz_example.jl

After resolving all missing packages and executing the mentioned command, I receive the following error:

$ julia --project Lorenz_example.jl 

/home/ben/some_path/EnsembleKalmanProcesses.jl/examples/Lorenz
2
[8.0; 2.5;;]
ERROR: LoadError: MethodError: no method matching ParameterDistribution(::Vector{Dict{String, Any}})
Closest candidates are:
  ParameterDistribution(::Union{Array{PDType}, PDType}, ::Union{Array{CType}, Array, CType}, ::Union{Array{ST}, ST}) 
  where {PDType<:ParameterDistributionType, CType<:EnsembleKalmanProcesses.ParameterDistributions.ConstraintType, ST<:AbstractString} 
  at ~/.julia/packages/EnsembleKalmanProcesses/iWN0p/src/ParameterDistributions.jl:167
Stacktrace:
 [1] top-level scope
   @ ~/some_path/EnsembleKalmanProcesses.jl/examples/Lorenz/Lorenz_example.jl:98
in expression starting at /home/ben/some_path/Kalman_test/EnsembleKalmanProcesses.jl/examples/Lorenz/Lorenz_example.jl:85

I also tried the installation via the package manager, leading to the same error.
If you need any further information, please let me know.

On a side note, it also seems that the current version of Cloudy is incompatible with the EnsembleKalmanProcesses, as I wasn't able to install both into the same (empty) environment.

Kind regards,
Ben

GalacticOptim.jl

Can we hook EKP.jl into that project? We could then calibrate using DiffEqFlux.jl.

Update docs API with new base methods

E.g. Docs still reference get_dimensions whereas we now have

EnsembleKalmanProcesses.jl/src/ParameterDistributions.jl

Line 183 in 38d9d6c

ndims(d::Parameterized) = length(d.distribution)

	prior_distributions = [Parameterized(Normal(0, 1)), Parameterized(Normal(0, 1))]

	constraints = [[no_constraint()], [no_constraint()]]

	parameter_names = ["u1", "u2"]

	prior = ParameterDistribution(prior_distributions, constraints, parameter_names)

	if [ "$it" = "1" ]; then
	id_ens_array=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_init_ens --array=1-$n ekp_single_cm_run.sbatch $it)
	else
	id_ens_array=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_ek_upd --array=1-$n ekp_single_cm_run.sbatch $it)
	fi
	id_ek_upd=$(sbatch --parsable --kill-on-invalid-dep=yes -A esm --dependency=afterok:$id_ens_array --export=n=$n ekp_cont_calibration.sbatch $it)

	if !(size(g)[2] == ekp.N_ens)
	throw(DimensionMismatch("ensemble size in EnsembleKalmanProcess and g do not match, try transposing g or check ensemble size"))
	end

clima / ensemblekalmanprocesses.jl Goto Github PK

ensemblekalmanprocesses.jl's Introduction

EnsembleKalmanProcesses.jl

Citing us

Requirements

ensemblekalmanprocesses.jl's People

Contributors

Stargazers

Watchers

Forkers

ensemblekalmanprocesses.jl's Issues

Deterministic setup

Nondeterministic setup

Solution?

Solution options

Description

Proposed fix

Issue

Proposal: VectorOfParameterized distribution type

Recommend Projects

Recommend Topics

Recommend Org

Proposal: `VectorOfParameterized` distribution type