cgrudz / dataassimilationbenchmarks.jl Goto Github PK
View Code? Open in Web Editor NEWPackage Information and Documentation
Home Page: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/dev/
License: Apache License 2.0
Package Information and Documentation
Home Page: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/dev/
License: Apache License 2.0
The public API of the DataAssimilationBenchmarks.jl
module is currently documented in the README.md
. This clutters the document and does not adhere to a standardized documentation formatting.
Julia enables object documentation via docstrings, and advertises a standard to write such documentations. This has the advantage that the documentation is part of the actual source code, and that a formatted documentation can always be generated from the source code with Documenter.jl
.
Please consider removing the API documentation from the README.md
, and adding appropriate docstrings to the functions of the module that are part of the public API. I think this would work well together with my suggestion in #7, thus making the README.md
a more "high-level" overview of the package.
One typical use case I imagine for this package is the comparison of the filter/smoother methods implemented here with one implemented in another package. However, it is currently very complicated for users to add a new analysis scheme, or entire filter/smoother method to this package.
The "heavy-lifting" of filter and smoother operations is currently done by the EnsembleKalmanSchemes.transform
function, which implements the different analysis schemes and discerns them via the analysis::String
argument. To add a new analysis scheme, users have to alter this file and necessarily need to familiarize themselves with the source code of this package. Although this would require quite a bit of work, I think it would be great if the functions of this package supported the injection of analysis schemes via function arguments. This requires defining and documenting a proper interface for the analysis scheme functions.
In theory, this could be extended to all parts of the filter/smoother algorithms, most prominently also the state inflation.
Here's a crude example on how this could look like:
# ---
# DataAssimilationBenchmark.jl code
# Define module containing analysis functions
module AnalysisSchemes
function enkf(...)
# EnKF analysis function...
# This is the code that is currently located in the "analysis == enkf" branch of transform()
end
end # module AnalysisSchemes
module EnsembleKalmanSchemes
# Re-define function argument
function ensemble_filter(analysis::Function, ...)
# ...
analysis(...) # Call the passed analysis function here
# ...
end
end # module EnsembleKalmanSchemes
# ---
# User code
using DataAssimilationBenchmarks
using EnsembleKalmanSchemes
ensemble_filter(AnalysisSchemes.enkf, ...) # Call with EnKF analysis scheme from this package
# ensemble_filter("enkf", ...) # <-- This is what the same call currently looks like
function my_fancy_enkf(...)
# My own EnKF analysis function!
end
ensemble_filter(my_fancy_enkf, ...) # Call with custom EnKF analysis function, without modifying package code!
All operations in this module are output file based, meaning that no function returns its data, but all data is written into output files to be retrieved later on. For this to be viable also in conjunction with other modules, it would be helpful to have the structure of the output documented. Which datasets are written by the, e.g., l96_time_series
and filter_state
functions, and what are their data types and dimensions? This would particularly help when writing scripts to analyze the results.
The output documentation could be part of the function docstring (see #8), or added to the README.md
where appropriate.
Is there a particular reason why most functions in this package use one tuple as argument instead of multiple arguments? As the tuple effectively causes the arguments to be unnamed, it is very difficult to understand the usage of the functions from just looking at their signature. Since the tuple arguments are usually unpacked right away (see, e.g., /src/experiments/GenerateTimeSeries.jl#L18
), I see no reason why the function argument list should not be used to define these variables in the first place.
Example:
# Now:
function L96_time_series(args::Tuple{Int64,Int64,Float64,Int64,Int64,Float64,Float64})
seed, state_dim, tanl, nanl, spin, diffusion, F = args
# Do stuff...
end
# Better, imo:
function L96_time_series(
seed::Int64,
state_dim::Int64,
tanl::Float64,
nanl::Int64,
spin::Int64,
diffusion::Float64,
F::Float64
)
# Do stuff...
end
The JOSS paper gives a very good introduction into the topic of data assimilation, provides a nice overview of the project, and compares it well to similar packages. I only have a few suggestions to make it even more concise. Most of them concern sections that are part of the README.md
and hence not required in the paper.
README.md
. The first paragraph of the Installation section currently gives an overview of the source code structure. This can be moved to the Summary section. I also think it suffices to mention in the Summary that the package is registered in the Julia General registry and hence can be downloaded and installed via REPL.grudzien2020numerical
appears twice in the list of references. I am not sure what causes this. It might be an issue with the JOSS paper generator.Regarding the review openjournals/joss-reviews#4129
The upper cases should be respected in the bibliography https://github.com/cgrudz/DataAssimilationBenchmarks.jl/blob/master/paper.bib
For instance, State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems
is missing the upper case to Gaussian
.
The README.md
documents the API of the module. As the argument lists are quite extensive, I think it would be helpful to add "typical" usage examples for some of the methods and solvers. In particular, this would help to demonstrate some kind of workflow. Starting out with generating a time series, an example could show how to run different data assimilation methods included in this package, and then how to evaluate the output and compare the methods against each other.
Currently, one has to look up a typical argument list in the tests, which is cumbersome.
Community guidelines are a review criterion of JOSS publications. Please add information to this repository on how to seek support, report issues, and how to contribute. This can be supplied in sections of the README.md
and in a separate CONTRIBUTING.md
file. More information on this topic can be found, e.g., in the Github docs: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors
As far as I see, functions in this package currently assume that every state dimension is always observed (with some observation error). From my experience with data assimilation, this is rarely the case, and the performance of DA algorithms can strongly differ depending on which, and how many, state dimensions are part of the observation. For a DA comparison framework, I suggest more control over observations (i.e., observation operators) at the user side of things. A first step would be to determine which state dimensions are observed, e.g., via a vector of indices that is supplied as function argument. A further improvement would be the option to add the complete observation operator matrix which calculates the observations via multiplication with the state vector.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Related to JOSS submission openjournals/joss-reviews#4129
I think it would be great to have a badge instead of the link to the documentation in the readme, and also to "promote" the stable documentation additionally to the master-version.
[docs-stable-img]: https://img.shields.io/badge/docs-stable-blue.svg
[docs-stable-url]: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/stable
[docs-dev-img]: https://img.shields.io/badge/docs-dev-purple.svg
[docs-dev-url]: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/dev
Let me know if you think it's a good idea.
The analysis scripts currently seem defunct, as they try to run visualization scripts on specific, partially hard-coded paths. I think they would be very helpful to immediately visualize the results. They could be reworked to take a path to a .jld2
data file, similar to the other functions of this package, and then analyze/visualize the data contained in it.
The README.md
lacks instructions on how to extend the framework with new models and new DA schemes. Independently from my suggestion to change how DA schemes can be specified when calling functions of this module (#13), the current way of implementing additions should be documented. For example, the DAPPER Python library documents how to add models and at least refers to examples for adding DA schemes. I understand that defining interfaces for DA schemes and models is more difficult in Julia because the language features no classes like in C++ or Python. However, I think that at least a rough implementation guideline should be supplied. This can be done in the README.md
, or in another document if the instructions become too lengthy.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.