cgrudz / dataassimilationbenchmarks.jl Goto Github PK

View Code? Open in Web Editor NEW

13.0 13.0 5.0 75.59 MB

Package Information and Documentation

Home Page: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/dev/

License: Apache License 2.0

Julia 71.84% Python 27.39% TeX 0.74% Shell 0.02%

dataassimilationbenchmarks.jl's People

Contributors

Stargazers

Watchers

Forkers

amani-jr sukhreens c-merchant peanutfun tmigot

dataassimilationbenchmarks.jl's Issues

Consider adding docstrings to exported functions

The public API of the DataAssimilationBenchmarks.jl module is currently documented in the README.md. This clutters the document and does not adhere to a standardized documentation formatting.

Julia enables object documentation via docstrings, and advertises a standard to write such documentations. This has the advantage that the documentation is part of the actual source code, and that a formatted documentation can always be generated from the source code with Documenter.jl.

Please consider removing the API documentation from the README.md, and adding appropriate docstrings to the functions of the module that are part of the public API. I think this would work well together with my suggestion in #7, thus making the README.md a more "high-level" overview of the package.

openjournals/joss-reviews#4129

Options for injecting custom transform/analysis algorithms

One typical use case I imagine for this package is the comparison of the filter/smoother methods implemented here with one implemented in another package. However, it is currently very complicated for users to add a new analysis scheme, or entire filter/smoother method to this package.

The "heavy-lifting" of filter and smoother operations is currently done by the EnsembleKalmanSchemes.transform function, which implements the different analysis schemes and discerns them via the analysis::String argument. To add a new analysis scheme, users have to alter this file and necessarily need to familiarize themselves with the source code of this package. Although this would require quite a bit of work, I think it would be great if the functions of this package supported the injection of analysis schemes via function arguments. This requires defining and documenting a proper interface for the analysis scheme functions.

In theory, this could be extended to all parts of the filter/smoother algorithms, most prominently also the state inflation.

Here's a crude example on how this could look like:

# ---
# DataAssimilationBenchmark.jl code

# Define module containing analysis functions
module AnalysisSchemes

function enkf(...)
    # EnKF analysis function...
    # This is the code that is currently located in the "analysis == enkf" branch of transform()
end

end  # module AnalysisSchemes


module EnsembleKalmanSchemes

# Re-define function argument
function ensemble_filter(analysis::Function, ...)
    # ...
    analysis(...)  # Call the passed analysis function here
    # ...
end

end  # module EnsembleKalmanSchemes

# ---
# User code
using DataAssimilationBenchmarks
using EnsembleKalmanSchemes

ensemble_filter(AnalysisSchemes.enkf, ...)  # Call with EnKF analysis scheme from this package
# ensemble_filter("enkf", ...)  # <-- This is what the same call currently looks like

function my_fancy_enkf(...)
    # My own EnKF analysis function!
end

ensemble_filter(my_fancy_enkf, ...)  # Call with custom EnKF analysis function, without modifying package code!

openjournals/joss-reviews#4129

Document output file structure

All operations in this module are output file based, meaning that no function returns its data, but all data is written into output files to be retrieved later on. For this to be viable also in conjunction with other modules, it would be helpful to have the structure of the output documented. Which datasets are written by the, e.g., l96_time_series and filter_state functions, and what are their data types and dimensions? This would particularly help when writing scripts to analyze the results.

The output documentation could be part of the function docstring (see #8), or added to the README.md where appropriate.

openjournals/joss-reviews#4129

Favor named argument list over argument tuple

Is there a particular reason why most functions in this package use one tuple as argument instead of multiple arguments? As the tuple effectively causes the arguments to be unnamed, it is very difficult to understand the usage of the functions from just looking at their signature. Since the tuple arguments are usually unpacked right away (see, e.g., /src/experiments/GenerateTimeSeries.jl#L18), I see no reason why the function argument list should not be used to define these variables in the first place.

Example:

# Now:
function L96_time_series(args::Tuple{Int64,Int64,Float64,Int64,Int64,Float64,Float64})
    seed, state_dim, tanl, nanl, spin, diffusion, F = args
    # Do stuff...
end

# Better, imo:
function L96_time_series(
    seed::Int64,
    state_dim::Int64,
    tanl::Float64,
    nanl::Int64,
    spin::Int64,
    diffusion::Float64,
    F::Float64
)
    # Do stuff...
end

openjournals/joss-reviews#4129

Suggestions for improving JOSS paper

The JOSS paper gives a very good introduction into the topic of data assimilation, provides a nice overview of the project, and compares it well to similar packages. I only have a few suggestions to make it even more concise. Most of them concern sections that are part of the README.md and hence not required in the paper.

Remove the Documentation section. A package documentation is a review criterion of a JOSS publication and hence will exist in the package repository.
Remove the Installation section. Installation instructions are part of the README.md. The first paragraph of the Installation section currently gives an overview of the source code structure. This can be moved to the Summary section. I also think it suffices to mention in the Summary that the package is registered in the Julia General registry and hence can be downloaded and installed via REPL.
The reference grudzien2020numerical appears twice in the list of references. I am not sure what causes this. It might be an issue with the JOSS paper generator.
Please add DOIs to every reference. If no DOI is available, please add an URL to the source.
For referencing Numba, please refer to the respective instructions in the documentation.
Likewise, the Julia docs encourage a citation when using Julia for research.
Personally, I would not consider the webpage https://julialang.org/benchmarks/ a viable reference. I suggest adding a footnote in this case instead.
The reference to the Zenodo archive of DAPPER does not have the same author list as given by the citation instructions of said archive. However, since the archive contains a presumably outdated version, I suggest referencing the Github repository https://github.com/nansencenter/DAPPER instead.

openjournals/joss-reviews#4129

Upper-cases in bibliography

Regarding the review openjournals/joss-reviews#4129

The upper cases should be respected in the bibliography https://github.com/cgrudz/DataAssimilationBenchmarks.jl/blob/master/paper.bib

For instance, State-of-the-art stochastic data assimilation methods for high-dimensional non-Gaussian problems is missing the upper case to Gaussian.

Add usage examples and workflow to README.md

The README.md documents the API of the module. As the argument lists are quite extensive, I think it would be helpful to add "typical" usage examples for some of the methods and solvers. In particular, this would help to demonstrate some kind of workflow. Starting out with generating a time series, an example could show how to run different data assimilation methods included in this package, and then how to evaluate the output and compare the methods against each other.

Currently, one has to look up a typical argument list in the tests, which is cumbersome.

openjournals/joss-reviews#4129

Add community guidelines

Community guidelines are a review criterion of JOSS publications. Please add information to this repository on how to seek support, report issues, and how to contribute. This can be supplied in sections of the README.md and in a separate CONTRIBUTING.md file. More information on this topic can be found, e.g., in the Github docs: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/setting-guidelines-for-repository-contributors

openjournals/joss-reviews#4129

Options for custom observations or observation operators

As far as I see, functions in this package currently assume that every state dimension is always observed (with some observation error). From my experience with data assimilation, this is rarely the case, and the performance of DA algorithms can strongly differ depending on which, and how many, state dimensions are part of the observation. For a DA comparison framework, I suggest more control over observations (i.e., observation operators) at the user side of things. A first step would be to determine which state dimensions are observed, e.g., via a vector of indices that is supplied as function argument. A further improvement would be the option to add the complete observation operator matrix which calculates the observations via multiplication with the state vector.

openjournals/joss-reviews#4129

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Add badges for doc in README

Related to JOSS submission openjournals/joss-reviews#4129

I think it would be great to have a badge instead of the link to the documentation in the readme, and also to "promote" the stable documentation additionally to the master-version.

[docs-stable-img]: https://img.shields.io/badge/docs-stable-blue.svg
[docs-stable-url]: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/stable
[docs-dev-img]: https://img.shields.io/badge/docs-dev-purple.svg
[docs-dev-url]: https://cgrudz.github.io/DataAssimilationBenchmarks.jl/dev

Let me know if you think it's a good idea.

Rework analysis scripts to take path to data file as argument

The analysis scripts currently seem defunct, as they try to run visualization scripts on specific, partially hard-coded paths. I think they would be very helpful to immediately visualize the results. They could be reworked to take a path to a .jld2 data file, similar to the other functions of this package, and then analyze/visualize the data contained in it.

Add instructions on how to add new DA schemes and models

The README.md lacks instructions on how to extend the framework with new models and new DA schemes. Independently from my suggestion to change how DA schemes can be specified when calling functions of this module (#13), the current way of implementing additions should be documented. For example, the DAPPER Python library documents how to add models and at least refers to examples for adding DA schemes. I understand that defining interfaces for DA schemes and models is more difficult in Julia because the language features no classes like in C++ or Python. However, I think that at least a rough implementation guideline should be supplied. This can be done in the README.md, or in another document if the instructions become too lengthy.

openjournals/joss-reviews#4129

cgrudz / dataassimilationbenchmarks.jl Goto Github PK

dataassimilationbenchmarks.jl's People

Contributors

Stargazers

Watchers

Forkers

dataassimilationbenchmarks.jl's Issues

Recommend Projects

Recommend Topics

Recommend Org