juliadynamics / delayembeddings.jl Goto Github PK

Delay coordinates embeddings and optimizing them

License: MIT License

Julia 100.00%

delay-embedding nonlinear-dynamics chaos nonlinear hacktoberfest timeseries nonlinear-timeseries-analysis reconstruction

delayembeddings.jl's Introduction

DelayEmbeddings.jl

A Julia package that provides a generic interface for performing delay coordinate embeddings, as well as cutting edge algorithms for creating optimal embeddings given some data. It can be used as a standalone package, or as part of DynamicalSystems.jl.

To install it, run import Pkg; Pkg.add("DelayEmbeddings").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

delayembeddings.jl's People

Contributors

Stargazers

Watchers

Forkers

alcap23 kahaaga standardgalactic aladinoster ikottlarz denizhanpak

delayembeddings.jl's Issues

Re-do all embeddings based on `genembed`

With the latest addition of genembed, there is framework to do any kind of embedding imaginable, and the need for MTDelayEmbedding, DelayEmbedding does not exist anymore.

This will dramatically reduce source code complexity, but one has to check performance. Also, how to do the weighted embedding in this scenario.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Check (and if possible optimize) convertion Dataset <-> Matrix

See also discussion in JuliaDynamics/RecurrenceAnalysis.jl#20

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Normalization factor in Uzal-costfunction might be wrong.

Hey all,

take a look here. This seems to be valid. The correction also follows from Eqs. (17), (18) & (20) in the original publication. Thanks to @JayeshMD , who digged deeper into the math. However, this will change the general offset of the L-function and Fig.7 in the original paper would look qualitatevily the same (even better), but the graphs have a different offset. The change in the code is subtle, but it does make sense from a physical perspective I'd say. I also tried around with L-functions of different time series lengths of the same system and got similar results as @JayeshMD has shown in his slides: With high time series lengths the value of L saturates/converges. This does not happen the way it is indicated in the paper and also implemented here so far.

Another remark regarding PECUZAL: PECUZAL is not affected by these changes, since there we look at ratios of L for adjacent embedding dimensions and, thus, the time series length cancels out.

I am already preparing a PR and will adjust the tests, which I compare with the outcomes of the PECUZAL-implementations in MATLAB and Python. But I can not use the Figures in the original publication as a benchmark anymore. So let me know, if you are fine with that. While doing that I will also adjust the tests for PECUZAL, regarding #95 .

Cheers

Pointless index-returns from `all_neighbors`

The function all_neighbors returns pointless indices and according nearest neighbor values of -Inf, when the Theiler window and the length of the dataset do not allow an approariate declaration of Knearest neighbors. Specifically, that happens, when the Theiler window is large compared to the time series length. It would be better to assert that this does not happen, or at least throw a meaningful error in this case.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Improve the tests, like, a lot

One of the things the Good Scientific Code Workshop teaches is writing good unit tests. I have to admit, this repository suffers tremendously from really bad tests when it comes to the delay embedding tests. Practically all tests test if the output of the functions matchjes the value of the output of the same functions in some pre-existing data. I am pasting here the slide with "good advice on writing tests":

Actually unit: test atomic, self-contained functions. Each test must test only one thing, the unit. When a test fails, it should pinpoint the location of the problem. Testing entire processing pipelines (a.k.a. integration tests) should be done only after units are covered, and only if resources/time allow for it!
Known output / Deterministic: tests defined through minimal examples that their result is known analytically are the best tests you can have! If random number generation is necessary, either test valid output range, or use seed for RNG
Robust: Test that the expected outcome is met, not the implementation details. Test that the target functionality is met without utilizing knowledge about the internals. Also, never use internal functions in the test suite.
High coverage: the more functionality of the code is tested, the better
Clean slate: each test file should be runnable by itself, and not rely on previous test files
Fast: use the minimal amount of computations to test what is necessary
Regression: Whenever a bug is fixed, a test is added for this case
Input variety: attempt to cover a wide gambit of input types

One doesn't have to worry about re-writing all tests. In fact, a PR "correcting" a single test file is already very much welcomed!

The best place to start is re-writting delay embedding tests so that they are more flexible (and not test whether the found delay time is e.g., exactly 42), and to be analytically resolvable, e.g., test things that we know for sure what the outcome should be, like a dataset with cosine and sine as the timeseries (the embedding dimension here is clearly 2). Also separate the files to be individually runnable and not rely on global state. ANother analytic test: get the Lorenz96 and generate timeseries with 4 and with 6 oscillators. We do not know analytically the fractal dimension of the 4 and 6 case, but we do know analytically that the 6 case has larger fractal dimension. Hence, our embedding must be higher dimensional in the 6 over the 4 case.

New method for estimating Theiler window: space time separation plot: Provenzale et al (1992)

I'm reading again the Nonlinear Time Series Analysis textbook by Kantz & Schreiber. In section 6.5 they describe a rather straightforward method for estimating a good value of the Theiler window. It is a method by Provenzale et al

Provenzale, A., Smith, L. A., Vio, R. &Murante, G. (1992). Distinguishing between low-dimensional dynamics and randomness in measured time series. Physica D, 58, 31. https://www.sciencedirect.com/science/article/abs/pii/0167278992901002

From Kantz's textbook:

The idea is that in the presence of temporal correlations the probability that a given pair of points has a distance smaller than ε does not only depend on ε but also on the time that has elapsed between the two measurements. This dependence can be detected by plotting the number of pairs as a function of two variables, the time separation Δt and the spatial distance ε. Increase versus Δt is very rapid at the start, but quickly saturates as Δt increases.

I think implementing this method is easy, and a threshold for saturation of increase versus Δt (keyword argument) can be used to choose w.

Average displacement from the diagonal method for delay time in traditional embedding

In the traditional embedding scheme where one calculates independently delay time and embedding dimension, there is another method to compute optical delay time, besides automutual information minima.

This is the "average displacement of the time-delayed vectors from the phase space diagonal" from:

Paper: https://www.delucafoundation.org/download/bibliography/de-luca/061.pdf

Implementing this is easy, and shouldn't take more than 10 lines of code!

Error with Dataset's hcat function

I tried to run the example at https://juliadynamics.github.io/DynamicalSystems.jl/latest/embedding/unified/ for the pecuzal embedding using a unidimensional timeseries I have.

This is the error message I get:

julia> Y, τ_vals, ts_vals, Ls, εs = pecuzal_embedding(test_series; τs = 0:Tmax , w = theiler, econ = true)
Initializing PECUZAL algorithm for univariate input...
Starting 1-th embedding cycle...
Starting 2-th embedding cycle...
Algorithm stopped due to increasing L-values. VALID embedding achieved ✓.
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure, a comprehension or a generator.
Stacktrace:
 [1] Dataset(::SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}, ::Vararg{SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}})
   @ DelayEmbeddings ~/.julia/packages/DelayEmbeddings/DdOZb/src/dataset.jl:202
 [2] hcat_lagged_values(Y::Vector{Float64}, s::Vector{Float64}, τ::Int64)
   @ DelayEmbeddings ~/.julia/packages/DelayEmbeddings/DdOZb/src/utils.jl:147
 [3] pecuzal_embedding(s::Vector{Float64}; τs::UnitRange{Int64}, w::Int64, samplesize::Int64, K::Int64, KNN::Int64, L_threshold::Int64, α::Float64, p::Float64, max_cycles::Int64, econ::Bool, verbose::Bool)
   @ DelayEmbeddings ~/.julia/packages/DelayEmbeddings/DdOZb/src/unified_de/pecuzal.jl:127
 [4] top-level scope
   @ REPL[21]:1
 [5] top-level scope
   @ ~/.julia/packages/CUDA/Axzxe/src/initialization.jl:52

My package versions:

(@v1.7) pkg> st
      Status `~/.julia/environments/v1.7/Project.toml`
  [f20549b4] AlphaStableDistributions v1.1.3
  [a134a8b2] BlackBoxOptim v0.6.1
  [336ed68f] CSV v0.10.3
  [052768ef] CUDA v3.8.3
  [a93c6f00] DataFrames v1.3.2
  [2b5f629d] DiffEqBase v6.82.0
  [aae7a2af] DiffEqFlux v1.45.1
  [0c46a032] DifferentialEquations v7.1.0
  [31c24e10] Distributions v0.25.49
  [61744808] DynamicalSystems v1.8.3
  [587475ba] Flux v0.12.9
  [7073ff75] IJulia v1.23.2
  [2b0e0bc5] LanguageServer v4.1.1-DEV `/home/tomas/julia-langserver/LanguageServer.jl#master`
  [2fda8390] LsqFit v0.12.1
  [961ee093] ModelingToolkit v8.5.1
  [429524aa] Optim v1.6.2
  [91a5bcdd] Plots v1.26.0
  [e409e4f3] PoissonRandom v0.4.0
  [e6cf234a] RandomNumbers v1.5.3
  [276daf66] SpecialFunctions v1.8.4
  [99342f36] StateSpaceModels v0.6.1
  [90137ffa] StaticArrays v1.4.1
  [cf896787] SymbolServer v7.2.0
  [0c5d862f] Symbolics v4.3.0
  [ac1d9e8a] ThreadsX v0.1.9
  [9e3dc215] TimeSeries v0.23.0
  [f218859d] TimeseriesPrediction v0.7.0
  [ade2ca70] Dates
  [37e2e46d] LinearAlgebra
  [10745b16] Statistics

Deprecations from Peaks.jl

Hi Hauke,

Peaks.jl which you use in tests (I think) has deprecations. If you run the tests by simply doing ] test DelayEmbeddings you'll see warnings like:

 ┌ Warning: `maxima(x, w = 1, strictbounds = true)` is deprecated, use `argmaxima(x, w; strictbounds = strictbounds)` instead.
│   caller = maxima(::Array{Float64,1}) at deprecated.jl:71
└ @ Peaks ./deprecated.jl:71
┌ Warning: `minima(x, w = 1, strictbounds = true)` is deprecated, use `argminima(x, w; strictbounds = strictbounds)` instead.
│   caller = minima(::Array{Float64,1}) at deprecated.jl:71
└ @ Peaks ./deprecated.jl:71

These warnings will soon become broken code if not taken care of. This isn't a priority so feel free to take your time to address it.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

Function ```estimate_dimension``` is missing from ```separated_de/estimate_dimension.jl```

Describe the bug
When trying to call the function estimate_dimension e.g. as estimate_dimension(signal, ...) the following error is returned:

ERROR: UndefVarError: `estimate_dimension` not defined
Stacktrace:
 [1] getproperty
   @ .\Base.jl:31 [inlined]
 [2] avg_dimension(sigs::Matrix{Float64})
   @ Main d:\Corentin\Julia\Glial_mitochondria\src\mitochondrial_dynamics.jl:34
 [3] top-level scope
   @ d:\Corentin\Julia\Glial_mitochondria\src\mitochondrial_dynamics.jl:57

Looking at the source code in separated_de/estimate_dimension.jl there seems to be no such function there, as opposed to the function estimate_delay, which is found in estimate_delay.jl

Minimal Working Example

using DynamicalSystems
using Random
test = rand(1000)
 estimate_dimension(signal)

Package versions

DyncamicalSystems 3.2.3

Metric used for the estimation of dimensions

There is an inconsistency in the metrics used for calculating the optimal dimensions of reconstruct / embed.

In the code of the internal function _average_a (used by estimate_dimension), the distance δ of nearest neighbors is calculated using an infinity norm, as in Cao's paper - e.g. δ = norm(R2[i]-R2[j], Inf). However, the KDTree that was used to find the nearest neighbors is calculated with the default options (tree2 = KDTree(R2)), including Euclidean distances (as said in https://github.com/KristofferC/NearestNeighbors.jl).

The KDTree should be calculated with the Chebyshev metric to be consistent. Another option (which I advocate) is let the user choose what Metric is used. PR #9 includes code to facilitate this (only for the calculation of δ, not for the KDTree).

The authors of the other methods (FNN, FFNN) used Euclidean distances in their papers. This is the only option consistent with Kennel's equations [4-5] (FNN), but in the case of the FFNN, other metrics could be used without problem.

Selection of Embedding Dimension and Delay Time in PhaseSpace Reconstruction via Symbolic Dynamics

Yet another technique for optimal embedding, based on a very recent, just out of the oven, paper:

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwiu8MaP5NrwAhXS2aQKHfIlBZ4QFjAAegQIBRAD&url=https%3A%2F%2Fwww.mdpi.com%2F1099-4300%2F23%2F2%2F221%2Fpdf&usg=AOvVaw2dogcAC-Stuu-46rVORutH

Rename all `D` to `γ` in docstrings and docpages

Through experience we have realized that one cannot longer call the D parameter "embedding dimension" but only the "number of temporal neighbors".

That is because D doesn't always directly correspond to the resulting dimension. For example, if one does multiple timeseries embedding (which we have support for) then D is not the dimension. The same concept expands to TimeseriesPrediction.

After consideration we have decided that γ is a good name for this parameter, which will stand for the amount of temporal neighbors to be used in the embedding.

Improve performance and tests of `statespace_sampler`

Alright, the newly added statespace_sampler #114 has abysmal performance and can be optimized very easily by preallocating the vector and then taking random.

Mutual information between two timeseries?

Hi @heliosdrm ,I am wondering... We have this great Mutual Information function. At the moment I have two timeseries x, y and I want to calculate the mutual information between x and y delayed by τ.

At the moment I am using "InformationMeasures", but I am not really happy with that package. It has bad syntax, and doesn't even have a project toml... I want to get rid of it and add such a method here. I was thinking of copy pasting their code here and making it drastically smaller and with simpler syntax, but first I should ask you: is it possible to add mutual information here between two timesries, given the method you have written?

(to clarify: I don't have a good idea how to get mutual info from two variables, besides doing all the histograms from scratch. that's why I use a package)

[FR] Optimal embedding parameters: a modelling paradigm by Small and Tse

Related with #42 , yet another method that picks optimal embedding parameters based on timeseries prediction error and the product d*τ of embedding dimension and delay time.

https://www.sciencedirect.com/science/article/pii/S0167278904001186

Merging multiple datasets in one operation

Is it possible to merge multiple Datasets into one larger Dataset using a single operation, say splatting a tuple of Datasets like Dataset(ds...)?

For example, if I have three Dataset{2} with the same length, I can do the following to obtain a Dataset{6}.

julia> x, y, z = Dataset([1:10 2:11]), Dataset([3:12 4:13]), Dataset([5:14 6:15])
julia> Dataset(Dataset(x, y), z)
6-dimensional Dataset{Int64} with 10 points
  1   2   3   4   5   6
  2   3   4   5   6   7
  3   4   5   6   7   8
  4   5   6   7   8   9
  5   6   7   8   9  10
  6   7   8   9  10  11
  7   8   9  10  11  12
  8   9  10  11  12  13
  9  10  11  12  13  14
 10  11  12  13  14  15

But I'd like to be able to just Dataset(x, y, z, ...), potentially with more than three variables. It is of course possible to do nested calls to the Dataset constructor, but that leads to ugly code. The following doesn't work, but it would be nice if it did:

julia> Dataset(x, y, z)
ERROR: MethodError: no method matching Dataset(::Dataset{2, Int64}, ::Dataset{2, Int64}, ::Dataset{2, Int64})
Closest candidates are:
  Dataset(::AbstractDataset{D1, T}, ::AbstractDataset{D2, T}) where {D1, D2, T} at ~/.julia/packages/DelayEmbeddings/Lcs5l/src/datasets/dataset.jl:206
  Dataset(::Vector{<:Real}, ::AbstractDataset{D, T}) where {D, T} at ~/.julia/packages/DelayEmbeddings/Lcs5l/src/datasets/dataset.jl:208
  Dataset(::Dataset) at ~/.julia/packages/DelayEmbeddings/Lcs5l/src/datasets/dataset.jl:163
  ...
Stacktrace:
 [1] top-level scope
   @ REPL[52]:1

And the following creates a dataset of datasets:

julia> Dataset([x, y, z])
1-dimensional Dataset{Dataset{2, Int64}} with 3 points
 2-dimensional Dataset{Int64} with 10 points
 2-dimensional Dataset{Int64} with 10 points
 2-dimensional Dataset{Int64} with 10 points

Persistent homology based estimation of optimal delay time

Recent paper titled Selecting embedding delays: An overview of embedding techniques and a new method using persistent homology, with abstract:

Delay embedding methods are a staple tool in the field of time series analysis and prediction. However, the selection of embedding parameters can have a big impact on the resulting analysis. This has led to the creation of a large number of methods to optimize the selection of parameters such as embedding lag. This paper aims to provide a comprehensive overview of the fundamentals of embedding theory for readers who are new to the subject. We outline a collection of existing methods for selecting embedding lag in both uniform and non-uniform delay embedding cases. Highlighting the poor dynamical explainability of existing methods of selecting non-uniform lags, we provide an alternative method of selecting embedding lags that includes a mixture of both dynamical and topological arguments. The proposed method, Significant Times on Persistent Strands (SToPS), uses persistent homology to construct a characteristic time spectrum that quantifies the relative dynamical significance of each time lag. We test our method on periodic, chaotic, and fast-slow time series and find that our method performs similar to existing automated non-uniform embedding methods. Additionally, n-step predictors trained on embeddings constructed with SToPS were found to outperform other embedding methods when predicting fast-slow time series.

Some code is available here: https://github.com/eugenetkj98/SToPS_Public

Yet another method that would be nice to have in our comprehensive library.

Return type of one-dimensional Datasets slices

In other places (e.g in JuliaDynamics/RecurrenceAnalysis.jl#106) we have been discussing why multi-dimensional time series should be using the Dataset type instead of AbstractMatrix.

I think that one of the selling points of Dataset is that it can be used to iterate over the multi-dimensional points just as it is done with vectors, e.g. if you have a series of 3-dimensional points, you can have them in a dataset data and iterate over those points just with for x in data. On the other hand, with AbstractMatrices you have to take into account if the points are organized in rows or columns, and then do for x in eachrow(data) etc.

However that argument would be stronger if one-dimensional slices like data[1:10] also returned a Dataset. As it is now, that returns a vector of SArray, and the straightforward way of slicing Datasets is with matrix-like indexing, i.e. data[1:10, :]. This is not good for generic code that would like to manage Datasets more or less like Vectors. (v[1:10, :] for vectors returns a matrix!)

Would it be sensible to rewrite the method Base.getindex(d::AbstractDataset, r::AbstractRange) to achieve that, or would it be too breaking?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

The Takens embedding theorem

This is just a minor note on language.

At the moment you spell it as "Taken's Theorem" - at least in the README - while its eponym was called "Takens" by family name. Therefore we usually refer to it as "Takens' theorem" or just "the Takens theorem".

`Dataset`s fail on Julia v1.7

I just did a fresh install of Julia 1.7, which was released today. It is no longer possible to generate Datasets from multiple time series.

julia> using DelayEmbeddings

julia> x, y = rand(10), rand(10)
([0.17779520192659526, 0.563263248796157, 0.39139686493332115, 0.15479089624077924, 0.3934735362718357, 0.3407797126800054, 0.16178868389420253, 0.1382236336820647, 0.916694491692921, 0.9921268976184539], [0.7051381564531282, 0.8434716929486992, 0.6192484967072842, 0.49990774114108527, 0.6775452485768428, 0.7925747656345918, 0.9019177093085311, 0.61166286182219, 0.20256010573103433, 0.6142453376316591])

julia> Dataset(x, y)
ERROR: The function body AST defined by this @generated function is not pure. This likely means it contains a closure, a comprehension or a generator.
Stacktrace:
 [1] Dataset(::Vector{Float64}, ::Vararg{Vector{Float64}})
   @ DelayEmbeddings ~/.julia/packages/DelayEmbeddings/rap7T/src/dataset.jl:189
 [2] top-level scope
   @ REPL[7]:1

Creating datasets from matrices still work, though:

julia> x = rand(10, 3);

julia> Dataset(x)
3-dimensional Dataset{Float64} with 10 points
 0.159304   0.596368   0.70848
 0.421832   0.412474   0.405058
 0.424306   0.702204   0.1256
 0.298175   0.648843   0.945673
 0.301568   0.656165   0.982148
 0.891723   0.0634695  0.217265
 0.585401   0.58202    0.663207
 0.200097   0.427457   0.386551
 0.931868   0.976055   0.0482512
 0.0363905  0.788387   0.035237

Vectors of vectors work too:

julia> x = [rand(3) for i = 1:10];

julia> Dataset(x)
3-dimensional Dataset{Float64} with 10 points
 0.423361   0.772301  0.543372
 0.552137   0.843886  0.037349
 0.790341   0.849523  0.563382
 0.239326   0.663965  0.598404
 0.888437   0.295614  0.0433812
 0.501872   0.649074  0.485239
 0.0605288  0.811402  0.682475
 0.653849   0.203677  0.71336
 0.952275   0.69277   0.387057
 0.698991   0.622259  0.00362218

Error in estimate_delay when using exp_decay

The function lag = estimate_delay(tr, "exp_decay", 0:100) can throw an error of the type

ERROR: The correlation function has elements that are ≤ 0. We can't fit an exponential to it. Please choose another method.

It is clear what it means (and the error can be avoided when using a shorter delay interval), but in my opinion it would be better to catch the situation where auto-correlation values could be negative and to truncate the auto-correlation function to the interval of positive values.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Feature request: Pecora method

There is a great idea for time delay embedding that outperforms the simple approach. It was described in Pecora et al: A unified approach to attractor reconstruction, Chaos 17, 2007 (http://dx.doi.org/10.1063/1.2430294).

It would be nice to have this algorithm included.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

optimal embedding delay/dimension that maximize entropy

@Datseris Thank you for attending the "coming soon" of DelayEmbeddings.

Now that you have all the building blocks in place for estimating the time delay, dimension and producing recurrence plots, this section should conclude with a method/function to determine the optimal embedding parameters so as to maximize some measure of entropy. Candidate methods are: Delay Vector Variance as in: https://www.commsp.ee.ic.ac.uk/~mandic/dvv.htm and Permutation Entropy Based on Non-Uniform Embedding https://epubl.ktu.edu/object/elaba:30704063/index.html. The latter is a shortcut to current state-of-the-art on optimization of embedding parameters although it fails to mention the former.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Could we define "views" of `Dataset`s?

The only way of taking a fragment of a Dataset is creating another one. This is not optimal, and it might be good to have another subtype of AbstractDataset, whose contents are a view to the data of another Dataset.

The easiest would be to define:

Base.view(d::AbstractDataset, i) = view(d.data, i)

and then some type like DatasetView <: AbstractDataset whose data is a SubArray{<:SArray}, such that one could do:

d = Dataset(x)
dv = Dataset(@view d[xx])

(Notice that this code actually works right now, but it produces a copy of the contents before creating dv.)

Such dv::DatasetView should behave as if I had written dv = d[xx], but that way we would save the superfluous allocation.

I acknowledge that it would be nicer if the interface is directly dv = @view d, but I'm not sure if that would be right. Base.view (and thus the macro @view) is meant to produce a SubArray, which is defined as a container encoding a "view" of a parent AbstractArray. So, although technically it might be feasible to write a method of Base.view that takes and returns an AbstractDataset, for me that would only feel right if we define AbstractDataset <: AbstractArray -- which maybe makes sense too, but might need some deeper thought to ensure that the API of AbstractDataset is fully consistent with what is expected from AbstractArrays (or at least AbstractVectors).

Make `Matrix` available as return type of `orthonormal`

Issue description

The orthonormal function that is also used in lyapunovspectrum over in ChaosTools.jl always returns a SMatrix. This can lead to issues with high-dimensional systems, as

A very rough rule of thumb is that you should consider using a normal Array for arrays larger than 100 elements.
( Readme of StaticArrays.jl)

The return type should be adjustable depending on the system size (see this issue in ChaosTools.jl)

use Neighborhood API

All existing "neighborhood" related information in DelayEmbeddings.jl needs to be deleted and replaced by the new Neighborhood.jl.

The two neighborhood types will be exported again of course for clarity, and also mentioned in the documentation of DynamicalSystems.jl.

We have a bunch of code still using NearestNeighbors.knn as well, which we have to update.

Increase efficiency of `_average_a`

AT the moment

function _average_a(s::AbstractVector{T},γ,τ) where T
    #Sum over all a(i,d) of the Ddim Reconstructed space, equation (2)
    R1 = reconstruct(s,γ+1,τ)
    R2 = reconstruct(s[1:end-τ],γ,τ)

However it is clear that from here:

    for (i, γ) ∈ enumerate(γs)
        aafter = _average_a(s, γ+1, τ)
        E1s[i] = aafter/aprev
        aprev = aafter
    end

each reconstruction is actually done twice! This is inefficient and instead R2 should be replaced by R1 in each iteration. This will lead to type instability, but it may be faster anyway.

dissalow using `Size` and co in delayembeddings and all subsequent functions.

Instead accept either only a trajectory or a dataset.

In addition, separate the docstring into two: one which times timeseries s and one which takes trajectory tr.

Resolve circular dependencies

DynamicalSystemsBase.jl depends on DelayEmbeddings.jl. However DelayEmbeddings.jl uses DynamicalSystemsBase.jl in its test suite. That creates problems when updating major versions (in fact it makes them impossible on paper).

The only way to resolve this is to not use DSB.jl in DE.jl's test. One solution is to write all trajectories needed into CSV files and save them in the main JuliaDynamics repo that holds the docs, and download them each time tests are run (we already do this for some trajectories).

Unified embedding tests are brittle

We have established the validity of our unified embedding methods because we even wrote a paper utilizing them in detail and comparing them with each other. There isn't any doubt that they work. However, the way we have written our tests is brittle, because (due to chaos) running the test on different machines may make them fail. For example while working on PR #93 I've seen things like:

Multivariate example: Test Failed at c:\Users\m300808\.julia\dev\DelayEmbeddings\test\unified\test_pecuzal_embedding.jl:72
  Expression: -0.356 < Ls[2] < -0.355
   Evaluated: -0.356 < -0.32831887625247136 < -0.355

Testing pecuzal_method.jl...
Initializing PECUZAL algorithm for univariate input...
Starting 1-th embedding cycle...
Starting 2-th embedding cycle...
Starting 3-th embedding cycle...
Algorithm stopped due to increasing L-values. VALID embedding achieved ✓.
  5.931637 seconds (15.70 M allocations: 2.972 GiB, 14.40% gc time)
Univariate example: Test Failed at c:\Users\m300808\.julia\dev\DelayEmbeddings\test\unified\test_pecuzal_embedding.jl:19
  Expression: -0.728 < Ls[1] < -0.727
   Evaluated: -0.728 < -0.7839819804977819 < -0.727

and even

Initializing PECUZAL algorithm for multivariate input...
Starting 1-th embedding cycle...
Starting 2-th embedding cycle...
Starting 3-th embedding cycle...
Algorithm stopped due to increasing L-values. VALID embedding achieved ✓.
 21.312898 seconds (71.51 M allocations: 11.922 GiB, 8.30% gc time)
Multivariate example: Test Failed at c:\Users\m300808\.julia\dev\DelayEmbeddings\test\unified\test_pecuzal_embedding.jl:55
  Expression: length(ts_vals) == 5
   Evaluated: 3 == 5

(i.e. a lower-dimensional embedding is produced sometimes)

We need to re-write tests in a different manner. First increase the error margins, but also come up with tests that are not so brittle and depend sensitively on the chaotic nature of trajectories. Furthermore, this issue needs to be solved in conjuction with #94 , which would make trajectories no longer simulated but rather available as .csv files from the get go.

@hkraemer I'm also cc'ing you for reference.

Simplify `embed` in favor of `genembed` for any complicated stuff

Remove DelayEmbeddings.DelayEmbedding in favor of using generalized embedding for all kind of low-level stuff. Similarly, make embed not accept vector-valued τ, in favor of genembed.

setindex! neither errors nor does its job

Either we should properly implement this or have it error.
The current behaviour is clearly suboptimal.

julia> using DelayEmbeddings

julia> d = Dataset(rand(5,3))
3-dimensional Dataset{Float64} with 5 points
 0.43156   0.979782  0.649049
 0.451744  0.649936  0.383886
 0.328286  0.199782  0.36327
 0.450927  0.799212  0.628991
 0.677224  0.199356  0.233859

julia> d[:, 1] .= 0
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

julia> d
3-dimensional Dataset{Float64} with 5 points
 0.43156   0.979782  0.649049
 0.451744  0.649936  0.383886
 0.328286  0.199782  0.36327
 0.450927  0.799212  0.628991
 0.677224  0.199356  0.233859

The current bug appears to stem from the fact that the method is not implemented
and the fallback relies on Base.dotview.
However

julia> Base.dotview(d, :, 1)
5-element Vector{Float64}:
 0.43155961790952047
 0.45174350165887
 0.3282862209053605
 0.4509268888889679
 0.6772237874813112

this is clearly not a view....

Multivariate Singular Spectrum Analysis in DelayEmbeddings

@Datseris I am following your progress with DelayEmbedding and all. Can I suggest that you also bring Broomhead and King and Multivariate Singular Spectral Analysis into the Embedding methods as it deals nicely with noise and rank with respect to other embedding methods (http://www.spectraworks.com/Help/mssatheory.html). There is almost nothing on MSSA in julia. Some tutorials in Matlab are available here: https://www.mathworks.com/matlabcentral/fileexchange/58968-multichannel-singular-spectrum-analysis-beginners-guide and you have the setup already in place. ToeplitzMatrices.jl and BlockSparseMatrices.jl could be of help.

An obvious extension to reconstruct is to also include MSSA.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Improve the estimation of first minimum

In the function estimate_delay we currently estimate the first minimum (of either autocor or mutualinformation) using the direct result and a distance of a single index (i.e. when we find a value so that x[i-1] > x[i] < x[i+1] then the index i is the index of the first minimum):

DelayEmbeddings.jl/src/estimate_delay.jl

Lines 77 to 87 in 4bcb325

    
           function mincrossing(c, τs) 
        
               i = 1 
        
               while c[i+1] < c[i] 
        
                   i+= 1 
        
                   if i == length(c)-1 
        
                       @warn "Did not encounter a minimum, returning last `τ`." 
        
                       return τs[end] 
        
                   end 
        
               end 
        
               return τs[i] 
        
           end

This is not optimal for data that contain even the slightest bit of noise, and the computation of the mutual information is not perfectly smooth. It is better to first smoothen the autocorrelation or mutual information, by e.g. convolving with a Gaussian kernel. From an API perspective this is simple: add a keyword smoothen = .... If it is nothing, then no smoothing is done, (current behavior) otherwise it does gaussian convolution.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

Binning problem in mutualinformation?

Very rarely this can happen:

using DynamicalSystems
ds = Systems.gissinger(ones(3)) # 3D continuous chaotic system, also shown in orbit diagrams tutorial
dt = 0.05
data = trajectory(ds, 1000.0, dt = dt)
s = data[:, 1]

τ = estimate_delay(s, "mi_min", 0:1:400)

ERROR: BoundsError: attempt to access 98-element Array{Float64,1} at index [99]
Stacktrace:
 [1] getindex at .\array.jl:729 [inlined]
 [2] _frequencies!(::Array{Float64,1}, ::Array{Float64,1}, ::Array{Float64,1}) at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:269
 [3] _mutualinfo!(::Array{Float64,1}, ::Array{Float64,1}, ::Array{Int64,1}, ::Array{Float64,1}) at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:240
 [4] #mutualinformation#56(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Array{Float64,1}, ::StepRange{Int64,Int64}) at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:207
 [5] mutualinformation at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:197 [inlined]
 [6] #estimate_delay#39(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Array{Float64,1}, ::String, ::StepRange{Int64,Int64}) at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:50
 [7] estimate_delay(::Array{Float64,1}, ::String, ::StepRange{Int64,Int64}) at C:\Users\datseris\.julia\dev\DelayEmbeddings\src\estimate_delay.jl:33
 [8] top-level scope at none:0

Increasing the total integration time seems to fix it, but I do not know where the bug is coming from.

[FR] Ragwitz criterion for embedding time

This work: https://journals.aps.org/pre/abstract/10.1103/PhysRevE.65.056201

introduces a method to estimate delay times for embedding, so that the "timeseries prediction" is optimized. @hkraemer have you seen this before? It scans pairs of $d, τ$ to find the pair that minimizes the nonlinear timeseries prediction error.

Tests with Rössler system failing without relevant changes in code

78b476d passed the CI tests, and cb5f07c (PR #19) fails, although in my local machine it does not. There must be some change in other packages that I have not updated that cause the failure.

How to add exp. decay method ?

So, we used to have an option to compute optimal delay time by fitting an exponential decay to the correlation function and returning the decay time.

The problem is that this uses LsqFit which is a heavy dependency that uses OptimBase and NLSolve base.

One approach is to move this method to ChaosTools. Change the method identification from string to singeton types. e.g.

abstract type DelayEstimation end
struct FirstMin <: DelayEstimation end

etc. and then this allows ChaosTools to extend the methods in a Julia way. Now the only bad thing with this is that a new users may not find it intuitive.

Add Regularize function for dataset

Would be nice to have a function regularize(d::Dataset) that returns a new dataset where each column has its mean subtracted and is divided by its standard deviation.

	function mincrossing(c, τs)
	i = 1
	while c[i+1] < c[i]
	i+= 1
	if i == length(c)-1
	@warn "Did not encounter a minimum, returning last `τ`."
	return τs[end]
	end
	end
	return τs[i]
	end

juliadynamics / delayembeddings.jl Goto Github PK

delayembeddings.jl's Introduction

DelayEmbeddings.jl

delayembeddings.jl's People

Contributors

Stargazers

Watchers

Forkers

delayembeddings.jl's Issues

Issue description

Recommend Projects

Recommend Topics

Recommend Org