Giter Site home page Giter Site logo

Comments (9)

DominiqueMakowski avatar DominiqueMakowski commented on May 20, 2024 1

If I remember, I think that's because sample entropy compares a time-delayed embedding to its m+1 version, for which the last row is unavailable (because of the bigger dimension), hence we remove it from the first so that it matches in terms of array shape.

Hope that helps :)

Regarding the homogenization, yeah I'd definitely agree that its a good idea (the discrepancies across the implementations kinda were the starting point for NeuroKit's version 😅 )

from antropy.

FirefoxMetzger avatar FirefoxMetzger commented on May 20, 2024 1

@raphaelvallat Thanks for the update :) I won't get a chance to work on it this week, but should be able to get it going in the coming one.

from antropy.

raphaelvallat avatar raphaelvallat commented on May 20, 2024

Hi @FirefoxMetzger,

Thanks for opening the issue. This sounds great. Two questions:

  • Would adding your C implementation to antropy require any new dependencies?
  • Have you validated that the output is the same as antropy and/or MNE-features?

Assuming that the answer to these questions is No and Yes, then please do feel free to work on a PR!

Thanks,
Raphael

from antropy.

FirefoxMetzger avatar FirefoxMetzger commented on May 20, 2024

Would adding your C implementation to antropy require any new dependencies?

No additional dependencies beyond numpy. That said, I would implement this in numba for this repo since we currently don't have a pipeline to build and distribute C code. The main novelty in my version of the kernel is that I use a different set of tricks to count matching windows which is more efficient in reusing intermediate results and thus reduces the constant overhead. This improvement should also be visible when implementing in numba instead of pure C.

Have you validated that the output is the same as antropy and/or MNE-features?

I just compared it to antropy and, unfortunately, it produces different results. That said, it does produce the same result as my naive (but easy to read) reference implementation, which I used to test my algorithm against:

from local._lib.transforms import sample_entropy as c_sampleEn
from antropy.entropy import _numba_sampen, _app_samp_entropy
import numpy as np
from math import log


def reference_implementation(sequence, window_size, tolerance):
    """Easy to read but slow reference implementation."""
    sequence = np.asarray(sequence)
    size = sequence.size

    numerator = 0
    for idxA in range(size - (window_size + 1) + 1):
        for idxB in range(size - (window_size + 1) + 1):
            if idxA == idxB:
                continue

            winA = sequence[idxA : (idxA + (window_size + 1))]
            winB = sequence[idxB : (idxB + (window_size + 1))]

            if np.max(np.abs(winA - winB)) <= tolerance:
                numerator += 1

    denominator = 0
    for idxA in range(size - window_size + 1):
        for idxB in range(size - window_size + 1):
            if idxA == idxB:
                continue

            winA = sequence[idxA : (idxA + window_size)]
            winB = sequence[idxB : (idxB + window_size)]

            if np.max(np.abs(winA - winB)) <= tolerance:
                denominator += 1

    return -log(numerator / denominator)


def mne_sampleEn(data, window):
    phi = _app_samp_entropy(data, order=window, approximate=False)
    return -np.log(np.divide(phi[1], phi[0]))

def antropy_sampleEn(data, window):
    return _numba_sampen(data, order=window, r=(0.2 * np.std(data)))


# compile numba
data = np.random.randint(0, 50, 10).astype(float)
antropy_sampleEn(data, 2)

test_sequence = np.array([0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,], dtype=float)
test_std = np.std(test_sequence)

reference_value = reference_implementation(test_sequence, 2, .2*test_std)
c_result = c_sampleEn(test_sequence, 2, .2*test_std)
antropy_result = antropy_sampleEn(test_sequence, test_std)
mne_result = mne_sampleEn(test_sequence, 2)

np.testing.assert_almost_equal(c_result, reference_value)
np.testing.assert_allclose(c_result, antropy_result)

@raphaelvallat Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?

from antropy.

raphaelvallat avatar raphaelvallat commented on May 20, 2024

This improvement should also be visible when implementing in numba instead of pure C.

Got it.

Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?

I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.

Thanks again,
Raphael

from antropy.

FirefoxMetzger avatar FirefoxMetzger commented on May 20, 2024

I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.

Nice, thanks!

Just to avoid confusion, I'm talking about the code from this comment: #22 (comment) . It tries to be a 1:1 implementation of the definition so that a human can quickly find any deviations. Once we have that right, I can use fuzzy testing to compare it with my optimized (and less readable) implementation to see if it does the right thing, too.

For reference: One difference that I found is that my code uses distance(templateA, templateB) <= r whereas yours uses distance(templateA; templateB) < r. Wikipedia uses either; <= for the example code and < for the mathematical definition. Unfortunately though, this is not the only difference as results still differ after adjusting that.

from antropy.

FirefoxMetzger avatar FirefoxMetzger commented on May 20, 2024

@raphaelvallat I think antropy's current implementation is off by one.

(For reference, the algorithm works by counting all pairs of windows/templates (except the self-match) for which the Chebyshev distance between the windows is smaller than r. We do this for two window sizes (order and order+1), which antropy's implementation tracks in variables a and b. a is the number of order+1-length pairs and b is the number of order-length pairs.)

Consider a window/template size of order=2 and the example sequence: [1., 7., 0., 8., 8., 0., 9., 2., 8., 8.], which has r=.2*std(sequence)=.496.... A value of r=.49... < 1 means that we only count a pair if it is an exact match.

If we now compute a for the sequence above, we look at all pairs of size 3 windows into the array and count the number of pairs that are identical:

# all size 3 windows
array([[1., 7., 0.],
       [7., 0., 8.],
       [0., 8., 8.],
       [8., 8., 0.],
       [8., 0., 9.],
       [0., 9., 2.],
       [9., 2., 8.],
       [2., 8., 8.]])

No two rows are identical, and hence a = 0. This works correctly across all implementations (antropy, reference, and my C kernel). If we then compute b we do the same, but for all size 2 windows into the array:

# all size 2 windows
array([[1., 7.],
       [7., 0.],
       [0., 8.],
       [8., 8.],
       [8., 0.],
       [0., 9.],
       [9., 2.],
       [2., 8.],
       [8., 8.]])

We find a match for the pair [3, :] and [8, :] (with value (8,8)) and one for it's inverse. The optimized implementation of antropy doesn't count inverse pairs (they always exist, so we count it once and multiply by 2 in the end), which means we would expect it to produce a=1. However, it currently produces a=0 (whereas my reference and my C kernel produce a=1).

I've tried this for different sequences, and whenever there is a mismatch between my implementation and antropy's it is because of matches with the last order-length window which causes a difference in the count of b.


Looking into antropy's wrapper around MNE-features, I noticed that it explicitly excludes the last window before computing order-length pairs:

(note that approximate=True is hardcoded for sample entropy)

antropy/antropy/entropy.py

Lines 384 to 389 in dc84ed1

# compute phi(order, r)
_emb_data1 = _embed(x, order, 1)
if approximate:
emb_data1 = _emb_data1
else:
emb_data1 = _emb_data1[:-1]

We can use MNE-feature's implementation to compute a and b: count1 is the number of order-length pairs including the self match and inverse matches. Similarly for count2. We can use this to work out MNE's counts for a and b via:

a = (count1 - 1) / 2
b = (count2 - 1) / 2

If we do that then this confirms the difference explained above: If we exclude the last window, we get antropy's counts of order-length windows (b), if we don't we get my reference implementation's count of order-length windows.


I'm actually not sure which approach is correct. Do we have to exclude the last window in our computation? If so, why?

I think we shouldn't, but I do see the argument for "if we do then we have the same number of windows for a and b". Is this correct/wise though?


Edit: I have excluded counts for the last order-length window and now my implementation(s) match the results produced by antropy.

from antropy.

raphaelvallat avatar raphaelvallat commented on May 20, 2024

Hey @FirefoxMetzger — Thanks for the deep dive into this, and apologies about my delayed response.

To be honest, I am not sure whether the last window should be excluded or not. Looking at the neurokit2 implementation, I see that the last window is also explicitly excluded:

https://github.com/neuropsychology/NeuroKit/blob/8c564938c5fc2b4a377fd89bb14ac07a430983b9/neurokit2/complexity/utils.py#L106-L107

Perhaps @DominiqueMakowski might be able to chime in on this issue?

FYI I have also ran some tests and for larger arrays (N>100), the exclusion/inclusion of the last window has very little impact on the final entropy value.

PS2: neurokit2 uses r = 0.2 * np.std(x, ddof=1) while antropy uses ddof=1. Makes little difference on the output but I think it would be good to homogenize the implementations. I'm happy to switch to ddof=1.

from antropy.

raphaelvallat avatar raphaelvallat commented on May 20, 2024

Thanks @DominiqueMakowski!

@FirefoxMetzger I think that for now we can keep the removal of the last row, feel free to submit a PR with your improved numba implementation!

from antropy.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.