Comments (9)
If I remember, I think that's because sample entropy compares a time-delayed embedding to its m+1 version, for which the last row is unavailable (because of the bigger dimension), hence we remove it from the first so that it matches in terms of array shape.
Hope that helps :)
Regarding the homogenization, yeah I'd definitely agree that its a good idea (the discrepancies across the implementations kinda were the starting point for NeuroKit's version 😅 )
from antropy.
@raphaelvallat Thanks for the update :) I won't get a chance to work on it this week, but should be able to get it going in the coming one.
from antropy.
Hi @FirefoxMetzger,
Thanks for opening the issue. This sounds great. Two questions:
- Would adding your C implementation to antropy require any new dependencies?
- Have you validated that the output is the same as antropy and/or MNE-features?
Assuming that the answer to these questions is No and Yes, then please do feel free to work on a PR!
Thanks,
Raphael
from antropy.
Would adding your C implementation to antropy require any new dependencies?
No additional dependencies beyond numpy. That said, I would implement this in numba for this repo since we currently don't have a pipeline to build and distribute C code. The main novelty in my version of the kernel is that I use a different set of tricks to count matching windows which is more efficient in reusing intermediate results and thus reduces the constant overhead. This improvement should also be visible when implementing in numba instead of pure C.
Have you validated that the output is the same as antropy and/or MNE-features?
I just compared it to antropy and, unfortunately, it produces different results. That said, it does produce the same result as my naive (but easy to read) reference implementation, which I used to test my algorithm against:
from local._lib.transforms import sample_entropy as c_sampleEn
from antropy.entropy import _numba_sampen, _app_samp_entropy
import numpy as np
from math import log
def reference_implementation(sequence, window_size, tolerance):
"""Easy to read but slow reference implementation."""
sequence = np.asarray(sequence)
size = sequence.size
numerator = 0
for idxA in range(size - (window_size + 1) + 1):
for idxB in range(size - (window_size + 1) + 1):
if idxA == idxB:
continue
winA = sequence[idxA : (idxA + (window_size + 1))]
winB = sequence[idxB : (idxB + (window_size + 1))]
if np.max(np.abs(winA - winB)) <= tolerance:
numerator += 1
denominator = 0
for idxA in range(size - window_size + 1):
for idxB in range(size - window_size + 1):
if idxA == idxB:
continue
winA = sequence[idxA : (idxA + window_size)]
winB = sequence[idxB : (idxB + window_size)]
if np.max(np.abs(winA - winB)) <= tolerance:
denominator += 1
return -log(numerator / denominator)
def mne_sampleEn(data, window):
phi = _app_samp_entropy(data, order=window, approximate=False)
return -np.log(np.divide(phi[1], phi[0]))
def antropy_sampleEn(data, window):
return _numba_sampen(data, order=window, r=(0.2 * np.std(data)))
# compile numba
data = np.random.randint(0, 50, 10).astype(float)
antropy_sampleEn(data, 2)
test_sequence = np.array([0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1,], dtype=float)
test_std = np.std(test_sequence)
reference_value = reference_implementation(test_sequence, 2, .2*test_std)
c_result = c_sampleEn(test_sequence, 2, .2*test_std)
antropy_result = antropy_sampleEn(test_sequence, test_std)
mne_result = mne_sampleEn(test_sequence, 2)
np.testing.assert_almost_equal(c_result, reference_value)
np.testing.assert_allclose(c_result, antropy_result)
@raphaelvallat Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?
from antropy.
This improvement should also be visible when implementing in numba instead of pure C.
Got it.
Could I trouble you to have a look at the reference implementation (code above) and see if you spot a mistake in it?
I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.
Thanks again,
Raphael
from antropy.
I am out of office and unavailable until mid-February so this is unfortunately going to have to wait, but yes I can dive into it after this date.
Nice, thanks!
Just to avoid confusion, I'm talking about the code from this comment: #22 (comment) . It tries to be a 1:1 implementation of the definition so that a human can quickly find any deviations. Once we have that right, I can use fuzzy testing to compare it with my optimized (and less readable) implementation to see if it does the right thing, too.
For reference: One difference that I found is that my code uses distance(templateA, templateB) <= r
whereas yours uses distance(templateA; templateB) < r
. Wikipedia uses either; <=
for the example code and <
for the mathematical definition. Unfortunately though, this is not the only difference as results still differ after adjusting that.
from antropy.
@raphaelvallat I think antropy's current implementation is off by one.
(For reference, the algorithm works by counting all pairs of windows/templates (except the self-match) for which the Chebyshev distance between the windows is smaller than r
. We do this for two window sizes (order
and order+1
), which antropy's implementation tracks in variables a
and b
. a
is the number of order+1
-length pairs and b
is the number of order
-length pairs.)
Consider a window/template size of order=2
and the example sequence: [1., 7., 0., 8., 8., 0., 9., 2., 8., 8.]
, which has r=.2*std(sequence)=.496...
. A value of r=.49... < 1
means that we only count a pair if it is an exact match.
If we now compute a
for the sequence above, we look at all pairs of size 3 windows into the array and count the number of pairs that are identical:
# all size 3 windows
array([[1., 7., 0.],
[7., 0., 8.],
[0., 8., 8.],
[8., 8., 0.],
[8., 0., 9.],
[0., 9., 2.],
[9., 2., 8.],
[2., 8., 8.]])
No two rows are identical, and hence a = 0
. This works correctly across all implementations (antropy, reference, and my C kernel). If we then compute b
we do the same, but for all size 2 windows into the array:
# all size 2 windows
array([[1., 7.],
[7., 0.],
[0., 8.],
[8., 8.],
[8., 0.],
[0., 9.],
[9., 2.],
[2., 8.],
[8., 8.]])
We find a match for the pair [3, :]
and [8, :]
(with value (8,8)
) and one for it's inverse. The optimized implementation of antropy doesn't count inverse pairs (they always exist, so we count it once and multiply by 2 in the end), which means we would expect it to produce a=1
. However, it currently produces a=0
(whereas my reference and my C kernel produce a=1
).
I've tried this for different sequences, and whenever there is a mismatch between my implementation and antropy's it is because of matches with the last order
-length window which causes a difference in the count of b
.
Looking into antropy's wrapper around MNE-features, I noticed that it explicitly excludes the last window before computing order
-length pairs:
(note that approximate=True
is hardcoded for sample entropy)
Lines 384 to 389 in dc84ed1
We can use MNE-feature's implementation to compute a
and b
: count1
is the number of order
-length pairs including the self match and inverse matches. Similarly for count2. We can use this to work out MNE's counts for a
and b
via:
a = (count1 - 1) / 2
b = (count2 - 1) / 2
If we do that then this confirms the difference explained above: If we exclude the last window, we get antropy's counts of order
-length windows (b
), if we don't we get my reference implementation's count of order
-length windows.
I'm actually not sure which approach is correct. Do we have to exclude the last window in our computation? If so, why?
I think we shouldn't, but I do see the argument for "if we do then we have the same number of windows for a
and b
". Is this correct/wise though?
Edit: I have excluded counts for the last order
-length window and now my implementation(s) match the results produced by antropy.
from antropy.
Hey @FirefoxMetzger — Thanks for the deep dive into this, and apologies about my delayed response.
To be honest, I am not sure whether the last window should be excluded or not. Looking at the neurokit2 implementation, I see that the last window is also explicitly excluded:
Perhaps @DominiqueMakowski might be able to chime in on this issue?
FYI I have also ran some tests and for larger arrays (N>100), the exclusion/inclusion of the last window has very little impact on the final entropy value.
PS2: neurokit2 uses r = 0.2 * np.std(x, ddof=1)
while antropy uses ddof=1
. Makes little difference on the output but I think it would be good to homogenize the implementations. I'm happy to switch to ddof=1
.
from antropy.
Thanks @DominiqueMakowski!
@FirefoxMetzger I think that for now we can keep the removal of the last row, feel free to submit a PR with your improved numba implementation!
from antropy.
Related Issues (18)
- RuntimeWarning in _xlogx when x has zero values HOT 4
- Different results of different SampEn implementations HOT 4
- modify the entropy function be able to compute vectorizly HOT 2
- Zero-crossings HOT 2
- conda-forge package HOT 3
- Allow users to pass signal in frequency domain in spectral entropy
- Errors in sample_entropy and higuchi_fd on a small vector HOT 3
- Multiscale Entropy (MSE) HOT 2
- sample_entropy not matching data types HOT 2
- Fix KDTree all_metrics HOT 2
- numba jit needs explicit nopython=True HOT 3
- Variable implementation of r e.g. SampEn or AppEn HOT 1
- how to explain the detrended_fluctuation result HOT 3
- Calculation in katz_fd seems to be incorrect HOT 1
- The most "generic" entropy measure HOT 3
- Speed up importing antropy HOT 4
- Error importing with 32-bit windows 7 HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from antropy.