Giter Site home page Giter Site logo

wannesm / dtaidistance Goto Github PK

View Code? Open in Web Editor NEW
1.1K 28.0 185.0 1.3 MB

Time series distances: Dynamic Time Warping (fast DTW implementation in C)

License: Other

Makefile 0.96% Python 51.65% C 38.51% Cython 8.89%
timeseries dtw clustering dynamic-time-warping distance-measure c python

dtaidistance's Introduction

PyPi Version Conda Version Documentation Status DOI

Time Series Distances

Library for time series distances (e.g. Dynamic Time Warping) used in the DTAI Research Group. The library offers a pure Python implementation and a fast implementation in C. The C implementation has only Cython as a dependency. It is compatible with Numpy and Pandas and implemented such that unnecessary data copy operations are avoided.

Documentation: http://dtaidistance.readthedocs.io

Example:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0.0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2)

Citing this work:

Wannes Meert, Kilian Hendrickx, Toon Van Craenendonck, Pieter Robberechts, Hendrik Blockeel & Jesse Davis.
DTAIDistance (Version v2). Zenodo.
http://doi.org/10.5281/zenodo.5901139

New in v2:

  • Numpy is now an optional dependency, also to compile the C library (only Cython is required).
  • Small optimizations throughout the C code to improve speed.
  • The consistent use of ssize_t instead of int allows for larger data structures on 64 bit machines and be more compatible with Numpy.
  • The parallelization is now implemented directly in C (included if OpenMP is installed).
  • The max_dist argument turned out to be similar to Silva and Batista's work on PrunedDTW [7]. The toolbox now implements a version that is equal to PrunedDTW since it prunes more partial distances. Additionally, a use_pruning argument is added to automatically set max_dist to the Euclidean distance, as suggested by Silva and Batista, to speed up the computation (a new method ub_euclidean is available).
  • Support in the C library for multi-dimensional sequences in the dtaidistance.dtw_ndim package.
  • DTW Barycenter Averaging for clustering (v2.2).
  • Subsequence search and local concurrences (v2.3).
  • Support for N-dimensional time series (v2.3.7).

Installation

$ pip install dtaidistance

or

$ conda install -c conda-forge dtaidistance

The pip installation requires Numpy as a dependency to compile Numpy-compatible C code (using Cython). However, this dependency is optional and can be removed.

The source code is available at github.com/wannesm/dtaidistance.

If you encounter any problems during compilation (e.g. the C-based implementation or OpenMP is not available), see the documentation for more options.

Usage

Dynamic Time Warping (DTW) Distance Measure

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import numpy as np
s1 = np.array([0., 0, 1, 2, 1, 0, 1, 0, 0, 2, 1, 0, 0])
s2 = np.array([0., 1, 2, 3, 1, 0, 0, 0, 2, 1, 0, 0, 0])
path = dtw.warping_path(s1, s2)
dtwvis.plot_warping(s1, s2, path, filename="warp.png")

Dynamic Time Warping (DTW) Example

DTW Distance Measure Between Two Series

Only the distance measure based on two sequences of numbers:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance = dtw.distance(s1, s2)
print(distance)

The fastest version (30-300 times) uses c directly but requires an array as input (with the double type), and (optionally) also prunes computations by setting max_dist to the Euclidean upper bound:

from dtaidistance import dtw
import array
s1 = array.array('d',[0, 0, 1, 2, 1, 0, 1, 0, 0])
s2 = array.array('d',[0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Or you can use a numpy array (with dtype double or float):

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0])
d = dtw.distance_fast(s1, s2, use_pruning=True)

Check the __doc__ for information about the available arguments:

print(dtw.distance.__doc__)

A number of options are foreseen to early stop some paths the dynamic programming algorithm is exploring or tune the distance measure computation:

  • window: Only allow for shifts up to this amount away from the two diagonals.
  • max_dist: Stop if the returned distance measure will be larger than this value.
  • max_step: Do not allow steps larger than this value.
  • max_length_diff: Return infinity if difference in length of two series is larger.
  • penalty: Penalty to add if compression or expansion is applied (on top of the distance).
  • psi: Psi relaxation to ignore begin and/or end of sequences (for cylical sequences) [2].
  • use_pruning: Prune computations based on the Euclidean upper bound.

DTW Distance Measure all warping paths

If, next to the distance, you also want the full matrix to see all possible warping paths:

from dtaidistance import dtw
s1 = [0, 0, 1, 2, 1, 0, 1, 0, 0]
s2 = [0, 1, 2, 0, 0, 0, 0, 0, 0]
distance, paths = dtw.warping_paths(s1, s2)
print(distance)
print(paths)

The matrix with all warping paths can be visualised as follows:

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
import random
import numpy as np
x = np.arange(0, 20, .5)
s1 = np.sin(x)
s2 = np.sin(x - 1)
random.seed(1)
for idx in range(len(s2)):
    if random.random() < 0.05:
        s2[idx] += (random.random() - 0.5) / 2
d, paths = dtw.warping_paths(s1, s2, window=25, psi=2)
best_path = dtw.best_path(paths)
dtwvis.plot_warpingpaths(s1, s2, paths, best_path)

DTW Example

Notice the psi parameter that relaxes the matching at the beginning and end. In this example this results in a perfect match even though the sine waves are slightly shifted.

DTW Distance Measures Between Set of Series

To compute the DTW distance measures between all sequences in a list of sequences, use the method dtw.distance_matrix. You can set variables to use more or less c code (use_c and use_nogil) and parallel or serial execution (parallel).

The distance_matrix method expects a list of lists/arrays:

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

or a matrix (in case all series have the same length):

from dtaidistance import dtw
import numpy as np
series = np.matrix([
    [0.0, 0, 1, 2, 1, 0, 1, 0, 0],
    [0.0, 1, 2, 0, 0, 0, 0, 0, 0],
    [0.0, 0, 1, 2, 1, 0, 0, 0, 0]])
ds = dtw.distance_matrix_fast(series)

DTW Distance Measures Between Set of Series, limited to block

You can instruct the computation to only fill part of the distance measures matrix. For example to distribute the computations over multiple nodes, or to only compare source series to target series.

from dtaidistance import dtw
import numpy as np
series = np.matrix([
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1],
     [0., 0, 1, 2, 1, 0, 1, 0, 0],
     [0., 1, 2, 0, 0, 0, 0, 0, 0],
     [1., 2, 0, 0, 0, 0, 0, 1, 1]])
ds = dtw.distance_matrix_fast(series, block=((1, 4), (3, 5)))

The output in this case will be:

#  0     1    2    3       4       5
[[ inf   inf  inf     inf     inf  inf]    # 0
 [ inf   inf  inf  1.4142  0.0000  inf]    # 1
 [ inf   inf  inf  2.2360  1.7320  inf]    # 2
 [ inf   inf  inf     inf  1.4142  inf]    # 3
 [ inf   inf  inf     inf     inf  inf]    # 4
 [ inf   inf  inf     inf     inf  inf]]   # 5

Clustering

A distance matrix can be used for time series clustering. You can use existing methods such as scipy.cluster.hierarchy.linkage or one of two included clustering methods (the latter is a wrapper for the SciPy linkage method).

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)
# Augment Hierarchical object to keep track of the full tree
model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

For models that keep track of the full clustering tree (HierarchicalTree or LinkageTree), the tree can be visualised:

model.plot("myplot.png")

Dynamic Time Warping (DTW) hierarchical clusteringt

Dependencies

Optional:

Development:

Contact

References

  1. T. K. Vintsyuk, Speech discrimination by dynamic programming. Kibernetika, 4:81–88, 1968.
  2. H. Sakoe and S. Chiba, Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, 26(1):43–49, 1978.
  3. C. S. Myers and L. R. Rabiner, A comparative study of several dynamic time-warping algorithms for connected-word recognition. The Bell System Technical Journal, 60(7):1389–1409, Sept 1981.
  4. Mueen, A and Keogh, E, Extracting Optimal Performance from Dynamic Time Warping, Tutorial, KDD 2016
  5. D. F. Silva, G. E. A. P. A. Batista, and E. Keogh. On the effect of endpoints on dynamic time warping, In SIGKDD Workshop on Mining and Learning from Time Series, II. Association for Computing Machinery-ACM, 2016.
  6. C. Yanping, K. Eamonn, H. Bing, B. Nurjahan, B. Anthony, M. Abdullah and B. Gustavo. The UCR Time Series Classification Archive, 2015.
  7. D. F. Silva and G. E. Batista. Speeding up all-pairwise dynamic time warping matrix calculation, In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 837–845. SIAM, 2016.

License

DTAI distance code.

Copyright 2016-2022 KU Leuven, DTAI Research Group

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

dtaidistance's People

Contributors

aras-y avatar baael avatar ericmjl avatar m-rossi avatar macrocosme avatar pidgeyusedgust avatar probberechts avatar rietesh avatar shamazharikh avatar ssporrer-dlr avatar toddrme2178 avatar toon-vc avatar wannesm avatar wusai2333 avatar yasirroni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dtaidistance's Issues

PrepReadme and MySDistCommand issues in setup.py

I just attempted to install dtaidistance once again on a Windows 10 machine and ran into the issue that the variables "PrepReadme " and "MySDistCommand" could not be found. Thus, I simply removed the following lines (line 282 and line 283) from setup.py:

        'readme': PrepReadme,
        'sdist': MySDistCommand,

After doing so, I was able to install dtaidistance from source. Could you please have a look at this issue and adjust the file accordingly? Thanks a lot.

problem on import dtaidistance

when I tried to import dtaidistance I got this error
Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 1, in
import dtaidistance as dtw

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/init.py", line 19, in
from . import dtw

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/dtw.py", line 17, in
from .util import SeriesContainer, dtaidistance_dir

File "/opt/workbench/users/jzicker/.local/lib/python3.5/site-packages/dtaidistance/util.py", line 36
logger.debug(f"Using directory: {directory}")
^
SyntaxError: invalid syntax

dtw.distance_fast() return best_path

Is it possible for the C implementation of dtw distance to return not only the final distance but also the best_path or all paths? Because besides having the final distance, I would be interested in analysing the individual distances over time. Would that be possible?

Link error with pip install from PyPI

Generating code Finished generating code LINK : fatal error LNK1158: cannot run 'rc.exe' error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio 14.0\\VC\\BIN\\x86_amd64\\link.exe' failed with exit status 1158

Cannot install this package due to this error when building the wheel. Do you know how to fix it?

OverflowError: Python int too large to convert to C long

First of all: Thank you very much for creating this nice piece of software! Unfortunately, I have an issue running your sample code.

Whenever I try to run the following code:

    series = [np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
    ds = dtw.distance_matrix_fast(series)
    print(ds)

I'm greeted with this error message:
OverflowError: Python int too large to convert to C long

Any help would be highly appreciated.

compact only keyword for dtw_distance, not dtw_distance_fast

In the docs on page "Dynamic Time Warping (DTW", in the section entitled "DTW between set of series", after giving an example using dtw.distance_matrix_fast, the doc says:

"This behaviour can be deactivated by setting the argument compact to true," implying that compact can be set to true in the previous example, which is of distance_matrix_fast.

However, compact is only a keyword in dtw.distance_matrix, and not in the distance_matrix_fast. If you try setting compact to True in distance_matrix_fast, you will get an error. So as not to be misleading, I suggest clarifying this in the doc.

Memory Error

I am getting a memory error when trying to calculate DTW distances for a numpy matrix of 2.2 million series with 157 time points. I have 400 GB of RAM available, with < 10% of utilization, and immediately get the error when trying to run dtw.distance_matrix_fast. Here is the traceback:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-64-51fa732ba330> in <module>
      2 dists = dtw.distance_matrix_fast(df_numpy, window=15,
      3                                  parallel=True,
----> 4                                  show_progress=True)
      5 dists[dists == np.inf] = 0
      6 dists = dists + dists.T - np.diag(np.diag(dists))

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix_fast(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, show_progress)
    449                            window=window, max_step=max_step, penalty=penalty, psi=psi,
    450                            block=block, parallel=parallel,
--> 451                            use_c=True, use_nogil=True, show_progress=show_progress)
    452 
    453 

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw.py in distance_matrix(s, max_dist, max_length_diff, window, max_step, penalty, psi, block, parallel, use_c, use_nogil, show_progress)
    367         if parallel:
    368             logger.info("Use parallel computation")
--> 369             dists = dtw_c.distance_matrix_nogil_p(s, **dist_opts)
    370         else:
    371             logger.info("Use serial computation")

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil_p()

~/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/dtaidistance/dtw_c.pyx in dtaidistance.dtw_c.distance_matrix_nogil()

MemoryError: 

Any suggestions? Is it not feasible to run DTW on a dataset this large?

How exactly does dtaidistance keep track of labels?

Hi there! I am using your library for my master's thesis. Well, at least I am trying to. I almost have it done, I browsed the documentation, the source code and the closed issues - but I can't seem to find the solution to these three problems.

  1. How does your library keep track of the labels??

I want to make dead sure the labels are matching the time series ID that I have. That's because the time series have pre-defined groups and I am checking whether the groups match the clusters. I did some manual checks, and it looks like I get it, but since I am posting anyway figured it's better to make double sure.

  1. Is there a possibility to color the time series line charts by label?

Let's say I have 1000 time series and they all have a label: A, B or C. I would like A series to be red, B to be blue, C to be green.

  1. Can I extract a list of labels at each cluster?

I would like to know which time series are grouped together, and at what level of clustering certain attributes of my time-series make them cluster together.

Here is my code:

#Thats the database, the values I am clustering on is a floating point in NDVI column
database = pd.read_excel(r"data.xlsx")
data=database[['NDVI', 'Data', 'ID', 'TextAttribute']].dropna()
data.sort_values(by=['ID','Data'], ascending=True)

#Labels
duplicates = data.drop_duplicates(subset=['ID','TextAttribute'])
labels = duplicates['PartID'].tolist()

values = data.groupby('ID')['NDVI'].apply(lambda x: x.to_numpy())
series = [x.astype(np.double) for x in L]

#The clustering
model = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model.fit(series)

#Plotting
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(20, 250), dpi=300)
model.plot(filename='rawdata_all_dtwvis_id.pdf', axes=ax, show_ts_label=labels,
           show_tr_label=True, ts_label_margin=-35,
           ts_left_margin=30)

ValueError: negative dimensions are not allowed

We are getting a weird error using this library that I cannot debug. I have tried to set the logging level to debug and seeing what I can find, but no luck.

INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C (parallel=True)
Traceback (most recent call last):
  File "rt/clustering/rt_cluster_trainer.py", line 250, in <module>
    clustering.run()
  File "rt/clustering/rt_cluster_trainer.py", line 183, in run
    model = cluster_library.train()
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 241, in train
    models = self.perform_clustering(self.buckets, self.kvalues, self.linkage_method)
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 208, in perform_clustering
    y = self.calculate_distance_matrix()
  File "/opt/grok/grok3/domain/clustering/hierarchical_clustering.py", line 199, in calculate_distance_matrix
    y = dtw.distance_matrix_fast(self.frequency_counts, window=self.dtw_radius, compact=True)
  File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 548, in distance_matrix_fast
    use_c=True, use_nogil=True, show_progress=False)
  File "/opt/grok/ve3/lib/python3.7/site-packages/dtaidistance/dtw.py", line 416, in distance_matrix
    dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
  File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
  File "dtaidistance/dtw_c.pyx", line 657, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
ValueError: negative dimensions are not allowed

We are using the library exactly like the documentation

from dtaidistance import dtw
import numpy as np
series = [
    np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double),
    np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0]),
    np.array([0.0, 0, 1, 2, 1, 0, 0, 0])]
ds = dtw.distance_matrix_fast(series)

Our distance matrix is huge, so I cannot post it here, but I have written some code that tests each timeseries fed into the algorithm and they are all the same shape (307,), there are no negative values, there are no NaN.

Can someone shed some light on what this error actually means? There is nothing wrong with our data as far as we know...If we take random samples of the data, say only 50% of it, excluding many timeseries, it seems to work fine. Is there a single timeseries numpy array breaking it?

The 'parallel' and 'show_progress' parameter didn't work in dtaidistance.dtw.distance_matrix_fast

According to the MODULES document, parallel and show_progress should work on the c base distance matrix module,

dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)

However, I found that the both parallel and show_progress parameter do not work in dtaidistance.dtw.distance_matrix_fast module.

s = [
    np.array([10., 10, 10, 8, 10, 8, 8, 10, 8]),
    np.array([8., 10, 8, 8, 10, 8]),
    np.array([8., 2, 0, 0, 0, 0, 0, 1, 1]),
    np.array([8., 2, 0, 0, 0, 0, 0, 0, 0]),
    ...
    np.array([9., 0, 1, 2, 1, 0, 1, 0, 9]),
    np.array([0., 0, 0, 8, 10, 8, 8, 10, 8]),
    np.array([1., 2, 0, 0, 0, 0, 0, 1, 1]),
    np.array([1., 2, 0, 0, 0, 0, 0, 1, 3])
    ]

tic = time.clock()
dtw.distance_matrix(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic

tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=True)
toc = time.clock()
toc - tic

tic = time.clock()
dtw.distance_matrix_fast(s,compact=True,psi=1,show_progress=True,parallel=False)
toc = time.clock()
toc - tic

0%| | 0/100 [00:00<?, ?it/s]
100%|██████████| 100/100 [00:00<00:00, 512.83it/s]
900.014613522851514509

1.02014512547413254414

1.00014066557152784057

The progress bar didn't show up in fast module and the parallel parameter seem doesn't function properly as well. I believe there are some bug in the c code.

I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.

~Thank you.

Need to install with pip3, otherwise error

Hi,

When I tried installing with pip I got the following error:

line 263, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function

I tried reinstalling with pip3 and it worked.

Please edit the install doc to "pip3."

Thx.
~Jon

different length of numpy array plot support

Hi experts, my code failed with error message
"ValueError: operands could not be broadcast together with shapes (3,) (9,)"

Is there any workaround ?

from dtaidistance import dtw
import numpy as np
series = np.array([
     np.array([1, 2, 1]),
     np.array([0., 1, 2, 0, 0, 0, 0, 0, 0]),
     np.array([1., 2, 0, 0, 0, 0, 0, 1, 1, 3, 4, 5]),
     np.array([0., 0, 1, 2, 1, 0, 1]),
     np.array([0., 1, 2, 0, 0, 0, 0, 0]),
     np.array([1., 2, 0, 0, 0, 0, 0, 1, 1])])
ds = dtw.distance_matrix(series2)
print(ds)

model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model.fit(series)
print(cluster_idx)
model.plot(img_path+"model-test.png", show_ts_label=True)

Best regards,
Keita

ValueError: Buffer dtype mismatch, expected 'double' but got 'short'

I'm trying to run your DTW implementation on data with values in the range of -2000 to 2000. If I normalize the data to a certain (usually much smaller) range first, I have no issues. However, whenever I attempt to run your code on the raw (i.e. unnormalized) data, I end up with the following message:

ValueError: Buffer dtype mismatch, expected 'double' but got 'short'

The trace back is as follows:

Traceback (most recent call last):
File "D:\PathToMyProject\testing.py", line 313, in myDTWFunction
myDistance = dtw.distance(a,b,use_c=True,window=w,max_dist=minimumDistance)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 84, in distance
psi=psi)
File "D:\PathToMyAnaconda\lib\site-packages\dtaidistance-1.1.3-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 196, in distance_fast
psi=psi)
File "dtaidistance\dtw_c.pyx", line 129, in dtaidistance.dtw_c.distance_nogil

I thought this might be an issue with some old code and I tried to recompile the most recent code from your repository, but I didn't succeed yet due to this issue.

P.S: I run the code on a Windows 10 machine.

customized labelling of tree nodes

Hi your package is very interesting to me and I would like to use it to visualise my own data.
I would like to plot a dendrogram with each of the node leaves labelled according to an external label.

However I cannot seem to find a way to do this....

import numpy as np
import pandas as pd
dataframe = pd.read_csv('sample.csv',header=None)
dataframe.head(6)

                                                                           label for nodes
0  chr13_110718378-110719378.txt    _2.441430_  207.521542  163.575804   ......    2
1     chr2_96278196-96279196.txt   43.223219  242.530287  168.090298     ......    1
2   chr4_140084844-140085844.txt  237.444590  155.823012  249.811496  ......    3
3    chr10_71267774-71268774.txt  232.878508  139.246943  225.676080   .......   3
4    chr14_86309018-86310018.txt  131.655232  248.406099   67.069647   .......    2
5     chr3_97076527-97077527.txt  129.814476    0.000000  204.337600   ........     1

df=dataframe.values
labels=df[:,-1] #labels variable
df=df[:,:-1]
df= df[:,1:]

from dtaidistance import dtw
from dtaidistance import dtw_visualisation as dtwvis
from dtaidistance import clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
model2 = clustering.HierarchicalTree(model1)
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(np.matrix(df,dtype=np.double))

Hence my question is if there is a way to label the tree nodes and still have the nice plot formats that your package produces?

test

C library is not available (Not handled yet)

Hi,
I have already installed python: 3.7.1 with numpy: 1.16.1, Cython: 0.29.6, dtaidistance: 1.1.4. I have tried to install dtaidistance by pip, GitHub and also From Sources. But when I want to use "distance_matrix_fast", I get this error:
"The compiled dtaidistance C library is not available.
See the documentation for alternative installation options."

I am using Mac OS: 10:14.
and here is my log when I try to install dtaidistance by using pip:
Screen Shot 2019-04-07 at 5 46 26 PM

No tag for 1.2.3

1.2.3 was released but there is no tag, making it harder to download from github.

module' object is not callable

Dear Friend.

I have an error.

dist, cost, path = dtw(mfcc1.T, mfcc2.T)
print("The normalized distance between the two : ",dist)   # 0 for similar audios 

TypeError: 'module' object is not callable

Can you help .. to resolve?

How exactly is the distance added by a penalty calculated?

For example, if I started with

from dtaidistance import dtw
s1 = [1,2,3,4,1,2,3]
s2 = [2,3,4,1,2,3,4]
distance, paths = dtw.warping_paths(s1, s2,penalty=0)

The resulting distance was 1.4142135623730951.

When I set penalty = 10, the distance would be 3.872983346207417.

How did the distance go from 1.4142135623730951 to 3.872983346207417?
A paper or any other references would be very helpful.

Thanks!

Plotting fails with uneven-length time series

The linkage tree plot will fail, due to a bug in the module clustering.py, line 227. Numpy doesn't know what to do with series of uneven lengths when calculating the max.

The easiest solution is to concatenate the arrays first, and then continue:

all_y = np.concatenate(self.series)
max_y = max(all_y.max(), np.abs(all_y.min()))

Cannot import dtaidistance (indent error in code)

Getting this error. Seems like just a syntax thing

Traceback (most recent call last):

File "/opt/conda/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3291, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)

File "", line 70, in
from dtaidistance import dtw

File "/opt/conda/lib/python3.6/site-packages/dtaidistance/init.py", line 19, in
from . import dtw

File "/opt/conda/lib/python3.6/site-packages/dtaidistance/dtw.py", line 23
s from . import dtw_c
^
IndentationError: expected an indented block

The psi parameter not work in dtaidistance.dtw.distance_matrix_fast

According to the MODULES document, psi should work on the c base distance matrix module,

dtaidistance.dtw.distance_matrix_fast(s, max_dist=None, max_length_diff=None, window=None, max_step=None, penalty=None, psi=None, block=None, compact=False, parallel=True, show_progress=False)

However, I found that the psi parameter do not work on dtaidistance.dtw.distance_matrix_fast module.

s = [np.array([0., 0, 1, 2, 1, 0, 1, 0, 0]),
     np.array([9., 0, 1, 2, 1, 0, 1, 0, 9])]

dtw.distance_matrix(s,compact=True,psi=1)
dtw.distance_matrix_fast(s,compact=True,psi=1)

[0.]
[12.72792206]

And when I look into the dtw_c.pyx file, I didn't found the psi code included in distance_matrix_nogil function. I am not familiar with c so I maybe read the code wrongly.

I am using Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] and dtaidistance v1.2.

I really need this features, is there any workaround?
~Thank you.

Out of bounds on buffer access

Hi, my code works fine for smaller number of short (Nt=12) time series, N up to ~6000, but when I tried running my code for N= 350000 I'm getting an error you can see below. I run it on EC2 instance, so it's not a memory issue. Is there any hardcoded limit that could cause it?

model.fit(X_train)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/clustering.py", line 463, in fit
dists = self.dists_fun(self.series, **self.dists_options)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 547, in distance_matrix_fast
use_c=True, use_nogil=True, show_progress=show_progress)
File "/opt/conda/lib/python3.7/site-packages/dtaidistance/dtw.py", line 415, in distance_matrix
dists = dtw_c.distance_matrix_nogil(s, is_parallel=parallel, **dist_opts)
File "dtaidistance/dtw_c.pyx", line 586, in dtaidistance.dtw_c.distance_matrix_nogil
File "dtaidistance/dtw_c.pyx", line 668, in dtaidistance.dtw_c.distance_matrix_nogil_c_p
IndexError: Out of bounds on buffer access (axis 0)

How to solve C library not available

I installed this package and tried the fast implementation of the DTW but I get:

The compiled dtaidistance C library is not available.
See the documentation for alternative installation options

How can I rectify this? I have Cython installed and all other dependencies. The docs don't point to any other specific installations necessary. Thanks,

numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

I get the following error whenever I attempt to run DTW:

numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

The traceback looks as follows:

Traceback (most recent call last):
  File "D:\Projects\MyProject\Main.py", line 3, in <module>
    import Algorithms
  File "D:\Projects\MyProject\Algorithms.py", line 3, in <module>
    from dtaidistance import dtw, dtw_visualisation as dtwvis, clustering
  File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\__init__.py", line 19, in <module>
    from . import dtw
  File "D:\Programs\Anaconda\Anaconda\lib\site-packages\dtaidistance-1.1.4-py3.6-win-amd64.egg\dtaidistance\dtw.py", line 23, in <module>
    from . import dtw_c
  File "__init__.pxd", line 918, in init dtaidistance.dtw_c
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 216 from C header, got 192 from PyObject

The system I'm using is Windows 10.

P.S: I wrote before that all current errors are fixed for me but apparently I was still somehow using an older version. Sorry for the confusion.

Error in installation of dtaidistance

**----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/**

pip install dtaidistance
Collecting dtaidistance
Using cached https://files.pythonhosted.org/packages/75/ad/458d751a5d4842e3f7aa0ad6f79ee0219683d0b28ebc0d882f4106436a10/dtaidistance-1.1.4.tar.gz
Complete output from command python setup.py egg_info:
/home/lokesh/.local/lib/python2.7/site-packages/Cython/Compiler/Main.py:367: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-build-5FU_xZ/dtaidistance/dtaidistance/dtw_c.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
Compiling dtaidistance/dtw_c.pyx because it depends on /home/lokesh/.local/lib/python2.7/site-packages/Cython/Includes/numpy/init.pxd.
[1/1] Cythonizing dtaidistance/dtw_c.pyx
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-build-5FU_xZ/dtaidistance/setup.py", line 139, in
with open('dtaidistance/init.py', 'r', encoding='utf-8') as fd:
TypeError: 'encoding' is an invalid keyword argument for this function

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-5FU_xZ/dtaidistance/

There were problems running the Clustering sample code in jupyterlab

I hope you can help with the problems encountered in running the sample code for Clustering. Thanks!
image

model2.plot("hierarchy.png")

ValueError Traceback (most recent call last)
in
----> 1 model2.plot("hierarchy.png")

F:\Mysoftware\Anaconda3\lib\site-packages\dtaidistance\clustering.py in plot(self, filename, axes, ts_height, bottom_margin, top_margin, ts_left_margin, ts_sample_length, tr_label_margin, tr_left_margin, ts_label_margin, show_ts_label, show_tr_label, cmap, ts_color)
366 if isinstance(filename, Path):
367 filename = str(filename)
--> 368 plt.savefig(filename, bbox_inches='tight', pad_inches=0)
369 plt.close()
370 fig, ax = None, None

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\pyplot.py in savefig(*args, **kwargs)
714 def savefig(*args, **kwargs):
715 fig = gcf()
--> 716 res = fig.savefig(*args, **kwargs)
717 fig.canvas.draw_idle() # need this if 'transparent=True' to reset colors
718 return res

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in savefig(self, fname, transparent, **kwargs)
2178 self.patch.set_visible(frameon)
2179
-> 2180 self.canvas.print_figure(fname, **kwargs)
2181
2182 if frameon:

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2058 bbox_artists = kwargs.pop("bbox_extra_artists", None)
2059 bbox_inches = self.figure.get_tightbbox(renderer,
-> 2060 bbox_extra_artists=bbox_artists)
2061 pad = kwargs.pop("pad_inches", None)
2062 if pad is None:

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_tightbbox(self, renderer, bbox_extra_artists)
2359 bb = []
2360 if bbox_extra_artists is None:
-> 2361 artists = self.get_default_bbox_extra_artists()
2362 else:
2363 artists = bbox_extra_artists

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_default_bbox_extra_artists(self)
2330 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2331 # we don't want the figure's patch to influence the bbox calculation
-> 2332 bbox_artists.remove(self.patch)
2333 return bbox_artists
2334

ValueError: list.remove(x): x not in list

ValueError Traceback (most recent call last)
F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\formatters.py in call(self, obj)
339 pass
340 else:
--> 341 return printer(obj)
342 # Finally look for special method names
343 method = get_real_method(obj, self.print_method)

F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in (fig)
242
243 if 'png' in formats:
--> 244 png_formatter.for_type(Figure, lambda fig: print_figure(fig, 'png', **kwargs))
245 if 'retina' in formats or 'png2x' in formats:
246 png_formatter.for_type(Figure, lambda fig: retina_figure(fig, **kwargs))

F:\Mysoftware\Anaconda3\lib\site-packages\IPython\core\pylabtools.py in print_figure(fig, fmt, bbox_inches, **kwargs)
126
127 bytes_io = BytesIO()
--> 128 fig.canvas.print_figure(bytes_io, **kw)
129 data = bytes_io.getvalue()
130 if fmt == 'svg':

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\backend_bases.py in print_figure(self, filename, dpi, facecolor, edgecolor, orientation, format, bbox_inches, **kwargs)
2058 bbox_artists = kwargs.pop("bbox_extra_artists", None)
2059 bbox_inches = self.figure.get_tightbbox(renderer,
-> 2060 bbox_extra_artists=bbox_artists)
2061 pad = kwargs.pop("pad_inches", None)
2062 if pad is None:

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_tightbbox(self, renderer, bbox_extra_artists)
2359 bb = []
2360 if bbox_extra_artists is None:
-> 2361 artists = self.get_default_bbox_extra_artists()
2362 else:
2363 artists = bbox_extra_artists

F:\Mysoftware\Anaconda3\lib\site-packages\matplotlib\figure.py in get_default_bbox_extra_artists(self)
2330 bbox_artists.extend(ax.get_default_bbox_extra_artists())
2331 # we don't want the figure's patch to influence the bbox calculation
-> 2332 bbox_artists.remove(self.patch)
2333 return bbox_artists
2334

ValueError: list.remove(x): x not in list

Clustering example not running

Trying to run the example from the documentation:

# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2 = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

and getting an error for any of the proposed models:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-4bb92754361e> in <module>()
      2 model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
      3 # Keep track of full tree by using the HierarchicalTree wrapper class
----> 4 model2 = clustering.HierarchicalTree(model1)
      5 # You can also pass keyword arguments identical to instantiate a Hierarchical object
      6 model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})

/Users/fred/anaconda/lib/python2.7/site-packages/dtaidistance/clustering.pyc in __init__(self, model, **kwargs)
    388         else:
    389             self._model = model
--> 390         super().__init__(**kwargs)
    391         self._model.max_dist = np.inf
    392 

TypeError: super() takes at least 1 argument (0 given)

Memory freeing issue

I'm using Python 3.6 (with Anaconda) on a Windows 8.1 and a Windows 10 machine at work. I'm dealing with rather large data sets (roughly 10 data sets with 10,000 sequences consisting of 3,000 samples each). dtaidistance is working as expected but whenever I set the "use_c" parameter of the "distance" function, I can observe in the task-manager that the corresponding Python process slowly eats up all available RAM.

If I want to run datidistance on several data sets overnight, it regularly simply crashes after all my available RAM (roughly 32 GB) has been used.

Could it be possible that somehow the memory isn't freed again properly after it has been used?

ValueError: 'dtaidistance/dtw_c.pyx' doesn't match any files

I'm currently facing two issues. The first is that I'm unable to recompile the source code after updating the repository. Whenever I attempt to do this via

C:\pythoToMyPython\python.exe D:\pathToDtaidistance\dtaidistance\setup.py build_ext --inplace
on my Windows (10) machine, I get the following message:

Traceback (most recent call last):
File "D:\pathToDtaidistance\dtaidistance\setup.py", line 123, in
extra_link_args=extra_link_args)])
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 897, in cythonize
aliases=aliases)
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 777, in create_extension_list
for file in nonempty(sorted(extended_iglob(filepattern)), "'%s' doesn't match any files" % filepattern):
File "D:\PathToMyAnaconda\Anaconda\lib\site-packages\Cython\Build\Dependencies.py", line 102, in nonempty
raise ValueError(error_msg)
ValueError: 'dtaidistance/dtw_c.pyx' doesn't match any files

That's strange because my dtaidistance folder contains a file named "dtw_c.pyx".

For the second issue, I will open a separate ticket.

Parallel computation not work.

I'm running Ubuntu 18.04, and I compiled dtaidistance from GitHub. When logging info level, it's confirmed that the C libraries installed and parallel computation is enabled, but htop shows a single core running when running model = clustering.LinkageTree(dtw.distance_matrix_fast, {'parallel':True}). Any thoughts on why parallelization isn't working? Or how to debug it?

INFO:be.kuleuven.dtai.distance:Computing distances
INFO:be.kuleuven.dtai.distance:Compute distances in pure C
INFO:be.kuleuven.dtai.distance:Use parallel computation

Is there any C-implementation for ndim DTW?

Hi,
First, thanks for the great repo!

Maybe I missed, but there is any C-implementation for DTW in the n-dimensional case?

I found 'dtw_ndim.py' which handles ndim case, but 'dtw_c' (at line 22) are not used at all in this file.

(For the 1-dim case, I can see the C-implementation)

Thanks!

basic clustering example fails with error NoneType has no len()

Resolved. See suggestion for doc clarification below.

Original question:

Hi,

When I run this clustering example code provided in the docs:

from dtaidistance import dtw
import numpy as np
s1 = np.array([0, 0, 1, 2, 1, 0, 1, 0, 0], dtype=np.double)
s2 = np.array([0.0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0])
s3 = np.array([0.0, 0, 1, 2, 1, 0, 0, 0])
series = [s1, s2, s3]

from dtaidistance import clustering
# Custom Hierarchical clustering
model1 = clustering.Hierarchical(dtw.distance_matrix, {})
# Keep track of full tree by using the HierarchicalTree wrapper class
model2a = clustering.HierarchicalTree(model1)
# You can also pass keyword arguments identical to instantiate a Hierarchical object
model2b = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix, dists_options={})
# SciPy linkage clustering
model3 = clustering.LinkageTree(dtw.distance_matrix, {})
cluster_idx = model3.fit(series)

model2a.plot("hierarchy.png")

I get the following error traced back to clustering.py:

 line 220, in plot
    self._series_y = [0] * len(self.series)
TypeError: object of type 'NoneType' has no len()

I get the same error when I use distance_matrix_fast.

Can you help me?

SOLUTION:

The above example fit model3, but not model 2b which it seeks to plot. You need to call

var = model2a.fit(series)

then you can plot it.

Updating the doc accordingly would help others avoid this error.

Examples Error

In some provided examples there is a syntax error, where you need to replace the variable s with series.
For instance the following
ds = dtw.distance_matrix_fast(s)
should be
ds = dtw.distance_matrix_fast(series)

Why DTW distance are so different when using or not use_c flag

According to the documentation, when this flag is set to True only means that distance function use precompiled functions in C.
But when I check both values they are so different as we can check in the image bellow.

image

I assume that this is a wrong behavior, I´m wrong?
Could you tell me, if I'm wrong, what makes this difference?

Thanks in advance!

clustering does not work

I'm using your example but the result is: (None, None) Why?
series = np.matrix([
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1],
[0., 0, 1, 2, 1, 0, 1, 0, 0],
[0., 1, 2, 0, 0, 0, 0, 0, 0],
[1., 2, 0, 0, 0, 0, 0, 1, 1]])

from dtaidistance import clustering
from dtaidistance import dtw

Custom Hierarchical clustering

model1 = clustering.Hierarchical(dtw.distance_matrix_fast, {})
cluster_idx = model1.fit(series)

Augment Hierarchical object to keep track of the full tree

model2 = clustering.HierarchicalTree(model1)
cluster_idx = model2.fit(series)
model2 = clustering.HierarchicalTree(dists_fun=dtw.distance_matrix_fast, dists_options={})
cluster_idx = model2.fit(series)

SciPy linkage clustering

model3 = clustering.LinkageTree(dtw.distance_matrix_fast, {})
cluster_idx = model3.fit(series)

model2.plot("hierarchy.png")

(None, None)

basic clustering example fails with "AttributeError: 'NoneType' object has no attribute 'shape'"

dynamicTimeWarping.py:

from dtaidistance import dtw
from dtaidistance import clustering
import numpy as np

s = np.array([
         [0, 0, 1, 2, 1, 0, 1, 0, 0],
         [0, 1, 2, 0, 0, 0, 0, 0, 0],
         [1, 2, 0, 0, 0, 0, 0, 1, 1],
         [0, 0, 1, 2, 1, 0, 1, 0, 0],
         [0, 1, 2, 0, 0, 0, 0, 0, 0],
         [1, 2, 0, 0, 0, 0, 0, 1, 1],
         [1, 2, 0, 0, 0, 0, 0, 1, 1]])
         

model = clustering.Hierarchical(dtw.distance_matrix_fast, {})
modelw = clustering.HierarchicalTree(model)
cluster_idx = modelw.fit(s)
modelw.plot("hierarchy.png")

error logs:

(timeSeriesClassification) bash-3.2$ python3 dynamicTimeWarping.py
The compiled dtaidistance C library is not available.
See the documentation for alternative installation options.
Traceback (most recent call last):
  File "dynamicTimeWarping.py", line 17, in <module>
    cluster_idx = modelw.fit(s)
  File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 418, in fit
    result = self._model.fit(series, *args, **kwargs)
  File "/Users/user/.local/share/virtualenvs/timeSeriesClassification-ight38Tz/lib/python3.7/site-packages/dtaidistance/clustering.py", line 73, in fit
    pbar = tqdm(total=dists.shape[0])
AttributeError: 'NoneType' object has no attribute 'shape'

I should note that this fails the same way if I run the clustering tests included in this repo as well

how to reformat dense dendrogam plot

Is there a recommended way to reformat the dendogram plot if there are many time series? Mine is quite dense and hard to read. My code:

im_name = 'pic.png'
cluster_model.plot(im_name, show_ts_label=names, show_tr_label=True)
ax.imshow(plt.imread(im_name))

Thanks!

image

Multivariate float/int time series

Hi,
Is it working for multivariate time series using a list of list as input or an array with dimensions "nb_samples * nb_features" for instance ?
Thanks !

What is happening with the dtw distance?

I was just comparing results with other dynamic time warping libraries and noticed significant differences in the dtw distances, despite similar paths. Am I missing something or is there an error somewhere in your distance calculation?

dist1,paths= dtw.warping_paths(s1,s2)
path = dtw.best_path(paths) 
print(dist1)

dist2 = 0
for [a, b] in path:
    dist2 += abs(s1[a]-s2[b])
print(dist2)

dist1 and dist2 are vastly different in my examples.

Missing test file

The tests require a file synthetic_control.data. However, this file is not included in the github sources nor can I find any explanation on where to obtain this file. It would be helpful if it was in the github source tree, that would allow CI tests to be run.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.