angus924 / rocket Goto Github PK

View Code? Open in Web Editor NEW

410.0 10.0 55.0 44 KB

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

License: GNU General Public License v3.0

Jupyter Notebook 18.13% Python 81.87%

scalable time-series-classification random convolution convolutional-kernel convolutional-neural-network

rocket's Introduction

ROCKET · MINIROCKET · HYDRA

ROCKET

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.

Data Mining and Knowledge Discovery / arXiv:1910.13051 (preprint)

Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in < 2 h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

Please cite as:

@article{dempster_etal_2020,
  author  = {Dempster, Angus and Petitjean, Fran\c{c}ois and Webb, Geoffrey I},
  title   = {{ROCKET}: Exceptionally Fast and Accurate Time Series Classification Using Random Convolutional Kernels},
  journal = {Data Mining and Knowledge Discovery},
  year    = {2020},
  volume  = {34},
  number  = {5},
  pages   = {1454--1495}
}

`sktime`

An implementation of ROCKET (with basic multivariate capability) is available through sktime. See the examples.

MINIROCKET NEW

MINIROCKET is up to 75× faster than ROCKET on larger datasets.

Results

UCR Archive

Scalability

Code

Requirements

Python;
Numba;
NumPy;
scikit-learn (or equivalent).

Example

from rocket_functions import generate_kernels, apply_kernels
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# generate random kernels
kernels = generate_kernels(X_training.shape[-1], 10_000)

# transform training set and train classifier
X_training_transform = apply_kernels(X_training, kernels)
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

# transform test set and predict
X_test_transform = apply_kernels(X_test, kernels)
predictions = classifier.predict(X_test_transform)

Reproducing the Experiments

`reproduce_experiments_ucr.py`

Arguments:
-d --dataset_names : txt file of dataset names
-i --input_path    : parent directory for datasets
-o --output_path   : path for results
-n --num_runs      : number of runs (optional, default 10)
-k --num_kernels   : number of kernels (optional, default 10,000)

Examples:
> python reproduce_experiments_ucr.py -d bakeoff.txt -i ./Univariate_arff -o ./
> python reproduce_experiments_ucr.py -d additional.txt -i ./Univariate_arff -o ./ -n 1 -k 1000

`reproduce_experiments_scalability.py`

Arguments:
-tr --training_path : training dataset (csv)
-te --test_path     : test dataset (csv)
-o  --output_path   : path for results
-k  --num_kernels   : number of kernels

Examples:
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 100
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 1000

Acknowledgements

We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing the ranking of different classifiers and variants of ROCKET were produced using code from Ismail Fawaz et al. (2019).

🚀

rocket's People

Contributors

Stargazers

Watchers

Forkers

mindis tcapelle afcarl wentropy valeman goldstar111 ezekielbarnett sahanduiuc sri9s hedgefair svgsponer muleina karumanchi peter943 jerronl sanjaycg486 g1capital vishalbelsare junzhi-wen wzpy brisabrin tommasobendinelli doloes-a sumesh1 paratra saman-rahbar hajaalin zhanghang9991 benjaminmlucas harishj011 jackzsun dwuznikmariusz greysun arpieb haskarb spiritflag abdelrahmanibrahimrezk dhritin juliandietzel stevenboa sandy4321 gaotrust hannamykula andreaceni ravi7234 free-angel lkh-7 regisvargas wcj-bert cdong1997 ziyit barcha2000 realwhisky summer72wang

rocket's Issues

Multivariate implementation of rocket.

Hi, I wonder if a multivariant implementation of the rocket is available?

How to handle NaN values in the UCR data sets?

Hello,

I just learn the amazing Rocket and MiniRocket recently. When I used them to classify some UCRArchive2018 datasets (such as 'DodgerLoopGame'), NaN values representing missing values caused Errors. Therefore, I wonder how you handle these values? Is it approriate to directly delete samples with NaN values?

Thanks for your help!

Application of kernels to multivariate data

Hi,

I notice that that the ROCKET transform implemented in sktime now supports the transformation of multivariate datasets.
I was wondering whether you could detail how it is applied?

Many thanks.

Can output class-probabilities? TS must be same length?

Hi,

I would like to know if, instead of the predicted class, I can know the probability of being in that class. Also, if it allows for time series with different lengths.

regards,
Ferran

Per-feature normalization

Hi Angus,

First of all congrats on your new TSC method. I've used it and the results are very good, and super fast compared to other methods!

I've noticed a discrepancy between the bake-off and the scalable codes you've shared.
In the scalable, you perform a per-feature normalization while you don't do that in the bake-off.
Is there any reason for that? I've run a few tests with some bake-off datasets and have seen that this per-feature normalization hurts performance.

Memory Error

First of all: Thank you very much for providing this new method! If it works, it works very well and leads to good results (i.e. leads to results comparable to other methods within minutes instead of weeks).

Unfortunately, it doesn't work all the time for me. Especially if I use 10,000 kernels as recommended in your paper. This is what I end up with on a Windows 10 machine equipped with an NVIDIA 1080 TI GPU and 32 GB RAM:

Algorithms.trainAndEvaluateROCKET(...)
  File "PathToMyProject\MyClass.py", line 180, in trainAndEvaluateROCKET
    classifier.fit(X_training_transform,trainY)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1815, in fit
    _BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1528, in fit
    estimator.fit(X, y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1436, in fit
    X_mean, *decomposition = decompose(X, y, sqrt_sw)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1348, in _svd_decompose_design_matrix
    U, s, _ = linalg.svd(X, full_matrices=0)
  File "PathToMyAnacondaFolder\lib\site-packages\scipy\linalg\decomp_svd.py", line 129, in svd
    full_matrices=full_matrices, overwrite_a=overwrite_a)
MemoryError

My testing data-set consists of 3,278 time series (with a length of 1041 amplitude values each) and the training data-set consists of 43,264 time series (again with a length of 1041 amplitude values each). Any help is highly appreciated.

// edit:I forgot to mention that it works perfectly fine for 100 or 1,000 kernels.

Fixing the random state

Hi,

I just wanted to let you know that it is possible to fix the random seed with Numba. The only change would be to add a seed parameter to the generate_kernels function:

@njit
def generate_kernels(input_length, num_kernels, seed=42):
    np.random.seed(seed)

This way, the results would be perfectly reproducible (as it is, there is randomness when the kernels are generated that cannot be fixed).

Best,
Johann

Satellite Image Time Series (SITS) dataset

Hi I'd like to reproduce the scalability experiment using the SITS dataset mentioned in the paper. Is this dataset publicly available? If so, where can I find it?

Thanks =)

normalized features

Hi,

I found that in multi-rocket the features are not normalized when fed into the ridge classifier, While they are normalized in rocket and mini-rocket. I wonder if there is any reason the features are normalized, and in the case of the features being normalized, I wonder why the MAXs and PPVs are normalized together instead of normalized individually.

Thank you in advance.
Yunrui

How to use it for multivariate time-series?

Hi,
How can I use it for Multivariate time-series (multi-dimension time-series)?

Thanks,

Results_ucr_resamples

Hi,
I have learnt your paper these day, which is very enlightening to me. I have a question about Results_ucr_resamples. I do not how to reproduce the results and what does resample mean. Could you please give me some help?

a problem about the final accuracy

Hello, I am a master's student and I am working on your article. I have a problem about the final accuracy, even though I actually implemented your code, I get much less accuracy. I have put the link related to the code below. If you tell me the error of the code, I will be grateful.

https://colab.research.google.com/drive/1vH9qAI2lgF_0htYPhU8UQb45aGUBEGk5?usp=share_link