Giter Site home page Giter Site logo

angus924 / rocket Goto Github PK

View Code? Open in Web Editor NEW
410.0 10.0 55.0 44 KB

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels

License: GNU General Public License v3.0

Jupyter Notebook 18.13% Python 81.87%
scalable time-series-classification random convolution convolutional-kernel convolutional-neural-network

rocket's Introduction

ROCKET · MINIROCKET · HYDRA

ROCKET

ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels.

Data Mining and Knowledge Discovery / arXiv:1910.13051 (preprint)

Most methods for time series classification that attain state-of-the-art accuracy have high computational complexity, requiring significant training time even for smaller datasets, and are intractable for larger datasets. Additionally, many existing methods focus on a single type of feature such as shape or frequency. Building on the recent success of convolutional neural networks for time series classification, we show that simple linear classifiers using random convolutional kernels achieve state-of-the-art accuracy with a fraction of the computational expense of existing methods. Using this method, it is possible to train and test a classifier on all 85 ‘bake off’ datasets in the UCR archive in < 2 h, and it is possible to train a classifier on a large dataset of more than one million time series in approximately 1 h.

Please cite as:

@article{dempster_etal_2020,
  author  = {Dempster, Angus and Petitjean, Fran\c{c}ois and Webb, Geoffrey I},
  title   = {{ROCKET}: Exceptionally Fast and Accurate Time Series Classification Using Random Convolutional Kernels},
  journal = {Data Mining and Knowledge Discovery},
  year    = {2020},
  volume  = {34},
  number  = {5},
  pages   = {1454--1495}
}

sktime

An implementation of ROCKET (with basic multivariate capability) is available through sktime. See the examples.

MINIROCKET is up to 75× faster than ROCKET on larger datasets.

Results

UCR Archive

Scalability

Code

Requirements

  • Python;
  • Numba;
  • NumPy;
  • scikit-learn (or equivalent).

Example

from rocket_functions import generate_kernels, apply_kernels
from sklearn.linear_model import RidgeClassifierCV

[...] # load data, etc.

# generate random kernels
kernels = generate_kernels(X_training.shape[-1], 10_000)

# transform training set and train classifier
X_training_transform = apply_kernels(X_training, kernels)
classifier = RidgeClassifierCV(alphas = np.logspace(-3, 3, 10), normalize = True)
classifier.fit(X_training_transform, Y_training)

# transform test set and predict
X_test_transform = apply_kernels(X_test, kernels)
predictions = classifier.predict(X_test_transform)

Reproducing the Experiments

Arguments:
-d --dataset_names : txt file of dataset names
-i --input_path    : parent directory for datasets
-o --output_path   : path for results
-n --num_runs      : number of runs (optional, default 10)
-k --num_kernels   : number of kernels (optional, default 10,000)

Examples:
> python reproduce_experiments_ucr.py -d bakeoff.txt -i ./Univariate_arff -o ./
> python reproduce_experiments_ucr.py -d additional.txt -i ./Univariate_arff -o ./ -n 1 -k 1000
Arguments:
-tr --training_path : training dataset (csv)
-te --test_path     : test dataset (csv)
-o  --output_path   : path for results
-k  --num_kernels   : number of kernels

Examples:
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 100
> python reproduce_experiments_scalability.py -tr training.csv -te test.csv -o ./ -k 1000

Acknowledgements

We thank Professor Eamonn Keogh and all the people who have contributed to the UCR time series classification archive. Figures in our paper showing the ranking of different classifiers and variants of ROCKET were produced using code from Ismail Fawaz et al. (2019).

🚀

rocket's People

Contributors

angus924 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rocket's Issues

How to handle NaN values in the UCR data sets?

Hello,

I just learn the amazing Rocket and MiniRocket recently. When I used them to classify some UCRArchive2018 datasets (such as 'DodgerLoopGame'), NaN values representing missing values caused Errors. Therefore, I wonder how you handle these values? Is it approriate to directly delete samples with NaN values?

Thanks for your help!

Application of kernels to multivariate data

Hi,

I notice that that the ROCKET transform implemented in sktime now supports the transformation of multivariate datasets.
I was wondering whether you could detail how it is applied?

Many thanks.

Per-feature normalization

Hi Angus,

First of all congrats on your new TSC method. I've used it and the results are very good, and super fast compared to other methods!

I've noticed a discrepancy between the bake-off and the scalable codes you've shared.
In the scalable, you perform a per-feature normalization while you don't do that in the bake-off.
Is there any reason for that? I've run a few tests with some bake-off datasets and have seen that this per-feature normalization hurts performance.

Memory Error

First of all: Thank you very much for providing this new method! If it works, it works very well and leads to good results (i.e. leads to results comparable to other methods within minutes instead of weeks).

Unfortunately, it doesn't work all the time for me. Especially if I use 10,000 kernels as recommended in your paper. This is what I end up with on a Windows 10 machine equipped with an NVIDIA 1080 TI GPU and 32 GB RAM:

Algorithms.trainAndEvaluateROCKET(...)
  File "PathToMyProject\MyClass.py", line 180, in trainAndEvaluateROCKET
    classifier.fit(X_training_transform,trainY)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1815, in fit
    _BaseRidgeCV.fit(self, X, Y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1528, in fit
    estimator.fit(X, y, sample_weight=sample_weight)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1436, in fit
    X_mean, *decomposition = decompose(X, y, sqrt_sw)
  File "PathToMyAnacondaFolder\lib\site-packages\sklearn\linear_model\ridge.py", line 1348, in _svd_decompose_design_matrix
    U, s, _ = linalg.svd(X, full_matrices=0)
  File "PathToMyAnacondaFolder\lib\site-packages\scipy\linalg\decomp_svd.py", line 129, in svd
    full_matrices=full_matrices, overwrite_a=overwrite_a)
MemoryError

My testing data-set consists of 3,278 time series (with a length of 1041 amplitude values each) and the training data-set consists of 43,264 time series (again with a length of 1041 amplitude values each). Any help is highly appreciated.

// edit:I forgot to mention that it works perfectly fine for 100 or 1,000 kernels.

Fixing the random state

Hi,

I just wanted to let you know that it is possible to fix the random seed with Numba. The only change would be to add a seed parameter to the generate_kernels function:

@njit
def generate_kernels(input_length, num_kernels, seed=42):
    np.random.seed(seed)

This way, the results would be perfectly reproducible (as it is, there is randomness when the kernels are generated that cannot be fixed).

Best,
Johann

Satellite Image Time Series (SITS) dataset

Hi I'd like to reproduce the scalability experiment using the SITS dataset mentioned in the paper. Is this dataset publicly available? If so, where can I find it?

Thanks =)

normalized features

Hi,

I found that in multi-rocket the features are not normalized when fed into the ridge classifier, While they are normalized in rocket and mini-rocket. I wonder if there is any reason the features are normalized, and in the case of the features being normalized, I wonder why the MAXs and PPVs are normalized together instead of normalized individually.

Thank you in advance.
Yunrui

Results_ucr_resamples

Hi,
I have learnt your paper these day, which is very enlightening to me. I have a question about Results_ucr_resamples. I do not how to reproduce the results and what does resample mean. Could you please give me some help?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.