Giter Site home page Giter Site logo

timeseriesclassification.jl's Introduction

MLJTime

An MLJ compatible Julia toolbox for machine learning with time series.

Build Status Coverage

Installation

To install MLJTime.jl, launch Julia and run:

]add "https://github.com/alan-turing-institute/MLJTime.jl.git"

MLJTime.jl requires Julia version 1.0 or greater.

Quickstart

using MLJTime

# load data
X, y = ts_dataset("Chinatown")

# split data into training and test set
train, test = partition(eachindex(y), 0.7, shuffle=true, rng=1234) #70:30 split
X_train, y_train = X[train], y[train];
X_test, y_test = X[test], y[test];

# train model
model = TimeSeriesForestClassifier(n_trees=3)
mach = machine(model, matrix(X_train), y_train)
fit!(mach)

# make predictions
y_pred = predict_mode(mach, matrix(X_train))

Documentation

To find out more, check out our:

Future work

In future work, we want to add:

  • Support for multivariate time series,
  • Shapelet based classification algorithms,
  • Enhancements to KNN (KDTree and BallTree algorithms),
  • Forecasting framework.

How contribute

  • If you are interested, please raise an issue or get in touch with the MLJTime team on slack.

About the project

This project was originally developed as part of the Google Summer of Code 2020 with the support of the Julia community and my mentors Sebastian Vollmer and Markus Löning.

Active maintainers:

timeseriesclassification.jl's People

Contributors

aa25desh avatar mloning avatar ablaom avatar dilumaluthge avatar azev77 avatar vollmersj avatar

Stargazers

Charles Dufour avatar ebigram avatar  avatar Pietro Monticone avatar Claudio Moroni avatar baggiponte avatar  avatar Iblis Lin avatar Eduard avatar  avatar Daniel avatar aiexplorations avatar Sinan avatar Christian Rorvik avatar Shobhit Sinha avatar Amin Yahyaabadi avatar Domenico Di Gangi avatar Kestutis Vinciunas avatar  avatar Ashrya Agrawal avatar Ramansh Sharma avatar  avatar Nikita avatar Maxime Mouchet avatar Thibaut Lienart avatar Okon Samuel avatar  avatar

Watchers

James Cloos avatar Tomas Lazauskas avatar  avatar  avatar  avatar  avatar  avatar Okon Samuel avatar

timeseriesclassification.jl's Issues

Wrapping ARCHModels?

How hard would it be to wrap ARCHModels.jl?
It can fit ARMA{p,q} models (where p, q are the tuning parameters)

julia> using ARCHModels;
julia> fit_arma(df, p, q) = fit(ARCH{0}, df, meanspec=ARMA{p,q});
julia> fit_arma(BG96, 2, 3)
TGARCH{0,0,0} model with Gaussian errors, T=1974.
Mean equation parameters:
────────────────────────────────────────────────
       Estimate  Std.Error     z value  Pr(>|z|)
────────────────────────────────────────────────
c   -0.00983746  0.0354041  -0.277862     0.7811
φ₁   0.551574    0.589292    0.935995     0.3493
φ₂  -0.144346    1.92247    -0.0750838    0.9401
θ₁  -0.542057    0.591114   -0.91701      0.3591
θ₂   0.113263    1.92454     0.0588521    0.9531
θ₃   0.0501891   0.0282235   1.77828      0.0754
────────────────────────────────────────────────
Volatility parameters:
─────────────────────────────────────────
   Estimate  Std.Error  z value  Pr(>|z|)
─────────────────────────────────────────
ω  0.220462  0.0117617  18.7441    <1e-77
─────────────────────────────────────────
julia>

It can comes w/ a self-tuning function to easily & quickly select the optimal ARMA{p,q}:

julia> auto_arma(df, bic) = selectmodel(ARCH{0}, df, meanspec=ARMA, criterion=bic, minlags=1, maxlags=3);
julia> auto_arma(BG96, bic)       # ARMA(1,1)
TGARCH{0,0,0} model with Gaussian errors, T=1974.
Mean equation parameters:
─────────────────────────────────────────────
      Estimate  Std.Error   z value  Pr(>|z|)
─────────────────────────────────────────────
c   -0.0266446  0.0174716  -1.52502    0.1273
φ₁  -0.621838   0.160741   -3.86857    0.0001
θ₁   0.643588   0.154303    4.17095    <1e-4
─────────────────────────────────────────────
Volatility parameters:
─────────────────────────────────────────
   Estimate  Std.Error  z value  Pr(>|z|)
─────────────────────────────────────────
ω  0.220848  0.0118061  18.7063    <1e-77
─────────────────────────────────────────
julia> auto_arma(DOW29[:,1], bic) # ARMA(2,2)
TGARCH{0,0,0} model with Gaussian errors, T=2785.
Mean equation parameters:
───────────────────────────────────────────────
      Estimate  Std.Error     z value  Pr(>|z|)
───────────────────────────────────────────────
c   -0.166748   0.0681206   -2.44783     0.0144
φ₁   0.0178542  0.0401341    0.444864    0.6564
φ₂  -0.932372   0.0803993  -11.5968      <1e-30
θ₁  -0.0191798  0.0463979   -0.413375    0.6793
θ₂   0.903732   0.0963863    9.37614     <1e-20
───────────────────────────────────────────────
Volatility parameters:
─────────────────────────────────────────
   Estimate  Std.Error  z value  Pr(>|z|)
─────────────────────────────────────────
ω   3.65185   0.220716  16.5455    <1e-60
─────────────────────────────────────────
julia> auto_arma(DOW29[:,3], bic) # ARMA(2,1)
TGARCH{0,0,0} model with Gaussian errors, T=2785.
Mean equation parameters:
────────────────────────────────────────────────
       Estimate  Std.Error     z value  Pr(>|z|)
────────────────────────────────────────────────
c    0.00192406  0.0345892   0.0556262    0.9556
φ₁  -0.371152    0.2418     -1.53496      0.1248
φ₂  -0.145134    0.0625429  -2.32055      0.0203
θ₁   0.231451    0.235409    0.983186     0.3255
────────────────────────────────────────────────
Volatility parameters:
─────────────────────────────────────────
   Estimate  Std.Error  z value  Pr(>|z|)
─────────────────────────────────────────
ω   2.20732   0.164313  13.4336    <1e-40
─────────────────────────────────────────
julia> auto_arma(DOW29[:,9], bic) # ARMA(1,2)
TGARCH{0,0,0} model with Gaussian errors, T=2785.
Mean equation parameters:
──────────────────────────────────────────────
      Estimate  Std.Error    z value  Pr(>|z|)
──────────────────────────────────────────────
c   -0.0184696  0.0215819  -0.855789    0.3921
φ₁   0.109868   0.306715    0.358211    0.7202
θ₁  -0.110765   0.308797   -0.358699    0.7198
θ₂  -0.107618   0.0347561  -3.09639     0.0020
──────────────────────────────────────────────
Volatility parameters:
─────────────────────────────────────────
   Estimate  Std.Error  z value  Pr(>|z|)
─────────────────────────────────────────
ω   1.79002   0.109611  16.3307    <1e-59
─────────────────────────────────────────

It takes any vector as input (it assumes vector is in chronological order)...

auto_arma(randn(5000), bic) 

using MLJ;
X, y = @load_boston;
auto_arma(y, bic) #Nonsense bc y has no time structure in this dataset

Can you please help me wrap ARCHModels.jl?

Partial code review

@aa25desh I've taken a look at the overall structure of the code and have some comments. I can see a lot of work has gone into this, particularly into core algorithms (which I have not, however, reviewed in any detail). Be great if you can look at this when you have some time.

cc @vollmersj @mloning

  • Please change the name of this repository to TimeSeriesClassification.jl,
    or something similar without MLJ prefix, which we are now
    reserving for packages providing core functionality.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/test/runtests.jl#L10

  • Where does the right hand side for this test come from? If this is
    the output of some alternative implementation (e.g., sk-learn), then
    please state this clearly in a comment. Otherwise, explain why you
    know the right hand side must be the correct output (Independent of
    your implementation). In any case, the test is not robust because you are
    comparing floats using ==. Instead, please use approx or .

  • I think a lack of unit tests here is still a serious issue. What
    about a unit test for these functions?

    • _discrete_fourier_transform,
    • transform (at this
      line
    • _shorten_bags,
    • select_sort
    • InvFeaturesGen (version on this line
    • apply_kernel
    • apply_kernels

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/Project.toml#L6

  • You should not have MLJBase as a dependency unless you
    absolutely need it. I see that you use accuracy. I'm guessing you only need it for a
    test? The idea is that the light-weight package MLJModelInterface
    should suffice. You will still need MLJBase as a dependency for
    testing, i.e., listed under [extras] and [targets]. Read
    this
    carefully. If you're not sure how to include dependencies for
    testing, look at the examples at MLJModels or elsewhere.

  • Perhaps review the inclusion of MultivariateStats and Distributions
    as dependencies, as these are pretty hefty. Note that you can use
    the UnivariateFinite constructor without Distributions or
    MLJBase.

  • Before registering your package, you will need to give every package
    in [deps] that is not part of the standard library and explicit
    [compa] entry. Then you can accelerate merging of patch and
    minor releases.

https://github.com/alan-turing-institute/MLJTime.jl

  • At a minimum, documentation needs to include a description of each
    model provided (Could be a table), ideally including an explanation
    of all hyper parameters. This could be as simple as a reproduction
    of the doc strings. I would put this directly in the readme. It is
    difficult to find this information quickly from the other
    "documentation" that you provide. If input data is matrix rather
    than tabular, say whether your observations correspond to
    rows or columns. Probably worth stating
    explicitly what input is allowed, since a lot of MLJ models use
    tabular data.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/MLJTime.jl#L12

  • You shouldn't need to export the methods predict, predict_mean,
    fitted_params, predict_mode, as you are overloading methods
    already defined in MLJModelInterface which are already exported by
    MLJ or MLJBase. The methods accuracy, fit!, and machine are
    also already exported by MLJ/MLJBase. At the moment, the user's
    work-flow would begin using MLJ; using MLJTime; .... After your package is
    registered, the work-flow will be using MLJ; @load TimeSeriesForestClassifier ... or similar.

  • predict_new looks like a private method; it is not part of the MLJ
    API; I don't think you need to export it.

  • ditto RandomForestClassifierFit.

  • I suggest exporting your model types here (for example,
    TimeSeriesForestClassifier). (Does the example on the read me
    actually work without this export?)

  • We need model metadata here. Here's a suggestion for the first
    classifier:

MMI.input_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractMatrix(<:MMI.Continuous)
MMI.target_scitype(::Type{<:TimeSeriesForestClassifier}) = 
    AbstractVector{<:MMI.Finite}
MMI.load_path(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification.TimeSeriesForestClassifier"
MMI.package_name(::Type{<:TimeSeriesForestClassifier}) = 
    "TimeSeriesClassification"
MMI.package_uuid(::Type{<:TimeSeriesForestClassifier}) = 
    "2a643d68-9e49-4566-a2d5-26c3fb6c4a71"
MMI.package_url(::Type{<:TimeSeriesForestClassifier}) = "???"
MMI.is_pure_julia(::Type{<:TimeSeriesForestClassifier}) = true

I'm assuming here that your inputs are matrices of abstract floats. If
you change your requirements for the input type, then you'll need to
modify the input_trait accordingly.

https://github.com/alan-turing-institute/MLJTime.jl/blob/b38e4b5dd1aba2d2b2b6402ec4568ee9b1c98970/src/interface.jl#L93

  • Add a signature for the constructor here. Just like you have for
    the preceding model.

TS feature requests

Hi @aa25desh and welcome to MLJ.jl!
Here are some time series forecasting features I find very valuable:
Check out @robjhyndman's free book on forecasting: https://otexts.com/fpp2/

Univariate time-series:

  1. naive & seasonal naive models
  2. auto.arima model
  3. ets model
  4. thetam model
  5. nnetar model
  6. stlm model
  7. tbats model
  8. their hybrids. Also check out Forecast Benchmarks.
    All the above belong to @robjhyndman's forecast.r which is being refactored in the new fable.r.
    -it is valuable to understand where forecast went wrong & what fable is doing different so we don't make the same mistakes

Multivariate time-series:

  1. vector auto regression: @fipelle's TSAnalysis.jl is nice (also ElasticNetVAR.jl).
    PS: I've never seen automated multivariate models (VARIMA) the same way we have automated univariate models (auto.arima() etc).

Volatility models:

  1. @s-broda's ARCHModels.jl is very neat!
    Hansen has nice slides on volatility forecasting & a paper that compares 330 ARCH-type models.

Impulse Response Functions:
I usually do this in R.
Two Julia packages: VARmodels.jl & VectorAutoregressions.jl

In general Julia has great libraries in many domains. Unfortunately time series is one of the least well organized.
It also means this is the area w/ the biggest opportunities to make a lasting impact on the worth through open source!

Readme example doesn't work

julia> using MLJTime

julia> # load data
       X, y = ts_dataset("Chinatown");

julia> # split data into training and test set
       train, test = partition(eachindex(y), 0.7, shuffle=true, rng=1234); #70:30 split

julia> X_train, y_train = X[train], y[train];

julia> X_test, y_test = X[test], y[test];

julia> # train model
       model = TimeSeriesForestClassifier(n_trees=3)
TimeSeriesForestClassifier(
    n_trees = 3,
    random_state = nothing,
    min_interval = 3,
    max_depth = -1,
    min_samples_leaf = 1,
    min_samples_split = 2,
    min_purity_increase = 0.0,
    n_subfeatures = 0,
    post_prune = false,
    merge_purity_threshold = 1.0,
    pdf_smoothing = 0.0,
    display_depth = 5) @784

julia> mach = machine(model, X_train, y_train)
Machine{TimeSeriesForestClassifier} @035 trained 0 times.
  args: 
    1:  Source @887`ScientificTypes.Table{AbstractArray{ScientificTypes.Continuous,1}}`
    2:  Source @175`AbstractArray{ScientificTypes.Multiclass{2},1}`


julia> fit!(mach)
[ Info: Training Machine{TimeSeriesForestClassifier} @035.
┌ Error: Problem fitting the machine Machine{TimeSeriesForestClassifier} @035, possibly because an upstream node in a learning network is providing data of incompatible scitype. 
└ @ MLJBase ~/.julia/packages/MLJBase/Ov46j/src/machines.jl:422
[ Info: Running type checks... 
[ Info: Type checks okay. 
ERROR: MethodError: no method matching TimeSeriesForestClassifier(::TimeSeriesForestClassifier, ::IndexedTables.IndexedTable{StructArrays.StructArray{NTuple{24,Float64},1,NTuple{24,Array{Float64,1}},Int64}}, ::Array{UInt32,1})
Closest candidates are:
  TimeSeriesForestClassifier(::Any, ::Array, ::Array) at /Users/AZevelev/.julia/packages/MLJTime/61x0z/src/interval_based_forest.jl:12
Stacktrace:
 [1] fit_only!(::MLJBase.Machine{TimeSeriesForestClassifier}; rows::Nothing, verbosity::Int64, force::Bool) at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/machines.jl:433
 [2] fit_only! at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/machines.jl:386 [inlined]
 [3] #fit!#85 at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/machines.jl:478 [inlined]
 [4] fit!(::MLJBase.Machine{TimeSeriesForestClassifier}) at /Users/AZevelev/.julia/packages/MLJBase/Ov46j/src/machines.jl:476
 [5] top-level scope at none:1

julia> # make predictions
       y_pred = predict_mod(mach, X_test)
ERROR: UndefVarError: predict_mod not defined
Stacktrace:
 [1] top-level scope at none:1

julia> 

Improve test coverage

This is currently at 9%. I would say that until coverage is above 70%, adding tests should take priority over adding functionality.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.