Giter Site home page Giter Site logo

mlj.jl's Introduction

MLJ

A Machine Learning Toolbox for Julia.

Build Status Coverage #mlj Documentation

MLJ is a machine learning framework for Julia aiming to provide a convenient way to use and combine a multitude of tools and models available in the Julia ML/Stats ecosystem. MLJ is released under the MIT licensed and sponsored by the Alan Turing Institute.


Using MLJMLJ UniverseContributingAvailable ModelsMLJ CheatsheetCiting MLJ

Key goals

  • Offer a consistent way to use, compose and tune machine learning models in Julia,
  • Promote the improvement of the Julia ML/Stats ecosystem by making it easier to use models from a wide range of packages,
  • Unlock performance gains by exploiting Julia's support for parallelism, automatic differentiation, GPU, optimisation etc.

Key features

  • Data agnostic, train models on any data supported by the Tables.jl interface,
  • Extensive support for model composition (pipelines and learning networks),
  • Convenient syntax to tune and evaluate (composite) models,
  • Consistent interface to handle probabilistic predictions.

Using MLJ

It is a good idea to use a separate environment for MLJ in order to avoid version clashes with other packages you may be using. You can do so with

julia> using Pkg; Pkg.activate("My_MLJ_env", shared=true)

Installing MLJ is also done with the package manager:

julia> Pkg.add(["MLJ", "MLJModels"])

It is important to note that MLJ is essentially a big wrapper providing a unified access to model providing packages and so you will also need to make sure these packages are available in your environment. For instance, if you want to use a Decision Tree Classifier, you need to have DecisionTree.jl installed:

julia> Pkg.add("DecisionTree");
julia> using MLJ;
julia> @load DecisionTreeClassifier

For a list of models and their packages see the table below, or run

using MLJ
models()

We recommend you start with models marked as coming from mature packages such as DecisionTree, ScikitLearn or XGBoost.

Tutorials

The best place to get started with MLJ is to go the MLJ Tutorials website. Each of the tutorial can be downloaded as a notebook or Julia script to facilitate experimentation with the packages.

You're also welcome to join the #mlj Julia slack channel to ask questions and make suggestions.


The MLJ Universe

The MLJ universe is made out of several repositories some of which can be used independently of MLJ (indicated with a ⟂ symbol):

  • (⟂) MLJBase.jl offers essential tools to load and interpret data, describe ML models and use metrics; it is the repository you should interface with if you wish to make your package accessible via MLJ,
  • MLJ.jl offers tools to compose, tune and evaluate models,
  • MLJModels.jl contains interfaces to a number of important model-providing packages such as, DecisionTree.jl, ScikitLearn.jl or XGBoost.jl as well as a few built-in transformations (one hot encoding, standardisation, ...), it also hosts the model registry which keeps track of all models accessible via MLJ,
  • (⟂) ScientificTypes.jl a lightweight package to help specify the interpretation of data beyond how the data is currently encoded,
  • (⟂) MLJLinearModels.jl an experimental package for a wide range of penalised linear models such as Lasso, Elastic-Net, Robust regression, LAD regression, etc.
  • MLJFlux.jl an experimental package to use Flux within MLJ.

and maybe most importantly:


Contributing to MLJ

MLJ is an ambitious project and we need all the help we can get! There are multiple ways you can contribute; the table below helps indicate where you can help and what are the subjective requirements in terms of Julia and ML expertise.

Julia ML What to do
= = use MLJ and give us feedback, help us write better tutorials, suggest missing features, test the less mature model packages
= package to facilitate visualising results in MLJ
add/improve data pre-processing tools
add/improve interfaces to other model-providing packages
functionalities for time series
functionalities for systematic benchmarking of models
functionalities for natural language processing (NLP)
⭒⭒ = decrease the overhead incurred by MLJ
⭒⭒ = improving support for sparse data
⭒⭒ add parallelism and/or multithreading to MLJ (there is an ongoing effort to interface with Dagger.jl)
⭒⭒ add interface with probabilistic programming packages (there is an ongoing effort to interface with Soss.jl)
⭒⭒ ⭒⭒ more sophisticated HP tuning (BO, Bandit, early stopping, ...) possibly as part of an external package(s), possibly integrating with Julia's optimisation and autodiff packages

If you're interested in one of these beyond the first one, please get in touch with either Anthony Blaom or Thibaut Lienart on Slack and we can further guide you. Thank you!

You can also have a look at MLJ's release notes to get an idea for what's been happening recently.


Models available

There is a wide range of models accessible via MLJ. We are always looking for contributors to add new models or help us test existing ones. The table below indicates the models that are accessible at present along with a subjective indication of how mature the underlying package is.

  • experimental: indicates the package is fairly new and/or is under active development; you can help by testing these packages and making them more robust,
  • medium: indicates the package is fairly mature but may benefit from optimisations and/or extra features; you can help by suggesting either,
  • high: indicates the package is very mature and functionalities are expected to have been fairly optimised and tested.
Package Models Maturity Note
Clustering.jl KMeans, KMedoids high
DecisionTree.jl DecisionTreeClassifier, DecisionTreeRegressor high
GLM.jl LinearRegressor, LinearBinaryClassifier, LinearCountRegressor medium
LIBSVM.jl LinearSVC, SVC, NuSVC, NuSVR, EpsilonSVR, OneClassSVM high also via ScikitLearn.jl
MLJModels.jl (builtins) StaticTransformer, FeatureSelector, FillImputer, UnivariateStandardizer, Standardizer, UnivariateBoxCoxTransformer, OneHotEncoder, ConstantRegressor, ConstantClassifier medium
MLJLinearModels.jl LinearRegressor, RidgeRegressor, LassoRegressor, ElasticNetRegressor, QuantileRegressor, HuberRegressor, RobustRegressor, LADRegressor, LogisticClassifier, MultinomialClassifier experimental
MultivariateStats.jl RidgeRegressor, PCA, KernelPCA, ICA, LDA, BayesianLDA, SubspaceLDA, BayesianSubspaceLDA high
NaiveBayes.jl GaussianNBClassifier, MultinomialNBClassifier, HybridNBClassifier low
NearestNeighbors.jl KNNClassifier, KNNRegressor high
ScikitLearn.jl SVMClassifier, SVMRegressor, SVMNuClassifier, SVMNuRegressor, SVMLClassifier, SVMLRegressor, ARDRegressor, BayesianRidgeRegressor, ElasticNetRegressor, ElasticNetCVRegressor, HuberRegressor, LarsRegressor, LarsCVRegressor, LassoRegressor, LassoCVRegressor, LassoLarsRegressor, LassoLarsCVRegressor, LassoLarsICRegressor, LinearRegressor, OrthogonalMatchingPursuitRegressor, OrthogonalMatchingPursuitCVRegressor, PassiveAggressiveRegressor, RidgeRegressor, RidgeCVRegressor, SGDRegressor, TheilSenRegressor, LogisticClassifier, LogisticCVClassifier, PerceptronClassifier, RidgeClassifier, RidgeCVClassifier, PassiveAggressiveClassifier, SGDClassifier, GaussianProcessRegressor, GaussianProcessClassifier, AdaBoostRegressor, AdaBoostClassifier, BaggingRegressor, BaggingClassifier, GradientBoostingRegressor, GradientBoostingClassifier, RandomForestRegressor, RandomForestClassifier, GaussianNB, MultinomialNB, ComplementNB, BayesianLDA, BayesianQDA high
XGBoost.jl XGBoostRegressor, XGBoostClassifier, XGBoostCount high

Note (†): some models are missing, your help is welcome to complete the interface. Get in touch with Thibaut Lienart on Slack if you would like to help, thanks!


Citing MLJ

Cite MLJ
@software{anthony_blaom_2019_3541506,
  author       = {Anthony Blaom and
                  Franz Kiraly and
                  Thibaut Lienart and
                  Sebastian Vollmer},
  title        = {alan-turing-institute/MLJ.jl: v0.5.3},
  month        = nov,
  year         = 2019,
  publisher    = {Zenodo},
  version      = {v0.5.3},
  doi          = {10.5281/zenodo.3541506},
  url          = {https://doi.org/10.5281/zenodo.3541506}
}

Contributors

Core design: A. Blaom, F. Kiraly, S. Vollmer

Active maintainers: A. Blaom, T. Lienart

Active collaborators: D. Arenas, D. Buchaca, J. Hoffimann, S. Okon, J. Samaroo, S. Vollmer

Past collaborators: D. Aluthge, E. Barp, G. Bohner, M. K. Borregaard, V. Churavy, H. Devereux, M. Giordano, M. Innes, F. Kiraly, M. Nook, Z. Nugent, P. Oleśkiewicz, A. Shridar, Y. Simillides, A. Sengupta, A. Stechemesser.

License

MLJ is supported by the Alan Turing Institute and released under the MIT "Expat" License.

mlj.jl's People

Contributors

ablaom avatar tlienart avatar ysimillides avatar darenasc avatar vollmersj avatar giordano avatar ayush-1506 avatar ayush1999 avatar dominusmi avatar swenkel avatar juliohm avatar sjvollmer avatar oleskiewicz avatar okonsamuel avatar mkborregaard avatar dilumaluthge avatar evelinag avatar jpsamaroo avatar kryohi avatar nilshg avatar roberthoenig avatar xiaodaigh avatar lhnguyen-vn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.