Giter Site home page Giter Site logo

mljtutorial.jl's Introduction

MLJTutorial.jl

Notebooks for introducing the machine learning toolbox MLJ (Machine Learning in Julia)

MLJ

Based on tutorials originally part of a 3.5 hour online workshop.

Prerequisites

  • Familiarity with basic data manipulation in Julia: vectors, tuples, dictionaries, arrays, generating random numbers, tabular data (e.g., DataDrames.jl) basic stats, Distributions.jl.

  • Familiarity with Machine Learning fundamentals and best practice.

Topics covered

Basic

  • Part 1 - Data Representation

  • Part 2 - Selecting, Training and Evaluating Models

  • Part 3 - Transformers and Pipelines

Advanced

  • Part 4 - Tuning hyper-parameters

  • Part 5 - Advanced model composition

The tutorials include links to external resources and exercises with solutions.

More about the tutorials

  • The tutorials focus on the machine learning part of the data science workflow, and less on exploratory data analysis and other conventional "data analytics" methodology

  • Here "machine learning" is meant in a broad sense, and is not restricted to so-called deep learning (neural networks)

  • The tutorials are crafted to rapidly familiarize the user with what MLJ can do and how to do it, and are not a substitute for a course on machine learning fundamentals. Examples do not necessarily represent best practice or the best solution to a problem.

Additional resources

Credits

The author and maintainer of this repository is @ablaom. Pluto notebooks have been adapted from the julia scripts by @roland-KA who is also a maintainer.

mljtutorial.jl's People

Contributors

ablaom avatar roland-ka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

mljtutorial.jl's Issues

Make model in tute 02 reproducible

@roland-KA has notices that misclassification_rate is occassionially zero, which might confuse some users. Perhaps consider a reproducible model (this one uses Flux dropout which cannot be passed an RNG at time of writing).

`evaluate!` mutating?

I have a general question concerning evaluate!: As the exclamation mark in it's name shows, this is a mutating function. But it's purpose is the calculation of different measures, which basically (at least from an abstract point of view) shouldn't require to mutate the data given to evaluate!?

So, why is it nonetheless defined as being mutating? Are there calculations of specific measures which require to mutate the data (so my assumption is wrong) or is it just a way to give the implementors of this function more degrees of freedom on how to implement it?

MLJ 0.17 breaks `fit!` in tutorial 02

I've updated the Pluto notebook for tutorial 02 to MLJ 0.17 as well as all other packages to their most recent versions.

Version changes in detail

Old versions:
CSV = "~0.9.10"
DataFrames = "~1.2.2"
Distributions = "~0.25.28"
MLJ = "~0.16.11"
MLJFlux = "~0.2.5"
Plots = "~1.23.6"
PlutoUI = "~0.7.19"
Tables = "~1.6.0"
UrlDownload = "~1.0.0"

New versions:
CSV = "~0.9.11"
DataFrames = "~1.3.1"
Distributions = "~0.25.37"
MLJ = "~0.17.0"
MLJFlux = "~0.2.6"
Plots = "~1.25.4"
PlutoUI = "~0.7.27"
Tables = "~1.6.1"
UrlDownload = "~1.0.0"

This breaks the first fit! statement in the tutorial (in 'Step 3'):

fit!(mach, rows=train, verbosity=2)

It comes up with the following error message:

UndefVarError: scitype not defined

reformat(::DataFrames.DataFrame)@core.jl:167
collate(::MLJFlux.NeuralNetworkClassifier{MLJFlux.Short, typeof(NNlib.softmax), Flux.Optimise.ADAM, typeof(Flux.Losses.crossentropy)}, ::DataFrames.DataFrame, ::CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}})@core.jl:247
fit(::MLJFlux.NeuralNetworkClassifier{MLJFlux.Short, typeof(NNlib.softmax), Flux.Optimise.ADAM, typeof(Flux.Losses.crossentropy)}, ::Int64, ::DataFrames.DataFrame, ::CategoricalArrays.CategoricalVector{String, UInt32, String, CategoricalArrays.CategoricalValue{String, UInt32}, Union{}})@mlj_model_interface.jl:56
var"#fit_only!#53"(::Vector{Int64}, ::Int64, ::Bool, ::typeof(MLJBase.fit_only!), ::MLJBase.Machine{MLJFlux.NeuralNetworkClassifier{MLJFlux.Short, typeof(NNlib.softmax), Flux.Optimise.ADAM, typeof(Flux.Losses.crossentropy)}, true})@machines.jl:592
#fit!#[email protected]:659[inlined]
top-level scope@Local: 1[inlined]

I can't relate this error to any of the changes coming with the new releases.

Tutorial 04: `predict(..., rows = 1:3)`

In part 04 of the tutorial, in section 'The tuning wrapper' there is a predict statement applied to the tuned model/machine:
predict(tuned_mach, rows = 1:3)

To me it is not quite clear why the parameter 1:3 is given to rows. I would have expected that XHorse (or a subset of it) would be used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.