Would someone like to implement the new MLJ interface for linear models for which jul

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

In response to an offer of help from <a class="user-mention notranslate" data-hovercar

Implement MLJ interface for linear models,about alan-turing-institute/mlj.jl

Comments (6)

tlienart commented on May 14, 2024 1

Hello @xiaodaigh , there's an ongoing PR to interface with GLM models which should be merged next week I would think.

from mlj.jl.

ablaom commented on May 14, 2024

In response to an offer of help from @tlienart. Some details:

How about you put your implementation of the MLJ "model interface" for
GLM.jl in a module that lives in 'src/builtins/GLM.jl' (where we
currently have the toy "KNN.jl") although your code will probably more
resemble the MulitvariateStats.jl stub where I put the RidgeRegressor
model. (I think we will move away from lazily loaded interface
implementations; if it does not stay in builtins, your code might become a
separate package or, we might try to get GLM.jl to include your
interface in their code.)

I expect you will generally be generally be predicting probabilities
rather than actual target values (this will probably be done in the
RidgeRegressor as well, but isn't at present). There has been some very
recent discussion about exaclty what predict should return in these
cases; see

issue 34

and

issue 33

We will go with @fkiraly recommendations, which are not reflected in the adding_new_models.md document just yet; In particular:

if an algorithm predicts probablities, there is no need to implement
a second predict method that predicts values (i.e., means or by applying
threshold, etc). So only one predict method per model. (We will
dump predict_proba)
the predict method will predict a vector of distribution-objects,
one for each input pattern. (To get the probability of a specific
outcome for the target one will need to call the object on the
outcome of interest, as Franz explains in the first thread
above. However, your interface isn't concerned with this.) I admit I
haven't thought too much about the details of this yet but hopefully
we can just use Distributions.jl for this purpose. I will be turning
to this question first thing when I return from holiday in the new year.

Do keep in mind that in the case of nominal target data, the target
y will arrive to your model as a CategoricalArray which includes
levels in its pool that may or may not be actually be realized in
the data, but which need be incorporated in the distribution object
(with zero probability if they do not occur); see also the
adding_new_models.md doc.

Note that you will need a separate model for each kind of target data
/ response type because each model SomeModel can only have one value for
metadata(SomeModel)[:outputs_are]. (To the possible values "nominal", "ordinal", "multiclass"
and "multivariate" we will now add "probabilistic", meaning
probablities are to be predicted). So you might have these models:

GLMProbabilisticRegressor
GLMProbabilisticClassifier
GLMProbabilisticMulticlassClassifier

and limit the allowed options for the "family" and "link" options
accordingly. Perhaps not worry about models for multivariate targets
just now.

No need for R style "formula". Your model already gets separate input
X and target y and you fit to all input features (columns) of
X. Feature selection will be external to the model interface.

from mlj.jl.

ablaom commented on May 14, 2024

Oops. Closed by accident. :-)

from mlj.jl.

tlienart commented on May 14, 2024

Ok, I'll start this and probably open a NO-MERGE PR for guidance while I get familiar with the interface and more comfortable with the goal

from mlj.jl.

xiaodaigh commented on May 14, 2024

Is there an example on how to use linear models using MLJ.jl? Can anyone please show me a simple example of fitting y = ax+b where a and b are coefficients? E.g. in GLM it would be

using GLM
x = rand(100)
y = rand(100)
data = DataFrame(x=x, y=y)
lm(@formulat(y~x), data)

from mlj.jl.

ablaom commented on May 14, 2024

For now you can use OLS (ordinary least squares regressor) or RidgeRegressor. For example:

julia> using MLJ
julia> X = (x1=rand(100), x2=rand(100));   # input must be a Tables.jl compatible table
julia> y = rand(100)
julia> @load  OLSRegressor         # load code from external packages

julia> model =  OLSRegressor()   # instantiate model
OLSRegressor(fit_intercept = true,) @ 4…70

julia> mach = machine(model, X, y)  # bind model to train/evaluation data 
Machine{OLSRegressor} @ 1…97

julia> fit!(mach, rows=1:95)    # fit on selected rows
[ Info: Training Machine{OLSRegressor} @ 1…97.
Machine{OLSRegressor} @ 1…97

julia> predict(mach, rows=96:100) # get (probabilistic) predictions on some other rows
5-element Array{Distributions.Normal{Float64},1}:
 Distributions.Normal{Float64}(μ=0.5573871503802207, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.5910371492542903, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.4871839625605999, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.6031116815100634, σ=0.2789162731813959)
 Distributions.Normal{Float64}(μ=0.5461718402936951, σ=0.2789162731813959)

julia> predict_mean(mach, rows=1:5) # get point predictions
5-element Array{Float64,1}:
 0.5573871503802207
 0.5910371492542903
 0.4871839625605999
 0.6031116815100634
 0.5461718402936951

julia> predict_mean(mach, (x1=rand(4), x2=rand(4))) get point predictions on new input data
4-element Array{Float64,1}:
 0.5483367654825207
 0.5948051723537034
 0.4847273704563324
 0.5892571004039957

from mlj.jl.

Implement MLJ interface for linear models about mlj.jl HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent