Giter Site home page Giter Site logo

Comments (6)

tlienart avatar tlienart commented on May 14, 2024 1

Hello @xiaodaigh , there's an ongoing PR to interface with GLM models which should be merged next week I would think.

from mlj.jl.

ablaom avatar ablaom commented on May 14, 2024

In response to an offer of help from @tlienart. Some details:

How about you put your implementation of the MLJ "model interface" for
GLM.jl in a module that lives in 'src/builtins/GLM.jl' (where we
currently have the toy "KNN.jl") although your code will probably more
resemble the MulitvariateStats.jl stub where I put the RidgeRegressor
model. (I think we will move away from lazily loaded interface
implementations; if it does not stay in builtins, your code might become a
separate package or, we might try to get GLM.jl to include your
interface in their code.)

I expect you will generally be generally be predicting probabilities
rather than actual target values (this will probably be done in the
RidgeRegressor as well, but isn't at present). There has been some very
recent discussion about exaclty what predict should return in these
cases; see

issue 34

and

issue 33

We will go with @fkiraly recommendations, which are not reflected in the adding_new_models.md document just yet; In particular:

  • if an algorithm predicts probablities, there is no need to implement
    a second predict method that predicts values (i.e., means or by applying
    threshold, etc). So only one predict method per model. (We will
    dump predict_proba)

  • the predict method will predict a vector of distribution-objects,
    one for each input pattern. (To get the probability of a specific
    outcome for the target one will need to call the object on the
    outcome of interest, as Franz explains in the first thread
    above. However, your interface isn't concerned with this.) I admit I
    haven't thought too much about the details of this yet but hopefully
    we can just use Distributions.jl for this purpose. I will be turning
    to this question first thing when I return from holiday in the new year.

Do keep in mind that in the case of nominal target data, the target
y will arrive to your model as a CategoricalArray which includes
levels in its pool that may or may not be actually be realized in
the data, but which need be incorporated in the distribution object
(with zero probability if they do not occur); see also the
adding_new_models.md doc.

Note that you will need a separate model for each kind of target data
/ response type because each model SomeModel can only have one value for
metadata(SomeModel)[:outputs_are]. (To the possible values "nominal", "ordinal", "multiclass"
and "multivariate" we will now add "probabilistic", meaning
probablities are to be predicted). So you might have these models:

GLMProbabilisticRegressor
GLMProbabilisticClassifier
GLMProbabilisticMulticlassClassifier

and limit the allowed options for the "family" and "link" options
accordingly. Perhaps not worry about models for multivariate targets
just now.

No need for R style "formula". Your model already gets separate input
X and target y and you fit to all input features (columns) of
X. Feature selection will be external to the model interface.

from mlj.jl.

ablaom avatar ablaom commented on May 14, 2024

Oops. Closed by accident. :-)

from mlj.jl.

tlienart avatar tlienart commented on May 14, 2024

Ok, I'll start this and probably open a NO-MERGE PR for guidance while I get familiar with the interface and more comfortable with the goal

from mlj.jl.

xiaodaigh avatar xiaodaigh commented on May 14, 2024

Is there an example on how to use linear models using MLJ.jl? Can anyone please show me a simple example of fitting y = ax+b where a and b are coefficients? E.g. in GLM it would be

using GLM
x = rand(100)
y = rand(100)
data = DataFrame(x=x, y=y)
lm(@formulat(y~x), data)

from mlj.jl.

ablaom avatar ablaom commented on May 14, 2024

For now you can use OLS (ordinary least squares regressor) or RidgeRegressor. For example:

julia> using MLJ
julia> X = (x1=rand(100), x2=rand(100));   # input must be a Tables.jl compatible table
julia> y = rand(100)
julia> @load  OLSRegressor         # load code from external packages

julia> model =  OLSRegressor()   # instantiate model
OLSRegressor(fit_intercept = true,) @ 470

julia> mach = machine(model, X, y)  # bind model to train/evaluation data 
Machine{OLSRegressor} @ 197

julia> fit!(mach, rows=1:95)    # fit on selected rows
[ Info: Training Machine{OLSRegressor} @ 197.
Machine{OLSRegressor} @ 197

julia> predict(mach, rows=96:100) # get (probabilistic) predictions on some other rows
5-element Array{Distributions.Normal{Float64},1}:
 Distributions.Normal{Float64}=0.5573871503802207, σ=0.2789162731813959)
 Distributions.Normal{Float64}=0.5910371492542903, σ=0.2789162731813959)
 Distributions.Normal{Float64}=0.4871839625605999, σ=0.2789162731813959)
 Distributions.Normal{Float64}=0.6031116815100634, σ=0.2789162731813959)
 Distributions.Normal{Float64}=0.5461718402936951, σ=0.2789162731813959)

julia> predict_mean(mach, rows=1:5) # get point predictions
5-element Array{Float64,1}:
 0.5573871503802207
 0.5910371492542903
 0.4871839625605999
 0.6031116815100634
 0.5461718402936951

julia> predict_mean(mach, (x1=rand(4), x2=rand(4))) get point predictions on new input data
4-element Array{Float64,1}:
 0.5483367654825207
 0.5948051723537034
 0.4847273704563324
 0.5892571004039957

from mlj.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.