Comments (6)
Hello @xiaodaigh , there's an ongoing PR to interface with GLM models which should be merged next week I would think.
from mlj.jl.
In response to an offer of help from @tlienart. Some details:
How about you put your implementation of the MLJ "model interface" for
GLM.jl in a module that lives in 'src/builtins/GLM.jl' (where we
currently have the toy "KNN.jl") although your code will probably more
resemble the MulitvariateStats.jl stub where I put the RidgeRegressor
model. (I think we will move away from lazily loaded interface
implementations; if it does not stay in builtins, your code might become a
separate package or, we might try to get GLM.jl to include your
interface in their code.)
I expect you will generally be generally be predicting probabilities
rather than actual target values (this will probably be done in the
RidgeRegressor as well, but isn't at present). There has been some very
recent discussion about exaclty what predict
should return in these
cases; see
and
We will go with @fkiraly recommendations, which are not reflected in the adding_new_models.md document just yet; In particular:
-
if an algorithm predicts probablities, there is no need to implement
a second predict method that predicts values (i.e., means or by applying
threshold, etc). So only onepredict
method per model. (We will
dumppredict_proba
) -
the predict method will predict a vector of distribution-objects,
one for each input pattern. (To get the probability of a specific
outcome for the target one will need to call the object on the
outcome of interest, as Franz explains in the first thread
above. However, your interface isn't concerned with this.) I admit I
haven't thought too much about the details of this yet but hopefully
we can just use Distributions.jl for this purpose. I will be turning
to this question first thing when I return from holiday in the new year.
Do keep in mind that in the case of nominal target data, the target
y
will arrive to your model as a CategoricalArray
which includes
levels in its pool that may or may not be actually be realized in
the data, but which need be incorporated in the distribution object
(with zero probability if they do not occur); see also the
adding_new_models.md
doc.
Note that you will need a separate model for each kind of target data
/ response type because each model SomeModel
can only have one value for
metadata(SomeModel)[:outputs_are]
. (To the possible values "nominal", "ordinal", "multiclass"
and "multivariate" we will now add "probabilistic", meaning
probablities are to be predicted). So you might have these models:
GLMProbabilisticRegressor
GLMProbabilisticClassifier
GLMProbabilisticMulticlassClassifier
and limit the allowed options for the "family" and "link" options
accordingly. Perhaps not worry about models for multivariate targets
just now.
No need for R style "formula". Your model already gets separate input
X
and target y
and you fit to all input features (columns) of
X
. Feature selection will be external to the model interface.
from mlj.jl.
Oops. Closed by accident. :-)
from mlj.jl.
Ok, I'll start this and probably open a NO-MERGE PR for guidance while I get familiar with the interface and more comfortable with the goal
from mlj.jl.
Is there an example on how to use linear models using MLJ.jl? Can anyone please show me a simple example of fitting y = ax+b
where a
and b
are coefficients? E.g. in GLM it would be
using GLM
x = rand(100)
y = rand(100)
data = DataFrame(x=x, y=y)
lm(@formulat(y~x), data)
from mlj.jl.
For now you can use OLS (ordinary least squares regressor) or RidgeRegressor. For example:
julia> using MLJ
julia> X = (x1=rand(100), x2=rand(100)); # input must be a Tables.jl compatible table
julia> y = rand(100)
julia> @load OLSRegressor # load code from external packages
julia> model = OLSRegressor() # instantiate model
OLSRegressor(fit_intercept = true,) @ 4…70
julia> mach = machine(model, X, y) # bind model to train/evaluation data
Machine{OLSRegressor} @ 1…97
julia> fit!(mach, rows=1:95) # fit on selected rows
[ Info: Training Machine{OLSRegressor} @ 1…97.
Machine{OLSRegressor} @ 1…97
julia> predict(mach, rows=96:100) # get (probabilistic) predictions on some other rows
5-element Array{Distributions.Normal{Float64},1}:
Distributions.Normal{Float64}(μ=0.5573871503802207, σ=0.2789162731813959)
Distributions.Normal{Float64}(μ=0.5910371492542903, σ=0.2789162731813959)
Distributions.Normal{Float64}(μ=0.4871839625605999, σ=0.2789162731813959)
Distributions.Normal{Float64}(μ=0.6031116815100634, σ=0.2789162731813959)
Distributions.Normal{Float64}(μ=0.5461718402936951, σ=0.2789162731813959)
julia> predict_mean(mach, rows=1:5) # get point predictions
5-element Array{Float64,1}:
0.5573871503802207
0.5910371492542903
0.4871839625605999
0.6031116815100634
0.5461718402936951
julia> predict_mean(mach, (x1=rand(4), x2=rand(4))) get point predictions on new input data
4-element Array{Float64,1}:
0.5483367654825207
0.5948051723537034
0.4847273704563324
0.5892571004039957
from mlj.jl.
Related Issues (20)
- Confusing Julia code in adding_models_for_general_use.md HOT 1
- Include MLJBalancing.jl in MLJ and re-export it's names.
- Update docs for new class imbalance support
- Add new sk-learn models to the docs
- Export the name `MLJFlow` HOT 1
- `evaluate` errors HOT 3
- Add AutoEncoderMLJ model (part of BetaML) HOT 10
- need a tutorial for using logger with dagshub and mlflow HOT 4
- Document how to add plot recipes in a new model implementation HOT 4
- Add new model descriptors to fix doc-generation fail HOT 1
- Two models fail integration tests but defy isolation
- Update list of BetaML models HOT 1
- Reinstate CatBoost integraton test
- Upate ROADMAP.md HOT 1
- Improve documentation by additional hierarchy HOT 5
- Include support for MixedModels.jl HOT 2
- Deserialisation fails for wrappers like `TunedModel` when atomic model overloads `save/restore` HOT 2
- feature_importances for Pipeline including XGBoost don't work HOT 2
- Current performance evaluation objects, recently added to TunedModel histories, are too big HOT 2
- Update cheat sheet instance of depracated `@from_network` code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlj.jl.