Comments (9)
Deterministic vs Probabilistic predictions for ensemble models
For each atomic model type Atom
, we have an ensemble model type Ensemble{Atom}
. (I'm omitting the fitresult type parameter). Here's a proposal for ensembles:
If Atom
is Deterministic
, then so is Ensemble{Atom}
. Predicting a distribution does not make any sense here, as far as I can see. The variability of the individual atomic predictions is an artifice of the algorithm used to generate the ensemble (e.g. bagging) and does not directly reflect of the uncertainty of the ensemble model's final prediction. (Edit: That said, the random forest classifier in scitkit-learn and elsewhere predicts a probability and not the ensemble mode.)
If Atom
is Probabilistic
there are two cases:
(i) Atom
has nominal target. In this case it makes sense to average the discrete probability distributions (i.e. average the underlying measures) and so Ensemble{Atom}
becomes Probabilistic
.
(ii) Atom
has numeric target. In this case averaging the measures almost always delivers a distribution of different form from that in the atoms (even in the normal case - averaging the measures is not the same thing as averaging the associated random variables, and the latter doesn't make sense to me here). So I guess we make Ensemble{Atom}
is Deterministic
in this case. We could take the mean of the means to get a point-estimate, or randomly sample each atom's predicted probability distribution (exactly once at the time of fitting the ensemble) and take the mean of these samples?
What do others think?
from mlj.jl.
Further to (ii): I suppose in the normal case, not expecting large variations, we could just approximate the averaged pdf by a normal one. The mean for this approximation would be the mean of the means. What is the most natural way to combine the standard deviations?
from mlj.jl.
"If Atom is Deterministic, then so is Ensemble{Atom}."
"Predicting a distribution does not make any sense here, as far as I can see. "
Hm, but it could make sense to attach a transformer to the deterministic ensemble of predictions to make into a distribution (which need not be identical)? I.e., instead of using the re-sampled distribution as a probabilistic prediction, one could use it as a (distributional) feature in fitting one?
Though this is a bit hypothetical and non-standard.
But 150% agreed that the re-sampled distribution is not a good probabilistic supervised prediction in general. A lot of people make that mistake - I applaud you for being aware of the issue.
from mlj.jl.
The natural way to to bagging on a probabilistic estimator is to average pdf/pdf, see section 6.3 of https://arxiv.org/abs/1801.00753
(again, a round of applause for @ablaom)
In case (ii), I would still average - in skpro, we've used a mixture distribution type for this.
Of course you may want to fuse this with a transformer-adaptor which makes it simple parameteric, e.g., normal, with the mixture distribution's mean and variance. I'd consider this a composite strategy though, not the "natural" bagging ensembler.
from mlj.jl.
PS: the expectation/mean of samples of atoms' pdf is the same as the mean of the atoms' pdf's mean
from mlj.jl.
Actually, come to think of it,
"Hm, but it could make sense to attach a transformer to the deterministic ensemble of predictions to make into a distribution (which need not be identical)? I.e., instead of using the re-sampled distribution as a probabilistic prediction, one could use it as a (distributional) feature in fitting one?"
is not non-standard:
probability calibration is an instance of this!
https://scikit-learn.org/stable/modules/calibration.html
The natural type of probability calibration is a distribution->distribution target transformer.
from mlj.jl.
I see that Distributions has mixture models, yay. So I'll look into that for ensembles of numeric probabilistic models.
from mlj.jl.
Well, that should make it pretty easy then?
from mlj.jl.
Yes, now done. Here is the current doc string for the basic ensembling now implemented:
EnsembleModel(atom=nothing, weights=Float64[], bagging_fraction=0.8, rng_seed=0, n=100, parallel=true)
Create a model for training an ensemble of n
learners, with optional
bagging, each with associated model atom
. Useful if
fit!(machine(atom, data...))
does not create identical models on
repeated calls (ie, is a stochastic model, such as a decision tree
with randomized node selection criteria), or if bagging_fraction
is
set to a value not equal to 1.0 (or both). The constructor fails if no
atom
is specified.
Predictions are weighted according to the vector weights
(to allow
for external optimization) except in the case that atom
is a
Deterministic
classifier. Uniform weights are used if weight
has zero length.
The ensemble model is Deterministic
or Probabilistic
, according to
the corresponding supertype of atom
. In the case of classifiers, the
predictions are majority votes, and for regressors they are ordinary
averages. Probabilistic predictions are obtained by averaging the
atomic probability distribution functions; in particular, for
regressors, the ensemble prediction on each input pattern has the type
MixtureModel{VF,VS,D}
from the Distributions.jl package, where D
is the type of predicted distribution for atom
.
from mlj.jl.
Related Issues (20)
- Update docs for new class imbalance support
- Add new sk-learn models to the docs
- Export the name `MLJFlow` HOT 1
- `evaluate` errors HOT 3
- Add AutoEncoderMLJ model (part of BetaML) HOT 10
- need a tutorial for using logger with dagshub and mlflow HOT 4
- Document how to add plot recipes in a new model implementation HOT 4
- Add new model descriptors to fix doc-generation fail HOT 1
- Two models fail integration tests but defy isolation
- Update list of BetaML models HOT 1
- Reinstate CatBoost integraton test
- Upate ROADMAP.md HOT 1
- Improve documentation by additional hierarchy HOT 5
- Include support for MixedModels.jl HOT 2
- Deserialisation fails for wrappers like `TunedModel` when atomic model overloads `save/restore` HOT 2
- feature_importances for Pipeline including XGBoost don't work HOT 2
- Current performance evaluation objects, recently added to TunedModel histories, are too big HOT 2
- Update cheat sheet instance of depracated `@from_network` code
- Requesting better exposure to MLJFlux in the model browser HOT 2
- Reexport `CompactPerformanceEvaluation` and `InSample`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mlj.jl.