Port homogeneous ensembles from Koala,about alan-turing-institute/mlj.jl

Comments (9)

ablaom commented on May 14, 2024 1

Deterministic vs Probabilistic predictions for ensemble models

For each atomic model type Atom, we have an ensemble model type Ensemble{Atom}. (I'm omitting the fitresult type parameter). Here's a proposal for ensembles:

If Atom is Deterministic, then so is Ensemble{Atom}. Predicting a distribution does not make any sense here, as far as I can see. The variability of the individual atomic predictions is an artifice of the algorithm used to generate the ensemble (e.g. bagging) and does not directly reflect of the uncertainty of the ensemble model's final prediction. (Edit: That said, the random forest classifier in scitkit-learn and elsewhere predicts a probability and not the ensemble mode.)

If Atom is Probabilistic there are two cases:

(i) Atom has nominal target. In this case it makes sense to average the discrete probability distributions (i.e. average the underlying measures) and so Ensemble{Atom} becomes Probabilistic.

(ii) Atom has numeric target. In this case averaging the measures almost always delivers a distribution of different form from that in the atoms (even in the normal case - averaging the measures is not the same thing as averaging the associated random variables, and the latter doesn't make sense to me here). So I guess we make Ensemble{Atom} is Deterministic in this case. We could take the mean of the means to get a point-estimate, or randomly sample each atom's predicted probability distribution (exactly once at the time of fitting the ensemble) and take the mean of these samples?

What do others think?

from mlj.jl.

ablaom commented on May 14, 2024

Further to (ii): I suppose in the normal case, not expecting large variations, we could just approximate the averaged pdf by a normal one. The mean for this approximation would be the mean of the means. What is the most natural way to combine the standard deviations?

from mlj.jl.

fkiraly commented on May 14, 2024

"If Atom is Deterministic, then so is Ensemble{Atom}."
"Predicting a distribution does not make any sense here, as far as I can see. "

Hm, but it could make sense to attach a transformer to the deterministic ensemble of predictions to make into a distribution (which need not be identical)? I.e., instead of using the re-sampled distribution as a probabilistic prediction, one could use it as a (distributional) feature in fitting one?
Though this is a bit hypothetical and non-standard.

But 150% agreed that the re-sampled distribution is not a good probabilistic supervised prediction in general. A lot of people make that mistake - I applaud you for being aware of the issue.

from mlj.jl.

fkiraly commented on May 14, 2024

The natural way to to bagging on a probabilistic estimator is to average pdf/pdf, see section 6.3 of https://arxiv.org/abs/1801.00753
(again, a round of applause for @ablaom)

In case (ii), I would still average - in skpro, we've used a mixture distribution type for this.

Of course you may want to fuse this with a transformer-adaptor which makes it simple parameteric, e.g., normal, with the mixture distribution's mean and variance. I'd consider this a composite strategy though, not the "natural" bagging ensembler.

from mlj.jl.

fkiraly commented on May 14, 2024

PS: the expectation/mean of samples of atoms' pdf is the same as the mean of the atoms' pdf's mean

from mlj.jl.

fkiraly commented on May 14, 2024

Actually, come to think of it,

"Hm, but it could make sense to attach a transformer to the deterministic ensemble of predictions to make into a distribution (which need not be identical)? I.e., instead of using the re-sampled distribution as a probabilistic prediction, one could use it as a (distributional) feature in fitting one?"

is not non-standard:
probability calibration is an instance of this!
https://scikit-learn.org/stable/modules/calibration.html
The natural type of probability calibration is a distribution->distribution target transformer.

from mlj.jl.

ablaom commented on May 14, 2024

I see that Distributions has mixture models, yay. So I'll look into that for ensembles of numeric probabilistic models.

from mlj.jl.

fkiraly commented on May 14, 2024

Well, that should make it pretty easy then?

from mlj.jl.

ablaom commented on May 14, 2024

Yes, now done. Here is the current doc string for the basic ensembling now implemented:

EnsembleModel(atom=nothing, weights=Float64[], bagging_fraction=0.8, rng_seed=0, n=100, parallel=true)

Create a model for training an ensemble of n learners, with optional
bagging, each with associated model atom. Useful if
fit!(machine(atom, data...)) does not create identical models on
repeated calls (ie, is a stochastic model, such as a decision tree
with randomized node selection criteria), or if bagging_fraction is
set to a value not equal to 1.0 (or both). The constructor fails if no
atom is specified.

Predictions are weighted according to the vector weights (to allow
for external optimization) except in the case that atom is a
Deterministic classifier. Uniform weights are used if weight has zero length.

The ensemble model is Deterministic or Probabilistic, according to
the corresponding supertype of atom. In the case of classifiers, the
predictions are majority votes, and for regressors they are ordinary
averages. Probabilistic predictions are obtained by averaging the
atomic probability distribution functions; in particular, for
regressors, the ensemble prediction on each input pattern has the type
MixtureModel{VF,VS,D} from the Distributions.jl package, where D
is the type of predicted distribution for atom.

from mlj.jl.

Port homogeneous ensembles from Koala about mlj.jl HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent