Comments (4)
Very interesting. For anyone looking for a mathematical reference, the annals article is available on the arxiv: https://arxiv.org/abs/0912.4554.
I am intrigued since, please confirm if I understad this correctly:
- Lambert transformed distributions are actually dependent objects - i.e., Lambert tf of distr D, which makes thes compositional
- the practical intention of introducing them is making distributions more normal for modelling, which is a common assumption in machine learning
If I understand correctly, there are also multiple related "objects":
- the transformation itself, operating on empirical samples (matrices)
- the distribution depending on another distribution that gets transformed
- a regressor that applies the sample transformation on data, and inverts the transformation on a distribution that is predictd
The last one especially is related to the "transformed distribution" proposed in #30.
from skpro.
I'd be happy to open a PR to implement a first version of Lambert W x Gaussian distributions, but would like some guidance/pointers on best practices for skpro.
Thanks, that would be nice!
skpro
generally follows sklearn
extension patterns. The distribution extension contract is not that well-documented at the moment, it is maturing - you could however look at the classes in distributions
, all methods have proper docstrings. Perhaps the Normal
is the best template for now.
The one thing to note, perhaps, is that distributions are of matrix/table shape, i.e., a matrix/table with distributions (possibly dependnent but usually independent) as entries. This is because in tabular probabilistic regression, this object is the output.
Questions:
- would it not be nicer to have Lambert W x any distribution? Or, are the transforms of Gaussians more explicit than the arbitrary case?
- this would be representable in the interface, it would be a distribution that takes another distribution as argument.
from skpro.
@fkiraly yes to all your points in first reply.
re 2nd: yes, implementing Lambert W x Gaussian shouldn't be much different from just implementing a Lambert W x F abstract class and then inheriting/setting base_distribution=Gaussian . This is what I ended up doing for torchlambertw
as well as the xgboostlss
implementations.
I need to get more familiar with skpro
first to see how this would actually work in this framework. Will take a look at this and see if I run into any issues trying to implement the generic LambertWDistribution class first, with LambertWGaussian, LambertWExponential, etc as special cases.
from skpro.
shouldn't be much different from just implementing a Lambert W x F abstract class and then inheriting/setting base_distribution=Gaussian
I see!
I need to get more familiar with
skpro
first to see how this would actually work in this framework.
I would recommend to look at distributions.normal
for an example. We have not gotten round to write an extension template, but I hope the stucture is self-explanatory.
Will take a look at this and see if I run into any issues trying to implement the generic LambertWDistribution class first, with LambertWGaussian, LambertWExponential
The way I imagined it would be sth around the lines:
any_inner_dist = InnerDist(a=a_arr, b=b_arr)
lambert_trafo_dist = LambertW(any_inner_dist, gamma=0.5)
That is, any distribution can be taken as an argument of LambertW
- what is passed is the actual distribution, not a string.
In the example, InnerDist
could be Gaussian
or Laplace
or anything else, and it provides th methods that all distributions have. Do you think it can be implemented in this high degree of generality, or do we need to make case distinctions for inner distributions, e.g., due to limitations in our knowledge of the explicit form of distribution generating functions?
from skpro.
Related Issues (20)
- [ENH] feature importance interface and tag for regression models
- [ENH] general Johnson QPD regression wrapper to reduce proba to quantile predictions
- [ENH] proba regression: reduction to multiclass classification HOT 7
- [BUG] test_methods_p not handling getattr after shifting
- [ENH] Multiple link function support for GLMs HOT 3
- [BUG] `sklearn.utils._param_validation.InvalidParameterError` thrown when using max_iter parameter in various sklearn regressors HOT 1
- [BUG] `Singular Matrix` error when testing param set with Cross Validation estimators
- [ENH] Design of a Bayesian model interface for sktime and skpro
- [ENH] outlier detection based on probabilitsic regressors
- [MNT] `numpy 2` compatibility HOT 3
- [ENH] merge `test_probabilistic_metrics` into `TestAllDistrMetrics` HOT 1
- [ENH] Improve efficiency of `Histogram Distribution` HOT 1
- [MNT] addres incompatibility of `lifelines` with `scipy 1.14`
- [BUG] `Histogram` distribution fails to construct on `numpy 2` HOT 2
- [BUG] `plot` method of the `Binomial` distribution shows incorrect pmf.
- Intervals/quantiles can be negative for models that can only make non-negative predictions HOT 3
- [ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor HOT 6
- [ENH] interface GLM models from `glum`
- [ENH] Tweedie distribution, incl mathematics and design HOT 2
- [ENH] dummy supervised regressor HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skpro.