Comments (11)
Well, and I'm accepting applications for paid internships (send to me directly) or RA/postdoc positions (via Turing)...
Volunteers are welcome too, of course.
from skpro.
Replying to the pro's in the list:
-
Easier to use The current interface is actually more powerful than a vector of distribution objects and in a sense easier. For example, calling
y_pred.pdf(x)
actually calls each pdf(x) function of each of the predicted distributions and returns it as an array. If y_pred was a list of distributions objects that would become something like[q.pdf(x) for q in y_pred]
. Moreover, ifx
is a vector itself, the skpro interface supports elementwise mapping, meaning that x_i gets passed into pdf_i (see here). In the list of objects case the user would need to write something like[q.pdf(x[i]) for i, q in enumerate(y_pred)]
. -
Easier to understand That might be a valid point - a object list might not be easier to handle but it is at least easier to understand. After all, everyone knows how to deal with arrays and writing single-line-for-loops might be more fun and zen-pythonic than reading skpro's documentation to learn that there is a magic interface for that. I don't know.
-
use existing, consolidated, and well-supported interface One thing of skpro's interface is that one can actually choose to use an existing interface by simply forwarding the interface calls. That is in fact what happens in the
Parametric
model that forwards to scipy.stats here. So it is possible to just take advantage of a standard interface, but you can also choose not to use it when things get to more complicated contra list cases.
from skpro.
Regarding the first point "easier to use" - I see.
Doing this with statsmodels distributions would indeed not work out-of-the-box.
What is missing is a "product/independent" distribution type.
More precisely: an n-vector of pdf/cdf can equally be represented as a single,
On the distribution class side, there would have to be a "product" distribution type that the user would never see, e.g., one could make products as y_pred = product(y_pred_list)
- or, even better, the prediction object would return a vector-valued distribution that sub-classes the "product" type, hence is known to factorize.
statsmodels doesn't currently have this and might not support this in near future.
However, one could still make statsmodels a dependency and extend/subclass?
from skpro.
Um, ok, this attempt at embedded TeX didn't quite work out as planned.
... help?
from skpro.
Regarding point 2 "easier to understand":
would going with a distribution interface that supports products/vectors solve both issues?
Regarding point 3 "use consolidated":
In which case you could sub-class statsmodels or the custom interface?
from skpro.
I'm not sure if I understand the product type distribution. What would the corresponding Python object look like? What would y_pred.pdf(x)
do?
And yes, you could sub-class statsmodels.
from skpro.
y_pred
, i.e., the return object of predict, would be a vector-valued distribution, with a length (e.g., n
)
cdf and pdf could be called with a real argument vector x
, also of length n
. Thus, y_pred.pdf(x)
would return a vector with n
elements, the i-th entry of which would be the i-th predicted distribution's pdf evaluated at the i-th entry of x
.
from skpro.
Maybe I misunderstand something, but functionally that sounds like what we currently have? It's only a semantic difference in the sense that the y_pred
object is thought to represent a vector-valued distribution rather than a vector of distributions. Would anything change in the actual Python functionality?
from skpro.
Yes and no: yes, it would behave the same way - but no in the sense that it would be a hierarchy of classes that would be self-reliant. Currently, the distribution interface is inseparably fused with the learner interface. The alternative would be to have an isolated distribution interface (similar to, but not necessarily identical with statsmodel distributions) and use them as return objects.
from skpro.
Right, I see. That seems like a sensible evolution of skpro - a good amount of work though. I'm happily accepting pull-requests 😉
from skpro.
This is being addressed as one among multiple points by the major v2 refactor in #6, so closing the issue as a duplicate.
from skpro.
Related Issues (20)
- [ENH] feature importance interface and tag for regression models
- [ENH] general Johnson QPD regression wrapper to reduce proba to quantile predictions
- [ENH] proba regression: reduction to multiclass classification HOT 7
- [BUG] test_methods_p not handling getattr after shifting
- [ENH] Multiple link function support for GLMs HOT 3
- [BUG] `sklearn.utils._param_validation.InvalidParameterError` thrown when using max_iter parameter in various sklearn regressors HOT 1
- [BUG] `Singular Matrix` error when testing param set with Cross Validation estimators
- [ENH] Design of a Bayesian model interface for sktime and skpro
- [ENH] outlier detection based on probabilitsic regressors
- [MNT] `numpy 2` compatibility HOT 3
- [ENH] merge `test_probabilistic_metrics` into `TestAllDistrMetrics` HOT 1
- [ENH] Improve efficiency of `Histogram Distribution` HOT 1
- [MNT] addres incompatibility of `lifelines` with `scipy 1.14`
- [BUG] `Histogram` distribution fails to construct on `numpy 2` HOT 2
- [BUG] `plot` method of the `Binomial` distribution shows incorrect pmf.
- Intervals/quantiles can be negative for models that can only make non-negative predictions HOT 3
- [ENH] interface `TweedieRegressor` from `sklearn` as `skpro` regressor HOT 6
- [ENH] interface GLM models from `glum`
- [ENH] Tweedie distribution, incl mathematics and design HOT 2
- [ENH] dummy supervised regressor HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from skpro.