Re-opening the sub-issue opened in <a class="issue-link js-issue-link" data-error-text

Replying to the pro's in the list: Easier t

Distributions as return objects about skpro HOT 11 CLOSED

fkiraly commented on July 28, 2024

Distributions as return objects

from skpro.

Comments (11)

fkiraly commented on July 28, 2024 1

Well, and I'm accepting applications for paid internships (send to me directly) or RA/postdoc positions (via Turing)...
Volunteers are welcome too, of course.

from skpro.

frthjf commented on July 28, 2024

Replying to the pro's in the list:

Easier to use The current interface is actually more powerful than a vector of distribution objects and in a sense easier. For example, calling y_pred.pdf(x) actually calls each pdf(x) function of each of the predicted distributions and returns it as an array. If y_pred was a list of distributions objects that would become something like [q.pdf(x) for q in y_pred]. Moreover, if x is a vector itself, the skpro interface supports elementwise mapping, meaning that x_i gets passed into pdf_i (see here). In the list of objects case the user would need to write something like [q.pdf(x[i]) for i, q in enumerate(y_pred)].
Easier to understand That might be a valid point - a object list might not be easier to handle but it is at least easier to understand. After all, everyone knows how to deal with arrays and writing single-line-for-loops might be more fun and zen-pythonic than reading skpro's documentation to learn that there is a magic interface for that. I don't know.
use existing, consolidated, and well-supported interface One thing of skpro's interface is that one can actually choose to use an existing interface by simply forwarding the interface calls. That is in fact what happens in the Parametric model that forwards to scipy.stats here. So it is possible to just take advantage of a standard interface, but you can also choose not to use it when things get to more complicated contra list cases.

from skpro.

fkiraly commented on July 28, 2024

Regarding the first point "easier to use" - I see.
Doing this with statsmodels distributions would indeed not work out-of-the-box.

What is missing is a "product/independent" distribution type.
More precisely: an n-vector of pdf/cdf can equally be represented as a single, $\mathbb{R}^n$-valued pdf/cdf that factorizes (which is what the skpro design implicitly does). More precisely a vector of pdfs $(p_i){i=1..n}$ with $p_i:\mathbb{R}\rightarrow [0,\infty)$ can equally be represented by the product $p: \mathbb{R}^n \rightarrow [0,\infty)^n;,; x\mapsto (p_i(x_i)){i=1..n}$, or the multivariate pdf $p: \mathbb{R}^n \rightarrow [0,\infty);,; x\mapsto \prod_{i=1}^n p_i(x_i)$. Similar for cdfs or arrays rather than vectors.

On the distribution class side, there would have to be a "product" distribution type that the user would never see, e.g., one could make products as y_pred = product(y_pred_list) - or, even better, the prediction object would return a vector-valued distribution that sub-classes the "product" type, hence is known to factorize.

statsmodels doesn't currently have this and might not support this in near future.
However, one could still make statsmodels a dependency and extend/subclass?

from skpro.

fkiraly commented on July 28, 2024

Um, ok, this attempt at embedded TeX didn't quite work out as planned.
... help?

from skpro.

fkiraly commented on July 28, 2024

Regarding point 2 "easier to understand":
would going with a distribution interface that supports products/vectors solve both issues?

Regarding point 3 "use consolidated":
In which case you could sub-class statsmodels or the custom interface?

from skpro.

frthjf commented on July 28, 2024

I'm not sure if I understand the product type distribution. What would the corresponding Python object look like? What would y_pred.pdf(x) do?

And yes, you could sub-class statsmodels.

from skpro.

fkiraly commented on July 28, 2024

y_pred, i.e., the return object of predict, would be a vector-valued distribution, with a length (e.g., n)
cdf and pdf could be called with a real argument vector x, also of length n. Thus, y_pred.pdf(x) would return a vector with n elements, the i-th entry of which would be the i-th predicted distribution's pdf evaluated at the i-th entry of x.

from skpro.

frthjf commented on July 28, 2024

Maybe I misunderstand something, but functionally that sounds like what we currently have? It's only a semantic difference in the sense that the y_pred object is thought to represent a vector-valued distribution rather than a vector of distributions. Would anything change in the actual Python functionality?

from skpro.

fkiraly commented on July 28, 2024

Yes and no: yes, it would behave the same way - but no in the sense that it would be a hierarchy of classes that would be self-reliant. Currently, the distribution interface is inseparably fused with the learner interface. The alternative would be to have an isolated distribution interface (similar to, but not necessarily identical with statsmodel distributions) and use them as return objects.

from skpro.

frthjf commented on July 28, 2024

Right, I see. That seems like a sensible evolution of skpro - a good amount of work though. I'm happily accepting pull-requests 😉

from skpro.

fkiraly commented on July 28, 2024

This is being addressed as one among multiple points by the major v2 refactor in #6, so closing the issue as a duplicate.

from skpro.

Distributions as return objects about skpro HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent