Giter Site home page Giter Site logo

Distributions as return objects about skpro HOT 11 CLOSED

fkiraly avatar fkiraly commented on July 28, 2024
Distributions as return objects

from skpro.

Comments (11)

fkiraly avatar fkiraly commented on July 28, 2024 1

Well, and I'm accepting applications for paid internships (send to me directly) or RA/postdoc positions (via Turing)...
Volunteers are welcome too, of course.

from skpro.

frthjf avatar frthjf commented on July 28, 2024

Replying to the pro's in the list:

  • Easier to use The current interface is actually more powerful than a vector of distribution objects and in a sense easier. For example, calling y_pred.pdf(x) actually calls each pdf(x) function of each of the predicted distributions and returns it as an array. If y_pred was a list of distributions objects that would become something like [q.pdf(x) for q in y_pred]. Moreover, if x is a vector itself, the skpro interface supports elementwise mapping, meaning that x_i gets passed into pdf_i (see here). In the list of objects case the user would need to write something like [q.pdf(x[i]) for i, q in enumerate(y_pred)].

  • Easier to understand That might be a valid point - a object list might not be easier to handle but it is at least easier to understand. After all, everyone knows how to deal with arrays and writing single-line-for-loops might be more fun and zen-pythonic than reading skpro's documentation to learn that there is a magic interface for that. I don't know.

  • use existing, consolidated, and well-supported interface One thing of skpro's interface is that one can actually choose to use an existing interface by simply forwarding the interface calls. That is in fact what happens in the Parametric model that forwards to scipy.stats here. So it is possible to just take advantage of a standard interface, but you can also choose not to use it when things get to more complicated contra list cases.

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

Regarding the first point "easier to use" - I see.
Doing this with statsmodels distributions would indeed not work out-of-the-box.

What is missing is a "product/independent" distribution type.
More precisely: an n-vector of pdf/cdf can equally be represented as a single, $\mathbb{R}^n$-valued pdf/cdf that factorizes (which is what the skpro design implicitly does). More precisely a vector of pdfs $(p_i){i=1..n}$ with $p_i:\mathbb{R}\rightarrow [0,\infty)$ can equally be represented by the product $p: \mathbb{R}^n \rightarrow [0,\infty)^n;,; x\mapsto (p_i(x_i)){i=1..n}$, or the multivariate pdf $p: \mathbb{R}^n \rightarrow [0,\infty);,; x\mapsto \prod_{i=1}^n p_i(x_i)$. Similar for cdfs or arrays rather than vectors.

On the distribution class side, there would have to be a "product" distribution type that the user would never see, e.g., one could make products as y_pred = product(y_pred_list) - or, even better, the prediction object would return a vector-valued distribution that sub-classes the "product" type, hence is known to factorize.

statsmodels doesn't currently have this and might not support this in near future.
However, one could still make statsmodels a dependency and extend/subclass?

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

Um, ok, this attempt at embedded TeX didn't quite work out as planned.
... help?

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

Regarding point 2 "easier to understand":
would going with a distribution interface that supports products/vectors solve both issues?

Regarding point 3 "use consolidated":
In which case you could sub-class statsmodels or the custom interface?

from skpro.

frthjf avatar frthjf commented on July 28, 2024

I'm not sure if I understand the product type distribution. What would the corresponding Python object look like? What would y_pred.pdf(x) do?

And yes, you could sub-class statsmodels.

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

y_pred, i.e., the return object of predict, would be a vector-valued distribution, with a length (e.g., n)
cdf and pdf could be called with a real argument vector x, also of length n. Thus, y_pred.pdf(x) would return a vector with n elements, the i-th entry of which would be the i-th predicted distribution's pdf evaluated at the i-th entry of x.

from skpro.

frthjf avatar frthjf commented on July 28, 2024

Maybe I misunderstand something, but functionally that sounds like what we currently have? It's only a semantic difference in the sense that the y_pred object is thought to represent a vector-valued distribution rather than a vector of distributions. Would anything change in the actual Python functionality?

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

Yes and no: yes, it would behave the same way - but no in the sense that it would be a hierarchy of classes that would be self-reliant. Currently, the distribution interface is inseparably fused with the learner interface. The alternative would be to have an isolated distribution interface (similar to, but not necessarily identical with statsmodel distributions) and use them as return objects.

from skpro.

frthjf avatar frthjf commented on July 28, 2024

Right, I see. That seems like a sensible evolution of skpro - a good amount of work though. I'm happily accepting pull-requests 😉

from skpro.

fkiraly avatar fkiraly commented on July 28, 2024

This is being addressed as one among multiple points by the major v2 refactor in #6, so closing the issue as a duplicate.

from skpro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.