It would be desirable to be able to specify non-uniform distributions on uncertainty p

Non-uniform distributions for uncertainties about emaworkbench HOT 5 CLOSED

quaquel commented on August 22, 2024

Non-uniform distributions for uncertainties

from emaworkbench.

Comments (5)

quaquel commented on August 22, 2024

I agree that this is a useful addition. Others have been asking for it as well.

A few thoughts:

Rhodium has something like this already. It might be worth checking how they do it and see if we can use the same or similar API.
Alternatively, we might have a separate 'constructor' for non uniform parameters. This would keep the API the same for the most common use case, while offering the required flexibility. For example, we might do something like

import scipy as sp

dist = sp.stats.beta(2.31, 0.627)

uncertainties = [RealParameter.from_dist("unc1", dist),
                 RealParameter("unc2", 0, 5.5]]

Regarding infinite bounds, I doubt it will be much of a problem for most analysis stuff. All the plotting modules focus on outcomes. Only CART and PRIM might be used to visualise the sampled uncertainty space. I am thus inclined to go with your option 1: document it carefully and let the user deal with it. This is in line with Python's general philosophy that we are adults here and thus there is no need to bulletproof everything.

from emaworkbench.

jpn-- commented on August 22, 2024

I checked out Rhodium's non-uniform capabilities. It looks like beyond uniform they enable normal and lognormal by hard-coding specific derived classes for these distributions. Keeping the workbench API close to Rhodium is a good plan when possible, but I think we are in agreement this isn't the correct path for the workbench.

I also think it's preferable to keep things simple and maintain only one way to make random variable draws from a distribution in the back end. This means if we want to accept an arbitrary scipy.stats rv_frozen as the distribution, we should just create a uniform one on parameter construction if none is provided. I played with the alternate constructor idea, but I found it easier to have one constructor that accepts a dist argument, or lower and upper bounds (as before) but not both -- explicitly raising an exception if both are provided, as I've done in #48. The current implementation has lower and upper bounds as mandatory positional arguments, so making them optional but validating the arguments as I have done should break nothing in any existing code.

My preliminary use testing indicates no signification speed impact in attaching a rv_frozen to every parameter instead of current practice -- turns out current code technically creates a new uniform rv_frozen for every parameter anyhow, so this should generally be just as efficient (the complexity of the non-uniform distribution itself notwithstanding, as that complexity is baked in to using it in the first place).

from emaworkbench.

quaquel commented on August 22, 2024

Thanks for checking Rhodium. Clearly, that is not a route we want to take.

I have been doing some reading on the most pythonic way to have alternative constructors. Basically, we hare here 2 alternative constructors:

name, lower_bound, upper bound --> uniform distribution
name, initialised scipy dist --> whatever non uniform distribution you want

The consensus seems to be to use class methods, or so called factory functions. So you would need 2 distinct factory functions (the default one, needs a better name), and the from_dist approach. Next, the first factory function would create the uniform dist, followed by a call to the actual init, while 2 could do some checking (e.g. is the dist integers only for an IntegerParameter / CategoricalParameter), followed by a call to the underlying init.

I do see your point on simplicity and simply check if either lower_bound and upper_bound, or dist is provided. For now that might be sufficient, although it is possibly a bit more difficult to document. Moreover, it breaks the python idea that there should always be 1 obvious way to do something. Having separate factor methods for creating parameters from a dist, or with a uniform dist creates this clarity.

In short, I haven't made up my mind yet regarding the preferred API for this. The other changes are largely fine in your PR. Although, what is your idea behind the UniformLHS sampler?

from emaworkbench.

jpn-- commented on August 22, 2024

The thing about the factory methods is that the "default" constructor is usually the one where the arguments are closest to the attributes that are being set and require the least processing. E.g., a pandas DataFrame can be constructed most directly from data/index/columns, and has factory methods for compiling a dictionary or other things. If I were building this class from scratch, it might be simpler to have a direct constructor that takes a rv_frozen dist argument, and an alternate form that takes (lower, upper) bounds, either as a 2-tuple in the dist argument or as a classmethod factory. But prevent the change from breaking current code, the default constructor needs to be able
to still accept lower and upper bounds as arguments.

If you still think having two factory classmethods will be valuable, I can write them. Alternatively, I'll just make changes to the class documentation to explain the arguments, which I don't think is too hard -- see https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.interp.html for an example.

The thinking on the UniformLHS is to provide a simple way to ignore the selected distributional shapes, if that is desired for some kinds of analysis, particularly if the distributions are skewed. E.g. you may want to have distributions for a robust optimization task that should take account of the expected distribution, but want to sample more uniformly across the uncertainty space when searching for worst-case scenarios, as there will be better experimental coverage in the tails where a worst case scenario is more likely to appear.

from emaworkbench.

quaquel commented on August 22, 2024

Another relevant consideration is API design. In my view, the most common use case of the workbench is exploratory modelling using uniform distributions. Given that this is the most common use case, it makes sense to support this with the default init.

Documenting attributes of a class is done separately (see also the numpy doc standard).

I have to think about this some more. Leave the PR for now. I have a conference next week where I hope to have time to look at it again more closely. I might be able to find a compromise between your preferred approach, and my current thinking about using factory methods. It might for example be possible to use a meta class to resolve the issue.

from emaworkbench.

Non-uniform distributions for uncertainties about emaworkbench HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent