Giter Site home page Giter Site logo

Comments (5)

quaquel avatar quaquel commented on August 22, 2024

I agree that this is a useful addition. Others have been asking for it as well.

A few thoughts:

  1. Rhodium has something like this already. It might be worth checking how they do it and see if we can use the same or similar API.
  2. Alternatively, we might have a separate 'constructor' for non uniform parameters. This would keep the API the same for the most common use case, while offering the required flexibility. For example, we might do something like
import scipy as sp

dist = sp.stats.beta(2.31, 0.627)

uncertainties = [RealParameter.from_dist("unc1", dist),
                 RealParameter("unc2", 0, 5.5]]

Regarding infinite bounds, I doubt it will be much of a problem for most analysis stuff. All the plotting modules focus on outcomes. Only CART and PRIM might be used to visualise the sampled uncertainty space. I am thus inclined to go with your option 1: document it carefully and let the user deal with it. This is in line with Python's general philosophy that we are adults here and thus there is no need to bulletproof everything.

from emaworkbench.

jpn-- avatar jpn-- commented on August 22, 2024

I checked out Rhodium's non-uniform capabilities. It looks like beyond uniform they enable normal and lognormal by hard-coding specific derived classes for these distributions. Keeping the workbench API close to Rhodium is a good plan when possible, but I think we are in agreement this isn't the correct path for the workbench.

I also think it's preferable to keep things simple and maintain only one way to make random variable draws from a distribution in the back end. This means if we want to accept an arbitrary scipy.stats rv_frozen as the distribution, we should just create a uniform one on parameter construction if none is provided. I played with the alternate constructor idea, but I found it easier to have one constructor that accepts a dist argument, or lower and upper bounds (as before) but not both -- explicitly raising an exception if both are provided, as I've done in #48. The current implementation has lower and upper bounds as mandatory positional arguments, so making them optional but validating the arguments as I have done should break nothing in any existing code.

My preliminary use testing indicates no signification speed impact in attaching a rv_frozen to every parameter instead of current practice -- turns out current code technically creates a new uniform rv_frozen for every parameter anyhow, so this should generally be just as efficient (the complexity of the non-uniform distribution itself notwithstanding, as that complexity is baked in to using it in the first place).

from emaworkbench.

quaquel avatar quaquel commented on August 22, 2024

Thanks for checking Rhodium. Clearly, that is not a route we want to take.

I have been doing some reading on the most pythonic way to have alternative constructors. Basically, we hare here 2 alternative constructors:

  1. name, lower_bound, upper bound --> uniform distribution
  2. name, initialised scipy dist --> whatever non uniform distribution you want

The consensus seems to be to use class methods, or so called factory functions. So you would need 2 distinct factory functions (the default one, needs a better name), and the from_dist approach. Next, the first factory function would create the uniform dist, followed by a call to the actual init, while 2 could do some checking (e.g. is the dist integers only for an IntegerParameter / CategoricalParameter), followed by a call to the underlying init.

I do see your point on simplicity and simply check if either lower_bound and upper_bound, or dist is provided. For now that might be sufficient, although it is possibly a bit more difficult to document. Moreover, it breaks the python idea that there should always be 1 obvious way to do something. Having separate factor methods for creating parameters from a dist, or with a uniform dist creates this clarity.

In short, I haven't made up my mind yet regarding the preferred API for this. The other changes are largely fine in your PR. Although, what is your idea behind the UniformLHS sampler?

from emaworkbench.

jpn-- avatar jpn-- commented on August 22, 2024

The thing about the factory methods is that the "default" constructor is usually the one where the arguments are closest to the attributes that are being set and require the least processing. E.g., a pandas DataFrame can be constructed most directly from data/index/columns, and has factory methods for compiling a dictionary or other things. If I were building this class from scratch, it might be simpler to have a direct constructor that takes a rv_frozen dist argument, and an alternate form that takes (lower, upper) bounds, either as a 2-tuple in the dist argument or as a classmethod factory. But prevent the change from breaking current code, the default constructor needs to be able
to still accept lower and upper bounds as arguments.

If you still think having two factory classmethods will be valuable, I can write them. Alternatively, I'll just make changes to the class documentation to explain the arguments, which I don't think is too hard -- see https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.interp.html for an example.

The thinking on the UniformLHS is to provide a simple way to ignore the selected distributional shapes, if that is desired for some kinds of analysis, particularly if the distributions are skewed. E.g. you may want to have distributions for a robust optimization task that should take account of the expected distribution, but want to sample more uniformly across the uncertainty space when searching for worst-case scenarios, as there will be better experimental coverage in the tails where a worst case scenario is more likely to appear.

from emaworkbench.

quaquel avatar quaquel commented on August 22, 2024

Another relevant consideration is API design. In my view, the most common use case of the workbench is exploratory modelling using uniform distributions. Given that this is the most common use case, it makes sense to support this with the default init.

Documenting attributes of a class is done separately (see also the numpy doc standard).

I have to think about this some more. Leave the PR for now. I have a conference next week where I hope to have time to look at it again more closely. I might be able to find a compromise between your preferred approach, and my current thinking about using factory methods. It might for example be possible to use a meta class to resolve the issue.

from emaworkbench.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.