Giter Site home page Giter Site logo

emukit / emukit Goto Github PK

View Code? Open in Web Editor NEW
565.0 17.0 127.0 15.98 MB

A Python-based toolbox of various methods in decision making, uncertainty quantification and statistical emulation: multi-fidelity, experimental design, Bayesian optimisation, Bayesian quadrature, etc.

Home Page: https://emukit.github.io/emukit/

License: Apache License 2.0

Python 99.86% Stan 0.14%
machine-learning bayesian-optimization uncertainty-quantification multi-fidelity experimental-design bayesian-quadrature sensitivity-analysis decision-making emulation python

emukit's Introduction

Emukit

Build Status | Documentation Status | Tests Coverage | GitHub License

Website | Documentation | Contribution Guide

Emukit is a highly adaptable Python toolkit for enriching decision making under uncertainty. This is particularly pertinent to complex systems where data is scarce or difficult to acquire. In these scenarios, propagating well-calibrated uncertainty estimates within a design loop or computational pipeline ensures that constrained resources are used effectively.

The main features currently available in Emukit are:

  • Multi-fidelity emulation: build surrogate models when data is obtained from multiple information sources that have different fidelity and/or cost;
  • Bayesian optimisation: optimise physical experiments and tune parameters of machine learning algorithms;
  • Experimental design/Active learning: design the most informative experiments and perform active learning with machine learning models;
  • Sensitivity analysis: analyse the influence of inputs on the outputs of a given system;
  • Bayesian quadrature: efficiently compute the integrals of functions that are expensive to evaluate.

Emukit is agnostic to the underlying modelling framework, which means you can use any tool of your choice in the Python ecosystem to build the machine learning model, and still be able to use Emukit.

Installation

To install emukit, simply run

pip install emukit

For other install options, see our documentation.

Dependencies / Prerequisites

Emukit's primary dependencies are Numpy and GPy. See requirements.

Getting started

For examples see our tutorial notebooks.

Documentation

To learn more about Emukit, refer to our documentation.

To learn about emulation as a concept, check out the Emukit playground project.

Citing the library

If you are using emukit, we would appreciate if you could cite our papers about Emukit in your research:

@inproceedings{emukit2019,
  author = {Paleyes, Andrei and Pullin, Mark and Mahsereci, Maren and McCollum, Cliff and Lawrence, Neil and González, Javier},
  title = {Emulation of physical processes with {E}mukit},
  booktitle = {Second Workshop on Machine Learning and the Physical Sciences, NeurIPS},
  year = {2019}
}

@article{emukit2023,
  title={Emukit: A {P}ython toolkit for decision making under uncertainty},
  author={Andrei Paleyes and Maren Mahsereci and Neil D. Lawrence},
  journal={Proceedings of the Python in Science Conference},
  year={2023}
}

The papers themselves can be found at these links: NeurIPS workshop 2019, SciPy conference 2023.

License

Emukit is licensed under Apache 2.0. Please refer to LICENSE and NOTICE for further license information.

emukit's People

Contributors

aaronkl avatar alpiges avatar apaleyes avatar brunokm avatar charelstoncrabb avatar clairecp avatar davidjanz avatar dekuenstle avatar dependabot[bot] avatar eamanu avatar ekalosak avatar fheilz avatar henrymoss avatar hyandell avatar javiergonzalezh avatar jeannotalpin avatar jejjohnson avatar kurtmckee avatar marpulli avatar mashanaslidnyk avatar mmahsereci avatar neochaos12 avatar ntenenz avatar onponomarev avatar polivucci avatar rns294 avatar rvvincelli avatar sennendoko avatar sunnyszy avatar tpielok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

emukit's Issues

Create the documentation

Emukit should have documentation up and running on some doc hosting, e.g. RTD. Here is the scope:

  • Generate docs for API
  • Add notebooks to the docs
  • Additional docs on architecture of the library
  • Index page for all this stuff above
  • Host the documentation

Result from loops

At the minute the loop does not return anything and there is no quick way to get a solution from a loop. We also can't access the model easily. We should fix this.

Multi-output sensitvity analysis

The tests suggest that the sensitivity analysis should work with multi-output functions because the test model used has 5-dimensional outputs. However, the total variance output is a scalar and I'd expect it to have 5 entries, one for each output. Is there a bug in the code or my understanding?

Add build for python 3.7

Right now we build for 3.5 and 3.6. 3.7 is around for some time now, we should add build and make sure we support this version too.

Add ways to customize outer loop flow

It should be possible to customize the loop flow without rewriting it. For instance, if a user wants to store some benchmarking information between iterations, they should be able to add it into the existing loop without redefining it.

Right now we have self.custom_step() at the end of each iteration, which does the job to some extent, but obviously isn't very flexible and does not cover for many scenarios.

Integrating model hyperparameters

Integrating over model hyper-parameters needs to be implemented and offered as an option when hyper-priors are used in the models. This can provide fundamentally different (and many times better) results than optimizing the models as it is currently done.

matplotlib as requirements

If I install emukit via pip, matplotlib won't be install. However, it seems that it is a requirement of GPy (at least if I do from emukit.test_functions import forrester_function it throws me an error "ImportError: No module named 'matplotlib'" somewhere in GPy).

Could we add matplotlib to requirements.txt?

Get optional dependencies straight

We have a few models with optional dependencies at the moment. They are not tested, and result in empty doc pages. We need to find a good place for them to go, and think about how to manage their dependencies appropriately.

Parallel evaluations of the user function

Following some previous discussions, we should start considering cases where the UserFunction is evaluated simultaneously in several location. This is common in experimental design, BayesOpt etc. For 1.0 we don't need implementations for everything but it will be really useful to get the design and structure of how we are going to handle these cases in the future.

Add logging

It is currently difficult to keep track of what is happening when a loop is running. The loops potentially take a long time to run. We should add logging to allow users to monitor the loops.

Tidy up example functions

The Forrester function is implemented in both the Bayesian optimization and multi-fidelity modules. Shall we put all test functions/toy simulators into one top-level module?

Examples for the landing page

The landing page will be the first thing users will see when approaching the library. When writting the blogs for the methods it is very important that we send an clear message of how the library should be used. For the examples involving loops (BO, ED, BQ) this is what I propose:

  • In all the cards in the landing page we explain the idea of the method that each card links to and we add a simple example with code.
  • We use the same objective function and the same pattern in all the examples.
  • All the examples use specific methods-loops and have the same structure.
  1. Definition of the objective function (same in all examples)
  2. Definition of the model (same in all examples)
  3. Definition of the elements specific of each method (this changes across examples).
  4. Creation of the loop (also specific).
  5. Run the loop and show results (same).

The code for 1, 2, and 5 should be the same in all the examples.

Sensitivity analysis will also use 1 and 2 but as there is no loop the results with be directly shown (specific call to 5).

The multi-fidelity are just models that we offer. The way to go here is to use do 1 and 2 for some example and then repeat 3-5 for one of the other applications (ED?).

It is important that we don't use any class from the /examples folder (like GPBayesianOptimization) as the idea with this blogpost is to show the modularity of the library and be clear about the core structure. We can have a separate card for examples and how to wrap things up.

Does it make sense?

clean up epmgp.py

it is not easy to read. Also we might want to move it out of the util folder

Stopping condition for cost sensitive evaluations

Currently the stopping condition is only checked at beginning of each loop-step. If the evaluation is cost/budget sensitive e.g., in multi-fidelity/source models or models where the input location determines the evaluation cost, then we might want to check after point calculation, too. Alternatively the point calculator could be smarter and only return points which are still below budget, but that is harder to do.

Use abstractions appropriately

Before going 1.0 we should review our use of abstractions. Based on recent conversations I propose this approach:

  • On OuterLoop level we should be dealing in ModelUpdater, CandPointCalc and so on.
  • On concrete method level we should be dealing in problem specific terms: acquisition, update step size, etc.
  • If concrete method does not suit user needs, it should be clear how to create own implementation of the OuterLoop. In fact, users should be encouraged to do that.

There are few places in the code base where we currently don't follow this approach, and if everyone agrees with this proposal, we should identify and fix those places.

Exact integration in integrated variance acquisition

Investigate whether we can use BQ package to do integral of variance in integrated variance acquisition function in experimental design for uniform integration measure in with constant integration bounds. We currently use monte carlo integration which we should keep for when we can't integrate the GP exactly.

Validate notebooks

We need a way to make sure our notebooks are tested:

  1. At the very least they should be valid json files
  2. One step up is to make sure they are valid jupyter notebook files
  3. Even higher goal is to make sure they execute without errors

Point 3 was already implemented in GPyOpt: https://github.com/SheffieldML/GPyOpt/blob/master/manual/notebooks_check.py . We can consider adopting this script. However we need to make sure that whatever validation method we choose can be run as a part of Travis CI build.

Standard acquisition function test suite

We should be able to have some standard tests for all acquisition functions such as testing output shapes and numerical gradient checks. This would make it easier to create new acquisitions and make sure they are all tested.

Updating acquisition function in the loop

Some acquisition functions, such as entropy search, require a recalculation of some parameters after each iteration. This doesn't neatly fit into our OuterLoop framework at the minute. We may need to add an update method to the acquisition which is called after observing new data

Dependency on GPy and GPyOpt

I'd like to open a discussion on Emukit's dependency on GPy and GPyOpt.

GPy dependency causes us some trouble: it is most likely cause of installation failures, it drags in matplotlib and plotly, and it has known issues with Python 3.7. For these reasons I would like to see if we could somehow get rid of that dependency being required.

We depend on GPy and GPyOpt in following places:

  1. Model free designs
    It is possible to even copy over relevant code pieces from GPyOpt and get rid of that dependency.
  2. Optimizer
    Same thing, we could copy, or re-impelement
  3. Multi-fidelity
    This is harder, but we could consider multi-fidelity as an extra bit of the package. People could successfully run decision loops in Emukit without multi-fidelity feature.

Note that 1 and 2 help us remove GPyOpt as a dependency at all.

Opinions welcome.

Verify Windows support

We may have windows users, lots of them. Our builds currently run for linux and osx only. We should verify that emukit works fine on windows, and see if we can have a Travis build for that.

cost sensitive loop

So far Emukit only considers the case where one has only one single model for the objective function. However, there are many cases, for example EIperSec, Fabolas, MTBO, .... , where one has two models: one for the objective function and one for the cost of evaluating the objective function.
It would be fairly straightforward to bootstrap from the already existing BayesianOptimizationLoop module to implement these methods. However, a cost sensitive loops seems to be a fundamental problem that might be also interesting for Bayesian quadrature or Experimental Design and I am wondering whether it makes sense to implement a general CostSensitiveOuterLoop class instead?

WSABI for BQ package

both WSABI-L and WSABI-M (Gunter et al. 2014) with their corresponding acquisition function might be a nice addition.

  • WSABI-L (is already useful without the variance implementation)
    • WSABI-L mean prediction
    • WSABI-L integral variance
    • WSABI-L acquisition function
  • WSABI-M (is already useful without the variance implementation)
    • WSABI-M mean prediction
    • WSABI-M integral variance
    • WSABI-M acquisition function

PEP8 compliance

We say in our contribution guidelines the code should adhere to PEP8 standards. It currently does not. Let's fix it and then check it stays compliant

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.