Giter Site home page Giter Site logo

ds4dm / ecole Goto Github PK

View Code? Open in Web Editor NEW
316.0 316.0 69.0 2.38 MB

Extensible Combinatorial Optimization Learning Environments

Home Page: https://www.ecole.ai

License: BSD 3-Clause "New" or "Revised" License

CMake 3.98% C++ 56.76% Python 7.32% Shell 2.28% JetBrains MPS 29.66%
combinatorial-optimization gym markov-decision-processes ml scip

ecole's People

Contributors

antoineprv avatar aurelienserre avatar benoitsteiner avatar dchetelat avatar gasse avatar lascavana avatar skylion007 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecole's Issues

How Ecole handles CTRL-C

Describe the bug

At the moment, pressing CTL-C while Ecole is running results in either one the two following effects:

  1. SCIP catches the signal and interrupts the solving process silently, resulting in the episode terminating early, without any error message.
  2. Ecole catches the signal and terminates with an Exception.

I think behaviour 2) is OK: CTRL-C signals should stop everything. Behaviour 1) is ok if one runs SCIP in interactive mode via the console, and wants to resume the optimization process afterwards. But with Ecole, the behaviour does not make much sense. Also it is simply annoying, say I want to kill Ecole in a jupyter Notebook, or in console, I have to keep pressing CTRL-C until Python catches the signal in order for the current execution to terminate.

Setting

All settings.

To Reproduce

Run an MDP loop (while True: reset, then while not done: step). Press CTRL-C during the execution.

Expected behavior

A Python KeyboardInterruption.

Fix

A fix is to tell SCIP to not catch CTRL-C signals:

"misc/catchctrlc": False

Rename the EnvironmentComposer Python class

Describe the problem or improvement suggested

In the documentation, we talk about environments, while referring to the class EnvironmentComposer. This can be confusing, as it could suggest there is actually an Environment class somewhere, but this class is nowhere to be found.

Describe the solution you would like

Change the Python class name EnvironmentComposer to Environment.

Describe alternatives you have considered

Have a proper Environment class, which inherits from EnvironmentComposer ?

Additional context

NA

Environment Event

Describe the problem or improvement suggested

Currently, given for instance a reward function, the process of calling reset and obtain_reward is not transparent to the user. Is When is reset called? Before of after a the dynamics are themselves reset?
When writing a new function, it also leads to confusions. Should reset return a reward or will obtain_reward be called right after?
Furthermore, this current setting offers no possibility to access the Model outside of very specific places.
Finally, the code for calling the callback is very similar between observation functions, reward functions, and soon info functions.

To alleviate this, we could generalize the idea of callback event, to reset_pre, reset_post, step_pre, step_post, seed_pre, seed_post.
Ideally, such a callback would be created by simply defining a method a the same name.

struct AnyEvent: Event {
    void reset_pre(scip::Model  const&) override { /* Brand new model not yet on initial state */ }
};

An reward function would then become an event, with a __call__/operator() method for the reward.

struct LpIterations: RewardFunction, Event {
    void reset_pre(scip::Model const&) {  last_lp_iter = 0; }

    Reward operator()(scip::Model model const&) {
        auto reward = ... - last_lp_iter;
        return reward;
    }

private:
    double last_lp_iter = 0;
};

Describe the solution you would like

  • Base event class has all callback implemented with nothing being done.
  • Make event virtual to limit code duplication with Python? Only useful if we are able to call them from C++. In Dynamics?
  • override on scip::Model& and scip::Model const&

Describe alternatives you have considered

  • __call__/operator() becomes redundant with reset_post and step_post, so should it simply be a simple getter, that guarantee to return the same on two successive calls.
  • A more general scheme with arbitrary events that could be registered and fired at runtime (e.g. Scip events, or pytorch ignite events). However, this seem to get in the way of simple use case and it is not clear what use cases it would serve here.

Handle both randomization/randomseedshift and randomization/permutationseed

Currently, Model.seed() only sets randomization/randomseedshift. The permutation of rows / cols depends on a distinct seed, "randomization/permutationseed". We have to deal with that in Model, so that users only have to change one seed, which then affects both. I suggest setting both of them to the same value, or a shifted value, e.g.,
randomization/permutationseed = randomization/randomseedshift + 42

Python State functions composition

State function could be composed easily to create new ones.

  • Sum, product, power (also with scalars) of rewards functions
  • Logical operators of early termination functions
  • Cumulative Reward
  • TupleObservationFunction and DictObservationFunction for observation functions
  • Make Environment automatically create Dict/Tuple obs func if given a tuple/dict.

C++ State function composition

Describe the problem or improvement suggested

Implement the same Python API #49 in C++

Describe the solution you'd like

Reward function all derive from the same ABC, so this could be done with virtual calls (rather than templates) and bound to Python (minimize code duplication).

For observation functions, template (and hence code duplication) seem necessary.

Describe alternatives you've considered

Assertions ignored in CI

Due to the fact that SCIP and Ecole are compiled in Release mode in CI, all assertions are silently ignored in CI tests.
This is problematic, especially since SCIP relies heavily on it.

  • Find the culprit CMAKE_BUILD_TYPE? conda CXX_FLAGS?

Lift the GIL

The GIL is currently in place.

  • Use the PyBind feature to lift it;
  • Be careful to lock again in Trampoline functions;
  • Add a multi threaded test in Python;
  • If possible, enable ThreadSanitizer on the Python extension

Don't compute observations in the final state

Describe the problem of improvement suggested

Environments should not ask for an observation in the final state (when done == True). This observation is most likely impossible to compute, since SCIP will likely not be in a SOLVING state. And the final observation of an episode is not relevant for learning anyway. That way, Observation functions do not deal with that situation, and silently return empty observations (current behaviour).

Describe the solution you'd like

The Environment does not call the registered ObservationFunction when done==True, and the ObservationFunctions do not check any more for SCIP_STAGE_SOLVING.

Describe alternatives you've considered

na

Additional context

na

Recover from exceptions in step

When the user pass a wrong action to Environment::step, the environment needs to be reset.
However, when using the environment interactively, it would be more natural that wrong value are rejected, but that the user could still try to pass other values.

This requires either rolling back on exception, or checking action value at the start of step before attempting to step_state.

API for Action set

Some environment, such as selecting the branching variable or the next branching node, have a dynamic action set. This means that, within different states of the episode, the set of possible action changes.

This is not the same as the Action Space defined by OpenAi Gym because the latter is a static property of the environment and does not change during an episode (or between episodes).

The set of available actions is critical information that needs to be given to the user.

Here are some ideas I could think of, none is perfect. Any other ideas are welcome.

Option 1

The action set is returned as part of the transition in step and reset

obs, action_set, reward, done, info = env.step(action)

Downsides:

  • That's starting to be a lot of thing in a tuple. I myself already have to make efforts to remember if it is ..., reward, done, ..., or ..., done, reward, ..., this will add much more confusion.
  • Degrades API for environment that don't need an action set.

Option 2

An attribute of the environment. Something in the like of

action = user_policy(obs, env.current_action_set())
obs, reward, done, info = env.step(action)

Downsides:

  • More implicit. Does not clearly tell the user that there is complexity to pay attention to. Nor does it tell the user when it changes (is it every episode? very transition?)

Option 3

Part of the Observation. After all, this is a POMDP, if the observation is not good enough to tell that agent what to do, it is not a good observation, period.
Some thoughts

  • For branching, NodeBipartite could have a flag for every variable indicating which variable is fractional, which would have the benefit of avoiding having permutation issues with the action set.
  • However, this is would be different for another observation, e.g. KhalilState.
  • What of NodeBipartite for other environments, now it needs to contain all action spaces?

Obtain an initial reward along with the initial observation (`env.reset()`)

Describe the problem of improvement suggested

Some environment MDPs may be over after reset(), without any call to step(). In such a case, no reward is available. This is annoying, as typically one would like to obtain cumulated rewards for every episode of the MDP, for example when benchmarking different policies. This situation happens often for example in branching, when the instance is solved in preprocessing. We still want to measure the running time, or the number of nodes, or the primal-dual integral etc.

Describe the solution you'd like

observation, action_set, reward, done = env.reset("path/to/problem")

Describe alternatives you've considered

Obtaining the cumulated reward at any moment by calling env.getCumulatedReward(). However, that cumulated reward would not match a cumulated reward manually computed by the user. Observing such a difference can be very confusing for users. Also, it introduces an additional function, which must be safe-guarded (can be called only during / after an episode).

Additional context

Add __repr__ for Python classes

  • Add fmtlib for simpler string creation
  • Specialize fmtlib for Ecole classes
  • Edit representations in doc code samples (doctest)

Bind ecole Exception to Python

  • Bind ecole::scip::Exception
  • Bind ecole::environment::Exception
  • Adapt Python Errors tests to test for bound exceptions

Compare against default SCIP

Describe the problem of improvement suggested

Provide a way to compare against a default SCIP baseline, in terms of Ecole metrics (reward functions).

Describe the solution you'd like

A new environment (DefaultSCIP ? Default ?) with a single step MDP, and empty action set. That way SCIP is ran with default config, and one can extract Ecole reward functions to compare to other Environments. That DefaultSCIP environment could simply be a specialization of the Configuring environment, with an empty action set.

Describe alternatives you've considered

Simply using the Configuring environment to provide a default baseline. This can be a bit confusing for users though.

Add support for pseudo branching candidates

The branching environments should be able to toggle between using regular, fractional branching candidates (SCIPgetLPBranchCands) as hardcoded right now in Ecole, and using all non-fixed variables (SCIPgetPseudoBranchCands), which is what learn2branch uses. We will need it to reproduce learn2branch in Ecole.

Pseudocost observations

Describe the problem or improvement suggested

To reimplement learn2branch, we will need to reproduce pseudocost branching. This requires a new observation function that returns pseudocosts (similar to the current strong branching observation function).

Describe the solution you would like

I would like a new ObservationFunction that returns the pseudocosts, like the current StrongBranchingScores.

Parameter values are silently rounded in Configuring environment

Code to reproduce:

import ecole

env = ecole.environment.Configuring()
env.reset(filename='libecole/tests/data/enlight8.mps')
env.step({'presolving/maxrounds': 0.1})

I suggest we check types more carefully, e.g., with a narrow cast:

template<class Target, class Source>
Target narrow_cast(Source v) {
    auto r = static_cast<Target>(v);
    if (static_cast<Source>(r) != v)
        throw Exception("narrow_cast<>() failed");
    return r;

Python tests are suceptible to run wrong Ecole package

Currently, to run the tests, the user is expected to have Ecole importable.
This leaves room for running tests with outdated versions of Ecole.

#32 makes would make this easier to debug, but a better behavior would be to avoid that altogether.

Tox is a tool that can solve this issue but is hardly compatible with our building tools.

API for stage detection

Describe the problem or improvement suggested

Have a way to programmatically query/declare what SCIP stages are valid for a given observation/reward function, dynamics.
This has the following advantages:

  • Avoid out of date / incomplete documentation of compatibility;
  • Provide clear errors to the user rather than obscure exceptions, segfaults, and wrong results;
  • Automatically add optional on terminal states (if made constexpr), e.g. would help solve #59
  • Make possible automatic generation of integration tests.

Describe the solution you would like

An observation function for instance could have:

struct NodeBipartite: ObservationFunction<SomeObs> {
    static std::array<scip::Stage, 2> valid_stages = {scip::Stage::Solving}
    ...
};

Describe alternatives you have considered

  • Whether this is static/constexpr variable or could be deduced from constructor arguments

Setup pipeline to build documentation

  • Create doc.ecole.ai repository, and set proper DNS;
  • Add Doxygen, Sphynx, and Breathe to CMake;
  • Test building documentation to CI;
  • Deploy documentation to Github in specific branches.

Refactoring Environment

To simplify the Python bindings, we should bind components (ObservationFunction, RewardFunction, SimpleEnvironment...) independently, and combine them in a Python Environment class.

In practice, the DefaultEnvironment would not be bound but reproduced in Python.

This would remove the need to have common base class for all the state functions, as well as simplify the default Environment class.
Python Environment inheritance and composition would be much more easy.

  • Change Environment to be standalone classes, rather than inheriting from DefaultEnvironment
  • Simplify Default Environment (remove pointer handling)
  • Create Python Environment class to mimic DefaultEnvironment
  • Evaluate relevance of clone methods and dereference
  • Remove bindings to None types, and manage them in Python

Implement InformationFunction

Describe the problem or improvement suggested

Similarily to RewardFunction, and ObservationFunction, we want to have a composable/customizable type to return in the information dictionary.

Describe the solution you would like

An important question is what it the type of the information dictionary in C++, as it cannot be as dynamic as Python.
Something with std::map<Key, Value> where:

  • Key is std::string? std::variant<std::string, long int>?
  • Value is std::variant<std::string, long int, double, xt::xarray>?

Additionally, we also want to construct InformationFunction from RewardFunction.

Describe alternatives you have considered

Alternatively, the Value type could be templated and the variant merged at compile time when merging different information function. Unfortunately, it may mean wrapping information function before binding them to Python.

Additional context

Node Bipartite Feature Memoization

Describe the problem or improvement suggested

Some feature in the NodeBipartite observation only need to be computed on reset and not on all transitions.

Describe the solution you would like

Cache the features:

  • Cache the whole observation
  • Update features that can change

Describe alternatives you have considered

Additional context

Enable Model::set_param with std::string

Currently not possible via Cast_SFNIAE, because the std::string goes out of scope (and is deallocated) before the call to SCIP.

Possible solutions:

  • Add an overload rather than a specialization of the template;
  • Specialize the template with std::string (no &, no const): possible copy of string

Add Branching with Imitation Learning tutorial

The tutorial should showcase how Ecole can be used with the Branching environment, a node bipartite observation, and imitation learning to learn a branching policy.

The file should go under /examples

Write an integration test for inheritance

Write a test that verify that spaces inherited from their base class or from an existing space have the desired behavior when given back to the C++ end (through an environment).

  • Create new file test_inheritance.py
  • Inherit some spaces from their base class
  • Inherit other spaces from an existing space
  • Create an environment with these spaces
  • Verify that the spaces are called correctly

Note:

  • The test may not work at first as not all the features are implemented yet, but it will act as test driven development for said feature
  • Look in other test files for convention on how to organize test (e.g. the model fixture to get a valid MILP problem)

Enable Source Distribution

Steps for PyPI:

  • Make Conan optional #121
  • Add top level setup.py
  • Use sciki-build to drive compilation with CMake
    • Set current version of Python
    • Extract compile flags from setuptools (or scikit-build?)
  • Use pyproject.toml and PEP518 to resolve build time dependencies
    • Scikit-build (and CMake)
    • Conan (will resolve libecole dependencies) or CMake FetchContent
    • PyBind11
    • Numpy
    • Scip (unknown so far)
    • xtensor-python (no plan so far xtensor-stack/xtensor-python#219)
  • Add Manifest.in
  • Use setuptools tests_require=["pytest", "pyscipopt"] for pytest
  • Remove dependencies from Dockerfile.src
  • Edit conda recipe to use new setup (--no-build-isolation)
  • Add a convenient way to run tests (Ctest ?)
  • Document how to used development version (PYTHONPATH)
    • Seems not yet supported, should read environment varaibles
  • Document how to set SCIP_DIR using pip
  • Fix install Rpath
  • Add sdist releases to Pypi

Additionally

  • Switch to tox for automated testing

Python type hints

Add type hints for IDE completion, static check, and documentation purposes.

  • Investigate if Pybind need stubs;
  • The correct types for state function would be to define a Protocol but this is Python 3.8+

Implement an `ObservationAggregator` observation function

In order to be able to combine several observation functions. The observations are then returned in a list, in the same order as the passed ObservationFunction's.

Pseudocode of the use case:

of = ObservationAggregator([NodeBipartite(), StrongBranchingScores()])
env = Environment(observation_function=of)

obs, done = env.reset()
bipartitegraph, sb_scores = obs

Allow for nested exceptions

Replace Exception for std::exception in order to allow std::throw_with_nested. That would improve error messages for the users.

Convert xtensor to pyxtensor

Figure if it is possible to convert xtensor to pyxtensor without copy.

This would make it possible to remove the template from Observation and ObservtionFunction classes.

Model.set_params/get_params

Describe the improvement suggested

Everything is in the plural, get and set multiple SCIP parameters in one call for convenience.

Describe the solution you'd like

C++, use the ParamType variant to hold different parameters values.
It may also be an opportunity to bind the Python code using the variant as well to reduce code duplication.

Ecole observation features

Modularize feature extraction from scip::Model

Currently, feature extracted in NodeBipartite cannot be easily reused by other classes, or customized in the current class.
On the other end, VarProxy... create confusion with SCIP variables.

  • Move the code extracting features from scip::Model to the observation namespace.
  • Define Feature classes that
    • Are typed depending on the feature (scalar, categorical, on all variable, global...)
    • Contain directive for preprocessing: one-hot encode, static function to normalize...
    • Know (statically? dynamically?) if there need to be recomputed on every transition
  • Take feature to extract as a vector / array of names (enum)
  • Leverage range library for maximum re-usability? and std::span
  • Be performant, in particular maximize data locality and make compile-time compuattion when the features are given at compile-time.
  • Determine where to put the function to extract row matrix
  • Make it possible to reuse features to create new features, while efficiently caching the intermediate feature in the implementation

Python object ownership

Problem:
Python cannot give away ownership of its references. Using holder types such as std::unique_ptr is impossible without making copies (of base::XSpace, or scip::Model).
Move semantics from Python are work in progress in Pybind11, and unexpected to the Python user anyways.

  • Template a holder type to choose between unique_ptr or shared_ptr
  • Add constructor to work with the holder type
  • Add holder type to pybind space classes
  • Adapt Python bindings to use the shared_ptr for spaces
  • Use Model with the holder type
  • Change Model::reset to also use holder of model
  • Adapt Python bindings to use the shared_ptr of spaces for Env
  • Template holder type for Model
    • Remove full template specialization of member functions
  • Use Model of same holder type in Envs
    • Remove Using ptr<Model> in favor of Model<Holder>
  • Adapt Python bindings for Env.reset
  • Replace make_unqiue for temaplated make_ptr to statically dipatch between shared and unique pointers

Documentation pages

  • Installation
  • First step / Environment example
  • State functions (full environment API)
  • MDP formulation and generalization
  • Difference with OpenAi (and motivation)
    • done flag on reset
    • initial prob. distribution on reset
    • None on terminal states
  • Extending an environment
  • Creating an environment
  • Compatibility with PySCIPOpt (seed and to/from Model), understanding Model, and acesssing it through state (e.,g. env.state.model.get_param("random/somehting"))

Lifetime of Model should exceed that of Thread

Currently the lifetime of Controller and the model it uses Model are independent but this assumption is wrong because

  • The Model should not be accessed without validation form Controller;
  • The Controller needs a valid Model to be destructed, as it needs to terminate the solving (this cannot be prematurely terminated, as we want SCIP to properly free the solving resources).

The lifetimes are currently manually maintained, but this design is not fullproof.

Proposed solution:

  • Move State inside of Controller, as std::shared_ptr;
  • Pass std::shared_ptr<State> to the auxillary thread, hence avoiding address errors;
  • Validate external access to State using Controller lock;
  • Change DefaultEnvironment to access State through Branching Controller (tied with #19)

Increase testing environments

Continuous integration should test the following configurations:

  • All python >= 3.6
  • gcc vs clang
  • scip >= 6.0

To simplify builds, environments should be completely defined by Docker images.

TODO:

Coding conventions

Make the following conventions consistent across the codebase:

  • Brace vs parenthesis constructors;
  • auto x = declaration (may be a copy in C++<17);
  • naming private attribute m_var
  • setter/getter convention:
    • void name(val&) and auto name()
    • auto& name(val)
    • explicit get and set prefixes
  • snake_case (std convention) vs PascalCase (Python convention) for types

Many of this things can be don by clang-tidy

Environment composability

Environments are made composable by their observation, reward, termination spaces.

  • Add TerminationSpace
  • Add a TerminationSpace subclass
  • Python bindings for TerminationSpace
  • Finish RewardSpace API
  • Add a RewardSpace subclass
  • Python bindings for RewardSpace
  • Change handling of generic spaces at the generic (Env) level
    Maximum control is left to the user
    • Rethink Env as pure ABC, and combine with private inheritance mixin
  • Complete C++ documentation of base::Env
  • Add reset for all spaces
  • Provide a default reset that does nothing
  • Add Python bindings for reset and edit tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.