Giter Site home page Giter Site logo

dcbr / sdab Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 31 KB

This repository provides the necessary code to reproduce all supplementary experiments of the "Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies" paper [1, Appendix B].

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

sdab's Introduction

State-dependent action bounds

This repository provides the necessary code to reproduce all supplementary experiments of the "Enforcing Hard State-Dependent Action Bounds on Deep Reinforcement Learning Policies" paper [1, Appendix B].

An example implementation of state-dependent action bounds is provided for the SAC method, using Stable-Baselines3 and Pytorch. Custom state-dependent action bounds are defined for the Pendulum-v1 and LunarLanderContinuous-v2 OpenAI gym environments. Refer to the paper's supplementary material for further details.

Installation

  1. Clone this repository.

    git clone https://github.com/dcbr/sdab

    cd sdab

  2. Install the required packages. Optionally, create a virtual environment first (using e.g. conda or venv).

    python -m pip install -r requirements.txt

Usage

Run the action_bounds script with suitable arguments to train the models or evaluate and analyze their performance. For example

python action_bounds.py --mode train --envs LunarLanderContinuous-v2 --rescale lin hyp

to train on the lunar lander environment (with stabilizing action bounds) for both the linear and hyperbolic rescaling function.

To reproduce all results of Appendix B, first train all models with python action_bounds.py, followed by the analysis python action_bounds.py --mode analyze. Beware that this might take a while to complete, depending on your hardware!

A summary of the most relevant parameters to this script is provided below. Check python action_bounds.py --help for a full overview of supported parameters.

Parameter Supported values Description
--mode train, eval, analyze Run mode. Either train models, evaluate (and visualize) them or analyze and summarize the results (creating the plots shown in the paper).
--envs Pendulum-v1, LunarLanderContinuous-v2 OpenAI gym environment ID.
--algs sac, bsac Reinforcement learning algorithm to use. Either the bounded SAC algorithm (bsac), with enforced state-dependent action bounds, or the default SAC algorithm (sac), without enforcement of such bounds.
--rescale lin, pwl, hyp, clip Rescaling function ฯƒ to use. Either linear (lin), piecewise linear (pwl) or hyperbolic (hyp) rescaling; or clipping (clip).
--seeds Any integer number N Experiments are repeated for all of the provided seeds. Can also be a negative number -N in which case N seeds are randomly chosen.

References

[1] De Cooman, B., Suykens, J., Ortseifen, A.: Enforcing hard state-dependent action bounds on deep reinforcement learning policies. Accepted for 8th International Conference on Machine Learning, Optimization & Data Science, LOD 2022.

sdab's People

Contributors

dcbr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.