Giter Site home page Giter Site logo

scvpo's Introduction

Super Constrained Variational Policy Optimization for Safe Reinforcement Learning

SCVPO is a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning,It has the ability to take into account past strategies to make updates smoother.

Table of Contents

The structure of this repo is as follows:

Safe RL libraries
├── safe_rl  # core package folder
│   ├── policy # safe model-free RL methods
│   ├── ├── model # stores the actor critic model architecture
│   ├── ├── policy_name # RL algorithms implementation
│   ├── util # logger and pytorch utils
│   ├── worker # collect data from the environment
│   ├── runner.py # core module to connect policy and worker
├── script  # stores the training scripts.
│   ├── config # stores some configs of the env and policy
│   ├── run.py # launch a single experiment
│   ├── experiment.py # launch multiple experiments in parallel with ray
│   ├── button/circle/goal.py # hyper-parameters for each experimental env
├── data # stores experiment results

Installation

1. System requirements

  • Tested in Ubuntu 20.04, should be fine with Ubuntu 18.04
  • I would recommend to use Anaconda3 for python env management

2. System-wise dependencies installation

Since we will use mujoco and mujocu_py for the safety-gym environment experiments, so some dependencies should be installed with sudo permissions. To install the dependencies, run

cd envs/safety-gym && bash setup_dependency.sh

And enter the sudo password to finish dependencies installation.

3. Anaconda Python env setup

Back to the repo root folder, activate a python 3.6+ virtual anaconda env, and then run

cd ../.. && bash install_all.sh

It will install the modified safety_gym and this repo's python package dependencies that are listed in requirement.txt. Then install pytorch based on your platform, see tutorial here.

Some experiments (CarCircle, BallCircle) are done in BulletSafetyGym, so if you want to try those environments, install them with following commands:

cd envs/Bullet-Safety-Gym
pip install -e .

Training

How to run a single experiment

Simply run

python script/run.py -p cvpo -e SafetyCarCircle-v0

where -p is the policy name, -e is the environment name. More configs could be found in script/config folder and in run.py and in safe_rl/runner.py.

To evaluate a trained model, run:

python script/run.py -m eval -d /model_dir -e SafetyCarCircle-v0

Note that if you are going to render bullet_safety_gym environments, such as SafetyCarCircle-v0, you need to add the argument -e SafetyCarCircle-v0.

How to run multiple experiments in parallel

We use the Ray Tune tool to conduct experiments in parallel. For instance, running all the off-policy methods in the button environments, run:

python script/experiment.py cvpo sac_lag ddpg_lag td3_lag --env button --cpu 4 --thread 1

where --env is the environment name and should be selected from button, circle or goal. --cpu specifies the maximum cpu you want to use to run all the experiments, and --thread is the cpu resource for each experiment trial. See Ray for more details.

Check experiment results

You may either use tensorboard or script/plot.py to monitor the results. All the experiment results are stored in the data folder with corresponding experiment name.

For example:

tensorboard --logdir data/experiment_folder
python script/plot.py data/experiment_folder -y EpRet EpCost

scvpo's People

Contributors

liruiluo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.