Giter Site home page Giter Site logo

ais's Introduction

AIS based PORL

This repository contains the code for the PORL (partially observed reinforcement learning) experiments presented in the paper

J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, "Approximate Information State for Approximate Planning and Reinforcement Learning in Partially Observable Environments", 2020.

Three classes of experiments are presented (with their gym-environment-names)

  • Low-dimensional environments

    • Tiger: Tiger-v0
    • Voicemail: Voicemail-v0
    • Cheese Maze: CheeseMaze-v0
  • Moderate-dimensional environments

    • Rock Sample: RockSampling-v0
    • Drone Surveillance: DroneSurveillance-v0
  • High-dimensional environments

    Various grid-world models from gym-minigrid (used in the BabyAI platform) including

    • Simple crossing: MiniGrid-SimpleCrossingS9N1-v0, MiniGrid-SimpleCrossingS9N2-v0, MiniGrid-SimpleCrossingS9N3-v0, MiniGrid-SimpleCrossingS11N5-v0
    • Lava crossing: MiniGrid-LavaCrossingS9N1-v0, MiniGrid-LavaCrossingS9N2-v0
    • Key corridor: MiniGrid-KeyCorridorS3R1-v0, MiniGrid-KeyCorridorS3R2-v0, MiniGrid-KeyCorridorS3R3-v0
    • Obstructed maze: MiniGrid-ObstructedMaze-1Dl-v0, MiniGrid-ObstructedMaze-1Dlh-v0
    • Misc MiniGrid-Empty-8x8-v0, MiniGrid-DoorKey-8x8-v0, MiniGrid-FourRooms-v0

Installation

To install all the dependencies of the code in a virtual environment, run the setup script:

bash bin/setup.sh

Usage

Fist activate the virtual environment using:

source python-vms/ais/bin/activate

To run AIS training algorithm for an environment, say Tiger-v0, run:

python src/main.py --env_name Tiger-v0 

This program accepts the following command line arguments:

Option Description
--output_dir The results are stored in this directory.
--env_name The environment name (in open-ai gym format)
--eval_frequency Number of batch iterations per evaluation step.
--N_eps_eval Number of episodes to evaluate in an evaluation step.
--beta Discount Factor
--lmbda Trade-off between the next reward loss and next observation loss. It generally helps to keep this value low if the rewards of the environment are high.
policy_LR Learning rate used by the ADAM optimizer for the policy.
ais_LR Learning rate used by the ADAM optimizer for the ais.
batch_size Number of samples used in a batch for every optimization step.
num_batches Number of batches to train on.
AIS_state_size Size of the vector used to represent the approximate information state.
--AIS_pred_ncomp Number of components in the GMM for MiniGrid with the KL IPM.
--IPM The IPM can be specified using this argument. In this code, MMD can be used to use the L2-norm squared form of the kernel based IPM. Or KL can be used to indirectly optimize for the Wasserstein IPM.
--seed Random seed used.
--models_folder Directory to save/load models

Reproducing results in the paper

The results presented in the paper can be obtained by running the following wrapper scripts:

  • Low-dimensional environments

    • Tiger: sh bin/lowdim/tiger_MMD.sh and sh bin/lowdim/tiger_KL.sh
    • Voicemail: sh bin/lowdim/voicemail_MMD.sh and sh bin/lowdim/voicemail_KL.sh
    • Cheese Maze: sh bin/lowdim/cheesemaze_MMD.sh and sh bin/lowdim/cheesemaze_KL.sh
  • Moderate-dimensional environments

    • Rock Sample: sh bin/moddim/rocksampling_MMD.sh and sh bin/moddim/rocksampling_KL.sh
    • Drone Surveillance: sh bin/moddim/dronesurveillance_MMD.sh and sh bin/moddim/dronesurveillance_KL.sh
  • High-dimensional environments

    • sh bin/highdim/minigrid_MMD.sh and sh bin/highdim/minigrid_KL.sh

      This runs SimpleCrossingS9N1 environment. To run other environments, the name of the environment must be changed in bin/highdim/minigrid_*.sh files.

Citation

Please use the following citation to refer to the paper:

@misc{AIS,
      title={Approximate information state for approximate planning and reinforcement learning in partially observed systems}, 
      author={Jayakumar Subramanian and Amit Sinha and Raihan Seraj and Aditya Mahajan},
      year={2020},
      note={arXiv:2010.08843},
      url={https://arxiv.org/abs/2010.08843},

}

ais's People

Contributors

adityam avatar amitfishy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.