Giter Site home page Giter Site logo

policy-supervectors's Introduction

Policy Supervectors

Source code for the NeurIPS 2021 DRL workshop paper "General Characterization of Agents by States they Visit".

Long story short: Policy supervectors are fixed-length vectors that describe policy's behaviour (what states they visit), which can be quickly compared to each other via e.g. distance. With this one can study how a training algorithm evolves policies, as demonstrated in the image above.

Code repositories used in this project

prettyNEAT code is originally copied from here

deep-neuroevolution code is originally copied from here

Running the code

Experiment code is in Python (tested on Python 3.7) with requirements in the `requirements.txt´.

The easiest way to run the experiments is to use conda or other env-handler:

conda create -n pivector python=3.7
conda activate pivector
pip install -r requirements.txt
./scripts/run_all.sh

All data will be stored in the same directory as the code under different directories. The final figures/results will be printed in the console (plot_paper.py and plot_appendix.py), and figures will be placed under figures directory.

See ./scripts/run_all.sh for different scripts for different experiments and how they should be ran.

Note: Experiments with novelty search have to be done separately. See deep-neuroevolution/README.md for instructions.

Example code of the method

See plot_method_figures.py for a mostly self-contained code on running the method in Pendulum-v0 environment with three different agents. The resulting plots will be placed in figures, and were used in the method-figure in the main paper.

Gridworld illustrations of return, action and state-based characterizations

Run gridworld_experiments/experiments.py (requires numba) to create an illustration of how policies are measured differently by measuring returns, visited states and actions taken (`gridworld_illustration.pdf´).

Hardware/Software requirements

The strict hardware/software requirements for running the experiments are:

  • Linux-based OS (will not run correctly on Windows)
  • At least 64GB of system memory
  • At least ~300GB of free space available (data is stored in this same directory)

No GPU or specific CPU are required. Only required software is Python and libraries specified in requirements.txt (open and freely available).

Running all experiments takes roughly one to two weeks on a four-core Xeon Intel Xeon system. Note that this is without parallelizing experiments, and you can easily run separate parts by running scripts from scripts directory.

policy-supervectors's People

Contributors

miffyli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.