Giter Site home page Giter Site logo

n0wwa / evo-populationbasedtraining Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yyzpiero/evo-populationbasedtraining

0.0 0.0 0.0 1.51 MB

Population-Based Training (PBT) for Reinforcement Learning using Message Passing Interface (MPI)

License: MIT License

Python 100.00%

evo-populationbasedtraining's Introduction

EVO: Population-Based Training (PBT) for Reinforcement Learning using MPI

Overview

Population-Based Training is a novel approach to hyperparameter optimisation by jointly optimising a population of models and their hyperparameters to maximise performance. PBT takes its inspiration from genetic algorithms where each member of the population can exploit information from the remainder of the population.

PBT Illustration

Illustration of PBT training process (Liebig, Jan Frederik, Evaluating Population based Reinforcement Learning for Transfer Learning, 2021)

To extend the population of agents to extreme-scale using High-Performance Computer, this repo, namely EVO provide a PBT implementation for RL using Message Passing Interface.

MPI (Message Passing Interface) and mpi4py

Message passing interface (MPI) provides a powerful, efficient, and portable way to express parallel programs. It is the dominant model used in high-performance computing.

mpi4py provides a Python interface that resembles the message passing interface (MPI), and hence allows Python programs to exploit multiple processors on multiple compute nodes.

Get Started

Prerequisites:

  • Python 3.8
  • Conda
  • (Poetry)
  • (Pytorch)1

Clone the repo:

git clone https://github.com/yyzpiero/evo.git

Create conda environment:

conda create -p ./venv python==X.X

and use poetry to install all Python packages:

poetry install

Please use pip or poetry to install mpi4py :

pip install mpi4py

or

poetry add mpi4py

Using Conda install may lead to some unknown issues.

Basic Usage

Activate conda environment:

conda activate ./venv

Please use mpiexec or mpirun to run experiments:

mpiexec -n 4 python pbt_rl_wta.py --num-agents 4 --env-id CartPole-v1

Example

Tensorboard support

EVO also supports experiment monitoring with Tensorboard. Example command line to run an experiment with Tensorboard monitoring:

mpiexe -n 4 python pbt_rl_truct_collective.py --num-agents 4 --env-id CartPole-v1 --tb-writer True

Toy Model

The toy example was reproduced from Fig. 2 in the PBT paper

PBT Illustration

Reinforcement Learning Agent

PPO agent from stable-baselines 3 with default settings are used as reinforcement learning agent.

self.model = PPO("MlpPolicy", env=self.env, verbose=0, create_eval_env=True)

However, it can also be replaced by any other reinforcement learning algorithms.

Reference:

Selection Mechanism

"Winner-takes-all"

A simply selection mechanism, that for each generation, only the best-performed agent is kept, and its NN parameters are copied to all other agents. .py provides an implementation of such a mechanism using collective communications.

Truncation selection

It is the default selection strategy in PBT paper for RL training, and is widely used in other PBT-based methods.

All agents in the entire population are ranked by their episodic rewards. If the agent is in the bottom $25%$ of the entire population, another agent from the top $25%$ is sampled and its NN parameters and hyperparameters are copied to the current agent. Different MPI communication methods2 are implemented.

Implemented Variants

Variants Description
pbt_rl_truct.py implementation using point-2-point communications via send and recv.
pbt_rl_truct_collective.py implementation using collective communications.

For small clusters with a limited number of nodes, we suggest the point-2-point method, which is faster than the collective method. However, for large HPC clusters, the collective method is much faster and more robust.

Benchmarks

We used continuous control AntBulletEnv-v0 scenario in PyBullet environments to test our implementations.

Results of the experiments are presented on the Figure below:

Benchmark Results

Left Figure: Reward per generation using PBT | Right Figure: Reward per step using single SB3 agent

Some key observations:

  • By using PBT to train PPO agents can achieve better results than a SAC agent(single agent)

    • Note: SAC should outperforms PPO (see OpenRL) in most PyBullet environments
  • "Winner-takes-all" outperforms the Truncation Selection mechanism in this scenario.

Acknowledgements

This repo is inspired by graf, angusfung's population based training repo.

Footnotes

  1. Please use cpu-only version if possible, as most HPC clusters don't have GPUs

  2. This article briefly introduces the difference between point-2-point communications and collective communications in MPI.

evo-populationbasedtraining's People

Contributors

yyzpiero avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.