Giter Site home page Giter Site logo

cm3's Introduction

Cooperative Multi-Stage Multi-Goal Multi-Agent Reinforcement Learning (CM3)

This repository provides code for experiments in the paper CM3[1], published in ICLR 2020. It contains the main algorithm and baselines, and the three simulation Markov games on which algorithms were evaluated.

Dependencies

  • All experiments were run on Ubuntu 16.04
  • Python 3.6
  • TensorFlow 1.10
  • SUMO
  • pygame: sudo apt-get install python-pygame
  • OpenAI Gym 0.12.1

Project structure

  • alg: Implementation of algorithms and config files. config.json is the main config file. config_particle_*.json specifies various instances of the cooperative navigation task. config_sumo_stage{1,2}.json specifies agent initial/goal lane configurations for SUMO. config_checkers_stage{1,2}.json specifies parameters of the Checkers game.
  • env: Python wrappers/definitions of the simulation environments.
  • env_sumo: XML files that define the road and traffic for the underlying SUMO simulator.
  • log: Each experiment run will create a subfolder that contains the reward values logged during the training or test run.
  • saved: Each experiment run will create a subfolder contains trained TensorFlow models.

Environments

There are three simulations, selected by the experiment field in alg/config.json.

  1. Cooperative navigation: particles must move to individual target locations while avoiding collisions.
    • Environment code located in env/multiagent-particle-envs/
  2. SUMO
    • Stage 1: single agent on empty road. Corresponds to setting "stage" : 1
    • Stage 2: two agents on empty road. Corresponds to setting "stage" : 2
    • Python wrappers located in env/. Entry point is env/multicar_simple.py
    • SUMO topology and traffic defined in env_sumo/simple/
  3. Checkers: two agents cooperate to collect rewards while avoiding penalties in a checkered map.
    • Implemented in env/checkers.py

Environment setup

  1. Cooperative navigation: run pip install -e . inside env/multiagent-particle-envs/
  2. SUMO: Install SUMO and add the following to your .bashrc
  • export PYTHONPATH=$PYTHONPATH:path/to/sumo
  • export PYTHONPATH=$PYTHONPATH:path/to/sumo/tools
  • export SUMO_HOME="path/to/sumo"
  1. Checkers: None required

Training

Environment-specific examples

Cooperative navigation

  • In config.json, set
    • experiment: "particle"
    • particle_config should be one of config_particle_stage1.json, config_particle_stage2_antipodal.json, config_particle_stage2_cross.json, config_particle_stage2_merge.json
  • Inside alg/, execute python train_onpolicy.py

SUMO

  • In config.json, set
    • experiment: "sumo"
    • port: if multiple SUMO experiments are run in parallel, each experiment must have its unique number
  • Inside alg/, execute python train_offpolicy.py --env ../env_sumo/simple/merge.sumocfg
  • Include the option --gui to show SUMO GUI while training (at the cost of increased runtime)

Checkers

  • In config.json, set
    • experiment : "checkers"
  • Inside alg/, execute python train_offpolicy.py

General notes for running Stage 1 and 2 of CM3

  • stage: either 1 or 2
  • dir_restore: for Stage 2 of CM3, this must be equal to the string for dir_name when Stage 1 was run.
  • use_alg_credit: 1 for CM3
  • use_Q_credit: 1 for CM3. 0 for ablation that uses value function baseline.
  • train_from_nothing: 1 for Stage 1 of CM3, or the ablation that omits the curriculum. 0 to allow restoring a trained Stage 1 model.
  • model_name: when training Stage 2 and restoring a Stage 1 model, this must be the name of the model in Stage 1.
  • prob_random: 1.0 for Stage 1, 0.2 for Stage 2. Not applicable for Checkers.

Citation

@inproceedings{yang2019cm3,
  title={CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning},
  author={Yang, Jiachen and Nakhaei, Alireza and Isele, David and Fujimura, Kikuo and Zha, Hongyuan},
  booktitle={International Conference on Learning Representations},
  year={2019}
}

License

See LICENSE.

SPDX-License-Identifier: MIT

cm3's People

Contributors

011235813 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.