Giter Site home page Giter Site logo

chloe4d / iuur-for-multi-agent-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dreamchaser128/iuur-for-multi-agent-reinforcement-learning

0.0 1.0 0.0 9.67 MB

Two improvements based on MADDPG algorithm

Python 95.75% Tcl 3.87% PowerShell 0.23% Batchfile 0.15%

iuur-for-multi-agent-reinforcement-learning's Introduction

Two improvements based on MADDPG algorithm

1. Introduction

Based on MADDPG algorithm, there are mainly two improvements:

  • maddpg_IU-master:the code implementation of updating the agents strategies by iterative update.
  • maddpg_IUUR-master:the code implementation of updating the agent strategies by iterative update and unified representation.

The experimental environment is multiagent-particle-envs-master, which is a multi-agent environment that is installed in the same way as MADDPG-env.

2. Environment

There are mainly two environments:fully-cooperative and mixed cooperative-competitive:

  • Fully-cooperative environments(Spread): Agents perceive the environment on its own perspective and cooperate with each other to reach different destinations.
  • Mixed cooperative-competitive environments(Predator-Prey): Agents are divided into predators and preys. The predators need to cooperate with each other to catch the preys.

In order to compare the influence of the number of agents on the algorithm, it is designed the control groups by increasing the number of agents.

3. Install

Installation method and dependency package versions are the same as MAPPDG:

  • To environment install: cd into the root directory(multiagent-particle-envs-master) and type pip install -e .
  • Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
  • To run the code, cd into the related root directory and run the corresponding python file.
    For example,run train_Cooperative_game.py under a scenario named spread_3:
    python train_Cooperative_game.py --scenario=spread_3

4. Results

①Fully-cooperative environment

We set up a simple environment with three agents (Spread_3) and a complex environment with ten agents(Spread_10).        
Spread_3                                        Spread_10
                       Spread_3                                                                                       Spread_10

We run five random seeds for each environment and compare the performance among MADDPG, IU and IUUR.

  
Spread_3_comparison                           Spread_10_comparison
                  Spread_3_comparison                                                             Spread_10_comparison

As can be seen from the figure, IUUR converges quickly and after 20,000 episodes, it has exceeded MADDPG and maintained a steady rise.

②Mixed cooperative-competitive environment (the baseline is MADDPG vs MADDPG)

We set up three chase one as simple scenes(Predator_3-prey_1) and six chase two as complex scenes(Predator_6-prey_2).        
Predator_3-Prey_1                                        Predator_6-Prey_2
                       Predator_3-Prey_1                                                                       Predator_6-Prey_2

We run five random seeds for each environment and compare the performance among MADDPG, IU and IUUR.

  • Performance comparison in Predator_3-Prey_1

    • The prey is MADDPG while the predators are replaced by IU and IUUR:        
      Predator_3-Prey_1_predator_comparison
    • The predator is MADDPG while the preys are replaced by IU and IUUR:        
      Predator_3-Prey_1_prey_comparison

    IUUR outperforms MADDPG a lot while IU’s performance is slightly worse than that of MADDPG which is out of our expectation.

  • Performance comparison in Predator_6-prey_2

    • The prey is MADDPG while the predators are replaced by IU and IUUR:        
      Predator_6-Prey_2_predator_comparison
    • The predator is MADDPG while the preys are replaced by IU and IUUR:        
      Predator_6-Prey_2_prey_comparison

    IU outperforms MADDPG a lot while IUUR’s performance is worse than that of MADDPG. The reason is that as the number of agents increases, nonstationarity arises in multi-agent reinforcement learning gets more serious.

5. Conclusions

  • This paper presents iteration updating and unified representation. lterative update is used to stabilize the environment and unified representation take the advantages of tensor compute to save memory and speed up the interaction with environment.
  • Though our experiments are based on MADDPG, this method is also suitable for most of multi-agent algorithms like IQL, VDN, QMIX etc.

6. Future Works

  • Due to the limited computing resources, we only expand the number of agents to a certain extent, which can be further verified in more complex environments.
  • We only simply control the learning frequency of iterative update hyperparameter K through experience, which is a research direction in the future.
  • How to realize the iterative update method in this unified representative network,This will be further improved in the future work. (Considering the value fixing method based on Bellman Equation can only guarantee a smaller L_2 norm of its gradients)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.