Two improvements based on MADDPG algorithm

1. Introduction

Based on MADDPG algorithm, there are mainly two improvements:

maddpg_IU-master：the code implementation of updating the agents strategies by iterative update.
maddpg_IUUR-master：the code implementation of updating the agent strategies by iterative update and unified representation.

The experimental environment is multiagent-particle-envs-master, which is a multi-agent environment that is installed in the same way as MADDPG-env.

2. Environment

There are mainly two environments:fully-cooperative and mixed cooperative-competitive:

Fully-cooperative environments(Spread): Agents perceive the environment on its own perspective and cooperate with each other to reach different destinations.
Mixed cooperative-competitive environments(Predator-Prey): Agents are divided into predators and preys. The predators need to cooperate with each other to catch the preys.

In order to compare the influence of the number of agents on the algorithm, it is designed the control groups by increasing the number of agents.

3. Install

Installation method and dependency package versions are the same as MAPPDG:

To environment install: cd into the root directory(multiagent-particle-envs-master) and type pip install -e .
Known dependencies: Python (3.5.4), OpenAI gym (0.10.5), tensorflow (1.8.0), numpy (1.14.5)
To run the code, cd into the related root directory and run the corresponding python file.
For example,run train_Cooperative_game.py under a scenario named spread_3:
python train_Cooperative_game.py --scenario=spread_3

4. Results

①Fully-cooperative environment

We set up a simple environment with three agents (Spread_3) and a complex environment with ten agents(Spread_10).

                  Spread_3                                                       Spread_10

We run five random seeds for each environment and compare the performance among MADDPG, IU and IUUR.

              Spread_3_comparison                                     Spread_10_comparison

As can be seen from the figure, IUUR converges quickly and after 20,000 episodes, it has exceeded MADDPG and maintained a steady rise.

②Mixed cooperative-competitive environment (the baseline is MADDPG vs MADDPG)

We set up three chase one as simple scenes(Predator_3-prey_1) and six chase two as complex scenes(Predator_6-prey_2).

                  Predator_3-Prey_1                                          Predator_6-Prey_2

We run five random seeds for each environment and compare the performance among MADDPG, IU and IUUR.

Performance comparison in Predator_3-Prey_1
- The prey is MADDPG while the predators are replaced by IU and IUUR:
- The predator is MADDPG while the preys are replaced by IU and IUUR:
IUUR outperforms MADDPG a lot while IU’s performance is slightly worse than that of MADDPG which is out of our expectation.
Performance comparison in Predator_6-prey_2
- The prey is MADDPG while the predators are replaced by IU and IUUR:
- The predator is MADDPG while the preys are replaced by IU and IUUR:
IU outperforms MADDPG a lot while IUUR’s performance is worse than that of MADDPG. The reason is that as the number of agents increases, nonstationarity arises in multi-agent reinforcement learning gets more serious.

5. Conclusions

This paper presents iteration updating and unified representation. lterative update is used to stabilize the environment and unified representation take the advantages of tensor compute to save memory and speed up the interaction with environment.
Though our experiments are based on MADDPG, this method is also suitable for most of multi-agent algorithms like IQL, VDN, QMIX etc.

6. Future Works

Due to the limited computing resources, we only expand the number of agents to a certain extent, which can be further verified in more complex environments.
We only simply control the learning frequency of iterative update hyperparameter K through experience, which is a research direction in the future.
How to realize the iterative update method in this unified representative network,This will be further improved in the future work. (Considering the value fixing method based on Bellman Equation can only guarantee a smaller L_2 norm of its gradients)

chloe4d / iuur-for-multi-agent-reinforcement-learning Goto Github PK

iuur-for-multi-agent-reinforcement-learning's Introduction

Two improvements based on MADDPG algorithm

1. Introduction

2. Environment

3. Install

4. Results

①Fully-cooperative environment

②Mixed cooperative-competitive environment (the baseline is MADDPG vs MADDPG)

5. Conclusions

6. Future Works

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent