sjtu-marl / malib Goto Github PK

View Code? Open in Web Editor NEW

461.0 9.0 59.0 9.4 MB

A parallel framework for population-based multi-agent reinforcement learning.

Home Page: https://malib.io

License: MIT License

Makefile 0.22% Python 99.24% Shell 0.54%

multiagent reinforcement-learning python ray games parallel distributed

malib's Introduction

MALib: A parallel framework for population-based reinforcement learning

MALib is a parallel framework of population-based learning nested with reinforcement learning methods, such as Policy Space Response Oracle, Self-Play, and Neural Fictitious Self-Play. MALib provides higher-level abstractions of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms.

Installation

The installation of MALib is very easy. We've tested MALib on Python 3.8 and above. This guide is based on Ubuntu 18.04 and above (currently, MALib can only run on Linux system). We strongly recommend using conda to manage your dependencies, and avoid version conflicts. Here we show the example of building python 3.8 based conda environment.

conda create -n malib python==3.8 -y
conda activate malib

# install dependencies
./install.sh

Environments

MALib integrates many popular reinforcement learning environments, we list some of them as follows.

OpenSpiel: A framework for Reinforcement Learning in games, it provides plenty of environments for the research of game theory.
Gym: An open source environment collections for developing and comparing reinforcement learning algorithms.
Google Research Football: RL environment based on open-source game Gameplay Football.
SMAC: An environment for research in the field of collaborative multi-agent reinforcement learning (MARL) based on Blizzard's StarCraft II RTS game.
PettingZoo: A Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gymnasium.
DexterousHands: An environment collection of bimanual dexterous manipulations tasks.

See malib/envs for more details. In addition, users can customize environments with MALib's environment interfaces. Please refer to our documentation.

Algorithms and Scenarios

MALib integrates population-based reinforcement learning, popular deep reinforcement learning algorithms. See algorithms table here. The supported learning scenarios are listed as follow:

Single-stream PSRO scenario: for single-stream population-based reinforcement learning algorithms, cooperating with empirical game theoretical analysis methods. See scenarios/psro_scenario.py
Multi-stream PSRO scenario: for multi-stream population-based reinforcement learning algorithms, cooperating with empirical game theoretical analysis methods. See scenarios/p2sro_scenario.py
Multi-agent Reinforcement Learning scenario: for multi-/single-agent reinforcement learning, with distributed techniques. See scenarios/marl_scenario.py

Quick Start

Before running examples, please ensure that you import python path as:

cd malib

# if you run malib installation with `pip install -e .`, you can ignore the path export
export PYTHONPATH=./

Running PSRO example to start training for Kuhn Poker game: python examples/run_psro.py
Running RL example to start training for CartPole-v1 game: python examples/run_gym.py

Documentation

See online documentation at MALib Docs, or you can also compile a local version by compiling local files as

pip install -e .[dev]
make docs-compile

Then start a web server to get the docs:

# execute following command, then the server will start at: http://localhost:8000
make docs-view

Contributing

Read CONTRIBUTING.md for more details.

Citing MALib

If you use MALib in your work, please cite the accompanying paper.

@article{JMLR:v24:22-0169,
  author  = {Ming Zhou and Ziyu Wan and Hanjing Wang and Muning Wen and Runzhe Wu and Ying Wen and Yaodong Yang and Yong Yu and Jun Wang and Weinan Zhang},
  title   = {MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2023},
  volume  = {24},
  number  = {150},
  pages   = {1--12},
  url     = {http://jmlr.org/papers/v24/22-0169.html}
}

malib's People

Contributors

Stargazers

Watchers

Forkers

williamyuanv0 wnzhang zhihaolyu rl-code-lib simenglv apexrl duzhanyuan caizy1709 liveforchanging wangxinchd github-ghyuan vinbo robertmay615 mohan-zhang-u wadrhaw gj-12222 cuileiyang-github kailashg26 zhangmwg zbzhu99 ecustboy gaosz0755 jianzuo liyaangy yebo92 tigerneil ethansystem rshn1994 mxd6 qiaoptdun xuehaipan jiyuzhe luorq3 jialuyu61 yansong97 erlebnisw xinyingking zhixin612 harry-zhou cswangle mason0629 coldison cdm1619 ssx15 xiaolongguo hanyuyingfd47 hiuhiuwong guazimao zhangshuhao0928 cby-pku iq-scm 1229685850 github-12306 birdmanmandbir zc19950602 libin-star brunoscaglione wkkdhj bianmaxingkong

malib's Issues

[Question] How to debug `malib` (infinity loop when running `ray` in local mode)

Hi, I'm trying to run the PSRO algorithm with the quick start example on https://malib.io (PSRO PPO with leduc_holdem.env). I get the following errors:

2022-03-27 23:04:33,013    ERROR worker.py:79 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=185570, ip=127.0.0.1, repr=<malib.rollout.rollout_worker.RolloutWorker object at 0x7f1c52c37490>)
  File "/home/panxuehai/Projects/malib/malib/rollout/base_worker.py", line 441, in simulation
    raw_statistics, num_frames = self.sample(
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_worker.py", line 161, in sample
    for ret in rets:
  File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 65, in map
    yield self.get_next()
  File "/home/panxuehai/Miniconda3/envs/malib/lib/python3.8/site-packages/ray/util/actor_pool.py", line 178, in get_next
    return ray.get(future)
ray.exceptions.RayTaskError(TypeError): ray::Stepping.run() (pid=185523, ip=127.0.0.1, repr=<malib.rollout.rollout_func.Stepping object at 0x7f8814a20dc0>)
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 431, in run
    rollout_results = env_runner(
  File "/home/panxuehai/Projects/malib/malib/rollout/rollout_func.py", line 243, in env_runner
    rets = env.reset(
  File "/home/panxuehai/Projects/malib/malib/envs/vector_env.py", line 192, in reset
    _ret = env.reset(max_step=max_step, custom_reset_config=custom_reset_config)
TypeError: reset() got an unexpected keyword argument 'max_step'

Then I'd debug this myself with ray.init(local_mode=True) in my debugger. All Ray actors will run sequentially when the "local mode" is on.

https://docs.ray.io/en/latest/ray-core/starting-ray.html#local-mode:

Local Mode

By default, Ray will parallelize its workload and run tasks on multiple processes and multiple nodes. However, if you need to debug your Ray program, it may be easier to do everything on a single process. You can force all Ray functions to occur on a single process by enabling local mode

The program is stuck into an infinity loop here:

malib/malib/agent/agent_interface.py

Lines 277 to 279 in 5be07ac

    
           # wait 
        
           while len(self._agent_to_pids[env_aid]) == 0: 
        
               pass

I wonder what's the best practice for debugging with malib? Thanks very much!

Ugly implementation of evaluation control

The changes from PR #12 added a new feature to do policy evaluation, while it was ignored. Polishment is required

settings

malib/malib/settings.py

Line 95 in 9efda1b

"test_num_episodes": 0,

parameter

malib/malib/rollout/rollout_worker.py

Line 38 in 9efda1b

test: bool = False,

some related logics

malib/malib/rollout/base_worker.py

Line 162 in 9efda1b

if self._offline_dataset is None and not self._test:

malib/malib/rollout/base_worker.py

Line 104 in 9efda1b

def get_test(self):

malib/malib/rollout/base_worker.py

Line 315 in 9efda1b

group="testing" if self._test else "rollout",

pickle5.pickle.PicklingError when import PSROScenario

After I following the readme.md, initialize the virtual environment and install the dependencies, try executing python examples/ run_pro.py, error occurs: pickle5.pickle.PicklingError: Could not pickle object as excessively deep recursion required.

Create a new py file in the folder and execute just one line of code. from malib.scenarios.psro_scenario import PSROScenario .
The same error message will appear. How can I solve this problem?

Local_buffer_size is fixed at 10000

[mappo+gfootball]
agent_interface.py:142
capacity=local_buffer_config.get("size", 10000)
should buffer size be read from the config file?

Can not run the examples on GPU

When I run the examples such as psro_poker, maddpg_mpe, I set the config "use_cuda" as True, but got the error as follow:

2022-04-20 19:30:20,818 ERROR worker.py:80 -- Unhandled error (suppress with RAY_IGNORE_UNHANDLED_ERRORS=1): ray::RolloutWorker.simulation() (pid=32314, ip=172.28.78.34, repr=<matf.rollout.rollout_worker.RolloutWorker object at 0x7f083bb10fd0>)
(pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/base_worker.py", line 447, in simulation
(pid=32320) role="simulation",
(pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_worker.py", line 161, in sample
(pid=32320) for ret in rets:
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 65, in map
(pid=32320) yield self.get_next()
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/ray/util/actor_pool.py", line 178, in get_next
(pid=32320) return ray.get(future)
(pid=32320) ray.exceptions.RayTaskError(RuntimeError): ray::Stepping.run() (pid=32309, ip=172.28.78.34, repr=<matf.rollout.rollout_func.Stepping object at 0x7f243ce38190>)
(pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 447, in run
(pid=32320) dataset_server=self._dataset_server if task_type == "rollout" else None,
(pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 276, in env_runner
(pid=32320) active_policy_inputs, agent_interfaces, episodes
(pid=32320) File "/home/qianmd/work/test/matf/matf/rollout/rollout_func.py", line 155, in _do_policy_eval
(pid=32320) ) = interface.compute_action(**inputs) # 根据每个env_agent_id的态势信息，rnn_state, done 计算动作
(pid=32320) File "/home/qianmd/work/test/matf/matf/envs/agent_interface.py", line 268, in compute_action
(pid=32320) rets = self.policies[policy_id].compute_action(*args, **kwargs)
(pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/dqn/policy.py", line 88, in compute_action
(pid=32320) logits = torch.softmax(self.critic(observation), dim=-1)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
(pid=32320) result = self.forward(*input, **kwargs)
(pid=32320) File "/home/qianmd/work/test/matf/matf/algorithm/common/model.py", line 106, in forward
(pid=32320) pi = self.net(obs)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
(pid=32320) result = self.forward(*input, **kwargs)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
(pid=32320) input = module(input)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
(pid=32320) result = self.forward(*input, **kwargs)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 87, in forward
(pid=32320) return F.linear(input, self.weight, self.bias)
(pid=32320) File "/home/qianmd/anaconda3/envs/malib/lib/python3.7/site-packages/torch/nn/functional.py", line 1370, in linear
(pid=32320) ret = torch.addmm(bias, input, weight.t())
(pid=32320) RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm
(pid=32314) Exception ignored in: 'ray._raylet.task_execution_handler'

How to run in cluster or cross multiply machines?

Dear MALib support,

My question as following:

Can MALib run in cluster or multiply machines? How to set the config?
When running in single machines, how to set the config about agent number?

Thank you.

MAAC算法的示例?

看到文章说实现了MAAC 不知道如果运行这个算法? 在agent文件夹中没找到. 搜索也没有搜索到. 感谢

Docker image link in the documentation doesn't work

Hi, I have followed the installation guide on the documentation, and tried to run the quick start code, but I encountered several problems during installation. I noticed there is a link for a docker image for cross-platform use, however, when I tried to click that docker's link, it redirects me to the same document page. So, would you mind releasing the docker image in the next version too? Thank you.

Async training gets stuck in single-agent cases.

branch: vector-env

See the attached, there is only 1 training epoch for 100 rollout epochs.

script: python examples/async_simple.py

What if my action space is changing every step?

How to use DiagGaussianDistribution? the default is CategoricalDistribution

Hi, I am training my multi-agent environment (continuous action, continuous observation) with PSRO+PPO in branch "policy-support-baseline", see the paper Emergent Complexity via Multi-agent Competition for the specific environment. I found that in Policy's probability distribution, if the action space is continuous, a DiagGaussianDistribution is returned. see distribution.py ,line 876. But here on line 125 of malib/rl/pg/policy.py the default here is to use CategoricalDistribution (because only proba_distribution of the CategoricalDistribution class has the action_mask parameter),see the figure below. How do I use DiagGaussianDistribution? Can the logits obtained in the figure, line 122, help me to use it? I am looking forward to read your answer, thank you!

For SMAC Qimx/MADDPG config

It's really a nice work and according to your paper, running Qmix/MADDPG is really fast. But we didn't find a config about Qmix/MADDPG algorithm, We don't know how to run your Qmix/MADDPG program，so can you give us a config for the Qmix/MADDPG algorithm?

demo bugs

The demo has a bug:

when i add the 'env_id', there is another bug:

How to use PPO to train in psro_scenario

I can not find the implementation of PPO in this project.Through docs I know policy is compatible with Tianshou,but what about trainer?How can I use PPO to train in psro_scenario?I will appreciate it if you can answer my question.

Policy sampling issue in PSRO rollout module

I greatly appreciate your contribution! The lib helps a lot, especially considering the scarcity of open-source code related to PSRO in large games.

There seems to be a bug in rollout module of PSRO, specifically in the following section:

malib/malib/rollout/inference/ray/server.py

Line 115 in ea37d5d

spec_policy_id = spec.sample()

It appears that the policy of an agent is sampled every timestep, whereas I believe it should be sampled every episode.

By the way, the code is written with Ray as well as multi-threading and I find it difficult to debug (naive ray.init(local_mode = True) doesn't work). Could you suggest any elegant way of debugging?

Excellent!

万梓煜🐂👃

How to run the Mozi wargame environment?

Excellent repo! I noticed that the Mozi wargame environment was supported in MALIB，which was reported in the slide in RLChina 2021. But I didn't find the corresponding configuration in this repo. How should I run the experiment in the Mozi wargame environment? Is this part now open source?

pickle5 PicklingError

Environment:
1. Linux version 3.10.0-1160.71.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) )
2. conda 4.5.4

command:
1. conda create -n malib python==3.7 -y
2. conda activate malib
3. ./install.sh
4. python examples/run_psro.py

bug report:

KeyError: (<class 'RuntimeError'>, ('numba jitted function aborted due to unresolved symbol',), None)
During handling of the above exception, another exception occurred:
RecursionError: maximum recursion depth exceeded while calling a Python object

The above exception was the direct cause of the following exception:
pickle5.pickle.PicklingError: Could not pickle object as excessively deep recursion required.

1.log

Performance Results

Throughput Comparison

All the experiment results listed are obtained with one of the following hardware settings: (1) System # 1: a 32-core computing node with dual graphics cards. (2) System # 2: a two-node cluster with each node owning 128-core and a single graphics card. All the GPUs mentioned are of the same model (NVIDIA RTX3090).

Throughput comparison among the existing RL frameworks and MALib. Due to resource limitation (32 cores, 256G RAM), RLlib fails under heavy loads (CPU case: #workers >32, GPU case: #workers > 8). MALib outperforms other frameworks with only CPU and achieves comparable performance with the highly tailored framework Sample-Factory with GPU despite higher abstraction introduced. To better illustrate the scalability of MALib, we show the MA-Atari and SC2 throughput on System # 2 under different worker settings, the 512-workers group on SC2 fails due to resource limitation.

Additional comparisons between MALib and other distributed RL training frameworks. (Left): System # 3 cluster throughput of MALib in 2-player MA-Atari and 3-player SC2. (Middle): 4-player MA-Atari throughput comparison on System # 1 without GPU. (Right)} 4-player MA-Atari throughput comparison on System # 1 with GPU.

Wall-time & Performance of PB-MARL Algorithm

Comparisons of PSRO between MALib and OpenSpiel. (a) indicates that MALib achieves the same performance on
exploitability as OpenSpiel; (b) shows that the convergence rate of MALib is 3x faster than OpenSpiel; (c) shows that MALib
achieves a higher execution efficiency than OpenSpiel, since it requires less time consumption to iterate the same learning steps, which means MALib has the potential to scale up in more complex tasks that need to run for much more steps.

Typical MARL Algorithms

Results on Multi-agent Particle Environments

Comparisons of MADDPG in simple adversary under different rollout worker settings. Figures in the top row depict each agent's episode reward w.r.t. the number of sampled episodes, which indicates that MALib converges faster than RLlib with equal sampled episodes. Figures in the bottom row show the average time and average episode reward at the same number of sampled episodes, which indicates that MALib achieves 5x speedup than RLlib.

Scenario Crypto

Simple Push

Simple Reference

Simple Speaker Listener

Simple Tag

TypeError: load() missing 1 required positional argument: 'Loader' in main branch

When I use the command "python run_atari_game.py --config examples/configs/maatari/dqn_basketball_pong.yaml
" to setup the atari in examples. will raise an error:

Traceback (most recent call last):
  File "run_atari_game.py", line 26, in <module>
    config = yaml.load(f)
TypeError: load() missing 1 required positional argument: 'Loader'

An error in psro_leduc_poker.py

when I run psro_leduc_poker.py, an error about ray occurs as follows:

/home/mm/anaconda3/envs/malib_330/bin/python /data/XXX/malib/examples/psro_leduc_poker.py
WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
[2022-03-30 21:53:36,067][INFO] registered request handler=optimization
[2022-03-30 21:53:36,067][INFO] registered request handler=simulation
[2022-03-30 21:53:36,068][INFO] registered request handler=evaluate
[2022-03-30 21:53:36,068][INFO] registered request handler=update_payofftable
[2022-03-30 21:53:36,068][INFO] registered request handler=rollout
2022-03-30 21:53:36,587 INFO services.py:1166 -- View the Ray dashboard at http://127.0.0.1:8265
[2022-03-30 21:53:37,672][INFO] Ray lauched: {'node_ip_address': '192.168.43.185', 'raylet_ip_address': '192.168.43.185', 'redis_address': '192.168.43.185:6379', 'object_store_address': '/tmp/ray/session_2022-03-30_21-53-36_069448_64379/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-03-30_21-53-36_069448_64379/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-03-30_21-53-36_069448_64379', 'metrics_export_port': 50956}
[2022-03-30 21:53:37,674][INFO] Ray cluster resources info: {XXXXXXX}
Logger server up!
(pid=64609) [2022-03-30 21:53:38,698][INFO] dataset server initialized with (table_capacity=200000 table_learning_start=64)
(pid=64635) WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
(pid=64635) /data/XXX/malib/malib/evaluator/utils/payoff_table.py:20: DeprecationWarning: np.bool is a deprecated alias for the builtin bool. To silence this warning, use bool by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.bool_ here.
(pid=64635) Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
(pid=64635) self.simulation_flag = np.zeros([0] * len(self.agents), dtype=np.bool)
(pid=64621) [2022-03-30 21:53:39,902][INFO] ray.get_gpu_ids(): []
(pid=64621) [2022-03-30 21:53:39,902][INFO] CUDA_VISIBLE_DEVICES:
(pid=64634) [2022-03-30 21:53:39,868][INFO] ray.get_gpu_ids(): []
(pid=64634) [2022-03-30 21:53:39,868][INFO] CUDA_VISIBLE_DEVICES:
(pid=64634) WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
(pid=64635) [2022-03-30 21:53:40,260][INFO] training manager launched, 2 learner(s) created
(pid=64635) [2022-03-30 21:53:40,261][INFO] set worker num as 2
(pid=64621) WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
(pid=64635) [2022-03-30 21:53:40,282][INFO] RolloutWorker manager launched, 2 rollout worker(s) alives.
(pid=64635) [2022-03-30 21:53:40,423][INFO] Coordinator server started
(pid=64633) WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
(pid=64585) WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
2022-03-30 21:53:45,409 ERROR worker.py:1018 -- Possible unhandled error from worker: ray::CoordinatorServer.request() (pid=64635, ip=192.168.43.185)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/data/XXX/malib/malib/backend/coordinator/server.py", line 166, in request
f"Missing handler for task type {task_request.task_type.value}"
AttributeError: Missing handler for task type simulation
2022-03-30 21:53:46,410 ERROR worker.py:1018 -- Possible unhandled error from worker: ray::CoordinatorServer.request() (pid=64635, ip=192.168.43.185)
File "python/ray/_raylet.pyx", line 484, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 438, in ray._raylet.execute_task.function_executor
File "/data/XXX/malib/malib/backend/coordinator/server.py", line 166, in request
f"Missing handler for task type {task_request.task_type.value}"
AttributeError: Missing handler for task type simulation

I would be appreciate if you could give me some suggestions about how to solve it.

Additionally, I use the version of ray=1.0.0 and cloudpickle=1.6.0, because ray=1.8.0 is incompatible with cloudpickle=1.6.0(the same incompatible problem can see https://blog.csdn.net/weixin_42769131/article/details/121526206 ).

May be performance issues when redefining lots of remote actors

malib/malib/manager/rollout_worker_manager.py

Lines 68 to 77 in 3982db1

    
           for i in range(worker_num): 
        
               worker_idx = _get_worker_hash_idx(i) 
        
               worker_cls = rollout_worker_cls.as_remote( 
        
                   num_cpus=None, 
        
                   num_gpus=None, 
        
                   memory=None, 
        
                   object_store_memory=None, 
        
                   resources=None, 
        
               )

It seems each definition of worker_cls will be pickled and exported through Redis.

Ref:
ray-project/ray#6240

It would be better put the worker_cls definition out of the for loop:)

'List' from 'malib.utils.typing'

Hi I am trying to run a basic MARL setup using MAPPO.

Here's my yaml config file

name: "mappo_payload_carry"

training:
  interface:
    type: "centralized"
    population_size: -1
  config:
    # control the frequency of remote parameter update
    update_interval: 1
    saving_interval: 100
    batch_size: 32
    optimizer: "Adam"
    actor_lr: 5.e-4
    critic_lr: 5.e-4
    opti_eps: 1.e-5
    weight_decay: 0.0

rollout:
  type: "async"
  stopper: "simple_rollout"
  stopper_config:
    max_step: 10000
  metric_type: "simple"
  fragment_length: 100
  num_episodes: 4
  episode_seg: 1
  terminate: "any"
  num_env_per_worker: 1
  postprocessor_types:
    - copy_next_frame

env_description:
  #  scenario_name: "simple_spread"
  creator: "Gym"
  config:
    env_id: "urdf-env-v0"

algorithms:
  MAPPO:
    name: "MAPPO"
    model_config:
      initialization:
        use_orthogonal: True
        gain: 1.
      actor:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False
      critic:
        network: mlp
        layers:
          - units: 256
            activation: ReLU
          - units: 128
            activation: ReLU
          - units: 64
            activation: ReLU
        output:
          activation: False

    # set hyper parameter
    custom_config:
      gamma: 0.99
      use_cuda: False  # enable cuda or not
      use_q_head: False
      ppo_epoch: 4
      num_mini_batch: 1  # the number of mini-batches

      return_mode: gae
      gae:
        gae_lambda: 0.95
      vtrace:
        clip_rho_threshold: 1.0
        clip_pg_rho_threshold: 1.0


      use_rnn: False
      # this is not used, instead it is fixed to last hidden in actor/critic
      rnn_layer_num: 1
      rnn_data_chunk_length: 16

      use_feature_normalization: True
      use_popart: True
      popart_beta: 0.99999

      entropy_coef: 1.e-2



global_evaluator:
  name: "generic"

dataset_config:
  episode_capacity: 100
  fragment_length: 3001```

I have a custom environment where I created the env.

env = gym.make("urdf-env-v0", dt=0.01, robots=robots, render=render)
possible_agents = env.possible_agents
action_spaces = env.possible_actions
observation_spaces = env.observation_spaces
env_desc = {"creator": env, "possible_agents": possible_agents, "action_spaces": action_spaces, "observation_spaces": observation_spaces}
run(
    group=config["group"],
    name=config["name"],
    env_description=env_desc,
    agent_mapping_func=lambda agent: agent[
                                     :6
                                     ],  # e.g. "team_0_player_0" -> "team_0"
    training=training_config,
    algorithms=config["algorithms"],
    rollout=rollout_config,
    evaluation=config.get("evaluation", {}),
    global_evaluator=config["global_evaluator"],
    dataset_config=config.get("dataset_config", {}),
    parameter_server=config.get("parameter_server", {}),
    # worker_config=config["worker_config"],
    use_init_policy_pool=False,
    task_mode="marl",
)

I tried to see if malib.utils.typing had the List, Dict types, but it looks like they're non existent there, how do I fix this?

Run example error

Hi, when i run the run_gym.py, it reports the following error:

could you give me some help?

[mappo+gfootball] Failed to run on a ray cluster

I tried to run this branch on a ray cluster, however got error messages below:

ray.exceptions.RayTaskError(_InactiveRpcError): �[36mray::RolloutWorker.get_status()�[39m
  File "python/ray/_raylet.pyx", line 422, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 422, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 456, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 459, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
  File "/home/malib_cls_1206/malib/rollout/rollout_worker.py", line 44, in __init__
    self, worker_index, env_desc, metric_type, remote, save, **kwargs
  File "/home/malib_cls_1206/malib/rollout/base_worker.py", line 102, in __init__
    **kwargs["exp_cfg"],
  File "/home/malib_cls_1206/malib/utils/logger/__init__.py", line 249, in get_logger
    primary=expr_group, secondary=expr_name
  File "/home/malib_cls_1206/malib/rpc/ExperimentManager/ExperimentClient.py", line 73, in create_table
    self._create_table_callback(future.result()[0])
  File "/home/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/home/anaconda3/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/home/anaconda3/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/malib_cls_1206/malib/rpc/ExperimentManager/ExperimentClient.py", line 47, in _create_table
    table_key = stub.CreateTable(table_name, **kwargs)
  File "/home/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/anaconda3/lib/python3.6/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses"
	debug_error_string = "{"created":"@1638866326.987521646","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4133,"referenced_errors":[{"created":"@1638866326.987518864","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
>

I just changed runner.py:61 to let ray runtime attach to ray cluster built beforehand. And num_episodes and other resources related parameters were also set up to a small value.

3s5z in PyMARL can quickly reached to win_rate of 1 and where is the config for SMAC of malib?

Thanks for this nice repo. I'm interested in MARL for smac. I have some problem about this repo.

In the page 18 of your arxiv paper, you mentioned that "For the scenario 3s5z, however, both of MALib and PyMARL cannot reach 80% win rate." However, I run the PyMARL original code with his default config python3 src/main.py --config=qmix --env-config=sc2 with env_args.map_name=3s5z, the win rate quickly reached 1 for these two smac versions (4.10 and 4.6.2).
I would like to use this repo to run smac, but I can't find the corresponding config in examples, will this section be opend source? Thank you.

SMARTS(Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving)

Hi!
I learned on RLChina that MAlib supports SMARTS, but I couldn't find an example of using a SMARTS environment in this repo.

Thasnks!

Error in psro_leduc_poker.py

when run psro_leduc_poker.py ， there is an error as follows:
……
(pid=262604) /opt/anaconda/envs/malib/lib/python3.7/site-packages/supersuit/init.py:20: UserWarning: You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.
(pid=262604) warnings.warn("You're using SuperSuit 3.0, released 7/7/21. The entire codebase has been rewritten or refactored as part of this release. While we've tested it thoroughly, please ensure everything you're doing still works properly and report any issues at https://github.com/PettingZoo-Team/SuperSuit. This warning will be removed 2 months after release.")
(pid=253991) ######### payoff:
(pid=253991) [[-1.01]]
(pid=253991) ######### equilibriumn: {'player_0': {'DQN_0': 1.0}, 'player_1': {'DQN_0': 1.0}}
(pid=253991) ######### exploitability: 4.729025763649188
(pid=253984) build SimpleRolloutStopper: 55
(pid=253985) build SimpleRolloutStopper: 55
(pid=253993) terminate called after throwing an instance of 'boost::wrapexceptboost::system::system_error'
(pid=253993) what(): thread: Resource temporarily unavailable
2021-09-10 07:33:32,796 WARNING worker.py:1115 -- The autoscaler failed with the following error:
Traceback (most recent call last):
File "/opt/anaconda/envs/malib/lib/python3.7/site-packages/ray/_private/monitor.py", line 284, in run
self._run()
File "/opt/anaconda/envs/malib/lib/python3.7/site-packages/ray/_private/monitor.py", line 175, in _run
self.update_load_metrics()
File "/opt/anaconda/envs/malib/lib/python3.7/site-packages/ray/_private/monitor.py", line 140, in update_load_metrics
request, timeout=4)
File "/opt/anaconda/envs/malib/lib/python3.7/site-packages/grpc/_channel.py", line 946, in call
return _end_unary_response_blocking(state, call, False, None)
File "/opt/anaconda/envs/malib/lib/python3.7/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1631259206.469934596","description":"Error received from peer ipv4:10.12.0.6:45239","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Deadline Exceeded","grpc_status":4}"

2021-09-10 07:33:33,449 WARNING worker.py:1115 -- A worker died or was killed while executing task ffffffffffffffffa047959d2ed953e0f622778b01000000.

Quick Start Error

Traceback (most recent call last):
File "demo.py", line 3, in
from malib.envs.poker import poker_aec_env as leduc_holdem
File "/root/malib/malib/envs/init.py", line 4, in
from .poker import PokerEnv
File "/root/malib/malib/envs/poker/init.py", line 1, in
from .poker_aec_env import env as PokerEnv
File "/root/malib/malib/envs/poker/poker_aec_env.py", line 11, in
from open_spiel.python.rl_environment import Environment as OPEN_SPIEL_ENV, TimeStep
ModuleNotFoundError: No module named 'open_spiel'

demo.py:
"""PSRO with PPO for Leduc Holdem"""

from malib.envs.poker import poker_aec_env as leduc_holdem
from malib.runner import run
from malib.rollout import rollout_func

env = leduc_holdem.env(fixed_player=True)

run(
agent_mapping_func=lambda agent_id: agent_id,
env_description={
"creator": leduc_holdem.env,
"config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}
"possible_agents": env.possible_agents,
},
training={
"interface": {
"type": "independent",
"observation_spaces": env.observation_spaces,
"action_spaces": env.action_spaces
},
},
algorithms={
"PSRO_PPO": {
"name": "PPO",
"custom_config": {
"gamma": 1.0,
"eps_min": 0,
"eps_max": 1.0,
"eps_decay": 100,
},
}
},
rollout={
"type": "async",
"stopper": "simple_rollout",
"callback": rollout_func.sequential
}
)

A question for quick start

Thanks for the nice work.
I want to ask for some help about the quick start.
First, could you please give some instruction of running single rl in gym-based env, such as cartpole? I tried to run the file-run_gym.py and load yaml, but it doesnt work. Second, do we have to install open-spiel? Can we skip it?

Look forward to hearing from you. Cheers~

Best,
Yutong

How to deploy malib in GPU cluster server

I have a login node and four computing nodes.
Does malib need to be deployed on each node?
Is the job task submitted on the login node?
Is there a detailed example of cluster server usage?
Looking forward to your help

How to use the malib to rollout based on a trained model?

excuse me, when the training is done, where is the model saved, and how to use the model to rollout? Besides, how to replay the training process, can we render the envs like mpe, magent

A Minor Error in Quick Start Demo Code

In README Quick Start part

  env_description={
        "creator": leduc_holdem.env,
        "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}
        "possible_agents": env.possible_agents,
    }

A missing , after "config": {"scenario_configs": {"fixed_player": True}, "env_id": "leduc_holdem"}

thread: Resource temporarily unavailable

Ray cluster crashed when num_episodes is set up to 64 and higher.

`�[37m�[1m[2021-12-13 14:49:20,867][INFO] registered request handler=optimization�[0m
�[37m�[1m[2021-12-13 14:49:20,867][INFO] registered request handler=simulation�[0m
�[37m�[1m[2021-12-13 14:49:20,867][INFO] registered request handler=evaluate�[0m
�[37m�[1m[2021-12-13 14:49:20,867][INFO] registered request handler=update_payofftable�[0m
�[37m�[1m[2021-12-13 14:49:20,867][INFO] registered request handler=rollout�[0m
�[37m�[1m[2021-12-13 14:49:20,870][INFO] Pre launch checking for Coordinator server ... <function _request_simulation at 0x7fee8480d0d0>�[0m
2021-12-13 14:49:20,873 INFO worker.py:657 -- Connecting to existing Ray cluster at address:
�[2m�[36m(pid=219, 168)�[0m �[37m�[1m[2021-12-13 14:49:31,968][INFO] dataset server initialized with (table_capacity=256 table_learning_start=64)�[0m
�[2m�[36m(pid=220, 188)�[0m WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:32,388][INFO] registered request handler=optimization�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:32,389][INFO] registered request handler=simulation�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:32,389][INFO] registered request handler=evaluate�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:32,389][INFO] registered request handler=update_payofftable�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:32,389][INFO] registered request handler=rollout�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,107][INFO] ray.get_gpu_ids(): [7]�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,108][INFO] CUDA_VISIBLE_DEVICES: 7�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:35,365][INFO] training manager launched, 1 learner(s) created�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:35,366][INFO] set worker num as 1�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:35,373][INFO] RolloutWorker manager launched, 1 rollout worker(s) alives.�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:35,374][INFO] use_init_policy_pool: False�[0m
�[2m�[36m(pid=380)�[0m WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,344][INFO] registered request handler=optimization�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,344][INFO] registered request handler=simulation�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,344][INFO] registered request handler=evaluate�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,344][INFO] registered request handler=update_payofftable�[0m
�[2m�[36m(pid=380)�[0m �[37m�[1m[2021-12-13 14:49:35,344][INFO] registered request handler=rollout�[0m
�[2m�[36m(pid=508)�[0m WARNING:root:Cannot import alpharank utils, if you wanna run meta game experiments, please install open_spiel before that.
�[2m�[36m(pid=508)�[0m �[37m�[1m[2021-12-13 14:49:37,428][INFO] registered request handler=optimization�[0m
�[2m�[36m(pid=508)�[0m �[37m�[1m[2021-12-13 14:49:37,428][INFO] registered request handler=simulation�[0m
�[2m�[36m(pid=508)�[0m �[37m�[1m[2021-12-13 14:49:37,428][INFO] registered request handler=evaluate�[0m
�[2m�[36m(pid=508)�[0m �[37m�[1m[2021-12-13 14:49:37,428][INFO] registered request handler=update_payofftable�[0m
�[2m�[36m(pid=508)�[0m �[37m�[1m[2021-12-13 14:49:37,428][INFO] registered request handler=rollout�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:39,592][INFO] Coordinator server started�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:39,635][INFO] request: TaskType.OPTIMIZE�[0m
�[2m�[36m(pid=220, 188)�[0m �[37m�[1m[2021-12-13 14:49:39,636][INFO] request: TaskType.ROLLOUT�[0m
�[2m�[36m(pid=219, 168)�[0m �[37m�[1m[2021-12-13 14:49:39,726][INFO] created data table: PSGFootball_team_0_MAPPO_0�[0m
�[2m�[36m(pid=219, 168)�[0m terminate called after throwing an instance of 'boost::wrapexceptboost::system::system_error'
�[2m�[36m(pid=219, 168)�[0m what(): thread: Resource temporarily unavailable
2021-12-13 14:51:18,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=259, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:19,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=226, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:19,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=233, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:19,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=227, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:19,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=256, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:19,750 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=263, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:20,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=240, 94)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:24,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=224, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:24,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=277, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:24,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=228, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:24,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=219, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:25,751 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=267, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:25,752 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=222, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:25,752 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=288, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:25,752 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=225, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:26,752 ERROR worker.py:980 -- Possible unhandled error from worker: �[36mray::Stepping.run()�[39m (pid=226, 120)
File "python/ray/_raylet.pyx", line 463, in ray._raylet.execute_task
File "python/ray/_raylet.pyx", line 415, in ray._raylet.execute_task.function_executor
File "/home/////malib/utils/logger/init.py", line 136, in wrapper
return func(*args, **kwargs)
File "/home/////malib/rollout/rollout_func.py", line 431, in run
dataset_server=self._dataset_server if task_type == "rollout" else None,
File "/home/////malib/rollout/rollout_func.py", line 291, in env_runner
batch = ray.get(dataset_server.get_producer_index.remote(buffer_desc))
ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task.
2021-12-13 14:51:37,933 WARNING worker.py:1034 -- The node with node id 81c5e01345f7d92b30121df0b3af788325462cb9 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.

Identify the difference between data saving and no eough data

OfflineDataSet cannot identify the differences between sampled data saving and no enough data at the stage of data sampling. So that trainer may be stuck at the request_data.

Proposal: use a table status to identify the differences

class Table:
    def save(self):
        self.status = Status.IN_PROCESS
        # ...
        self.status  = Status.DONE

class OfflineDataset:
    # ...
    def sample(self, *args, **kwargs):
        # ...
        try:
            if table.status == Status.IN_PROCESS:
                raise BusyError
            if table.size < self._learn_start:
                raise NoEnoughDataError
            batch = table.sample(batch_size)
        except BusyError as e:
            info = BusyError
        except NoEnoughDataError as e:
            info = NoEnoughDataError
        except Exception as e:
            info = str(e)
        return batch, info

Fictitious Self-play and Self-play documentaion needed

Hi, thanks for providing such a powerful open-source framework. This is an awesome repo for MARL training and is what we needed the most to implement some distributed training algorithm without building a framework from scratch. While I appreciate how easy to train agents with the PSRO algorithm using this repo, I would like to see more training algorithms coming in the future. I was reading the paper of your work, and I find the following sentence in that paper:
"In the initial implementation, we provided three PB-MARL algorithms support, they are Policy Space Response Oracle [27] (PSRO), Fictitious Self-play [8] (FSP), Self-play [9] (SP) and Population-based Training [14] (PBT)" (Doesn't that make four rather than three algorithms already?)
If my understanding was right, you guys have already integrated the training algorithms such as Fictitious Self-play in this repo now, right? However, I could not find any documentation about how to use the FSP or PBT algorithm to train my agents. Would you guys mind to provide more examples of using other training algorithms in the future? Thanks very much.

	for i in range(worker_num):
	worker_idx = _get_worker_hash_idx(i)

	worker_cls = rollout_worker_cls.as_remote(
	num_cpus=None,
	num_gpus=None,
	memory=None,
	object_store_memory=None,
	resources=None,
	)

sjtu-marl / malib Goto Github PK

malib's Introduction

MALib: A parallel framework for population-based reinforcement learning

Installation

Environments

Algorithms and Scenarios

Quick Start

Documentation

Contributing

Citing MALib

malib's People

Contributors

Stargazers

Watchers

Forkers

malib's Issues

Throughput Comparison

Wall-time & Performance of PB-MARL Algorithm

Typical MARL Algorithms

Results on Multi-agent Particle Environments

Recommend Projects

Recommend Topics

Recommend Org