tonghanwang / dop Goto Github PK

Codes accompanying the paper "DOP: Off-Policy Multi-Agent Decomposed Policy Gradients" (ICLR 2021, https://arxiv.org/abs/2007.12322)

License: Apache License 2.0

Shell 2.00% Python 98.00%

dop's People

Contributors

Stargazers

Watchers

Forkers

czp199468 yyf17 wanghuimu tarun018 yuanleirl yuchen-x lamperougeyxy hongyonghan jiaoalvin 2020-ai-zx hsuth1996 1170300430 xueliu8617112 beaulolve hejichao2020 qst75693

dop's Issues

Epsilon is applied twice?

Hello there, Thank you for sharing your code!

We are a bit confused about a possible bug in the application of epsilon during the action selection. In the controller during the forward call an epsilon floor is applied after the softmax , here's that part of the code:

class BasicMAC:
    ...
    def forward(self, ep_batch, t, test_mode=False):
        ...
        # Softmax the agent outputs if they're policy logits
        if self.agent_output_type == "pi_logits":
            ...
            agent_outs = th.nn.functional.softmax(agent_outs, dim=-1)

            if not test_mode:
                # Epsilon floor
                epsilon_action_num = agent_outs.size(-1)
                if getattr(self.args, "mask_before_softmax", True):
                    # With probability epsilon, we will pick an available action uniformly
                    epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()

                agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
                               + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)
               ...

but then during the multinomial action selection the actions are chosen randomly with probability epsilon as its shown in the following part of the code:

class MultinomialActionSelector():
    ...
    def select_action(self, agent_inputs, avail_actions, t_env, test_mode=False):
            ....
            picked_actions = Categorical(masked_policies).sample().long()

            random_numbers = th.rand_like(agent_inputs[:, :, 0])
            pick_random = (random_numbers < self.epsilon).long()
            random_actions = Categorical(avail_actions.float()).sample().long()
            picked_actions = pick_random * random_actions + (1 - pick_random) * picked_actions
            ...

In our understanding, both of this code segments are appliying the epsilon probability of picking a random action, the problem is that they're applied sequentialy, so we think the probability of picking randomly ends up being greater than intended. Could you check if this is right?

Thank you very much!

Query about deterministic DOP

The deterministic DOP algorithm is presented in the conference paper, but I haven't found it in this project.
I'm very interested in it, and can you please release its open source code?
Thanks very much for your help!

Pymarl代码规范应当遵守

作者你好，既然在pyMarl框架上集成了DOP算法，应当按照PyMARL的代码框架来写，不该修改run.py中的代码，这样别的在集成别的算法时，就会出现冲突。
最好的方法是，所有的逻辑都在learner中的train方法中去实现。
一个小建议，仅供参考。

CUDA error: device-side assert triggered

I set batch_size_run=8 and map=3s5z_vs_3s6z optimizer=adam batch_size=32 off_batch_size=64.
When the training reaches 10million steps, it look likes something crashed:
Easy map like 8m is OK.

File "/home/xxxx/xxxx/pymarl/src/run2.py", line 49, in run2
    run_sequential(args=args, logger=logger)
  File "/home/xxxx/xxxx/pymarl/src/run2.py", line 188, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/home/xxxx/xxxx/pymarl/src/runners/parallel_runner.py", line 107, in run
    actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, bs=envs_not_terminated, test_mode=test_mode)
  File "/home/xxxx/xxxx/pymarl/src/controllers/basic_controller.py", line 23, in select_actions
    chosen_actions = self.action_selector.select_action(agent_outputs[bs], avail_actions[bs], t_env, test_mode=test_mode)
  File "/home/xxxx/xxxx/pymarl/src/components/action_selectors.py", line 100, in select_action
    not (th.gather(avail_actions, dim=2, index=picked_actions.unsqueeze(2)) > 0.99).all():
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1019, in wait
    return self._wait(timeout=timeout)
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1645, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['tee', '-a', '/tmp/tmpt92ayqy0']' timed out after 1 seconds

TypeError: expected str, bytes or os.PathLike object, not NoneType

Hi, thanks for sharing your excellent work!!
I followed your installation tutorial, but I encounter the problem of sacred == 0.7.2(from oxwhirl/fork)
Traceback (most recent call last): File "src/main.py", line 19, in <module> ex = Experiment("pymarl") File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/experiment.py", line 72, in __init__ super(Experiment, self).__init__(path=name, File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/ingredient.py", line 57, in __init__ gather_sources_and_dependencies(_caller_globals) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 487, in gather_sources_and_dependencies sources = gather_sources(globs, experiment_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 440, in get_sources_from_imported_modules return get_sources_from_modules(iterate_imported_modules(globs), base_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 409, in get_sources_from_modules filename = os.path.abspath(mod.__file__) File "/home/public/anaconda3/envs/torch/lib/python3.8/posixpath.py", line 374, in abspath path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType

How can I do?

About the didactic example in the paper

Hello!
I wonder what the input of the actor in the stateless game is in your paper.
thx!

The plot

Hello! Did you plot the picture in your paper by seaborn? It seems there's no code related to seaborn in this repo.

tonghanwang / dop Goto Github PK

dop's People

Contributors

Stargazers

Watchers

Forkers

dop's Issues

Epsilon is applied twice?

Query about deterministic DOP

Pymarl代码规范应当遵守

CUDA error: device-side assert triggered

TypeError: expected str, bytes or os.PathLike object, not NoneType

About the didactic example in the paper

The plot

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent