Giter Site home page Giter Site logo

tonghanwang / dop Goto Github PK

View Code? Open in Web Editor NEW
51.0 51.0 16.0 121 KB

Codes accompanying the paper "DOP: Off-Policy Multi-Agent Decomposed Policy Gradients" (ICLR 2021, https://arxiv.org/abs/2007.12322)

License: Apache License 2.0

Shell 2.00% Python 98.00%

dop's People

Contributors

tonghanwang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

dop's Issues

Epsilon is applied twice?

Hello there, Thank you for sharing your code!

We are a bit confused about a possible bug in the application of epsilon during the action selection. In the controller during the forward call an epsilon floor is applied after the softmax , here's that part of the code:

class BasicMAC:
    ...
    def forward(self, ep_batch, t, test_mode=False):
        ...
        # Softmax the agent outputs if they're policy logits
        if self.agent_output_type == "pi_logits":
            ...
            agent_outs = th.nn.functional.softmax(agent_outs, dim=-1)

            if not test_mode:
                # Epsilon floor
                epsilon_action_num = agent_outs.size(-1)
                if getattr(self.args, "mask_before_softmax", True):
                    # With probability epsilon, we will pick an available action uniformly
                    epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()

                agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
                               + th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)
               ...

but then during the multinomial action selection the actions are chosen randomly with probability epsilon as its shown in the following part of the code:

class MultinomialActionSelector():
    ...
    def select_action(self, agent_inputs, avail_actions, t_env, test_mode=False):
            ....
            picked_actions = Categorical(masked_policies).sample().long()

            random_numbers = th.rand_like(agent_inputs[:, :, 0])
            pick_random = (random_numbers < self.epsilon).long()
            random_actions = Categorical(avail_actions.float()).sample().long()
            picked_actions = pick_random * random_actions + (1 - pick_random) * picked_actions
            ...
            

In our understanding, both of this code segments are appliying the epsilon probability of picking a random action, the problem is that they're applied sequentialy, so we think the probability of picking randomly ends up being greater than intended. Could you check if this is right?

Thank you very much!

Query about deterministic DOP

The deterministic DOP algorithm is presented in the conference paper, but I haven't found it in this project.
I'm very interested in it, and can you please release its open source code?
Thanks very much for your help!

Pymarl代码规范应当遵守

作者你好,既然在pyMarl框架上集成了DOP算法,应当按照PyMARL的代码框架来写,不该修改run.py中的代码,这样别的在集成别的算法时,就会出现冲突。
最好的方法是,所有的逻辑都在learner中的train方法中去实现。
一个小建议,仅供参考。

CUDA error: device-side assert triggered

I set batch_size_run=8 and map=3s5z_vs_3s6z optimizer=adam batch_size=32 off_batch_size=64.
When the training reaches 10million steps, it look likes something crashed:
Easy map like 8m is OK.

File "/home/xxxx/xxxx/pymarl/src/run2.py", line 49, in run2
    run_sequential(args=args, logger=logger)
  File "/home/xxxx/xxxx/pymarl/src/run2.py", line 188, in run_sequential
    episode_batch = runner.run(test_mode=False)
  File "/home/xxxx/xxxx/pymarl/src/runners/parallel_runner.py", line 107, in run
    actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, bs=envs_not_terminated, test_mode=test_mode)
  File "/home/xxxx/xxxx/pymarl/src/controllers/basic_controller.py", line 23, in select_actions
    chosen_actions = self.action_selector.select_action(agent_outputs[bs], avail_actions[bs], t_env, test_mode=test_mode)
  File "/home/xxxx/xxxx/pymarl/src/components/action_selectors.py", line 100, in select_action
    not (th.gather(avail_actions, dim=2, index=picked_actions.unsqueeze(2)) > 0.99).all():
RuntimeError: CUDA error: device-side assert triggered

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1019, in wait
    return self._wait(timeout=timeout)
  File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1645, in _wait
    raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['tee', '-a', '/tmp/tmpt92ayqy0']' timed out after 1 seconds

TypeError: expected str, bytes or os.PathLike object, not NoneType

Hi, thanks for sharing your excellent work!!
I followed your installation tutorial, but I encounter the problem of sacred == 0.7.2(from oxwhirl/fork)
Traceback (most recent call last): File "src/main.py", line 19, in <module> ex = Experiment("pymarl") File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/experiment.py", line 72, in __init__ super(Experiment, self).__init__(path=name, File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/ingredient.py", line 57, in __init__ gather_sources_and_dependencies(_caller_globals) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 487, in gather_sources_and_dependencies sources = gather_sources(globs, experiment_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 440, in get_sources_from_imported_modules return get_sources_from_modules(iterate_imported_modules(globs), base_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 409, in get_sources_from_modules filename = os.path.abspath(mod.__file__) File "/home/public/anaconda3/envs/torch/lib/python3.8/posixpath.py", line 374, in abspath path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType

How can I do?

The plot

Hello! Did you plot the picture in your paper by seaborn? It seems there's no code related to seaborn in this repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.