tonghanwang / dop Goto Github PK
View Code? Open in Web Editor NEWCodes accompanying the paper "DOP: Off-Policy Multi-Agent Decomposed Policy Gradients" (ICLR 2021, https://arxiv.org/abs/2007.12322)
License: Apache License 2.0
Codes accompanying the paper "DOP: Off-Policy Multi-Agent Decomposed Policy Gradients" (ICLR 2021, https://arxiv.org/abs/2007.12322)
License: Apache License 2.0
Hello there, Thank you for sharing your code!
We are a bit confused about a possible bug in the application of epsilon during the action selection. In the controller during the forward call an epsilon floor is applied after the softmax , here's that part of the code:
class BasicMAC:
...
def forward(self, ep_batch, t, test_mode=False):
...
# Softmax the agent outputs if they're policy logits
if self.agent_output_type == "pi_logits":
...
agent_outs = th.nn.functional.softmax(agent_outs, dim=-1)
if not test_mode:
# Epsilon floor
epsilon_action_num = agent_outs.size(-1)
if getattr(self.args, "mask_before_softmax", True):
# With probability epsilon, we will pick an available action uniformly
epsilon_action_num = reshaped_avail_actions.sum(dim=1, keepdim=True).float()
agent_outs = ((1 - self.action_selector.epsilon) * agent_outs
+ th.ones_like(agent_outs) * self.action_selector.epsilon/epsilon_action_num)
...
but then during the multinomial action selection the actions are chosen randomly with probability epsilon as its shown in the following part of the code:
class MultinomialActionSelector():
...
def select_action(self, agent_inputs, avail_actions, t_env, test_mode=False):
....
picked_actions = Categorical(masked_policies).sample().long()
random_numbers = th.rand_like(agent_inputs[:, :, 0])
pick_random = (random_numbers < self.epsilon).long()
random_actions = Categorical(avail_actions.float()).sample().long()
picked_actions = pick_random * random_actions + (1 - pick_random) * picked_actions
...
In our understanding, both of this code segments are appliying the epsilon probability of picking a random action, the problem is that they're applied sequentialy, so we think the probability of picking randomly ends up being greater than intended. Could you check if this is right?
Thank you very much!
The deterministic DOP algorithm is presented in the conference paper, but I haven't found it in this project.
I'm very interested in it, and can you please release its open source code?
Thanks very much for your help!
作者你好,既然在pyMarl框架上集成了DOP算法,应当按照PyMARL的代码框架来写,不该修改run.py中的代码,这样别的在集成别的算法时,就会出现冲突。
最好的方法是,所有的逻辑都在learner中的train方法中去实现。
一个小建议,仅供参考。
I set batch_size_run=8 and map=3s5z_vs_3s6z optimizer=adam batch_size=32 off_batch_size=64.
When the training reaches 10million steps, it look likes something crashed:
Easy map like 8m is OK.
File "/home/xxxx/xxxx/pymarl/src/run2.py", line 49, in run2
run_sequential(args=args, logger=logger)
File "/home/xxxx/xxxx/pymarl/src/run2.py", line 188, in run_sequential
episode_batch = runner.run(test_mode=False)
File "/home/xxxx/xxxx/pymarl/src/runners/parallel_runner.py", line 107, in run
actions = self.mac.select_actions(self.batch, t_ep=self.t, t_env=self.t_env, bs=envs_not_terminated, test_mode=test_mode)
File "/home/xxxx/xxxx/pymarl/src/controllers/basic_controller.py", line 23, in select_actions
chosen_actions = self.action_selector.select_action(agent_outputs[bs], avail_actions[bs], t_env, test_mode=test_mode)
File "/home/xxxx/xxxx/pymarl/src/components/action_selectors.py", line 100, in select_action
not (th.gather(avail_actions, dim=2, index=picked_actions.unsqueeze(2)) > 0.99).all():
RuntimeError: CUDA error: device-side assert triggered
During handling of the above exception, another exception occurred:
Traceback (most recent calls WITHOUT Sacred internals):
File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1019, in wait
return self._wait(timeout=timeout)
File "/home/xxxx/anaconda3/envs/pymarl/lib/python3.7/subprocess.py", line 1645, in _wait
raise TimeoutExpired(self.args, timeout)
subprocess.TimeoutExpired: Command '['tee', '-a', '/tmp/tmpt92ayqy0']' timed out after 1 seconds
Hi, thanks for sharing your excellent work!!
I followed your installation tutorial, but I encounter the problem of sacred == 0.7.2(from oxwhirl/fork)
Traceback (most recent call last): File "src/main.py", line 19, in <module> ex = Experiment("pymarl") File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/experiment.py", line 72, in __init__ super(Experiment, self).__init__(path=name, File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/ingredient.py", line 57, in __init__ gather_sources_and_dependencies(_caller_globals) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 487, in gather_sources_and_dependencies sources = gather_sources(globs, experiment_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 440, in get_sources_from_imported_modules return get_sources_from_modules(iterate_imported_modules(globs), base_path) File "/home/public/anaconda3/envs/torch/lib/python3.8/site-packages/sacred/dependencies.py", line 409, in get_sources_from_modules filename = os.path.abspath(mod.__file__) File "/home/public/anaconda3/envs/torch/lib/python3.8/posixpath.py", line 374, in abspath path = os.fspath(path) TypeError: expected str, bytes or os.PathLike object, not NoneType
How can I do?
Hello!
I wonder what the input of the actor in the stateless game is in your paper.
thx!
Hello! Did you plot the picture in your paper by seaborn? It seems there's no code related to seaborn in this repo.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.