ioujenliu / pic Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 19.0 4.13 MB

PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning

License: Other

Python 100.00%

pic's People

Contributors

Stargazers

Watchers

Forkers

hyzcn yinjiangjin isaac009 goingmyway qingyuan-jiang zixianma caozixuan triple-rock haxrd ma-env heavenlysf wenhaoma-uts mingyucai tjuhaoxiaotian yyds-xtt xiaoyangyang2 deyh2020 minglou1984 yang-xy20

pic's Issues

It seems that you didn't use actor_target at all

in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),

that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?

episode rewards in simple_spread_n30 mismatch

hello, I ran your code using python main_vec.py --exp_name coop_navigation_n30 --scenario simple_spread_n30 --critic_type gcn_max --cuda , however, episode rewards start from -13000 instead of -28000 in the training curve you reported, could you tell me what may cause this problem?

gym version

Thank you very much for your work, I would like to know your version of gym

all six agents do not cover landmark，why???

Hello, I recently run your program code：python main_vec.py --exp_name coop_navigation_n6 --scenario simple_spread_n6 --critic_type gcn_max --cuda ， to test the coop_navigation model. The parameters are all default. However, during the test, it is found that all six agents do not cover landmark, and the success rate of coverage is not given in the paper. What is the success rate of your model? Is there something wrong with the training? Look forward to your reply

agent checkpoint in Scenario

Dear Authors,

I am trying to reproduce the experiment results in your paper and select your method as one of the baselines in our paper.

In your code, I cannot find the agent checkpoint for the scenarios, such as simple_tag_n3. However, it seems that this checkpoint is necessary. How to handle this problem?

Best

It seems that vectorized version force computation and collision are not used at all!

It seems that core_vec does not use vectorized operation at all, the code simple_spread_n100, n_200 all use the original mpe computation.

only update the last agent's critic?

Dear Author,

I notice that in your code, it seems that you only update the last agent's critic, is it correct?

More details:
In the line 200 at main_vec.py:
value_losses.append(agent.update_critic_parameters(batch, i, args.shuffle))
the i here refers to the last agent according to line 181.

In the function update_critic_parameters at ddpg_vec.py, agent_id is used to select this agent's reward.

Looking forward to here your feedback.
Best

No module named 'ddpg' when loading the scripted agents

When creating simple_tag scenarios, I got the following error:

self.scripted_agents = torch.load(scripted_agent_ckpt)['agents']
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'ddpg'

It seems that you didn't use target_actor at all !

in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),
that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?

The result about cooperative push

I just ran the experiment without changing any hyper-parameters. The results of the cooperative-push-30 got a large variance for both gcn_max and mlp. Moreover, the gcn_max seems like got a similar result with mlp.

This is much different like the spread experiment. Would u like to provide the hyper-parameters of coop-push to help me verify the influence of GCN?

reward

The GNN and MLP architectures seem to have achieved exactly the same reward

ioujenliu / pic Goto Github PK

pic's People

Contributors

Stargazers

Watchers

Forkers

pic's Issues

It seems that you didn't use actor_target at all

episode rewards in simple_spread_n30 mismatch

gym version

all six agents do not cover landmark，why???

agent checkpoint in Scenario

It seems that vectorized version force computation and collision are not used at all!

only update the last agent's critic?

No module named 'ddpg' when loading the scripted agents

It seems that you didn't use target_actor at all !

The result about cooperative push

reward

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent