ioujenliu / pic Goto Github PK
View Code? Open in Web Editor NEWPIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning
License: Other
PIC: Permutation Invariant Critic for Multi-Agent Deep Reinforcement Learning
License: Other
in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise)
,
that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?
hello, I ran your code using python main_vec.py --exp_name coop_navigation_n30 --scenario simple_spread_n30 --critic_type gcn_max --cuda
, however, episode rewards start from -13000 instead of -28000 in the training curve you reported, could you tell me what may cause this problem?
Thank you very much for your work, I would like to know your version of gym
Hello, I recently run your program code:python main_vec.py --exp_name coop_navigation_n6 --scenario simple_spread_n6 --critic_type gcn_max --cuda , to test the coop_navigation model. The parameters are all default. However, during the test, it is found that all six agents do not cover landmark, and the success rate of coverage is not given in the paper. What is the success rate of your model? Is there something wrong with the training? Look forward to your reply
Dear Authors,
I am trying to reproduce the experiment results in your paper and select your method as one of the baselines in our paper.
In your code, I cannot find the agent checkpoint for the scenarios, such as simple_tag_n3. However, it seems that this checkpoint is necessary. How to handle this problem?
Best
It seems that core_vec does not use vectorized operation at all, the code simple_spread_n100, n_200 all use the original mpe computation.
Dear Author,
I notice that in your code, it seems that you only update the last agent's critic, is it correct?
More details:
In the line 200 at main_vec.py:
value_losses.append(agent.update_critic_parameters(batch, i, args.shuffle))
the i here refers to the last agent according to line 181.
In the function update_critic_parameters at ddpg_vec.py, agent_id is used to select this agent's reward.
Looking forward to here your feedback.
Best
When creating simple_tag scenarios, I got the following error:
self.scripted_agents = torch.load(scripted_agent_ckpt)['agents']
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'ddpg'
in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise)
,
that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?
I just ran the experiment without changing any hyper-parameters. The results of the cooperative-push-30 got a large variance for both gcn_max
and mlp
. Moreover, the gcn_max
seems like got a similar result with mlp
.
This is much different like the spread experiment. Would u like to provide the hyper-parameters of coop-push to help me verify the influence of GCN?
The GNN and MLP architectures seem to have achieved exactly the same reward
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.