Giter Site home page Giter Site logo

pic's People

Contributors

ioujenliu avatar raymondyeh07 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pic's Issues

It seems that you didn't use actor_target at all

in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),

that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?

episode rewards in simple_spread_n30 mismatch

hello, I ran your code using python main_vec.py --exp_name coop_navigation_n30 --scenario simple_spread_n30 --critic_type gcn_max --cuda , however, episode rewards start from -13000 instead of -28000 in the training curve you reported, could you tell me what may cause this problem?

gym version

Thank you very much for your work, I would like to know your version of gym

all six agents do not cover landmark,why???

Hello, I recently run your program code:python main_vec.py --exp_name coop_navigation_n6 --scenario simple_spread_n6 --critic_type gcn_max --cuda , to test the coop_navigation model. The parameters are all default. However, during the test, it is found that all six agents do not cover landmark, and the success rate of coverage is not given in the paper. What is the success rate of your model? Is there something wrong with the training? Look forward to your reply

agent checkpoint in Scenario

Dear Authors,

I am trying to reproduce the experiment results in your paper and select your method as one of the baselines in our paper.

In your code, I cannot find the agent checkpoint for the scenarios, such as simple_tag_n3. However, it seems that this checkpoint is necessary. How to handle this problem?

Best

only update the last agent's critic?

Dear Author,

I notice that in your code, it seems that you only update the last agent's critic, is it correct?

More details:
In the line 200 at main_vec.py:
value_losses.append(agent.update_critic_parameters(batch, i, args.shuffle))
the i here refers to the last agent according to line 181.

In the function update_critic_parameters at ddpg_vec.py, agent_id is used to select this agent's reward.

Looking forward to here your feedback.
Best

No module named 'ddpg' when loading the scripted agents

When creating simple_tag scenarios, I got the following error:

self.scripted_agents = torch.load(scripted_agent_ckpt)['agents']
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/Users/qizhg/miniconda3/envs/deeprl/lib/python3.7/site-packages/torch/serialization.py", line 773, in _legacy_load
result = unpickler.load()
ModuleNotFoundError: No module named 'ddpg'

It seems that you didn't use target_actor at all !

in the 233 line of the file ddpg_vec.py, you use
next_action_batch = self.select_action( next_state_batch.view(-1, self.obs_dim), action_noise=self.train_noise),
that means you select next_action use current actor instead of the target actor, which is different from the ddpg paper, or you do this design choice for some other reason?

The result about cooperative push

I just ran the experiment without changing any hyper-parameters. The results of the cooperative-push-30 got a large variance for both gcn_max and mlp. Moreover, the gcn_max seems like got a similar result with mlp.

This is much different like the spread experiment. Would u like to provide the hyper-parameters of coop-push to help me verify the influence of GCN?

reward

The GNN and MLP architectures seem to have achieved exactly the same reward

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.