floodsung / ddpg Goto Github PK

View Code? Open in Web Editor NEW

543.0 543.0 175.0 178 KB

Reimplementation of DDPG(Continuous Control with Deep Reinforcement Learning) based on OpenAI Gym + Tensorflow

License: MIT License

Python 100.00%

ddpg's Introduction

ddpg's People

Stargazers

Watchers

Forkers

tigerneil zxzang hycis dshaun subercui ivehui chagge benjamesbabala leezqcst benelot largelymfs nkcr7 mazhengmac ioriiod0 dy8654 ashkoofaraz jietan ariskonidaris wuruiqi briancheung zhexiaozhe leakycauldron ijeomaonuosa undercontroller leliaonvidia collector-m yranibro gitxuy wenjiebit wang90063 zhfzhmsra robertyin-sa erikluo picopoco abner2015 tartavull bhargav5 timecracker dl-yc plz717 kevinking zoyazhao livey shubhamck dave1453629500 chernovars queekye lkhit doddaiah silentriverg lienbo lamperougeyxy mjm522 revenol shaozhijie duweiqiang szrayic wjianpeng ylevin dengkaidk tandychao swan2015 tzl0031 binderwang noke8868 lionkt afcarl slimterry zbxzc35 huaiyuchen xcgfth alphacyc victorleelk ml-lab jaimetang santoshrp rahulindoria5 piper-su spideralessio expcwei zgsdtzzz byxshr ml-tina wangyy161 gooooloo cj2jion neu-shuai woluo catyans weicheng51 robvcc anewcodefarmer btbujiangjun nh00000 alex-yanranwang gracedgl qixing-anhuiuniversity kant yuantian013 paulrich1234

ddpg's Issues

How to train DDPG agent on Reacher-v1 env?

HI song,

I have a little question about Reacher-v1. I found some discussions about the Reacher-v1's reward, but I don't find suitable way to modify it. There is a comment by you in OpenAI Gym about the changing mujoco rewards, so I want to ask you about this.

Looking forward your reply. : )

run env HumanoidStandup-v1 error

in Humanoid-v1 env; also error this when run some time,if STEPS smaller , the error appear is later

[2016-05-28 10:53:54,309] Starting new video recorder writing to /Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5/openaigym.video.None.2001.video000000.mp4
[2016-05-28 10:53:59,759] Finished writing results. You can upload them to the scoreboard via gym.upload('/Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5')
Traceback (most recent call last):
File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 33, in main
agent.set_feedback(observation,action,reward,done)
File "/Users/lmj/develop/DDPG/ddpg.py", line 98, in set_feedback
self.train()
File "/Users/lmj/develop/DDPG/ddpg.py", line 66, in train
self.critic_network.train(y_batch,state_batch,action_batch)
File "/Users/lmj/develop/DDPG/critic_network.py", line 93, in train
self.action_input:action_batch
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 553, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 64, 1) for Tensor u'Placeholder_4:0', which has shape '(?, 1)'

Bug with FilteredEnv

The filtering is incorrectly clipping the action values to between [-1,1].

Line 70 of FilteredEnv:

ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)

self.action_space.low and self.action_space.high are arrays with value -1 and 1:

self.action_space = gym.spaces.Box(-np.ones_like(acsp.high),np.ones_like(acsp.high))

self.filter_action(action) correctly converts to the range of the environment (e.g., in a 1D case) from [-1,1] to [lower bound, upper bound], but then it clips the value to be between [-1,1], when it should really clip it to be between [lower bound, upper bound].

Actions generated by Actor network increases to 1. and stay there

Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]

Gradients for that set:
[[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

Could you tell me what do you think is the problem?

AttributeError: 'FilteredEnv' object has no attribute 'monitor'

Traceback (most recent call last):
  File "gym_ddpg.py", line 43, in <module>
    main()
  File "gym_ddpg.py", line 13, in main
    env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'

>>> tf.__version__
'1.3.0'
>>> import gym
>>> gym.__version__
'0.9.3'

DDPG Actor output saturate

Hello, I meet some problem, when my action dimension equal to one, the result is good, but when my action dimension is two (the activation function is tanh and sigmoid), the output of actor will saturate.
Here is the result what I said: https://github.com/m5823779/DDPG
By the way, I use batch normalization only in my actor network.
Have anyone meet the same problem and already solve?

error: python gym_ddpg.py

what kind of versions of gym, tf etc did use back then??

File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 13, in main
env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'

mistake found

hi, thank you for you job. It is very helpful and easy to understand!

But it seems there's a mistake in 'filter_env.py'

The line 'ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)'

All the actions are constrained to '-1 to 1' instead of the real inputs.

And another question is : have you ever used XX_bn as the networks ?

In my situation, the result is worse. I do not know it is normal of not .

Best

the dimension of Input and Output

Thanks for your work! Is your input a image or just a vector? and what is the dimension of your output action?

Error: No module named utlility

Using python2.7 on ubuntu 14.04:

Traceback (most recent call last):
  File "gym_ddpg.py", line 2, in <module>
    from ddpg import *
  File "/home/truell20/Documents/ddpg-aigym/DDPG/ddpg.py", line 10, in <module>
    from critic_network import CriticNetwork 
  File "/home/truell20/Documents/ddpg-aigym/DDPG/critic_network.py", line 3, in <module>
    from utility import *
ImportError: No module named utility

how to save the actor and critic weights

when I call upon the load_network and save_network, the error pops up.
AttributeError: 'ActorNetwork' object has no attribute 'save_network'

I am trying to fix this error but until now it did not resolve.

Normalize actor gradients?

This gradient for actor needs to be normalized over batch_size?

DDPG/actor_network_bn.py

Line 37 in 18825ee

    
           self.parameters_gradients = tf.gradients(self.action_output,self.net,-self.q_gradient_input)

X Error of failed request: BadRequest (invalid request code or no such operation)

I am so sorry.when i run this code ,i got a error!

X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 149 (RANDR)
Minor opcode of failed request: 8 (RRGetScreenResources)
Serial number of failed request: 14
Current serial number in output stream: 14

tips:
I run this code use vncserver.

fatal error

mjmodel.h: no such file or directory

Action used for gradient calculation

Hi, thank you for your implementation, it helped me to wrote my own.

I have a question tho, about the action you used to compute gradients, in ddpg.py line 71.

Why don't you use action_batch to compute gradient ? I didn't manage to get any agent working so I can't test the difference.

Question regarding the calculation of the actor gradient.

Hi,

why did you include the minus sign in the grad_ys argument of the bottom function?

self.parameters_gradients = tf.gradients(self.action_output,self.parameters,-self.q_gradient_input/BATCH_SIZE)

As far as i understand grad_ys weights the gradients of each of the actor outputs with the the corresponding value (coming from the critic in your case).

Thanks!

No target networks in the implementation

Editing the previous issue as the initial one was an misunderstanding from my side.

Hi,

I see that there are no target networks used in the implementations. Is this even considered while talking about DDPG? I though this is one of the required features.

Issue about Segmentation fault (core dumped)

Hi,when I run the "python gym_ddpg.py",there always exit the issue about"Segmentation fault (core dumped)"