floodsung / ddpg Goto Github PK
View Code? Open in Web Editor NEWReimplementation of DDPG(Continuous Control with Deep Reinforcement Learning) based on OpenAI Gym + Tensorflow
License: MIT License
Reimplementation of DDPG(Continuous Control with Deep Reinforcement Learning) based on OpenAI Gym + Tensorflow
License: MIT License
HI song,
I have a little question about Reacher-v1. I found some discussions about the Reacher-v1's reward, but I don't find suitable way to modify it. There is a comment by you in OpenAI Gym about the changing mujoco rewards, so I want to ask you about this.
Looking forward your reply. : )
in Humanoid-v1 env; also error this when run some time,if STEPS smaller , the error appear is later
[2016-05-28 10:53:54,309] Starting new video recorder writing to /Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5/openaigym.video.None.2001.video000000.mp4
[2016-05-28 10:53:59,759] Finished writing results. You can upload them to the scoreboard via gym.upload('/Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5')
Traceback (most recent call last):
File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 33, in main
agent.set_feedback(observation,action,reward,done)
File "/Users/lmj/develop/DDPG/ddpg.py", line 98, in set_feedback
self.train()
File "/Users/lmj/develop/DDPG/ddpg.py", line 66, in train
self.critic_network.train(y_batch,state_batch,action_batch)
File "/Users/lmj/develop/DDPG/critic_network.py", line 93, in train
self.action_input:action_batch
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 553, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 64, 1) for Tensor u'Placeholder_4:0', which has shape '(?, 1)'
The filtering is incorrectly clipping the action values to between [-1,1].
Line 70 of FilteredEnv:
ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)
self.action_space.low
and self.action_space.high
are arrays with value -1 and 1:
self.action_space = gym.spaces.Box(-np.ones_like(acsp.high),np.ones_like(acsp.high))
self.filter_action(action)
correctly converts to the range of the environment (e.g., in a 1D case) from [-1,1] to [lower bound, upper bound], but then it clips the value to be between [-1,1], when it should really clip it to be between [lower bound, upper bound].
Hi,
Thanks for your code.
I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):
[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]
Gradients for that set:
[[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]
Could you tell me what do you think is the problem?
Traceback (most recent call last):
File "gym_ddpg.py", line 43, in <module>
main()
File "gym_ddpg.py", line 13, in main
env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'
>>> tf.__version__
'1.3.0'
>>> import gym
>>> gym.__version__
'0.9.3'
Hello, I meet some problem, when my action dimension equal to one, the result is good, but when my action dimension is two (the activation function is tanh and sigmoid), the output of actor will saturate.
Here is the result what I said: https://github.com/m5823779/DDPG
By the way, I use batch normalization only in my actor network.
Have anyone meet the same problem and already solve?
what kind of versions of gym, tf etc did use back then??
File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 13, in main
env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'
hi, thank you for you job. It is very helpful and easy to understand!
But it seems there's a mistake in 'filter_env.py'
The line 'ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)'
All the actions are constrained to '-1 to 1' instead of the real inputs.
And another question is : have you ever used XX_bn as the networks ?
In my situation, the result is worse. I do not know it is normal of not .
Best
Thanks for your work! Is your input a image or just a vector? and what is the dimension of your output action?
Using python2.7 on ubuntu 14.04:
Traceback (most recent call last):
File "gym_ddpg.py", line 2, in <module>
from ddpg import *
File "/home/truell20/Documents/ddpg-aigym/DDPG/ddpg.py", line 10, in <module>
from critic_network import CriticNetwork
File "/home/truell20/Documents/ddpg-aigym/DDPG/critic_network.py", line 3, in <module>
from utility import *
ImportError: No module named utility
when I call upon the load_network and save_network, the error pops up.
AttributeError: 'ActorNetwork' object has no attribute 'save_network'
I am trying to fix this error but until now it did not resolve.
This gradient for actor needs to be normalized over batch_size?
Line 37 in 18825ee
I am so sorry.when i run this code ,i got a error!
X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 149 (RANDR)
Minor opcode of failed request: 8 (RRGetScreenResources)
Serial number of failed request: 14
Current serial number in output stream: 14
tips:
I run this code use vncserver.
mjmodel.h: no such file or directory
Hi, thank you for your implementation, it helped me to wrote my own.
I have a question tho, about the action you used to compute gradients, in ddpg.py line 71.
Why don't you use action_batch to compute gradient ? I didn't manage to get any agent working so I can't test the difference.
Hi,
why did you include the minus sign in the grad_ys
argument of the bottom function?
self.parameters_gradients = tf.gradients(self.action_output,self.parameters,-self.q_gradient_input/BATCH_SIZE)
As far as i understand grad_ys
weights the gradients of each of the actor outputs with the the corresponding value (coming from the critic in your case).
Thanks!
Editing the previous issue as the initial one was an misunderstanding from my side.
Hi,
I see that there are no target networks used in the implementations. Is this even considered while talking about DDPG? I though this is one of the required features.
Hi,when I run the "python gym_ddpg.py",there always exit the issue about"Segmentation fault (core dumped)"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.