Giter Site home page Giter Site logo

ddpg's Introduction

Profile

ddpg's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ddpg's Issues

How to train DDPG agent on Reacher-v1 env?

HI song,

I have a little question about Reacher-v1. I found some discussions about the Reacher-v1's reward, but I don't find suitable way to modify it. There is a comment by you in OpenAI Gym about the changing mujoco rewards, so I want to ask you about this.

Looking forward your reply. : )

run env HumanoidStandup-v1 error

in Humanoid-v1 env; also error this when run some time,if STEPS smaller , the error appear is later

[2016-05-28 10:53:54,309] Starting new video recorder writing to /Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5/openaigym.video.None.2001.video000000.mp4
[2016-05-28 10:53:59,759] Finished writing results. You can upload them to the scoreboard via gym.upload('/Users/lmj/Documents/t/ddpgout/1/HumanoidStandup-v1-DDPG-5')
Traceback (most recent call last):
File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 33, in main
agent.set_feedback(observation,action,reward,done)
File "/Users/lmj/develop/DDPG/ddpg.py", line 98, in set_feedback
self.train()
File "/Users/lmj/develop/DDPG/ddpg.py", line 66, in train
self.critic_network.train(y_batch,state_batch,action_batch)
File "/Users/lmj/develop/DDPG/critic_network.py", line 93, in train
self.action_input:action_batch
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 340, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 553, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (1, 64, 1) for Tensor u'Placeholder_4:0', which has shape '(?, 1)'

Bug with FilteredEnv

The filtering is incorrectly clipping the action values to between [-1,1].

Line 70 of FilteredEnv:

ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)

self.action_space.low and self.action_space.high are arrays with value -1 and 1:

self.action_space = gym.spaces.Box(-np.ones_like(acsp.high),np.ones_like(acsp.high))

self.filter_action(action) correctly converts to the range of the environment (e.g., in a 1D case) from [-1,1] to [lower bound, upper bound], but then it clips the value to be between [-1,1], when it should really clip it to be between [lower bound, upper bound].

Actions generated by Actor network increases to 1. and stay there

Hi,

Thanks for your code.

I tried to use it for training TORCS, however, my result are not good and to be specific after a few steps, actions generated by Actor network increases to 1. and stay there. Similar to the following (for the top 10 for example):

[[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]
[ 1. 1. 1.]]

Gradients for that set:
[[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]
[ 4.80426752e-05 1.51122265e-04 -1.96302353e-05]]

Could you tell me what do you think is the problem?

AttributeError: 'FilteredEnv' object has no attribute 'monitor'

Traceback (most recent call last):
  File "gym_ddpg.py", line 43, in <module>
    main()
  File "gym_ddpg.py", line 13, in main
    env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'
>>> tf.__version__
'1.3.0'
>>> import gym
>>> gym.__version__
'0.9.3'

DDPG Actor output saturate

Hello, I meet some problem, when my action dimension equal to one, the result is good, but when my action dimension is two (the activation function is tanh and sigmoid), the output of actor will saturate.
Here is the result what I said: https://github.com/m5823779/DDPG
By the way, I use batch normalization only in my actor network.
Have anyone meet the same problem and already solve?

error: python gym_ddpg.py

what kind of versions of gym, tf etc did use back then??

File "gym_ddpg.py", line 43, in
main()
File "gym_ddpg.py", line 13, in main
env.monitor.start('experiments/' + ENV_NAME,force=True)
AttributeError: 'FilteredEnv' object has no attribute 'monitor'

mistake found

hi, thank you for you job. It is very helpful and easy to understand!

But it seems there's a mistake in 'filter_env.py'

The line 'ac_f = np.clip(self.filter_action(action),self.action_space.low,self.action_space.high)'

All the actions are constrained to '-1 to 1' instead of the real inputs.

And another question is : have you ever used XX_bn as the networks ?

In my situation, the result is worse. I do not know it is normal of not .

Best

Error: No module named utlility

Using python2.7 on ubuntu 14.04:

Traceback (most recent call last):
  File "gym_ddpg.py", line 2, in <module>
    from ddpg import *
  File "/home/truell20/Documents/ddpg-aigym/DDPG/ddpg.py", line 10, in <module>
    from critic_network import CriticNetwork 
  File "/home/truell20/Documents/ddpg-aigym/DDPG/critic_network.py", line 3, in <module>
    from utility import *
ImportError: No module named utility

how to save the actor and critic weights

when I call upon the load_network and save_network, the error pops up.
AttributeError: 'ActorNetwork' object has no attribute 'save_network'

I am trying to fix this error but until now it did not resolve.

X Error of failed request: BadRequest (invalid request code or no such operation)

I am so sorry.when i run this code ,i got a error!

X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 149 (RANDR)
Minor opcode of failed request: 8 (RRGetScreenResources)
Serial number of failed request: 14
Current serial number in output stream: 14

tips:
I run this code use vncserver.

Action used for gradient calculation

Hi, thank you for your implementation, it helped me to wrote my own.

I have a question tho, about the action you used to compute gradients, in ddpg.py line 71.

Why don't you use action_batch to compute gradient ? I didn't manage to get any agent working so I can't test the difference.

Question regarding the calculation of the actor gradient.

Hi,

why did you include the minus sign in the grad_ys argument of the bottom function?

self.parameters_gradients = tf.gradients(self.action_output,self.parameters,-self.q_gradient_input/BATCH_SIZE)

As far as i understand grad_ys weights the gradients of each of the actor outputs with the the corresponding value (coming from the critic in your case).

Thanks!

No target networks in the implementation

Editing the previous issue as the initial one was an misunderstanding from my side.

Hi,

I see that there are no target networks used in the implementations. Is this even considered while talking about DDPG? I though this is one of the required features.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.