Giter Site home page Giter Site logo

atavakol / action-branching-agents Goto Github PK

View Code? Open in Web Editor NEW
108.0 108.0 23.0 128.73 MB

(AAAI 2018) Action Branching Architectures for Deep Reinforcement Learning

Home Page: https://arxiv.org/abs/1711.08946

License: MIT License

Python 100.00%
aaai agents deep-reinforcement-learning reinforcement-learning

action-branching-agents's People

Contributors

atavakol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

action-branching-agents's Issues

Requirements list required

Hi, may I ask you to provide the requirements list for this project? It seems that for example Tensorflow 2.0 doesn't utilize .contrib.layers anymore. As I am a bit of a newbie in RL, I don't want to mess with the code too much until it doesn't work and I'm not sure which versions to use, that's why the question.

A question about PyTorch version

Hi, I tried some of the BDQ algorithm PyTorch implementations on GitHub. However, the performance of these implementations is not as good as the implementation in this repository. I compared the code and couldn’t find apparent differences for now. Do you have any suggestions for why there might be a performance difference?

Thanks a lot :)

Problems with loading custom environment

I tried with generic environments such as "Humanoid-v2" and the save and load features work without an error. Now I am experimenting with action branching with a custom gym environment. My model is saved without a problem after training. However, I receive error in the loading process when I run the enjoy_continous.py. I share the error message below

System

  • python = 3.6.10
  • gym = 0.15.4
  • tensorflow-gpu = 1.12
  • tensorflow = 1.15
  • cloudpickle = 1.2.1
  • dill = 0.3.1.1

_2020-05-14 14:57:12.020087: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Adam/beta_1 not found in checkpoint
Traceback (most recent call last):
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key Adam/beta_1 not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1546, in restore
{self.saver_def.filename_tensor_name: save_path})
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key Adam/beta_1 not found in checkpoint
[[node save/RestoreV2 (defined at /home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py:301) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 84, in
main(args)
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 53, in main
act = deepq.load(args.model_dir)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 77, in load
return ActWrapper.load(path, num_cpu=num_cpu)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 38, in load
U.load_state(os.path.join(td, "model"))
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py", line 301, in load_state
saver = tf.train.Saver()
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python
/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key Adam/beta_1 not found in checkpoint
[[node save/RestoreV2 (defined at /home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py:301) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1556, in restore
names_to_keys = object_graph_key_mapping(save_path)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1830, in object_graph_key_mapping
checkpointable.OBJECT_GRAPH_PROTO_KEY)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 371, in get_tensor
status)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: Key _CHECKPOINTABLE_OBJECT_GRAPH not found in checkpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 84, in
main(args)
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 53, in main
act = deepq.load(args.model_dir)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 77, in load
return ActWrapper.load(path, num_cpu=num_cpu)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 38, in load
U.load_state(os.path.join(td, "model"))
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py", line 302, in load_state
saver.restore(get_session(), fname)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1562, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key Adam/beta_1 not found in checkpoint
[[node save/RestoreV2 (defined at /home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py:301) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 84, in
main(args)
File "action-branching-agents/agents/bdq/enjoy_continuous.py", line 53, in main
act = deepq.load(args.model_dir)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 77, in load
return ActWrapper.load(path, num_cpu=num_cpu)
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/deepq/procedure_continuous_tasks.py", line 38, in load
U.load_state(os.path.join(td, "model"))
File "/home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py", line 301, in load_state
saver = tf.train.Saver()
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1102, in init
self.build()
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1114, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1151, in _build
build_save=build_save, build_restore=build_restore)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 795, in _build_internal
restore_sequentially, reshape)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 862, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 1466, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
op_def=op_def)
File "/home/hbp/anaconda3/envs/rl_manip/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key Adam/beta_1 not found in checkpoint
[[node save/RestoreV2 (defined at /home/hbp/rl_man/ActionBranchingDQN/action-branching-agents/agents/bdq/common/tf_util.py:301) = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_INT64, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Python Version/Requirements

Could you add the version of python you were using (or add a requirements.txt or yaml file)?
I'm getting the following error when running python enjoy_continuous.py:

Traceback (most recent call last): File "enjoy_continuous.py", line 70, in <module> main() File "enjoy_continuous.py", line 42, in main act = deepq.load(model_file) File "../bdq/deepq/procedure_continuous_tasks.py", line 74, in load return ActWrapper.load(path, num_cpu=num_cpu) File "../bdq/deepq/procedure_continuous_tasks.py", line 26, in load act = deepq.build_act(**act_params) File "../bdq/deepq/build_graph.py", line 108, in build_act observations_ph = U.ensure_tf_input(make_obs_ph("observation")) File "../bdq/deepq/procedure_continuous_tasks.py", line 202, in make_obs_ph sess = U.make_session(num_cpu=num_cpu) SystemError: unknown opcode

I've tried python 3.7.0, python 3.6.7. I tried python 3.5.4 but wasn't able to install all the dependencies with it

Update: After training models myself (with python 3.7.0) I was able to use enjoy_continuous.py on the .pkl files I had generated (just not the ones uploaded on the git)

Understanding of paper

Hi,

I was trying to understand your paper by relating to my problem where I have to select action from

Dict(
attack:Discrete(2), 
back:Discrete(2), 
camera:Box(2,), 
forward:Discrete(2),
jump:Discrete(2), 
left:Discrete(2), 
place:Enum(none,dirt), 
right:Discrete(2), 
sneak:Discrete(2), 
sprint:Discrete(2)
)

Since simultaneous actions are possible(attack+jump, forward+jump), how do I select them using BDQN? As per my understanding, in BDQN we will be selecting max q-value from each action_stream. For example in Bipedal-walker2D-v1 environment, action_space is like [-1, 1, -1, 1] that is there are 4 action streams and we will be predicting values for each action_stream? I am correct?
Also I am unable to comprehend the use of num_actions_pad = 33 # numb discrete sub-actions per action dimension

I will appreciate if you could kindly explain me this. Thanks!

Unable to run the script

I followed the steps mentioned in the readme but I encounter numerous errors while importing modules in the code. I am unable to figure out why is this happening. Any insights would be helpful.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.