Giter Site home page Giter Site logo

deep_q_rl's Introduction

Introduction

This package provides a Lasagne/Theano-based implementation of the deep Q-learning algorithm described in:

Playing Atari with Deep Reinforcement Learning Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller

and

Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

Here is a video showing a trained network playing breakout (using an earlier version of the code):

http://youtu.be/SZ88F82KLX4

Dependencies

The script dep_script.sh can be used to install all dependencies under Ubuntu.

Running

Use the scripts run_nips.py or run_nature.py to start all the necessary processes:

$ ./run_nips.py --rom breakout

$ ./run_nature.py --rom breakout

The run_nips.py script uses parameters consistent with the original NIPS workshop paper. This code should take 2-4 days to complete. The run_nature.py script uses parameters consistent with the Nature paper. The final policies should be better, but it will take 6-10 days to finish training.

Either script will store output files in a folder prefixed with the name of the ROM. Pickled version of the network objects are stored after every epoch. The file results.csv will contain the testing output. You can plot the progress by executing plot_results.py:

$ python plot_results.py breakout_05-28-17-09_0p00025_0p99/results.csv

After training completes, you can watch the network play using the ale_run_watch.py script:

$ python ale_run_watch.py breakout_05-28-17-09_0p00025_0p99/network_file_99.pkl

Performance Tuning

Theano Configuration

Setting allow_gc=False in THEANO_FLAGS or in the .theanorc file significantly improves performance at the expense of a slight increase in memory usage on the GPU.

Getting Help

The deep Q-learning web-forum can be used for discussion and advice related to deep Q-learning in general and this package in particular.

See Also

deep_q_rl's People

Contributors

davidsj avatar edersantana avatar gaoyuankidult avatar hiyorimi avatar ivanopolo avatar jleni avatar john-a-m avatar npow avatar spragunr avatar stone8oy avatar udibr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deep_q_rl's Issues

ALE assertion failed

Have you seen this error before? RL-Glue and ALE seem to initialize properly, but then "ALE RL-Glue" dies with a strange malloc error.

RL-Glue Version 3.04, Build 909
RL-Glue is listening for connections on port=4096
A.L.E: Arcade Learning Environment (version 0.4)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
Game console created:
  ROM file:  /opt/ale_0.4.4/ale_0_4/roms/space_invaders.bin
  Cart Name: Space Invaders (1978) (Atari) [!]
  Cart MD5:  72ffbef6504b75e69ee1045af9075f66
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        4096
  Bankswitch Type: AUTO-DETECT ==> 4K

Running ROM file...
Random Seed: Time
Game will be controlled through RL-Glue.
RL-Glue Python Experiment Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
    RL-Glue :: Experiment connected.
Initializing ALE RL-Glue ...
ale: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Aborted (core dumped)
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
     Agent Codec Connected
    RL-Glue :: Agent connected.

Mean Q calculations different from paper (or incorrect?)

Hi,

The code around calculating the mean Q value that is written to results.csv (ie the code starting around line 410 of rl_glue_ale_agent.py) has a couple of issues I think. The main one is that it outputs the mean Q values across all actions instead of the mean of the max action Q values like the Deep Mind paper does. I believe from the look of the code that the intention was to output the mean of the max action Q values.

The other, related, issue is that the mean is calculated across only 100 phis instead of across 100 batches of 32 phis like the code seems to want to do. The code asks for

self.holdout_data = self.data_set.random_batch(holdout_size * self.batch_size)[0]

and that returns something of shape (3200, 4, 80, 80), but then it iterates with:

for i in range(holdout_size):
    holdout_sum += np.mean(self.network.q_vals(self.holdout_data[i, ...]))

pressumably assuming that random_batch indexed each batch separately and that q_vals takes a batch at a time, but random_batch just numbers each phi set separately instead and q_vals only takes one example. eg looking at the shape of self.holdout_data[0, ...] it gives (4, 80, 80). network.q_vals then returns something of shape (18,) (ie one for each action).

I believe the following code should work

        for i in range(holdout_size * self.batch_size):
            holdout_sum += np.max(
                self.network.q_vals(self.holdout_data[i, ...]))

        self._update_results_file(epoch, self.episode_counter,
                                  holdout_sum / (holdout_size * self.batch_size))

(Thank you very much for releasing this, by the way. I am still struggling through the theano parts in cnn_q_learner.py. It does my head in. Do you think it'd be worth switching to the cuDNN theano wrapper instead of the cuda_convnet now that that's been released?)

What is the memory requirement for this program

I had an error while running python ale_run.py.
python ale_run.py
However I encountered a memory error. I think the problem is because I don't have enough memory.
So could you tell what is memory requirement for me to run this program ?

Traceback (most recent call last):
  File "./rl_glue_ale_agent.py", line 454, in <module>
    main()
  File "./rl_glue_ale_agent.py", line 450, in main
    AgentLoader.loadAgent(NeuralAgent())
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
    client.runAgentEventLoop()
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
    switch[agentState](self)
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 137, in <lambda>
    Network.kAgentInit: lambda self: self.onAgentInit(),
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 43, in onAgentInit
    self.agent.agent_init(taskSpec)
  File "./rl_glue_ale_agent.py", line 150, in agent_init
    phi_length=self.phi_length)
  File "/home/gao/Desktop/dqn/deep_q_rl/deep_q_rl/ale_data_set.py", line 38, in __init__
    self.states = np.zeros((self.capacity, height, width), dtype='uint8')
MemoryError

ValueError: total size of new array must be unchanged

Hi Nathan,

I'm trying to get the minimal actions branch working with a fresh pull? I've done a fresh install of ALE with the patch, as well?

Its weird as now neither master, or the minimal actions branch are working? Maybe if you have a few mins spare?

I'm guessing its a problem with the ALE patch, as everything was good till I rebuilt ALE fresh with it?

I get.

/usr/bin/python2.7 /home/ajay/PythonProjects/deep_q_rl-minimal_actions/deep_q_rl/ale_run.py
RL-Glue Version 3.04, Build 909
A.L.E: Arcade Learning Environment (version 0.4)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
Game console created:
ROM file: /home/ajay/ale_0_4/roms/pong.bin
Cart Name: Video Olympics (1978) (Atari)
Cart MD5: 60e0ea3cbe0913d39803477945e9e5ec
Display Format: AUTO-DETECT ==> NTSC
ROM Size: 2048
Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random Seed: Time
Game will be controlled through RL-Glue.
RL-Glue Python Experiment Codec Version: 2.02 (Build 738)
Connecting to 127.0.0.1 on port 4096...
Initializing ALE RL-Glue ...
Using gpu device 0: GeForce GTX 570
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
Connecting to 127.0.0.1 on port 4096...
Agent Codec Connected
(32, 4, 80, 80)
(4, 80, 80, 32)
(16, 19.0, 19.0, 32)
(32, 9.0, 9.0, 32)
(32, 32, 9.0, 9.0)
(32, 256)
(32, 18)
/home/ajay/bin/Theano-master/theano/gof/cmodule.py:289: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
rval = import(module_name, {}, {}, [module_name])
OPENING _01-28-04-27_0p0001_0p9/results.csv
training epoch: 1 steps_left: 50000
Traceback (most recent call last):
File "./rl_glue_ale_agent.py", line 430, in
main()
File "./rl_glue_ale_agent.py", line 426, in main
AgentLoader.loadAgent(NeuralAgent())
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
client.runAgentEventLoop()
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
switchagentState
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 138, in
Network.kAgentStart: lambda self: self.onAgentStart(),
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 51, in onAgentStart
action = self.agent.agent_start(observation)
File "./rl_glue_ale_agent.py", line 245, in agent_start
self.last_img = np.array(self._resize_observation(observation.intArray))
File "./rl_glue_ale_agent.py", line 263, in _resize_observation
img = observation.reshape(IMG_WIDTH, IMG_HEIGHT)
ValueError: total size of new array must be unchanged
Segmentation fault (core dumped)
training epoch: 1 steps_left: 49995
training epoch: 1 steps_left: 49993
training epoch: 1 steps_left: 49991

Segmentation fault

Hi,

I've been trying to get your code working, and I'm almost there, but I still am getting a seg fault. The system is running, but it's not saving out any results. Here is what I am getting when I run your script.
Thanks for your help!

  • Sridhar

python ale_run.py --exp_pref data | more
RL-Glue Version 3.04, Build 909
A.L.E: Arcade Learning Environment (version 0.4.4)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
Game console created:
ROM file: /home/mahadeva/Documents/code/deep_rl/roms/breakout.bin
Cart Name: Breakout - Breakaway IV (1978) (Atari)
Cart MD5: f34f08e5eb96e500e851a80be3277a56
Display Format: AUTO-DETECT ==> NTSC
ROM Size: 2048
Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random Seed: Time
Game will be controlled through RL-Glue.
RL-Glue Python Experiment Codec Version: 2.02 (Build 738)
Connecting to 127.0.0.1 on port 4096...
Initializing ALE RL-Glue ...
Using gpu device 1: GeForce GTX 980
In file included from /usr/include/python2.7/numpy/ndarraytypes.h:1761:0,
from /usr/include/python2.7/numpy/ndarrayobject.h:17,
from /usr/include/python2.7/numpy/arrayobject.h:4,
from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c
:239:
/usr/include/python2.7/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "U
sing deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_
1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it by "
^
In file included from /usr/include/python2.7/numpy/ndarrayobject.h:26:0,
from /usr/include/python2.7/numpy/arrayobject.h:4,
from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c
:239:
/usr/include/python2.7/numpy/multiarray_api.h:1629:1: warning: โ€˜_import_arrayโ€™
defined but not used [-Wunused-function]
_import_array(void)
^
In file included from /usr/include/python2.7/numpy/ufuncobject.h:327:0,
from /home/mahadeva/.pyxbld/temp.linux-x86_64-2.7/pyrex/shift.c
--More--Traceback (most recent call last):
File "./rl_glue_ale_agent.py", line 430, in
main()
File "./rl_glue_ale_agent.py", line 426, in main
AgentLoader.loadAgent(NeuralAgent())
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
client.runAgentEventLoop()
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
switchagentState
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 137, in
Network.kAgentInit: lambda self: self.onAgentInit(),
File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 43, in onAgentInit
self.agent.agent_init(taskSpec)
File "./rl_glue_ale_agent.py", line 158, in agent_init
self.network = self._init_network()
File "./rl_glue_ale_agent.py", line 192, in _init_network
approximator='cuda_conv')
File "/home/mahadeva/Documents/code/deep_rl/deep_q_rl/cnn_q_learner.py", line 168, in __init

target = theano.gradient.consider_constant(target)
AttributeError: 'module' object has no attribute 'consider_constant'
Segmentation fault (core dumped)
:240:
/usr/include/python2.7/numpy/__ufunc_api.h:241:1: warning: โ€˜_import_umathโ€™ defin
ed but not used [-Wunused-function]
_import_umath(void)
^
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
Connecting to 127.0.0.1 on port 4096...
Agent Codec Connected
(32, 4, 80, 80)
(4, 80, 80, 32)
(16, 19.0, 19.0, 32)
(32, 9.0, 9.0, 32)
(32, 32, 9.0, 9.0)
(32, 256)
(32, 18)
training epoch: 1 steps_left: 50000
training epoch: 1 steps_left: 49995
training epoch: 1 steps_left: 49993
training epoch: 1 steps_left: 49991
training epoch: 1 steps_left: 49989
training epoch: 1 steps_left: 49987
training epoch: 1 steps_left: 49985
training epoch: 1 steps_left: 49983
training epoch: 1 steps_left: 49981
training epoch: 1 steps_left: 49979
training epoch: 1 steps_left: 49977
training epoch: 1 steps_left: 49975
training epoch: 1 steps_left: 49973
training epoch: 1 steps_left: 49971
training epoch: 1 steps_left: 49969
training epoch: 1 steps_left: 49967
training epoch: 1 steps_left: 49965
training epoch: 1 steps_left: 49963
training epoch: 1 steps_left: 49961
training epoch: 1 steps_left: 49959
training epoch: 1 steps_left: 49957
training epoch: 1 steps_left: 49955
training epoch: 1 steps_left: 49953
training epoch: 1 steps_left: 49951
training epoch: 1 steps_left: 49949
training epoch: 1 steps_left: 49947
training epoch: 1 steps_left: 49945
training epoch: 1 steps_left: 49943
training epoch: 1 steps_left: 49941
training epoch: 1 steps_left: 49939

Illegal instruction (core dumped) when loading the ROM

Hi, I got the "Illegal instruction (core dumped) error" when loading in launcher.py, can you please help?

ale.loadROM(full_rom_path, full_core_path)

using gdb debugger it shows the following error:

R.L.E: Retro Learning Environment (version 1.0.0)
[Based upon the Arcade Learning Environment (A.L.E)]
[Powered by LibRetro]
Use -help for help screen.
[inf] Frontend supports RGB565 - will use that instead of XRGB1555.
Sound buffer size: 128000 (32000 samples)
Core loaded
[inf] No ROM file header found.
Map_LoROMMap
PPU.RenderSub = 0
PPU.FullClipping = 1
Settings.Transparency = 1
Settings.SpeedhackGameID = 0
PPU.SFXSpeedupHack = 0
coldata_update_screen = 1
[inf] "MORTAL KOMBAT" [checksum ok] LoROM, 16Mbits, ROM, NTSC, SRAM:0Kbits, ID:____, CRC32:0BD8EC55
Running ROM file...
Random seed is 65

Program received signal SIGILL, Illegal instruction.
core_audio_sample_batch (data=0x7fffc32019b0 <S9xAudioCallback.audio_buf>, frames=511)
    at /home/ben/deep_q_rl/build/RLE/src/environment/RetroAgent.cpp:316
316	/home/ben/deep_q_rl/build/RLE/src/environment/RetroAgent.cpp: No such file or directory.

Gradient zero when diff is clipped

This code in q_network.py,

if self.clip_delta > 0:
    diff = diff.clip(-self.clip_delta, self.clip_delta)

results in gradients going to zero whenever the diff is clipped. In other words, gradients vanish precisely when they should be largest.

Simplified example of the problem:

>>> a = T.arange(-2, 2, 0.25)
>>> loss = T.sum(a.clip(-1, 1) ** 2)
>>> theano.gradient.grad(loss, a).eval()
array([-0. , -0. , -0. , -0. , -2. , -1.5, -1. , -0.5,  0. ,  0.5,  1. ,
        1.5,  2. ,  0. ,  0. ,  0. ], dtype=float32)

The Nature paper is actually misleading in its discussion of clipping:

We also found it helpful to clip the error term ... to be between -1 and 1. Because the absolute value loss function |x| has a derivative of -1 for all negative values of x and a derivative of 1 for all positive values of x, clipping the squared error to be between -1 and 1 corresponds to using an absolute value loss function for errors outside of the (-1,1) interval.

First, clipping the "squared error" to (-1, 1) isn't even sensible since the squared error is always positive. They presumably mean clipping the error, but even that is wrong since, as we see above, that would incorrectly lead to a zero gradient.

In their Torch implementation they use their clipped error term directly as the derivative of an implicit loss function with respect to network output, and propagate that backwards to determine the gradient. So their loss, implicitly, is quadratic when error is in (-1, 1), and linear (rather than flat) outside of that region.

I have a patch for this, which I'm currently testing to verify improved learning performance. Results are promising so far.

ROM file wont load

Hi,

I just set up the library and ran the install_dependencies.sh file. Unfortunatley when I put in the command /home/aidan/Desktop/deep_q_rl-master/deep_q_rl/run_nips.py --rom Breakout.bin into terminal in ubuntu I get:

A.L.E: Arcade Learning Environment (version 0.5.0)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
No ROM File specified or the ROM file was not found.

Any thoughts on how to fix this?

Error memory

When i do :
python main.py --is_train=False --display=True --use_gpu=False

I get :

python main.py --is_train=False --display=True --use_gpu=False
[*] GPU : 1.0000
[2018-05-23 17:17:55,692] Making new env: Breakout-v0
{'_save_step': 500000,
'_test_step': 50000,
'action_repeat': 4,
'backend': 'tf',
'batch_size': 32,
'cnn_format': 'NHWC',
'discount': 0.99,
'display': True,
'double_q': False,
'dueling': False,
'env_name': 'Breakout-v0',
'env_type': 'detail',
'ep_end': 0.1,
'ep_end_t': 1000000,
'ep_start': 1.0,
'history_length': 4,
'learn_start': 50000.0,
'learning_rate': 0.00025,
'learning_rate_decay': 0.96,
'learning_rate_decay_step': 50000,
'learning_rate_minimum': 0.00025,
'max_delta': 1,
'max_reward': 1.0,
'max_step': 50000000,
'memory_size': 1000000,
'min_delta': -1,
'min_reward': -1.0,
'model': 'm1',
'random_start': 30,
'scale': 10000,
'screen_height': 84,
'screen_width': 84,
'target_q_update_step': 10000,
'train_frequency': 4}
Traceback (most recent call last):
File "main.py", line 70, in
tf.app.run()
File "/Tuto_DQN/env/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 62, in main
agent = Agent(config, env, sess)
File "/Tuto_DQN/tuto_dqn/DQN-tensorflow/dqn/agent.py", line 23, in init
self.memory = ReplayMemory(self.config, self.model_dir)
File "/Tuto_DQN/tuto_dqn/DQN-tensorflow/dqn/replay_memory.py", line 18, in init
self.screens = np.empty((self.memory_size, config.screen_height, config.screen_width), dtype = np.float16)
MemoryError

I installed all the dependencies according to the issue Add requirements.txt or alternative
I am running it on my laptop which is a samsung series 7 ultra notebook

Could someone advise me on how to overcome this issue? any comment would be highly appreciated
Thanks a lot!!

Training results don't match Deepmind implementation

In principle, there should be no significant algorithmic differences between this implementation and the Deepmind code. This code was developed to be consistent with the description from Nature paper. Where the paper was ambiguous, I've looked into their torch/lua code for clarification. All of the default parameters are set to match those described in the Nature paper.

For some unknown reason(s) the learning dynamics for this implementation don't match the reference lua implementation. In general, the current code seems to learn well for a while, then the quality of the policy drops off with continued training.

Choose action with epsilon set to self.epsilon instead of 1.0 prior to hitting replay size

Before the agent has seen replay_start_size frames, it collects frames while acting randomly (epsilon hardcoded to 1.0). Changing this to choose_action with epsilon at self.epsilon would help with resuming runs, since you could set the epsilon_start to some number approximately equal to what it was when the run got interrupted, and replay_start_size to a million, and it would be very close to being a smooth resume. When starting a new run, self.epsilon would be 1.0 anyway so nothing would change.

The one-line patch explains it better and more succinctly:


--- a/deep_q_rl/rl_glue_ale_agent.py
+++ b/deep_q_rl/rl_glue_ale_agent.py
@@ -571,7 +571,7 @@ class NeuralAgent(Agent):
                 self.loss_averages.append(loss)
         else:
             # save the data and pick one at random since we haven't hit the replay size
-            int_action, considered = self.choose_action(self.data_set, 1.0,
+            int_action, considered = self.choose_action(self.data_set, self.epsilon,
                                  current_image, np.clip(reward, -1, 1))
 
         # Map it back to ALE's actions

q_layers[5].b copied twice in load_weights

def load_weights(self, file_name):
        net_file = open(file_name, 'r')
        net = cPickle.load(net_file)
        # initial convolution layer
        self.q_layers[2].W.set_value(net.q_layers[2].W.get_value())
        self.q_layers[2].b.set_value(net.q_layers[2].b.get_value())
        # second convolution layer
        self.q_layers[3].W.set_value(net.q_layers[3].W.get_value())
        self.q_layers[3].b.set_value(net.q_layers[3].b.get_value())
        # hidden layer
        self.q_layers[5].b.set_value(net.q_layers[5].b.get_value())
        self.q_layers[5].b.set_value(net.q_layers[5].b.get_value())
        net_file.close()

This line is repeated twice:

self.q_layers[5].b.set_value(net.q_layers[5].b.get_value())

Shouldn't the first one be replaced by

self.q_layers[5].W.set_value(net.q_layers[5].W.get_value())

I am not quite sure about how the DenseLayer is different in terms of parameters. But this seems to be an issue to me.

impot host_from_gpu does not work with Theano 0.7.0

cc_layers.py tries to import host_from_gpu from theano.sandbox.cuda. This throws an import error. Changing the import statement to

from theano.sandbox.cuda.basic_ops import host_from_gpu

resolved the problem for me.

ImportError: No module named cuda.var in theano version 0.9.0

I have installed Theano==0.9.0 version.
Since import as_cuda_ndarray_variable from theano.sandbox.cuda.basic_ops is not available in release version 0.9.0, I guess the equivalent module is from theano.sandbox.cuda.var import CudaNdarraySharedVariable. But it gives import error as the title says No module named cuda.var.

Similarly, I am searching the equivalent for below modules

  1. host_from_gpu
  2. gpu_alloc_empty

The exact line which throws the error is as shown below:
from theano.sandbox.cuda.basic_ops import (as_cuda_ndarray_variable,
host_from_gpu,
gpu_contiguous, HostFromGpu,
gpu_alloc_empty)

Thank you in advance.

No output files on Mac

I finally got it (what appears to be) up and running...
and it creates a data/ directory, but it never
populates any files in the folder.

Output:

python ale_run.py --exp_pref data
RL-Glue Version 3.04, Build 909
RL-Glue is listening for connections on port=4096
RL-Glue Python Experiment Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
    RL-Glue :: Experiment connected.
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
     Agent Codec Connected
    RL-Glue :: Agent connected.

(then it just hangs, can't tell if it's doing anything or not)

Not sure if you have any idea what might be going wrong,
but thought I'd check. Thanks.

Training on nips paper doesn't run

Simple issue: need ale_nips_run_watch.py. Running the nips training settings but then the nature test settings results in not even working.

Neural net organization and parameters

Hi spragunr,

I am really interested in the deep reinforcement learning and m trying to build a similar agent for breakout on my own (though it's getting really tough for me to get my agent learn better). I have gone through your code once and have fairly understood it. Can you please spare some time of yours for some of my questions. Here are they:

  1. What is the organization of layers in your neural net? I mean, from my understanding, you have a convolution layer, then another convolution layer, then a fully connected layer with relu activation and finally a fully connected layer with linear activation units. Specifically, I cannot see a relu activation after first and second conv. layers. Is it so? or am I missing something. Also, if you aren't using any activation after conv. layer 1 and 2 then can you please reason that for me.
  2. Which is better (for training agent for breakout) : rmsprop or rmsprop with nesterev momentum? OR both are fairly good and give almost similar results?
  3. Is your learning rate : 0.0002 fixed or decays after some fixed number of steps. If it decays then can you please tell me the number of steps after which it decays?
  4. What is decay in your code? I can see that you are using a default value of 0.9 for decay but I cannot see where it is used.
  5. According to my understanding, the filter units in your neural net's layers are initialized with a normal distribution with mean 0 and standard deviation 0.01 for weights and fixed value of 0.1 for bias. Is it so?
  6. Finally, following are some of the parameters for breakout that you are using:
    rho = 0.9
    discount = 0.9
    epsilon = 10e-6
    Am i correct?

Sorry for so many questions. Looking at some of the work by you and others have really confused me in the context of parameter setting and neural net organization. It would be really helpful and nice of you to answer my questions.
Thanks for your time.
Regards.

learning.csv doesn't contain average loss per epoch

I don't know if this is a real issue and I don't know any other appropriate place to ask about this, it caused some confusion for me. I assumed it was loss per epoch, since it didn't say what it was explicitly. Also I'm still not sure if I'm correct about this and I want to find out how to plot the mean_loss per epoch.

The first column says it contains mean_loss. If this is per epoch, it should contain 100 values, assuming I run it with the default parameters. However it contains a lot more values. This number seems to be dependent on the number of episodes per total training; the higher number of episodes, the higher the amount of mean_loss values.

So I followed the loss all the way down the rabbit hole. Assuming the default parameters, each epoch is 50000 steps max. It tries to run as many episodes as possible while making each episode run for as many steps possible. At the start of a new episode we have self.loss_averages = []. Updating the self.loss_averages.append(loss) in def step() in ale_agent.py happens every time step. This happens until we reach 50000 steps or until the agent dies. In which case we take the mean of self.loss_averages and update the learning.csv: self._update_learning_file() in def end_episode(). Then if we have steps left we start a new episode, self.loss_averages = []. So the learning.csv actually contains mean_loss per episode. However, when I sum the number of episodes per epoch for all epochs, this number is not equal to the losses. So it can't be the mean_loss per episode.

So what is mean_loss in the learning.csv and how can I plot mean_loss per epoch?

Memory error, sysmalloc: Assertion failed

Hi there,
The previous version of you code using RL-Glue is working good for me, but when I try to get the new version up and running I receive the following error:

> ./run_nips.py --rom breakout
Using gpu device 0: GeForce GTX 770
A.L.E: Arcade Learning Environment (version 0.5.0)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
Game console created:
  ROM file:  ../../../roms/breakout.bin
  Cart Name: Breakout - Breakaway IV (1978) (Atari)
  Cart MD5:  f34f08e5eb96e500e851a80be3277a56
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        2048
  Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random Seed: 123
INFO:root:OPENING breakout_07-21-16-20_0p0002_0p95/results.csv
INFO:root:training epoch: 1 steps_left: 50000
python: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) *
2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) 
>= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 
1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & 
pagemask) == 0)' failed.
Aborted (core dumped)

Just wondering if anyone knows the cause/fix to this problem. Thanks in advance

No results.csv or .pkl files?

Hi, for some reason I'm not getting any results.csv file or .pkl file. The data folders created, and ALE and RL-Glue seem to be starting? Its a fresh build with ALE from Git.

I had to change one line in ale_run.py as ale was not found,

#p2 = subprocess.Popen('ale -game_controller rlglue -frame_skip 4 '+ ROM_PATH, shell=True, env=my_env)
p2 = subprocess.Popen('/home/ajay/ale/ale -game_controller rlglue -frame_skip 4 '+ ROM_PATH, shell=True, env=my_env)

Here's the output

/usr/bin/python2.7 /home/ajay/PythonProjects/deep_q_rl/ale_run.py
RL-Glue Version 3.04, Build 909
A.L.E: Arcade Learning Environment (version 0.4.4)
[Powered by Stella]
Use -help for help screen.
Warning: couldn't load settings file: ./stellarc
Game console created:
  ROM file:  /home/ajay/ale/roms/breakout.bin
  Cart Name: Breakout - Breakaway IV (1978) (Atari)
  Cart MD5:  f34f08e5eb96e500e851a80be3277a56
  Display Format:  AUTO-DETECT ==> NTSC
  ROM Size:        2048
  Bankswitch Type: AUTO-DETECT ==> 2K

Running ROM file...
Random Seed: Time
Game will be controlled through RL-Glue.
RL-Glue Python Experiment Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
Initializing ALE RL-Glue ...
Using gpu device 0: GeForce GTX 570
RL-Glue Python Agent Codec Version: 2.02 (Build 738)
    Connecting to 127.0.0.1 on port 4096...
     Agent Codec Connected
(32, 4, 80, 80)
(4, 80, 80, 32)
(16, 19.0, 19.0, 32)
(32, 9.0, 9.0, 32)
(32, 32, 9.0, 9.0)
(32, 256)
(32, 18)
Traceback (most recent call last):
  File "./rl_glue_ale_agent.py", line 427, in <module>
    main()
  File "./rl_glue_ale_agent.py", line 423, in main
    AgentLoader.loadAgent(NeuralAgent())
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
    client.runAgentEventLoop()
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
    switch[agentState](self)
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 137, in <lambda>
    Network.kAgentInit: lambda self: self.onAgentInit(),
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 43, in onAgentInit
    self.agent.agent_init(taskSpec)
  File "./rl_glue_ale_agent.py", line 155, in agent_init
    self.network = self._init_network()
  File "./rl_glue_ale_agent.py", line 189, in _init_network
    approximator='cuda_conv')
  File "/home/ajay/PythonProjects/deep_q_rl/cnn_q_learner.py", line 168, in __init__
    target = theano.gradient.consider_constant(target)
AttributeError: 'module' object has no attribute 'consider_constant'
training epoch:  1 steps_left:  50000
Segmentation fault (core dumped)
training epoch:  1 steps_left:  49995
training epoch:  1 steps_left:  49993

Just in case it helps, here's my PYTHONPATH from .bashrc

PYTHONPATH="/home/ajay/pylearn2:/home/ajay/pylearn2/pylearn2/scripts:/home/ajay/ale:$PYTHONPATH"
export PYTHONPATH

Thanks a lot ๐Ÿ‘ Happy New Year ๐Ÿ‘

Default parameters values are defined in multiple source files

Hi

I'm very interested in this project and I'm also working on reinforcement learning on another project using this repository.

But since there are many parameters and they are defined in multiple source files (eg, default decay values are defined in function header and etc)

It is very hard to track the values.

What is your opinion to organize parameters so that we can easily track the values?

Thanks.

UnusedInputError

Hi,

Thanks for reading this post.

Currently, I am trying to create my own network for reinforcement learning. To this end, I have adapted the Q network from
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis
Antonoglou, Daan Wierstra, Martin Riedmiller
and
Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

When theano tries to compile function for loss and q_val (),

self._train = theano.function([], [loss, q_vals], updates=updates,
givens=givens_train)
self._q_vals = theano.function([], q_vals,
givens=givens_q_val)

it keeps returning
UnusedInputError: theano.function was asked to create a function computing outputs given certain inputs, but the provided input variable at index 0 is not part of the computational graph needed to compute the outputs: <CudaNdarrayType(float32, 4D)>.
To make this error into a warning, you can pass the parameter on_unused_input='warn' to theano.function. To disable it completely, use on_unused_input='ignore'.

I have been debugging the code many many times, but I cannot understand why the inputs (from givens) are not used as part of the function/ calculation.

Many thanks in advance for your explanation.

Here is my full source code for the network:

"""
import lasagne
import numpy as np
import theano
import theano.tensor as T
from updates import deepmind_rmsprop
import logging

class DeepQLearner:
"""
Deep Q-learning network using Lasagne.
"""
def init(self, width_img,
height_img,
width_loc,
height_loc,
width_his,
height_his,
target_dis_size,
num_actions, num_frames, discount, learning_rate, rho,
rms_epsilon, momentum, clip_delta, freeze_interval,
batch_size, network_type, update_rule,
batch_accumulator, rng, input_scale=8.0):

    self.width_img = width_img
    self.height_img = height_img
    self.width_loc = width_loc
    self.height_loc = height_loc
    self.width_his = width_his
    self.height_his = height_his
    self.target_dis_size = target_dis_size

    self.num_actions = num_actions
    self.num_frames = num_frames
    self.batch_size = batch_size
    self.discount = discount
    self.rho = rho
    self.lr = learning_rate
    self.rms_epsilon = rms_epsilon
    self.momentum = momentum
    self.clip_delta = clip_delta
    self.freeze_interval = freeze_interval
    self.rng = rng

    self.logger = logging.getLogger(__name__)
    if not getattr(self.logger, 'handler_set', None):

        self.logger.setLevel(logging.DEBUG)
        # create a file handler

        handler = logging.FileHandler('toy.log', mode='a')
        handler.setLevel(logging.DEBUG)

        # create a logging format

        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
        handler.setFormatter(formatter)

        # add the handlers to the logger

        self.logger.addHandler(handler)
        self.logger.handler_set = True
    self.logger.info('initialise a Q network.')

    lasagne.random.set_rng(self.rng)

    self.update_counter = 0

    self.l_out = self.build_network(network_type, num_actions, num_frames, batch_size)
    if self.freeze_interval > 0:
        self.next_l_out = self.build_network(network_type, num_actions,
                                             num_frames, batch_size)
        self.reset_q_hat()

    #states = T.tensor4('states')
    #next_states = T.tensor4('next_states')
    imgs = T.tensor4('imgs')
    next_imgs = T.tensor4('next_imgs')
    locs = T.tensor4('locs')
    next_locs = T.tensor4('next_locs')
    hiss = T.tensor4('hiss')
    next_hiss = T.tensor4('next_hiss')

target_distribution = T.tensor('target_distribution')

next_target_distribution = T.tensor('next_target_distribution')

    sds = T.icol('sds')
    next_sds  = T.icol('next_sds')

    rewards = T.col('rewards')
    actions = T.icol('actions')
    terminals = T.icol('terminals')

self.states_shared = theano.shared(

np.zeros((batch_size, num_frames, input_height, input_width),

dtype=theano.config.floatX))

self.next_states_shared = theano.shared(

np.zeros((batch_size, num_frames, input_height, input_width),

dtype=theano.config.floatX))

    self.imgs_shared = theano.shared(
            np.zeros((batch_size, num_frames, width_img, height_img),
                     dtype=theano.config.floatX))
    self.next_imgs_shared = theano.shared(
            np.zeros((batch_size, num_frames, width_img, height_img),
                     dtype=theano.config.floatX))

    self.locs_shared = theano.shared(
    np.zeros((batch_size, num_frames, width_loc, height_loc),
             dtype=theano.config.floatX))
    self.next_locs_shared = theano.shared(
    np.zeros((batch_size, num_frames, width_loc, height_loc),
             dtype=theano.config.floatX))
    self.hiss_shared = theano.shared(
    np.zeros((batch_size, num_frames, width_his, height_his),
             dtype=theano.config.floatX))
    self.next_hiss_shared = theano.shared(
    np.zeros((batch_size, num_frames, width_his, height_his),
             dtype=theano.config.floatX))
    self.sds_shared = theano.shared(
        np.zeros((batch_size, 1), dtype='int32'),
        broadcastable=(False, True))
    self.next_sds_shared= theano.shared(
        np.zeros((batch_size, 1), dtype='int32'),
        broadcastable=(False, True))

    self.rewards_shared = theano.shared(
        np.zeros((batch_size, 1), dtype=theano.config.floatX),
        broadcastable=(False, True))

    self.actions_shared = theano.shared(
        np.zeros((batch_size, 1), dtype='int32'),
        broadcastable=(False, True))

    self.terminals_shared = theano.shared(
        np.zeros((batch_size, 1), dtype='int32'),
        broadcastable=(False, True))

q_vals = lasagne.layers.get_output(self.l_out, states / input_scale)

    # massage/ unpack states into the right form for multi input network


    q_vals = lasagne.layers.get_output(self.l_out, {'l_in':imgs,
              'l_loc1':locs,  'l_his':hiss,
              'l_dis': sds})

    if self.freeze_interval > 0:

next_q_vals = lasagne.layers.get_output(self.next_l_out,

next_states / input_scale)

        next_q_vals = lasagne.layers.get_output(self.next_l_out,
                                                {'l_in':next_imgs,
                              'l_loc1':next_locs,  'l_his':next_hiss,
                              'l_dis': next_sds})
    else:
        next_q_vals = lasagne.layers.get_output(self.l_out,
                                                {'l_in':next_imgs,
                              'l_loc1':next_locs,  'l_his':next_hiss,
                              'l_dis': next_sds})

        next_q_vals = theano.gradient.disconnected_grad(next_q_vals)

    target = (rewards +
              (T.ones_like(terminals) - terminals) *
              self.discount * T.max(next_q_vals, axis=1, keepdims=True))
    diff = target - q_vals[T.arange(batch_size),
                           actions.reshape((-1,))].reshape((-1, 1))

    if self.clip_delta > 0:
        # If we simply take the squared clipped diff as our loss,
        # then the gradient will be zero whenever the diff exceeds
        # the clip bounds. To avoid this, we extend the loss
        # linearly past the clip point to keep the gradient constant
        # in that regime.
        # 
        # This is equivalent to declaring d loss/d q_vals to be
        # equal to the clipped diff, then backpropagating from
        # there, which is what the DeepMind implementation does.
        quadratic_part = T.minimum(abs(diff), self.clip_delta)
        linear_part = abs(diff) - quadratic_part
        loss = 0.5 * quadratic_part ** 2 + self.clip_delta * linear_part
    else:
        loss = 0.5 * diff ** 2

    if batch_accumulator == 'sum':
        loss = T.sum(loss)
    elif batch_accumulator == 'mean':
        loss = T.mean(loss)
    else:
        raise ValueError("Bad accumulator: {}".format(batch_accumulator))

    params = lasagne.layers.helper.get_all_params(self.l_out)  
    givens_train = {

states: self.states_shared,

next_states: self.next_states_shared,

        imgs :self.imgs_shared,
        next_imgs :self.next_imgs_shared, 
        locs: self.locs_shared ,
        next_locs :self.next_locs_shared,
        hiss :self.hiss_shared,
        next_hiss :self.next_hiss_shared ,
        sds : self.sds_shared ,
        next_sds : self.next_sds_shared ,
        rewards: self.rewards_shared,
        actions: self.actions_shared,
        terminals: self.terminals_shared
    }
    givens_q_val = {

states: self.states_shared,

next_states: self.next_states_shared,

        imgs :self.imgs_shared,
        locs: self.locs_shared ,
        hiss :self.hiss_shared,
        sds : self.sds_shared

rewards: self.rewards_shared,

actions: self.actions_shared,

terminals: self.terminals_shared

    }
    if update_rule == 'deepmind_rmsprop':
        updates = deepmind_rmsprop(loss, params, self.lr, self.rho,
                                   self.rms_epsilon)
    elif update_rule == 'rmsprop':
        updates = lasagne.updates.rmsprop(loss, params, self.lr, self.rho,
                                          self.rms_epsilon)
    elif update_rule == 'sgd':
        updates = lasagne.updates.sgd(loss, params, self.lr)
    else:
        raise ValueError("Unrecognized update: {}".format(update_rule))

    if self.momentum > 0:
        updates = lasagne.updates.apply_momentum(updates, None,
                                                 self.momentum)

    self._train = theano.function([], [loss, q_vals], updates=updates,
                                  givens=givens_train)
    self._q_vals = theano.function([], q_vals,
                                   givens=givens_q_val)

def build_network(self, network_type, output_dim, num_frames, batch_size):
    if network_type == "myOwn":
        return self.build_myNetwork(output_dim, num_frames, batch_size)
    else:
        raise ValueError("Unrecognized network: {}".format(network_type))


def build_myNetwork(self, output_dim, num_frames, batch_size):

    from lasagne.layers import dnn
    l_in = lasagne.layers.InputLayer(
        shape=(batch_size, num_frames, self.width_img, self.height_img)
    )

    l_conv1 = dnn.Conv2DDNNLayer(
        l_in,
        num_filters=32,
        filter_size=(8, 8),
        stride=(3, 3),
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.HeUniform(),
        b=lasagne.init.Constant(.1)
    )

    l_conv2 = dnn.Conv2DDNNLayer(
        l_conv1,
        num_filters=64,
        filter_size=(4, 4),
        stride=(1, 1),
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.HeUniform(),
        b=lasagne.init.Constant(.1)
    )

    l_conv3 = dnn.Conv2DDNNLayer(
        l_conv2,
        num_filters=64,
        filter_size=(3, 3),
        stride=(1, 1),
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.HeUniform(),
        b=lasagne.init.Constant(.1)
    )

    l_loc1 = lasagne.layers.InputLayer(
        shape=(batch_size, num_frames, self.width_loc, self.height_loc)
    )

    n = 64
    l_loc2 = lasagne.layers.DenseLayer(l_loc1, num_units=n)



    #history = np.zeros((batch_size, num_frames, 4, 24*24), dtype=int)


    l_his = lasagne.layers.InputLayer(
        shape=(batch_size, num_frames, self.width_his, self.height_his)
    )

    l_his2 = lasagne.layers.DenseLayer(l_his, num_units=n)


    l_dis =  lasagne.layers.InputLayer(
        shape=(batch_size, num_frames, self.target_dis_size)
    )

    l_dis2 =  lasagne.layers.DenseLayer(l_dis, num_units=n)

    l_conv4 = lasagne.layers.ReshapeLayer(l_conv3, (batch_size, 1, -1))
    l_loc2 = lasagne.layers.ReshapeLayer(l_loc2, (batch_size,1,-1))
    l_his2 = lasagne.layers.ReshapeLayer(l_his2, (batch_size,1,-1))
    l_dis2 = lasagne.layers.ReshapeLayer(l_dis2, (batch_size,1,-1))
    l_merge = lasagne.layers.ElemwiseSumLayer((l_conv4,l_loc2, l_his2, l_dis2 ))

    print (l_conv4.output_shape)
    print l_loc2.output_shape
    print l_his2.output_shape
    print l_dis2.output_shape
    print l_merge.output_shape

    l_hidden1 = lasagne.layers.DenseLayer(
        l_merge,
        num_units=320,
        nonlinearity=lasagne.nonlinearities.rectify,
        W=lasagne.init.HeUniform(),
        b=lasagne.init.Constant(.1)
    )
    #
    l_out = lasagne.layers.DenseLayer(
        l_hidden1,
        num_units=output_dim,
        nonlinearity=None,
        W=lasagne.init.HeUniform(),
        b=lasagne.init.Constant(.1)
    )
    return l_out
def train(self,  imgs ,next_imgs ,  locs,  next_locs ,hiss,
                 next_hiss, sds ,
                 next_sds,
                    actions, rewards, terminals):
    """
    Train one batch.

    Arguments:

    states - b x f x h x w numpy array, where b is batch size,
             f is num frames, h is height and w is width.
    actions - b x 1 numpy array of integers
    rewards - b x 1 numpy array
    next_states - b x f x h x w numpy array
    terminals - b x 1 numpy boolean array (currently ignored)

    Returns: average loss
    """
    self.imgs_shared.set_value(imgs)
    self.next_imgs_shared.set_value(next_imgs)
    self.locs_shared.set_value(locs)
    self.next_locs_share.set_value(next_locs)
    self.hiss_shared.set_value(hiss)
    self.next_hiss_shared.set_value(next_hiss)
    self.sds_shared.set_value(sds)
    self.next_sds_shared.set_value(next_sds)

self.states_shared.set_value(states)

self.next_states_shared.set_value(next_states)

    self.actions_shared.set_value(actions)
    self.rewards_shared.set_value(rewards)
    self.terminals_shared.set_value(terminals)
    if (self.freeze_interval > 0 and
        self.update_counter % self.freeze_interval == 0):
        self.reset_q_hat()
    loss, _ = self._train()
    self.update_counter += 1
    return np.sqrt(loss)

def q_vals(self, img , loc,  his, sd):

states = np.zeros((self.batch_size, self.num_frames, self.input_height,

self.input_width), dtype=theano.config.floatX)

states[0, ...] = state

self.states_shared.set_value(states)

    imgs = np.zeros((self.batch_size, self.num_frames, self.height_img, 
                     self.width_img), dtype=theano.config.floatX)
    imgs[0, ...] = img
    locs = np.zeros((self.batch_size, self.num_frames, self.height_loc, 
                     self.width_loc), dtype=theano.config.floatX)
    locs[0, ...] = loc

    hiss = np.zeros((self.batch_size, self.num_frames, self.height_his, 
                     self.width_his), dtype=theano.config.floatX)

    hiss[0,...] = his

    sds = np.zeros((self.batch_size, self.num_frames, self.target_dis_size),
                    dtype='int32')

    sds[0, ...] = sd

    self.imgs_shared.set_value(imgs)
    self.locs_shared.set_value(locs)
    self.hiss_shared.set_value(hiss)
    self.sds_shared.set_value(sds)

    return self._q_vals()[0]

def choose_action(self, img , loc,  his, sd, epsilon):
    if self.rng.rand() < epsilon:
        return self.rng.randint(0, self.num_actions)
    q_vals = self.q_vals(img , loc,  his, sd)
    return np.argmax(q_vals)

def reset_q_hat(self):
    all_params = lasagne.layers.helper.get_all_param_values(self.l_out)
    lasagne.layers.helper.set_all_param_values(self.next_l_out, all_params)

def main():

net = DeepQLearner(84, 84, 16, 4, .99, .00025, .95, .95, 10000,

#32, 'nature_cuda')

width_img = 24
height_img = 24
width_loc = 1
height_loc = 3
width_his = width_img *height_img
height_his = 4
target_dis_size = 1
num_actions = 9
num_frames = 1
discount = 0.99
learning_rate = .00025
rho = 0.95
rms_epsilon = 0.95
momentum = 0.95
clip_delta = 1
freeze_interval = 100
batch_size = 100
network_type = 'myOwn'
update_rule = 'deepmind_rmsprop' 
batch_accumulator ='sum'
rng = np.random.RandomState(123456)

net = DeepQLearner(width_img,
             height_img,
             width_loc,
             height_loc,
             width_his,
             height_his,
             target_dis_size,
            num_actions, num_frames, discount, learning_rate, rho,
             rms_epsilon, momentum, clip_delta, freeze_interval,
             batch_size, network_type, update_rule,
             batch_accumulator, rng)

if name == 'main':
main()

something wrong with referenced Lasagne

the Lasagne.layers.Conv2DCCLayer used the default nonlinearity(activation), which use non existed theano.tensor.nnet.relu

The following code is in Line 142, Lasagne.nonlinearities.py

rectify

def rectify(x):
"""Rectify activation function :math:\\varphi(x) = \\max(0, x)

Parameters
----------
x : float32
    The activation (the summed, weighted input of a neuron).

Returns
-------
float32
    The output of the rectify function applied to the activation.
"""
return theano.tensor.nnet.relu(x)

Reproducibility

Current implementation does not support reproducibility which might be useful for research purposes. I already implemented it and tested (get 100% reproducible results). If it's of interest, please, let me know, I can do a PR.

segfault when pickling network file

Here's the relevant part of the output from ale_run.py:

Initializing ALE RL-Glue ...
Using gpu device 0: GeForce GTX 780
/usr/local/lib/python2.7/dist-packages/theano/tensor/opt.py:2536: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if (replace_x == replace_y and
/usr/local/lib/python2.7/dist-packages/theano/gof/cmodule.py:289: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  rval = __import__(module_name, {}, {}, [module_name])
Traceback (most recent call last):
  File "./rl_glue_ale_agent.py", line 427, in <module>
    main()
  File "./rl_glue_ale_agent.py", line 423, in main
    AgentLoader.loadAgent(NeuralAgent())
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/AgentLoader.py", line 58, in loadAgent
    client.runAgentEventLoop()
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 144, in runAgentEventLoop
    switch[agentState](self)
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 142, in <lambda>
    Network.kAgentMessage: lambda self: self.onAgentMessage() }
  File "/usr/local/lib/python2.7/dist-packages/rlglue/agent/ClientAgent.py", line 87, in onAgentMessage
    reply = self.agent.agent_message(message)
  File "./rl_glue_ale_agent.py", line 396, in agent_message
    cPickle.dump(self.network, net_file, -1)
RuntimeError: maximum recursion depth exceeded
Segmentation fault (core dumped)

The FutureWarning isn't a big deal, and the Theano mailing list says the RuntimeWarning isn't an issue either.

The real problem is on the line cPickle.dump(self.network, net_file, -1), when it hits the recursion limit and segfaults. It managed to produce about 15MB of output in network_file_1.pkl before crashing, but when I try to load the pickle it complains that the object is truncated.

setup.py numpy includes

In order to get setup.py to work properly, I needed to add:

import numpy

and

include_dirs=[numpy.get_include()]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.