jonathanraiman / theano_lstm Goto Github PK

:microscope: Nano size Theano LSTM module

License: Other

Python 57.99% Jupyter Notebook 42.01%

machine-learning recurrent-networks theano python lstm gru adadelta dropout automatic-differentiation neural-network tutorial

theano_lstm's Introduction

Small Theano LSTM recurrent network module

@author: Jonathan Raiman @date: December 10th 2014

Implements most of the great things that came out in 2014 concerning recurrent neural networks, and some good optimizers for these types of networks.

Key Features

This module contains several Layer types that are useful for prediction and modeling from sequences:

A non-recurrent Layer, with a connection matrix W, and bias b
A recurrent RNN Layer that takes as input its previous hidden activation and has an initial hidden activation
A recurrent LSTM Layer that takes as input its previous hidden activation and memory cell values, and has initial values for both of those
An Embedding layer that contains an embedding matrix and takes integers as input and returns slices from its embedding matrix (e.g. word vectors)
A non-recurrent GatedInput, with a connection matrix W, and bias b, that multiplies a single scalar to each input (gating jointly multiple inputs)
Deals with exploding and vanishing gradients with a subgradient optimizer (Adadelta) and element-wise gradient clipping (à la Alex Graves)

This module also contains the SGD, AdaGrad, and AdaDelta gradient descent methods that are constructed using an objective function and a set of theano variables, and returns an updates dictionary to pass to a theano function.

Quick Tutorial

See a short tutorial for sequence forecasting here. Or read on for some usage examples.

Usage

Here is an example of usage with stacked LSTM units, using Adadelta to optimize, and using a scan operation from Theano (a symbolic loop for backpropagation through time).

dropout = 0.0

model = StackedCells(4, layers=[20, 20], activation=T.tanh, celltype=LSTM)
model.layers[0].in_gate2.activation = lambda x: x
model.layers.append(Layer(20, 2, lambda x: T.nnet.softmax(x)[0]))

# in this example dynamics is a random function that takes our
# output along with the current state and produces an observation
# for t + 1

def step(x, *prev_hiddens):
    new_states = stacked_rnn.forward(x, prev_hiddens, dropout)
    return [dynamics(x, new_states[-1])] + new_states[:-1]

initial_obs = T.vector()
timesteps = T.iscalar()

result, updates = theano.scan(step,
                          n_steps=timesteps,
                          outputs_info=[dict(initial=initial_obs, taps=[-1])] + [dict(initial=layer.initial_hidden_state, taps=[-1]) for layer in model.layers if hasattr(layer, 'initial_hidden_state')])

target = T.vector()

cost = (result[0][:,[0,2]] - target[[0,2]]).norm(L=2) / timesteps

updates, gsums, xsums, lr, max_norm = \
	create_optimization_updates(cost, model.params, method='adadelta')

update_fun = theano.function([initial_obs, target, timesteps], cost, updates = updates, allow_input_downcast=True)
predict_fun = theano.function([initial_obs, timesteps], result[0], allow_input_downcast=True)

for example, label in training_set:
	c = update_fun(example, label, 10)

Minibatch usage

Suppose you now have many sequences (of equal length -- we'll generalize this later). Then training can be done in batches:

model = StackedCells(4, layers=[20, 20], activation=T.tanh, celltype=LSTM)
model.layers[0].in_gate2.activation = lambda x: x
model.layers.append(Layer(20, 2, lambda x: T.nnet.softmax(x)[0]))

# in this example dynamics is a function that simulates the behavior of a double
# pendulum and takes our current state and produces an observation
# for t + 1
def dynamics(x, u):
    dydx = T.alloc(0.0, 4)
    dydx = T.set_subtensor(dydx[0], x[1])
    del_ = x[2]-x[0]
    den1 = (M1+M2)*L1 - M2*L1*T.cos(del_)*T.cos(del_)
    dydx = T.set_subtensor(dydx[1],\n",
        (  M2*L1      *  x[1] * x[1] * T.sin(del_) * T.cos(del_)
           + M2*G       *  T.sin(x[2]) * T.cos(del_) +
             M2*L2      *  x[3] * x[3] * T.sin(del_)
           - (M1+M2)*G  *  T.sin(x[0]))/den1 )
    dydx = T.set_subtensor(dydx[2], x[3])

    den2 = (L2/L1)*den1
    dydx = T.set_subtensor(dydx[3], (-M2*L2  *   x[3]*x[3]*T.sin(del_) * T.cos(del_)
               + (M1+M2)*G   *   T.sin(x[0])*T.cos(del_)
               - (M1+M2)*L1  *   x[1]*x[1]*T.sin(del_)
               - (M1+M2)*G   *   T.sin(x[2]))/den2  + u )
    return x + dydx * dt

def step(x, *prev_hiddens):
    new_states = stacked_rnn.forward(x, prev_hiddens, dropout)
    return [dynamics(x, new_states[-1])] + new_states[:-1]

# switch to a matrix of observations:
initial_obs = T.imatrix()
timesteps = T.iscalar()

result, updates = theano.scan(step,
                          n_steps=timesteps,
                          outputs_info=[dict(initial=initial_obs, taps=[-1])] + [dict(initial=layer.initial_hidden_state, taps=[-1]) for layer in model.layers if hasattr(layer, 'initial_hidden_state')])

target = T.ivector()

cost = (result[0][:,:,[0,2]] - target[:,[0,2]]).norm(L=2) / timesteps

updates, gsums, xsums, lr, max_norm = \
	create_optimization_updates(cost, model.params, method='adadelta')

update_fun = theano.function([initial_obs, target, timesteps], cost, updates = updates, allow_input_downcast=True)
predict_fun = theano.function([initial_obs, timesteps], result[0], allow_input_downcast=True)

for minibatch, labels in minibatches:
	c = update_fun(minibatch, label, 10)

Minibatch usage with different sizes

Generalization can be made to different sequence length if we accept the minor cost of forward-propagating parts of our graph we don't care about. To do this we make all sequences the same length by padding the end of the shorter ones with some symbol. Then use a binary matrix of the same size than all your minibatch sequences. The matrix has a 1 in areas when the error should be calculated, and zero otherwise. Elementwise mutliply this mask with your output, and then apply your objective function to this masked output. The error will be obtained everywhere, but will be zero in areas that were masked, yielding the correct error function. While there is some waste computation, the parallelization can offset this cost and make the overall computation faster.

MaskedLoss usage

To use different length sequences, consider the following approach:

you have sequences y_1, y_2, ..., y_n, and labels l_1, l_2, ..., l_n.
pad all the sequences to the longest sequence y_k, and form a matrix Y of all padded sequences
similarly form the labels at each timestep for each padded sequence (with zeros, or some other symbol for labels in padded areas)
then record the length of the true labels (codelengths) needed before padding c_1, c_2, ..., c_n, and the length of the sequences before padding l_1, l_2, ..., l_n

pass the lengths, targets, and predictions to the masked loss as follows:

  predictions, updates = theano.scan(prediction_step, etc...)

  error = masked_loss(
          predictions,
          padded_labels,
          codelengths,
          label_starts).mean()

Visually this goes something like this, for the case with three inputs, three outputs, but a single label for the final output:

inputs [ x_1 x_2 x_3 ]

outputs [ p_1 p_2 p_3 ]

labels [ ... ... l_1 ]

then we would have a matrix x with x_1, x_2, x_3, and predictions in the code above would contain p_1, p_2, p_3. We would then pass to masked_loss the codelength [ 1 ], since there is only "l_1" to predict, and the label_starts [ 2 ], indicating that errors should be computed at the third prediction (with zero index).

Dropout Usage in Theano Scan

To get dropout to work and be dynamically modifyiable without recompiling let's consider the following usage example.

First we define a variable with the likelihood that a neuron will be dropped (randomly set to 0):

dropout = theano.shared(np.float64(0.3).astype(theano.config.floatX))
deterministic = False # for now

Create some model:

model = theano_lstm.StackedCells(50, layers=[100], celltype=theano_lstm.LSTM, activation=T.tanh)

Now we want to introduce dropout noise between the input and the LSTM. To use Dropout outside of a Theano scan loop you could simply multiply elementwise by a binomial random variable (see examples here), but if you plan on using recurrent networks with a Theano scan you need to call your random numbers outside of the loop.

In order to keep track of these dropout activations we'll generate masks. Masks are a list with all the realizations of binomials. We generate this list with MultiDropout, a special function in the theano_lstm module that takes different hidden layer sizes and returns a list of matrices with binomial random variable realizations inside:

if dropout.get_value() > 0:
    if deterministic:
        # just multiply by the likelihood of being kept:
        masks = [np.float32(1.) - self.dropout for i in range(2)]
    else:
        shapes = [50, 100]
        masks = theano_lstm.MultiDropout( [(x.shape[0], shape) for shape in shapes] if x.ndim > 1 else shapes,
                                                        self.dropout)
else:
    masks = []

Now our loop forward function is as follows:

def step(obs, hidden_state, *masks):
    new_state = model.forward(obs, [hidden_state], list(masks))
    return new_state[1]

We pass it to Theano's scan:

result, _ = theano.scan(step,
	sequences     = seq,
	non_sequences = masks,
	outputs_info  = [dict(initial=model.layers[0].initial_hidden_state, taps=[-1])]
	)

And We're done.

Note: To not use Masks pass an empty list [] instead.

theano_lstm's People

Contributors

Stargazers

Watchers

Forkers

kastnerkyle amoliu zhangaustin mingmingyang stevenlol mheilman rockt jfmackay peiswang truongdo xi-studio lixia244 julianhough caiyunapp tandakun moming1990 qlycool zhoujialinmumu tjevgerres tigerneil xuanhan863 kuyun-zhangyang fireae dsisds xyy19920105 clonehulk beronx86 hang-wu duguyue100 lanlianhuaer saltydizz ehsankddm asampat3090 bboalimoe stray-leone ericzhouh starimpact wanghaox zerkh plsang thunderlbc dapeng2018 lvchigo le02146 milesqli kdjyss wgmueller1 dennisliu94 prm10 sdcpku mikeseven g-wang wolfhu pluiefox ericjpfk slacklife cxysteven shuaijiang sudanenator pineking txd888 yehaibuaa anirudh9119 rayz0620 sandy4321 zhengkaifu jolinxql junteudjio xinghedyc haohaohaohaohaohaozhang hitwsl keerthanpg antoniovilarinholopes karamshuk imclab xifengbishu carps soroushmehr lchf2 bullud vanova forloverj arasharchor meatcomputer juary88 keaideii makai281 noskill tangbogreat cnnjqzr imzwz boyihu wangyifeigit wangwocg yyp009 yellolongyi wilburone pwhrm liviust eastlondoner

theano_lstm's Issues

Question in Example Code

I am new to theano, so maybe my question does not make any sense.
In the example here, in the create_prediction function,

if greedy:
            outputs_info = [dict(initial=self.priming_word, taps=[-1])] + [initial_state_with_taps(layer) for layer in self.model.layers[1:-1]]
            result, _ = theano.scan(fn=step,
                                n_steps=200,
                                outputs_info=outputs_info)
else:
            outputs_info = [initial_state_with_taps(layer, num_examples) for layer in self.model.layers[1:]]
            result, _ = theano.scan(fn=step,
                                    sequences=[inputs.T], # slice over each time step, so step fn. gets a sentence 
                                outputs_info=outputs_info)

why did you choose n_steps to be 200 for greedy? from what I understand it should be the same as the length of the longest sentence(because you padded things in the input matrix)

super() syntax for python 2

super needs args in python 2. eg:

super().init(*args)

becomes:

super(RNN, self).init(*args)

Issue with running tutorial

Everything goes fine until the very end

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    858         try:
--> 859             outputs = self.fn()
    860         except Exception:

ValueError: softmaxes.shape[0] (100) != y_lengths.shape[0] (0)

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-13-8890488df5a1> in <module>()
      1 # train:
      2 for i in range(10000):
----> 3     error = model.update_fun(numerical_lines, numerical_lengths)
      4     if i % 100 == 0:
      5         print("epoch %(epoch)d, error=%(error).2f" % ({"epoch": i, "error": error}))

/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    869                     node=self.fn.nodes[self.fn.position_of_error],
    870                     thunk=thunk,
--> 871                     storage_map=getattr(self.fn, 'storage_map', None))
    872             else:
    873                 # old-style linkers raise their own exceptions

/usr/local/lib/python3.4/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    312         # extra long error message in that case.
    313         pass
--> 314     reraise(exc_type, exc_value, exc_trace)
    315 
    316 

/usr/local/lib/python3.4/dist-packages/six.py in reraise(tp, value, tb)
    683             value = tp()
    684         if value.__traceback__ is not tb:
--> 685             raise value.with_traceback(tb)
    686         raise value
    687 

/usr/local/lib/python3.4/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    857         t0_fn = time.time()
    858         try:
--> 859             outputs = self.fn()
    860         except Exception:
    861             if hasattr(self.fn, 'position_of_error'):

ValueError: softmaxes.shape[0] (100) != y_lengths.shape[0] (0)
Apply node that caused the error: <theano_lstm.masked_loss.MaskedLoss object at 0x7f374891eb70>(InplaceDimShuffle{2,0,1}.0, Subtensor{::, int64::}.0, Elemwise{add,no_inplace}.0, Alloc.0)
Toposort index: 145
Inputs types: [TensorType(float64, 3D), TensorType(int32, matrix), TensorType(int32, vector), TensorType(int32, vector)]
Inputs shapes: [(100, 55, 51), (100, 55), (0,), (0,)]
Inputs strides: [(8, 40800, 800), (224, 4), (4,), (4,)]
Inputs values: ['not shown', 'not shown', array([], dtype=int32), array([], dtype=int32)]
Outputs clients: [[Sum{acc_dtype=float64}(<theano_lstm.masked_loss.MaskedLoss object at 0x7f374891eb70>.0), Shape_i{0}(<theano_lstm.masked_loss.MaskedLoss object at 0x7f374891eb70>.0)]]

Backtrace when the node is created:
  File "/home/simon/Programs/theano_lstm/theano_lstm/masked_loss.py", line 327, in make_node
    T.Tensor(dtype=softmaxes.dtype, broadcastable=[False])()])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Versions of software below

numpy==1.10.4
scipy==0.17.0
Theano==0.7.0
theano-lstm==0.0.14

Examples are probably outdated

Hi, the first example contains this line:
new_states = stacked_rnn.forward(x, prev_hiddens, dropout)

But there is no mention of stacked_rrn variable anywhere.
Also there is no mention of dynamics from line:
return [dynamics(x, new_states[-1])] + new_states[:-1]

Speed Benchmark

Thanks for making this code available. LSTMs are so hot right now.

I'm wondering if you know how your code compares to the rnnlm package in terms of speed (using yours with or without a GPU)? I've found rnnlm impossibly slow if you don't want to do their class-based lm trick.

Error with tutorial local_argmax_pushdown

Hello,

I got this issue: with the config
Win64bits with Lapack available
scipy==0.16.0
Cython==0.22.1
numpy==1.9.2
Theano==0.7.0
theano-lstm==0.0.15

Do you have any idea ?

Error Message:

D:_devs\Python01\WinPython-64-2710\python-2.7.10.amd64\lib\site-packages\theano\tensor\var.py:422: UserWarning: Warning, Cannot compute test value: input 0 (<TensorType(int32, matrix)>) of Op Subtensor{::, int64:int64:}(<TensorType(int32, matrix)>, Constant{0}, Constant{-1}) missing default value
lambda entry: isinstance(entry, Variable)))
D:_devs\Python01\WinPython-64-2710\python-2.7.10.amd64\lib\site-packages\theano\tensor\var.py:273: UserWarning: Warning, Cannot compute test value: input 0 (Subtensor{::, int64:int64:}.0) of Op Shape(Subtensor{::, int64:int64:}.0) missing default value
shape = property(lambda self: theano.tensor.basic.shape(self))
D:_devs\Python01\WinPython-64-2710\python-2.7.10.amd64\lib\site-packages\theano\tensor\var.py:422: UserWarning: Warning, Cannot compute test value: input 0 (Shape.0) of Op Subtensor{int64}(Shape.0, Constant{0}) missing default value
lambda entry: isinstance(entry, Variable)))
D:_devs\Python01\WinPython-64-2710\python-2.7.10.amd64\lib\site-packages\theano\scan_module\scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
from scan_perform.scan_perform import *
ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR:theano.gof.opt:Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR:theano.gof.opt:TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):

license?

Hi, sorry to bother you, but I was wondering if you might put a license on the code.

(You're probably already aware of the following very useful site, but just in case: http://choosealicense.com/)

Thanks.

Bidirectional RNN

Hi @JonathanRaiman
your rnn package looks clean and easily understandable, I am gonna test it out, but i have question, do you intend to implement a bidirectional RNN?

Question about fixing node value

Hi,

I was wondering that is there a way to fix any node value in the training? for an example, I want to fix the value of the top two nodes as 0 and 1 when I feed one category of data, and make them 1 and 0 when I feed in another category of data

LSTM model equations

The code says it implements the version of the LSTM from Graves et al. (2013), which I assume is this http://www.cs.toronto.edu/~graves/icassp_2013.pdf or http://www.cs.toronto.edu/~graves/asru_2013.pdf. However, it looks like the LSTM equations in those papers have both the output layer values and memory cell values from the previous time step as input to the gates.

E.g., in equation 3 of http://www.cs.toronto.edu/~graves/icassp_2013.pdf:

i_t = σ (W_xi xt + W_hi ht−1 + W_ci ct−1 + bi)

However, it looks like the code is doing the following:

i_t = σ (W_xi xt + W_hi ht−1 + bi)

Am I missing something here? Is there another LSTM paper this is based on?

I doubt there's much of a practical difference between these two formulations, but it would be good if the documentation were accuracy. Sorry if I'm misunderstanding something here (also sorry for the messy equations above).

Optimization failure in Tutorial

Hallo Jonathan,

I was exited to test your library. I wanted to test the lib after installing by running the Tutorial code. However, I get several

ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown

from my theano bleeding-edge Version and strangely after some compilation the error rises rapidly. The complete output follows:

/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/scan_module/scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/opt.py", line 1488, in process_node
    replacements = lopt.transform(node)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/nnet/nnet.py", line 1471, in local_argmax_pushdown
    return tensor._max_and_argmax(pre_x, axis)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/opt.py", line 1488, in process_node
    replacements = lopt.transform(node)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/nnet/nnet.py", line 1477, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/opt.py", line 1488, in process_node
    replacements = lopt.transform(node)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/nnet/nnet.py", line 1477, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/gof/op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "/home/karen/.local/lib/python2.7/site-packages/Theano-0.7.0-py2.7.egg/theano/tensor/basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

epoch 0, error=4619.00
the catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult catapult
epoch 100, error=392563.62
epoch 200, error=660598.00
epoch 300, error=914458.50
epoch 400, error=1159042.62
epoch 500, error=1393995.12
the calendar can , the lantern stole .
epoch 600, error=1621452.25
epoch 700, error=1842611.12
epoch 800, error=2058382.50
epoch 900, error=2269960.25
epoch 1000, error=2478081.75
the calendar can , a paper duck , the wrangler can .

Could you provide what lib versions you are using exactly
(i.e. theano, numpy, scipy, blas, ect) so that I can track what might go wrong for me. Thank you very much for help.

Karen

Masking operation errornous

Hi @JonathanRaiman , first off thanks for setting up this repo. Learned a lot from your implementation.

I have a question regarding the masking operation for variable length inputs. You mentioned in docs that

Elementwise mutliply this mask with your output, and then apply your objective function to this masked output. The error will be obtained everywhere, but will be zero in areas that were masked, yielding the correct error function.

But by doing so you only cuts off the back-propagation (BP) paths from masked outputs, whereas the BP paths from unmasked outputs via masked hidden units remain. The resulting loss function is still errornous.

I notice from other LSTM implementations (e.g. Lasagne) that the usual approach is to use the mask to switch between previous hidden states (if current input is masked) and computed hidden states (otherwise). In this way, both type of unwanted BP paths (masked outputs -> masked hidden -> weight & unmasked outputs -> masked hidden -> weight.)

Plz correct me if I'm wrong. Am I missing something?

Sequences of vectors

Hello,

I have some sequence of vectors of the same length. Now I would like to train LSTM on train sample of this sequence and then make the network predict new sequence of vectors of the same length as input based on several priming vectors.

I cannot find easily applicable example anywhere on the internet and I thought that your code could be suitable. Couldn't you please edit your example with words or add some new, which would done what I'm looking for?

Thank you

Simple example

Hello!

Can you please provide a simple example of usage?

Thanks!

Problem running the example code in 32-bit OS

First and foremost, THANK YOU for this AWESOME LSTM lib!!
I am new to all of these, RNN, Theano, and even Python, so please forgive my ignorant question:

I download the Tutorial.ipynb, and run it line by line, however it stuck at:

# construct model & theano functions:
model = Model(
    input_size=10,
    hidden_size=10,
    vocab_size=len(vocab),
    stack_size=1, # make this bigger, but makes compilation slow
    celltype=RNN # use RNN or LSTM
)
model.stop_on(vocab.word2index["."])

where the error message is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-13e620ce56bd> in <module>()
      5     vocab_size=len(vocab),
      6     stack_size=1, # make this bigger, but makes compilation slow
----> 7     celltype=RNN # use RNN or LSTM
      8 )
      9 model.stop_on(vocab.word2index["."])

<ipython-input-8-398b63091e52> in __init__(self, hidden_size, input_size, vocab_size, stack_size, celltype)
     62         self.srng = T.shared_randomstreams.RandomStreams(np.random.randint(0, 1024))
     63         # create symbolic variables for prediction:
---> 64         self.predictions = self.create_prediction()
     65         # create symbolic variable for greedy search:
     66         self.greedy_predictions = self.create_prediction(greedy=True)

<ipython-input-8-398b63091e52> in create_prediction(self, greedy)
    107                                 outputs_info=outputs_info)
    108         else:
--> 109             outputs_info = [initial_state_with_taps(layer, num_examples) for layer in self.model.layers[1:]]
    110             result, _ = theano.scan(fn=step,
    111                                 sequences=[inputs.T],

<ipython-input-8-398b63091e52> in initial_state_with_taps(layer, dimensions)
     34 def initial_state_with_taps(layer, dimensions = None):
     35     """Optionally wrap tensor variable into a dict with taps=[-1]"""
---> 36     state = initial_state(layer, dimensions)
     37     if state is not None:
     38         return dict(initial=state, taps=[-1])

<ipython-input-8-398b63091e52> in initial_state(layer, dimensions)
     30         return layer.initial_hidden_state if has_hidden(layer) else None
     31     else:
---> 32         return matrixify(layer.initial_hidden_state, dimensions) if has_hidden(layer) else None
     33 
     34 def initial_state_with_taps(layer, dimensions = None):

<ipython-input-8-398b63091e52> in matrixify(vector, n)
     18 
     19 def matrixify(vector, n):
---> 20     return T.repeat(T.shape_padleft(vector), n, axis=0)
     21 
     22 def initial_state(layer, dimensions = None):

C:\Anaconda\lib\site-packages\theano\tensor\extra_ops.pyc in repeat(x, repeats, axis)
    358     .. versionadded:: 0.6
    359     """
--> 360     return RepeatOp(axis=axis)(x, repeats)
    361 
    362 

C:\Anaconda\lib\site-packages\theano\gof\op.pyc in __call__(self, *inputs, **kwargs)
    397         """
    398         return_list = kwargs.pop('return_list', False)
--> 399         node = self.make_node(*inputs, **kwargs)
    400         if self.add_stack_trace_on_call:
    401             self.add_tag_trace(node)

C:\Anaconda\lib\site-packages\theano\tensor\extra_ops.pyc in make_node(self, x, repeats)
    257                     ("dtypes %s are not supported by numpy.repeat "
    258                      "for the 'repeats' parameter, "
--> 259                      % numpy_unsupported_dtypes), repeats.dtype)
    260 
    261         if self.axis is None:

TypeError: not all arguments converted during string formatting

EDIT 1:
In the beginning, I thought maybe my Theano 0.6.0 is a bit old, so I update it to 0.7.0, but the error remains.

EDIT 2:
I fixed the syntax typo and got the real error message:

TypeError: ("dtypes ('uint32', 'int64', 'uint64') are not supported by numpy.repeat for the 'repeats' parameter, ", 'int64')

And then I notice:

        ptr_bitwidth = theano.gof.local_bitwidth()
        if ptr_bitwidth == 64:
            numpy_unsupported_dtypes = ('uint64',)
        if ptr_bitwidth == 32:
            numpy_unsupported_dtypes = ('uint32', 'int64', 'uint64')

It is because I use 32 bit OS, the somehow default int64 type is not supported,
So you guys with 64 bit OS shouldn't have experienced this problem.

EDIT 3:
So I tried to force cast the type to get over this error:

num_examples = T.cast(num_examples, 'int32')

Then I just got blown over by a huge load of error and warnings

WARNING (theano.gof.compilelock): Overriding existing lock by dead process '4952' (I am process '4684')
WARNING:theano.gof.compilelock:Overriding existing lock by dead process '4952' (I am process '4684')
C:\Anaconda\lib\site-packages\theano\scan_module\scan_perform_ext.py:133: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility
  from scan_perform.scan_perform import *
ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR:theano.gof.opt:Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR:theano.gof.opt:TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1448, in local_argmax_pushdown
    return tensor._max_and_argmax(pre_x, axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR:theano.gof.opt:Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1448, in local_argmax_pushdown
    return tensor._max_and_argmax(pre_x, axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR:theano.gof.opt:Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR:theano.gof.opt:TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1454, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR:theano.gof.opt:Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1454, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR (theano.gof.opt): Optimization failure due to: local_argmax_pushdown
ERROR:theano.gof.opt:Optimization failure due to: local_argmax_pushdown
ERROR (theano.gof.opt): TRACEBACK:
ERROR:theano.gof.opt:TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1454, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

ERROR:theano.gof.opt:Traceback (most recent call last):
  File "C:\Anaconda\lib\site-packages\theano\gof\opt.py", line 1493, in process_node
    replacements = lopt.transform(node)
  File "C:\Anaconda\lib\site-packages\theano\tensor\nnet\nnet.py", line 1454, in local_argmax_pushdown
    ('x', 0))(pre_bias), axis)
  File "C:\Anaconda\lib\site-packages\theano\gof\op.py", line 507, in __call__
    node = self.make_node(*inputs, **kwargs)
  File "C:\Anaconda\lib\site-packages\theano\tensor\basic.py", line 1252, in make_node
    raise TypeError("MaxAndArgmax needs a constant axis")
TypeError: MaxAndArgmax needs a constant axis

EDIT4
Just found out the new errors are similar to issue12, which mentioned the error have something to do with scipy. So update it
scipy: 0.15.1-np19py27_0 --> 0.16.0-np19py27_0
Run the example code again, and the error remains. Still stuck. Frustrating.

These bugs are really newbie unfriendly, Last time I use pandas and got stuck with some pytable bugs, spend way too much time trying to develop walk-around. I don't want to spend time fighting the tools, I just want to get them up and running FAST so I can do my actual work.

Setting the learning rate

Could you enlighten how can we change the learning rate using this wonderful package?

Thanks

ipynb is down

The ipynb was not fixed..