element-research / rnn Goto Github PK

View Code? Open in Web Editor NEW

938.0 938.0 314.0 2.23 MB

Recurrent Neural Network library for Torch7's nn

License: BSD 3-Clause "New" or "Revised" License

CMake 0.14% Lua 99.86%

rnn's People

Contributors

Stargazers

Watchers

Forkers

jakezhaojb bogdanionutcirstea vseledkin nicholas-leonard louissmit mingmingyang wavelets mrgloom ivendrov milestonesvn yanweifu boknilev oztc bartvm gbouchar eulerreich chuckcho dreadlord1984 mseesquared salemameen ml-lab codeaudit richi91 rockt sagarwaghmare69 ywelement charubutr adamw005 temerick mathrho luisandresilva huayong diz-vara vgire wangg12 douwekiela pengsun wgapl lichengunc fedorajzf cheng6076 jnhwkim angelifrangite binderwang mtanana wishstudio a-mystery-88 sennendoko petrjanda jundengdeng tigerneil jrich9999 parthchadha caomw xtrigold zencoding rracinskij shockley strategist922 gagannarula mkorpusik iamalbert ethanabrooks jrbtaylor zhaobozb sherjilozair miradel51 alexirpan kalyanp ssampang manzilzaheer faatcat ai42 benathi anilcs13m tianlongwang thenghiapham githubsniper luhan0012002 hsheil gevangelopoulos boyu-wang ml-ai-nlp-ir alband sisirkoppaka jz3707 juesato jsuit lijian8 hitluobin multipath vyouman qiuyuew fabiencelier bilguun stjjhi silunwang zaizhongche subercui pygmalion6636

rnn's Issues

Adding minimalistic examples

This is not a bug issues, but it would be interesting to have a few minimalistic working toy examples on the usage with and without Sequencer .

Adding Examples for Different RNN Structures

Hi,

I wanted to suggest adding examples for different architectures using Recurrent structures.

For example:

Image Link.

I've seen that you've added some example in some of the issues (#51,#21).
Maybe adding them explicitly in the examples folder can reduce confusion.

Siavash

How to use MaskZero with LSTM and nn.ClassNLLCriterion for variable length squences

Hi Guys,
I tried to make use LSTM to deal with variable length sequences. But I failed to do that by using the MaskZero function. Could you please help me out? Thanks a lot!!

Here a minimal code example of what I mean:

require 'rnn'
require 'optim'

inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30

x, y1 = {}, {}

for i = 1, numSeq do
   local seqLength = torch.random(1,seqLengthMax)
   local temp = torch.zeros(seqLengthMax, inSize)
   local targets ={}
   if seqLength == seqLengthMax then
         targets = (torch.rand(seqLength)*numTargetClasses):ceil()
   else
      targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
   end
      temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
   table.insert(x, temp)
   table.insert(y1, targets)
end

model = nn.Sequencer(
   nn.Sequential()
      :add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
      :add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
      :add(nn.MaskZero(nn.LogSoftMax(),1))
)

criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))

output = model:forward(x)
print(output[1])

err = criterion:forward(output, y1)
print(err)

return undeclared variable

Not sure if it matters. In LSTM.lua , line 279 and 300 returns an global variable gradInput, which I guess is nil since is undeclared.

Recurrent/Modules With Table Output

Hi,

I just noticed that Recurrent apparently can't be used with modules whose outputs are tables rather than tensors. The error comes in AbstractRecurrent.lua:48 where 'new' can't be called on a table. Was this intentional?

Thanks,
Shawn

Unit test fails.

The current repository fdc3b21 has issues in unit test.

$ th -lrnn -e "dofile('test/test.lua'); rnn.test()"
Running 18 tests
________*_____**__  ==> Done Completed 1876 asserts in 18 tests with 22 errors
--------------------------------------------------------------------------------
Recurrence
Recurrence fwd err 2
 TensorEQ(==) violation   val=0.1917265403829, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:2134: in function <test/test.lua:2105>

--------------------------------------------------------------------------------
Recurrence
Recurrence fwd err 3
 TensorEQ(==) violation   val=0.18550556665035, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:2134: in function <test/test.lua:2105>

--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 1
 TensorEQ(==) violation   val=0.19031328301695, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:2141: in function <test/test.lua:2105>

--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 2
 TensorEQ(==) violation   val=0.12799707568363, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:2141: in function <test/test.lua:2105>

--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 3
 TensorEQ(==) violation   val=0.16196581750727, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:2141: in function <test/test.lua:2105>

--------------------------------------------------------------------------------
Recurrence
 Function call failed
...rs/Calvin/torch/install/share/lua/5.1/rnn/Recurrence.lua:167: expecting at least one updateOutput
stack traceback:
    [C]: in function 'assert'
    ...rs/Calvin/torch/install/share/lua/5.1/rnn/Recurrence.lua:167: in function 'updateGradInputThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
    test/test.lua:2144: in function <test/test.lua:2105>
    [C]: in function 'xpcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    test/test.lua:2399: in function 'test'
    [string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
    [C]: in function 'pcall'
    ...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
    [C]: at 0x0101c582f0

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 1
 TensorEQ(==) violation   val=0.0055855133430601, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1955: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 2
 TensorEQ(==) violation   val=0.20695973077736, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1954: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 2
 TensorEQ(==) violation   val=0.063109996521219, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1955: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 3
 TensorEQ(==) violation   val=0.18835985089823, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1954: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 3
 TensorEQ(==) violation   val=0.087992354034426, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1955: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 4
 TensorEQ(==) violation   val=0.20735178234061, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1954: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 4
 TensorEQ(==) violation   val=0.062143137089905, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1955: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 5
 TensorEQ(==) violation   val=0.21002205166161, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1954: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 5
 TensorEQ(==) violation   val=0.072276017437007, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1955: in function <test/test.lua:1905>

--------------------------------------------------------------------------------
Recursor
 Function call failed
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
    [C]: in function 'assert'
    /Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
    test/test.lua:1958: in function <test/test.lua:1905>
    [C]: in function 'xpcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    test/test.lua:2399: in function 'test'
    [string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
    [C]: in function 'pcall'
    ...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
    [C]: at 0x0101c582f0

--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
 TensorEQ(==) violation   val=0.17625401664154, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1136: in function <test/test.lua:1078>

--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
 TensorEQ(==) violation   val=0.18676221856625, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1136: in function <test/test.lua:1078>

--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
 TensorEQ(==) violation   val=0.18772510482446, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1136: in function <test/test.lua:1078>

--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
 TensorEQ(==) violation   val=0.18784248574533, condition=1e-07
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1136: in function <test/test.lua:1078>

--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) gradInput err
 TensorEQ(==) violation   val=0.073757910583675, condition=1e-06
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
    test/test.lua:1138: in function <test/test.lua:1078>

--------------------------------------------------------------------------------
Repeater
 Function call failed
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
    [C]: in function 'assert'
    /Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
    ...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
    /Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
    test/test.lua:1141: in function <test/test.lua:1078>
    [C]: in function 'xpcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
    /Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
    test/test.lua:2399: in function 'test'
    [string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
    [C]: in function 'pcall'
    ...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
    [C]: at 0x0101c582f0

--------------------------------------------------------------------------------'''

Multiple batches LSTM

Hi,
Does the LSTM support multiple batches directly (for performance benchmarking)? I tried to implement this. It didn't raise an error but the results seem to be inconsistent. (I am also confused because in the LSTM code file it is states that the expected input is either 1D or 2D, but in the Penn Tree Bank Sample multiple batches can be used. Here I want to time forward and backward separately though.). If I input identical sequences in one batch, I get different outputs.
BTW: Thanks for making this available to the public!

Here a minimal(the Mask zero part can also be removed) code example of what I mean:

require "rnn"
require "cunn"

torch.manualSeed(123)

batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'

-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
    n_zeros = torch.random(maxLen-2) 
    inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end

-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()


lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('input 1', inp1[1])
print('lstm out 1', out[1])  


print('input 2', inp1[2])  -- shoudl be the same as above
print('lstm out 2', out[2])  --  should be the same as above

output:

input 1   0
  0
 29
 43
[torch.DoubleTensor of size 4]

lstm out 1   0.0000  0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000  0.0000
-0.0226  0.0012  0.1373  0.0064  0.0766
 0.1174  0.1793  0.0684  0.0029  0.0138
[torch.DoubleTensor of size 4x5]

input 2   0
  0
 29
 43
[torch.DoubleTensor of size 4]

lstm out 2   0.0000  0.0000  0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  0.0000  0.0000
-0.0325  0.0143  0.2019  0.0113  0.1202
 0.1606  0.2348  0.1093  0.0045  0.0208
[torch.DoubleTensor of size 4x5]

LSTM Example

I'm working on a simple LSTM example, and I have an issue on model validation.
My idea is to train using mini-batch and validate at each epoch. The validation is made example by example and not in batch, so that I can use the same code base for prediction.

However I got this error:

th lstm_early_stop.lua
error for iteration 100 is 0.11727129280201 
/Users/fabiofumarola/torch/install/bin/luajit: ...biofumarola/torch/install/share/lua/5.1/nn/CAddTable.lua:12: inconsistent tensor size at /Users/fabiofumarola/torch/pkg/torch/lib/TH/generic/THTensorMath.c:456
stack traceback:
    [C]: in function 'add'
    ...biofumarola/torch/install/share/lua/5.1/nn/CAddTable.lua:12: in function 'updateOutput'
    ...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    ...ofumarola/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
    ...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
    ...s/fabiofumarola/torch/install/share/lua/5.1/rnn/LSTM.lua:162: in function 'updateOutput'
    ...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
    lstm_early_stop.lua:85: in function 'validate'
    lstm_early_stop.lua:106: in main chunk
    [C]: in function 'dofile'
    ...rola/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010e1c1180

I suppose that I miss something on LSTM internal states initialisation. Can someone help me on this?

require 'rnn'
require 'optim'

batchSize = 50
rho = 10
hiddenSize = 64
inputSize = 4
outputSize = 1

seriesSize = 10000
seriesEval = 1000

model = nn.Sequential()
model:add(nn.FastLSTM(inputSize, hiddenSize, rho))
--model:add(nn.Linear(inputSize, hiddenSize))
--model:add(nn.Tanh())
model:add(nn.Linear(hiddenSize, outputSize))

criterion = nn.MSECriterion()


-- dummy dataset (task predict the next item)
dataset = torch.randn(seriesSize, inputSize)
evalset = torch.randn(seriesEval, inputSize)

-- define the index of the batch elements
offsets = {}
for i= 1, batchSize do
   table.insert(offsets, math.ceil(math.random() * batchSize))
end
offsets = torch.LongTensor(offsets)

-- method to compute a batch
function nextBatch()
    --get a batch of inputs
    local inputs = dataset:index(1, offsets)
    -- shift of one batch indexes
    offsets:add(1)
    for j=1,batchSize do
        if offsets[j] > seriesSize then
            offsets[j] = 1
        end
    end
    -- a batch of targets
    local targets = dataset[{{},{outputSize}}]:index(1,offsets)
    return inputs, targets
end

-- get weights and loss wrt weights from the model
x, dl_dx = model:getParameters()

-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. weigths is the vector of trainable weights,
-- it extracts a mini_batch via the nextBatch method
feval = function(x_new)
    -- copy the weight if are changed
    if x ~= x_new then
        x:copy(x_new)
    end

    -- select a training batch
    local inputs, targets = nextBatch()

    -- reset gradients (gradients are always accumulated, to accommodate
    -- batch methods)
    dl_dx:zero()

    -- evaluate the loss function and its derivative wrt x, given mini batch
    local prediction = model:forward(inputs)
    local loss_x = criterion:forward(prediction, targets)
    model:backward(inputs, criterion:backward(prediction, targets))

    return loss_x, dl_dx
end

--function for validation
validate = function(data)
    local maxPosition = data:size()[1] - 1
   local cumulatedError = 0
    for i = 1, maxPosition do
            local x = data[i]
            local y = torch.DoubleTensor{data[i+1][4]}
          local prediction = model:forward(x)
          local err = criterion:forward(prediction, y)
          cumulatedError = cumulatedError + err
    end
    return cumulatedError / maxPosition
end

sgd_params = {
   learningRate = 0.1,
   learningRateDecay = 1e-4,
   weightDecay = 0,
   momentum = 0
}

lr = 0.1
for i = 1, 10e3 do
    -- train a mini_batch of batchSize in parallel
    _, fs = optim.sgd(feval,x, sgd_params)

    if sgd_params.evalCounter % 100 == 0 then
        print('error for iteration ' .. sgd_params.evalCounter  .. ' is ' .. fs[1] / rho)
        local validationError = validate(evalset)
        print('error on validation ' .. validationError)
    end
end

Numerical gradient check fails?

So this is my first time toying around with Torch modules and the like, so there's a big chance I'm overlooking something obvious. I was trying to implement an attention model, but when testing the gradients using optim.checkgrad they didn't match. I later realised that even for this simple model, I can't get them to match:

nn = require 'nn'
require 'rnn'
require 'optim'

hiddenSize = 2
nIndex = 2
r = nn.Recurrent(hiddenSize, nn.LookupTable(nIndex, hiddenSize),
                 nn.Linear(hiddenSize, hiddenSize))

rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())

criterion = nn.ClassNLLCriterion()

function f(x)
  parameters:copy(x)
  -- Do the forward prop
  rnn:zeroGradParameters()
  local err = 0
  for i = 1, sequence:size(1) - 1 do
     local output = rnn:forward(sequence[i])
     err = err + criterion:forward(output, sequence[i + 1])
     local gradOutput = criterion:backward(output, sequence[i + 1])
     rnn:backward(sequence[i], gradOutput)
  end
  r:backwardThroughTime()
  r:forget()
  return err, grads
end

parameters, grads = rnn:getParameters()

sequence = torch.Tensor{1, 2, 1, 2}:resize(4, 1)
local err = optim.checkgrad(f, parameters:clone())
print(err)

This gives me errors anywhere between 0.1 and 0.01, which is way too big. After some digging I got to these lines in Recurrent.lua. Removing these lines seems to fix the problem, making the gradient error falls to around 1e-7.

         -- startModule's gradParams shouldn't be step-averaged
         -- as it is used only once. So un-step-average it
         local params, gradParams = self.startModule:parameters()
         if gradParams then
            for i,gradParam in ipairs(gradParams) do
               gradParam:mul(rho)
            end
         end

I fail to see where gradParams get averaged, so I don't really understand the logic behind these lines. They seem to just scale the gradients for the initial hidden states with the number of steps?

Is there any google group + docs for this repository?

Prefer if examples dont need dp

I reckon it would be easier to use the examples if they didnt need one to learn dp to understand them fully.

Basically, currently one has to kind of learn both rnn and dp together, to understand the examples. I reckon it would be faster learning curve if the examples in rnn assumed only knowledge of rnn, and core Torch libraries, such as nn etc.

Implementing many-to-one RNN

Hi,
Thanks for the great RNN package! I am trying to implement a many-to-one RNN, LSTM specifically, where each sequence of inputs only produces a single output and finding it difficult to use the rnn package for this case. This is useful for e.g. in sentiment analysis where each review (with a set of words) gets mapped to a sentiment (positive or negative).

Any help regarding this would be appreciated. I am not sure if I should have raised an issue for this since this is a personal question but hoping others would benefit too.

Thanks a lot for your help!

Growing weights

Hi,

thanks for great recurrent package. I'm still new to Torch, so the problem is probably in my usage. However, when I try to stack LSTM into multilayer network by using Sequencer, the weights are constantly growing. Without sequencer with only one recurrent layer it works fine.

network:
-- LSTM

    recurrent = nn.LSTM(inSize, hiddenSize1, rho)
    recurrent2 = nn.LSTM(hiddenSize1, hiddenSize2, rho)
    recurrent.scales = torch.Tensor(rho):fill(1)
    recurrent2.scales = torch.Tensor(rho):fill(1)
    linear = nn.Linear(hiddenSize2, outSize)

    model = nn.Sequential()
    sequencer = nn.Sequencer(recurrent)
    sequencer2 = nn.Sequencer(recurrent2)
    sequencer3 = nn.Sequencer(linear)
    model:add(sequencer)
    model:add(sequencer2)
    model:add(sequencer3)

-- TRAINING

for t = 1, (trainSize - batchSize) do

        -- load new sample
        local inputs, targets, gradOutputs = {}, {}, {}
        for step = 1, rho do
            local index = t + step
            inputs[step] = inputData:sub(1, inputData:size(1), index, index + batchSize - 1):transpose(1,2)
            targets[step] = outputData:sub(1, outputData:size(1), index, index + batchSize - 1):transpose(1, 2)

        end

        local outputs = model:forward(inputs)

        for step = 1, rho do
            local err = criterion:forward(outputs[step], targets[step])
            trainError = trainError + err
            gradOutputs[step] = criterion:backward(outputs[step], targets[step])
        end

        model:backward(inputs, gradOutputs)
        model:updateParameters(lrRNN)
        linear:zeroGradParameters()
    end

I have tried lowering the learning rate (1e-2, 1e-3, 1e-4), different size of scales (1, 1/rho), batch size (1,10) or rho (1,5,10,100). But none of these modification seemed to work. Any ideas?

testJacobian fails on Missing gradInput

When trying to test a simple network (to get the last state of an LSTM)

local sequenceLength = math.random(5, 10)
local vectorSize = math.random(5, 10)
local hiddenSize = math.random(5, 10)

local input = torch.rand(sequenceLength, vectorSize)

-- testJacobian doesn't support table inputs so use a split table module on the input.
local module = nn.Sequential()
module:add(nn.SplitTable(1, 2))
module:add(nn.Sequencer(nn.FastLSTM(vectorSize, hiddenSize)))
module:add(nn.SelectTable(-1))

nn.Jacobian.testJacobian(module, input)

nn.Jacobian.testJacobian fails at https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L61

Am I missing something ?
Can you provide a pointer to solve this ?

Thanks

LSTM implementation

Question: Why does the LSTM implementation not inherit from Recurrent? I it seems that (something like) this is equivalent to the current LSTM implementation, but avoids a lot of code duplication.

nn = require 'nn'
require 'rnn'

local hiddenSize = 2
local nIndex = 2

-- A silly hack to make sure LSTM.recurrentModule is fed zeros at step 1
local Start = torch.class('nn.Start', 'nn.Identity')

function Start:updateOutput(input)
  self.output = {input, torch.zeros(2), torch.zeros(2)}
  return self.output
end

function Start:updateGradInput(input, gradOutput)
  self.gradInput = gradOutput[1]
  return self.gradInput
end

-- The LSTM network
-- The input and feedback modules are unused
-- Merge basically turns {input, {output, cell}} into {input, output, cell}
-- The transfer module is the full LSTM module
local r = nn.Recurrent(nn.Start(), nn.Identity(), nn.Identity(),
                       nn.LSTM(hiddenSize, hiddenSize).recurrentModule,
                       9999, nn.FlattenTable())

local rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.SelectTable(1))  -- Since both the output and the cell is given
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())

backward() vs backwardThroughTime()

I'm confused about how to use this library in such a way that it is API-compatible with the rest of torch. For other parts of torch, eg for simple MLPs, it seems that the standard pattern for training models is to do something like:

local parameters, gradParameters = model:getParameters()
local inputs, targets = getMiniBatch()

local function fEval(x)
if parameters ~= x then parameters:copy(x) end
model:zeroGradParameters()
local output = model:forward(inputs)
local err = criterion:forward(output, targets)
local df_do = criterion:backward(output, targets)
model:backward(inputs, df_do)
return err, gradParameters
end

optim.optimMethod(fEval, parameters)

I'd like to be able to use this package's RNN code, but train using training code I already have. However, it seems that backward() doesn't do something and we have to call backwardThroughTime() instead.

Is this true? Why not make the RNN stuff API-compatible so that we can call backward()?
If I'm going to add a backwardThroughTime() call to my training code? Where do I put it? Do I call backward() and then backwardThroughTime(). Suppose there are LookupTable layers below the RNN. After calling backwardThroughTime, would I need to call backward() on them?

Thanks,
David

rnn with cltorch?

Hi, are you planning to add support for using library with opencl through cltorch and clnn?
Thanks

Reproduce RAM results

Hi,
thanks for nice package for Torch.
I was trying to reproduce your RAM results on MNIST dataset (< 1%). I am not able to reproduce such results, the early stooping criteria finish learning after ~800 epoch with 98.6%. Could you provide exact parameters which were used for learning?

Second question is about using LSTM. How this model can be used with LSTM units? It is just replace nn.Recurrent with nn.LSTM or sth more (because I am not able to run such model).

Question: does this package work with optim? (yes)

Thank you for this awesome package. Much wow.

I cannot use dp which all examples are based on (#60).
So I have to make sure optim is supported for architectures like these:

Sequencer( 
  Sequential( 
    module1, 
    LSTM,
    module2, 
    ...
  ) 
)

Sequential( 
  module1, 
  Sequencer( 
    LSTM 
  ), 
  module2, 
  ... 
)

The second one was suggested for a many-to-one transducer in #21.

AbstractRecurrent says that BPTT happens in updateParameters() (https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L87). But optim manipulates the parameters directly, never calls updateParameters().

UPDATE: Sorry, a little confused about how similar stuff happens in different places.
Apparently, as long as you decorate with the Sequencer container, optim should work fine.

optim just wants you to call backward() on your net and provide the gradParams. backward() on your Sequencer should handle all the BPTT:

Module:backward() -> 
    Sequencer:updateGradInput() -> BPTT( LSTM:updateGradInput() )
    Sequencer:accGradParameters() -> BPTT( LSTM:accGradParameters() )

If someone with a bit more insight could verify this, I would be super happy. Thanks!

Going to close this issue then.

Inconsistency behavior of the Sequencer module

Hi,
In our model there is a Sequencer with dropout module.
In the testing phase we call model:evaluate() but in the Sequencer the field sharedClones is not updated well - the field train is false only in the first module of sharedClones and in the rest of the module the train field remains true. The result is that the dropout module has its training behavior instead of testing behavior.
Could you please check it out?
Many thanks for your help,
Einat

recurrent-visual-attention.lua issue

Hello, thank you for great project!
After I trained a model using recurrent-visual-attention.lua ,

How do I use the model to get a picture's visual attention?
When I ran:
th recurrent-visual-attention.lua --cuda --xpPath /home/silva/save/silva-XPS-8300:1441270258:1.dat
I got the following error:
/home/silva/torch/install/bin/luajit: /home/silva/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <optim.ConfusionMatrix>
stack traceback:
[C]: in function 'error'
/home/silva/torch/install/bin/luajit: /home/silva/torch/install/share/lua/5.1/torch/File.lua:262: unknown Torch class <optim.ConfusionMatrix>

Request for GRU implementation.

GRU is an efficient model can be replaced with LSTM.

Sharing params using clone() on Sequencer(LSTM())

Hi,
I have the following architecture:

    lstm_seq = nn.Sequential()
    lstm_seq:add(nn.Sequencer(l))
    lstm_seq:add(nn.SelectTable(args.state_dim))
    lstm_seq:add(nn.Linear(n_hid, n_hid))
    lstm_seq:add(nn.Rectifier())

    parallel_flows = nn.ParallelTable()
    for f=1, 2 do
        parallel_flows:add(lstm_seq:clone("weight","bias"))
    end

    lstm = nn.Sequential()
    lstm:add(parallel_flows)

If I check the parameters by using:

   w, dw = lstm.getParameters()

I get inconsistent sizes (dw seems to have almost twice the number of params as w).
However, when I turn off sharing params (in lstm_seq:clone()), the sizes are consistent. Do you have any idea why?

Thanks!

Sequence labelling with rnn

I want to recognize the online handwriten charaters with your lstm, is there any example? The example in https://github.com/nicholas-leonard/dp/blob/master/examples/recurrentlanguagemodel.lua is language model and not suite to my task.

Nested Sequencer

Hey guys,

I am currently writing and testing some minimalexamples of this repo.
I have one problem with understanding the way I should handle sequencer in models, e.g. nested sequencer.

Can anybody tell me how I get the inner LSTMs to work?

require 'nn'
require 'rnn'

local inputsize = 10
local outputsize = 12

local inputdata_t = torch.rand(10)

local innermodel = 
nn.Sequencer(
 nn.Sequential()
  :add(FastLSTM(inputsize, 2))
  :add(FastLSTM(2, 5))
)

local model = 
nn.Sequential()
  :add(nn.CAddTable())
  :add(innermodel)
  :add(nn.Linear(5,12))

model =  nn.Recurrence(model, 12, 1)

local inputs = {}
for ii=1,3 do
 table.insert(inputs, inputdata_t[ii])
end

local outputs = model:forward(inputs)

I guess I need to wrap the complete module in another Sequencer? But how do I take different rhos for different modules in the same model.

Is there some kind of hold unit which catches the outputs?

Thx for helping

LSTM numerical gradient check

Haven't been able to figure out what's going on here... With the new LinearBias I can get one step to give the correct gradient, but as soon as I have multiple steps weird stuff starts happening. The numerical gradients are generally much, much larger (1e3 vs. 1e-3). Any ideas as to what is happening?

nn = require 'nn'
require 'rnn'
require 'optim'

hiddenSize = 2
nIndex = 2
r = nn.LSTM(hiddenSize, hiddenSize)

rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())

criterion = nn.ClassNLLCriterion()

function f(x)
  parameters:copy(x)
  -- Do the forward prop
  rnn:zeroGradParameters()
  -- With or without fastBackward doesn't matter
  r.fastBackward = false
  local err = 0
  for i = 1, inputs:size(1) do
     local output = rnn:forward(inputs[i])
     err = err + criterion:forward(output, targets[i])
     local gradOutput = criterion:backward(output, targets[i])
     rnn:backward(inputs[i], gradOutput)
  end
  r:backwardThroughTime()
  r:forget()
  return err, grads
end

parameters, grads = rnn:getParameters()

-- This works:
-- targets = torch.Tensor{1}:resize(1, 1)
-- inputs = torch.randn(1, 2)

targets = torch.Tensor{1, 2}:resize(2, 1)
inputs = torch.randn(2, 2)
local err, dC, dC_est = optim.checkgrad(f, parameters:clone())
-- Print the exact and numerical gradients side by side
print(torch.cat(dC:view(dC:size(1), 1), dC_est:view(dC_est:size(1), 1), 2))
assert(err < 0.0001, "failed")
print("passed")

How to check out the parameters?

Hi,

It seems that the lib has not implemented parameters() in nn.Module, so I could hardly get access to the parameters of RNN modules unless change the codes of the library itself.

I'm wonder if is there any way to check out the parameters outside the class like the getParameters() in nn.Module?

Edward

Recurrent Neural Network update on sequence of length 1

The Recurrent class of 'rnn' does not allow training updates on sequences of length 1. Minimal example:

  require 'rnn'

  x = torch.rand(200)
  target = torch.rand(1)

  rho = 5
  hiddenSize = 100
  -- RNN
  r = nn.Recurrent(
     hiddenSize, nn.Linear(200,hiddenSize), 
     nn.Linear(hiddenSize, hiddenSize), nn.Sigmoid(), 
     rho
  )

  seq = nn.Sequential()
  seq:add(r)
  seq:add(nn.Linear(hiddenSize, 1))

  criterion = nn.MSECriterion()

  output = seq:forward(x)
  err = criterion:forward(output,target)
  gradOutput = criterion:backward(output,target)
  seq:backward(x,gradOutput)

  seq:updateParameters(0.01)

As far as I understand this should not be an issue, yet when ran this gives something like:

  /Users/hroosterhuis/torch/install/bin/luajit: /Users/hroosterhuis/torch/install/share/lua/5.1/nn/Add.lua:62: bad argument #1 to 'size' (dimension 1 out of range of 0D tensor at /Users/hroosterhuis/torch/pkg/torch/generic/Tensor.c:17)
  stack traceback:
  [C]: in function 'size'
  /Users/hroosterhuis/torch/install/share/lua/5.1/nn/Add.lua:62: in function 'accGradParameters'
  ...s/hroosterhuis/torch/install/share/lua/5.1/nn/Module.lua:53: in function 'accUpdateGradParameters'
  ...oosterhuis/torch/install/share/lua/5.1/rnn/Recurrent.lua:247: in function 'accUpdateGradParametersThroughTime'
  ...is/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:73: in function 'backwardUpdateThroughTime'
  ...is/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:83: in function 'updateParameters'
  ...roosterhuis/torch/install/share/lua/5.1/nn/Container.lua:31: in function 'updateParameters'
  testRNN.lua:26: in main chunk
  [C]: in function 'dofile'
  ...huis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
  [C]: at 0x0103773780

Encoder-Decoder Architectures

Hi,

I was looking to implement an encoder-decoder LSTM architecture (like http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). But the problem I have is that there doesn't seem to be a good way to pass the output of the encoder network to the decoder network as the hidden state.

More precisely, in LSTM:updateOutput, prevOutput is initialized to zero:

   if self.step == 1 then
      prevOutput = self.zeroTensor

However, I would need a way to pass in output[-1] from the encoder network into the decoder network as prevOutput. Of course, I will also need the gradients to flow back into the encoder properly.

Is there a way to achieve this setup with your current architecture?

Thanks a lot!

GRU init problem

Hi,
probably a dumb issue, but still can't figure it out -
trying to init a GRU (e.g., r = nn.GRU(1,1)) gives an 'attempt to call field 'GRU' (a nil value)' error.

Same code for LSTM works fine. Would appreciate any help.

AbstractRecurrent forget() execute unwanted assertion?

in this section of the code
https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L113

function AbstractRecurrent:forget(offset)
   offset = offset or 0
   if self.train ~= false then
      -- bring all states back to the start of the sequence buffers
      local lastStep = self.step - 1

      if lastStep > self.rho + offset then
         local i = 1 + offset

forget() is trying to check the boolean value self.train, but since AbstractRecurrent's parent class is nn.Container, there is no self.train, so the code below end up executing whether you called training() or evaluate().

Sequencer Problem since last update

Recently I updated the RNN package and the, up to this point, working script wouldn't run anymore.
To ensure the problem is not (completely) on my side I tested the script in the README

require 'rnn'

batchSize = 8
rho = 5
hiddenSize = 10
nIndex = 10000

mlp = nn.Sequential()
   :add(nn.Recurrent(
      hiddenSize, nn.LookupTable(nIndex, hiddenSize), 
      nn.Linear(hiddenSize, hiddenSize), nn.Sigmoid(), 
      rho
   )
   :add(nn.Linear(hiddenSize, nIndex))
   :add(nn.LogSoftMax())

rnn = nn.Sequencer(mlp)

criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())

-- dummy dataset (task is to predict next item, given previous)
sequence = torch.randperm(nIndex)

offsets = {}
for i=1,batchSize do
   table.insert(offsets, math.ceil(math.random()*batchSize))
end
offsets = torch.LongTensor(offsets)

lr = 0.1
i = 1
while true do
   -- prepare inputs and targets
   local inputs, targets = {},{}
   for step=1,rho do
      -- a batch of inputs
      table.insert(inputs, sequence:index(1, offsets))
      -- incement indices
      offsets:add(1)
      for j=1,batchSize do
         if offsets[j] > nIndex then
            offsets[j] = 1
         end
      end
      -- a batch of targets
      table.insert(targets, sequence:index(1, offsets))
   end

   local outputs = rnn:forward(inputs)
   local err = criterion:forward(outputs, targets)
   print(i, err/rho)
   i = i + 1
   local gradOutputs = criterion:backward(outputs, targets)
   rnn:backward(inputs, gradOutputs)
   rnn:updateParameters(lr)
   rnn:zeroGradParameters()
end

After adding the missing ')' in line 12 after rho (is this a bug?) the script should run.
Instead it gave my the same error message as my privat script, which was just recently corrected from Mr. Léonard himself (the problem with the sequencer):

.../sebastian/Torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
        [C]: in function 'assert'
        .../sebastian/Torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
        ...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
        ...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
        ...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
        ...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
        torchrnntest.lua:55: in main chunk
        [C]: in function 'dofile'
        ...tian/Torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
        [C]: at 0x00405ea0

Every luarock was updated to the latest version.

I looked in the code and I think it happened in commit 1fc81a4
with changing:

-         if step > 1 then
-            self.gradCells[step-1] = gradCell
-         end
+         self.gradCells[step-1] = gradCell

I tried manipulating it but I think there is also a change in the sequencer part.

Can anybody reproduce the error?

I'm pretty (totally) new to Git and I'm just getting started with torch and rnn so please forgive me if this is no bug or if I am doing anything wrong.

Thank you Mr. Léonard for providing this awesome repo!

backward not working for LookupTable

I have an example of use of LookupTable with the backward returning empty tensors (not computed)
is this a bug ?
gradInputs :
{
1 : DoubleTensor - size: 2x5
2 :
{
1 : DoubleTensor - empty
2 : DoubleTensor - empty
3 : DoubleTensor - empty
}
}

code :

require 'nn'
require 'rnn'
require 'cutorch'
require 'cunn'

batchSize = 2
rho = 3
embeddingSize = 4
dictionarySize = 10
nbfeatures=5

inputs, targets = {}, {} -- inputs and outputs
for i = 1, nbfeatures do
local featureTensor=torch.Tensor(batchSize,1)
for j=1,batchSize do
featureTensor[j][1]=torch.random(1,dictionarySize)
end
table.insert(inputs, featureTensor)
end
for i = nbfeatures+1, rho+nbfeatures do
local measure=torch.Tensor(batchSize)
for j=1,batchSize do
measure[j]=torch.random(1,dictionarySize)
end
table.insert(inputs, measure)
end
for i = 1, rho do
local measure=torch.Tensor(batchSize)
for j=1,batchSize do
measure[j]=torch.random(1,dictionarySize)
end
table.insert(targets, measure)
end

premodel=nn.Sequential()
b1=nn.Sequential()
b1:add(nn.NarrowTable(1,nbfeatures))
b1:add(nn.JoinTable(2)) -- ->Tensor(batchSize X nbfeatures)
b2=nn.Sequential()
b2:add(nn.NarrowTable(nbfeatures+1,rho))
c=nn.ConcatTable()
premodel:add(c)
c:add(b1)
c:add(b2) -- ->{tensorF , {list of tensor(i)}}

inputsA=premodel:forward(inputs)
print('inputsA')
print(inputsA)

model=nn.Sequential()
p=nn.ParallelTable()
p:add(nn.Identity())
p:add(nn.Sequencer(nn.LookupTable(dictionarySize, embeddingSize))) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining tensorF & tensor(i)
for i=1, rho do
local Slice =nn.Sequential()
SliceList:add(Slice)
local cc=nn.ConcatTable() -- contains the 2 tensors to join
Slice:add(cc)
local a=nn.Sequential()
cc:add(a)
a:add(nn.SelectTable(2)) -- we select list of tensor(i)
a:add(nn.SelectTable(i)) -- we select a tensor(i)
local b=nn.Sequential()
cc:add(b)
b:add(nn.SelectTable(1)) -- we select tensorF
Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rho)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
model:add(nn.Sequencer(nn.LogSoftMax()))

criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())

prediction = model:forward(inputsA)
err = criterion:forward(prediction, targets)
print('err=' .. err)
gradOutputs = criterion:backward(prediction, targets)
gradInputs=model:backward(inputsA, gradOutputs)
print('gradInputs')
print(gradInputs)

nn.ReverseTable

Hey Nicholas,
Thanks for adding in the BiSequencer. I see that code contains nn.ReverseTable() but I'm unable to find it in the nn or nnx packages. Am I missing something? I get the following error, of course:

rnn/BiSequencer.lua:48: attempt to call field 'ReverseTable' (a nil value)

Support long sequences

Sequencers should have a method to turn forgetting between sequences on or off.

Dropout "sizes do not match" error on backward pass

I am trying to train a stacked LSTM by calling forward/backward one step at a time. I need to do this because I run out of memory if I use a Sequencer on the whole mini-batch of sequences. If I just run the forward pass all is well; but when I add in the backward pass I get the error:

...Dropout.lua:42: bad argument #2 to 'cmul' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/THCTensorMathPointwise.cu:132)

Here is a code snippet for my model and feval function (I am using Optim for training):

model = nn.Sequential()
model:add(nn.LookupTable(vocab_size, 512))
model:add(nn.FastLSTM(512, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.FastLSTM(opt.rnn_size, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.FastLSTM(opt.rnn_size, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.Linear(opt.rnn_size, vocab_size))
model:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
model:cuda()
criterion:cuda()

x, dl_dx = model:getParameters()

function feval(x_new)
    if x ~= x_new then
        x:copy(x_new)
    end

    ------------------ get minibatch -------------------
    local inputs, targets = loader:next_batch(1)
    ------------------- forward pass -------------------
    dl_dx:zero()

    local loss_x = 0

    outputs = {}

    for i = 1,opt.seq_length do
        local lst = model:forward(inputs[i])
        table.insert(outputs, lst)
        loss_x = loss_x + criterion:forward(lst, targets[i])
    end

    loss_x = loss_x / opt.seq_length

    for i = opt.seq_length,1,-1 do
        model:backward(inputs[i], criterion:backward(outputs[i], targets[i]))
    end

    dl_dx = torch.clamp(dl_dx,-opt.grad_clip,opt.grad_clip)

    return loss_x, dl_dx
end

Sequencer remember/forget with "eval" mode

Hi,

Recent changes to the Sequencer remember/forget mechanism introduced modes like "both" and "eval", which is very convenient. However, in "eval" mode, a forward step during evaluation will set the maximum number of BPTT steps (rho value) to the size of the input. Then, a subsequent epoch of training on a sequence of different size will fail in the backward step. Before the change, remember() worked fine.

The reason is probably the setting of rho in the recurrent module (in this case LSTM), which then causes the backward step during training to stop before reaching the beginning of the sequence. See LSTM:updateGradInputThroughTime().

Note: I know that the README says it is recommended to set mode="both" for LSTM, but I prefer the "eval" mode because each training example is independent. In any case, I suppose both modes should be possible for any AbstractRecurrent instance.

A minimal working example with LSTMs:

lstm = nn.LSTM(5,5)
seq = nn.Sequencer(lstm)
inputTrain = {torch.randn(5), torch.randn(5), torch.randn(5)}
inputEval = {torch.randn(5)}


modes = {'both', 'eval'}
for i, mode in ipairs(modes) do
  print('\nmode: ' .. mode)
  seq:remember(mode)

  -- do one epoch of training
  seq:training()
  seq:forward(inputTrain)
  seq:backward(inputTrain, inputTrain)

  -- evaulate
  seq:evaluate()
  seq:forward(inputEval)

  -- do another epoch of training
  seq:training()
  seq:forward(inputTrain)
  -- this will fail when mode = 'eval'
  seq:backward(inputTrain, inputTrain)
end

Could you look into that?

Many thanks for your help.

Fast Tensor to Sequence Creation

There are a few datasets available saved as tensors. I've been converting them to sequences something like the following:

    -- conversion loop
    local pixels = {}
    for k = 1, raster_size do
        pixels[k] = torch.Tensor({raw_data[k]})
    end
    collectgarbage()

...

   sequenced_layer = rnn.Sequencer(...)
   sequenced_layer:forward(pixels)

However the conversion loop seems to be relatively slow. If this is a common operation (converting Tensor data to Sequences), perhaps a faster Lua/C helper function would be a useful feature.

time series prediction

hi all,
just wondering if this library can be used for either multivariate or univariate time series prediction.
Is there already an example of this?

Many thanks,
best,
Andrew

Simple LSTM sequence to singe output for multiple batches

Hi,
We tried to implement an LSTM similar for the imdb dataset for a batch of multiple sequences. For some reason the backpropagation does not work. Here a minimum sample of the code which causes the error and the corresponding error message. I used a Sequencer for the recurrent part which is supposed to be compatible with the forward and backward functions (according to documentation). Thank you for providing this module by the way :).

require "rnn"
require "cunn"

batch_size= 5
maxLen = 17
wordVec = 128
nWords = 10000
mode = 'GPU'

inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) -- 
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s

lstm = nn.Sequential()
lstm:add(nn.LookupTable(nWords,wordVec, batch_size))  -- convert indices to word vectors
lstm:add(nn.SplitTable(1))  -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.LSTM(wordVec, wordVec))) -- lstm, no batch size here
lstm:add(nn.JoinTable(1)) -- stack list to tensor
lstm:add(nn.View(batch_size, -1, 128)) -- reshape tensor arbitrary y (maxLen)
lstm:add(nn.Mean(2))  -- average over words
lstm:add(nn.Linear(wordVec, 2)) -- bring to to classes
lstm:add(nn.LogSoftMax())

criterion = nn.ClassNLLCriterion()

if mode == 'GPU' then
    lstm:cuda()
    criterion:cuda()
    labels = labels:cuda()
    inp1 = inp1:cuda()
end

out = lstm:forward(inp1)

print('out', #out)  --- pritns (bsize, classes) here 5,2
print('labels', labels)  -- vector of 1s and 2s with len batch size here 5

out_crit = criterion:forward(out, labels)
print('loss', out_crit) -- scalar

gradOut = criterion:backward(out, labels)

print('gradout', #gradOut)  -- same as out 5,2

lstm:backward(inp1, gradOut) -- does not work

ERROR:
torch/install/share/lua/5.1/torch/Tensor.lua:460: expecting a contiguous tensor
stack traceback:
    [C]: in function 'assert'
    /home/../torch/install/share/lua/5.1/torch/Tensor.lua:460: in function 'view'
    /home/../torch/install/share/lua/5.1/nn/View.lua:85: in function 'updateGradInput'
    /home/../torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
    /home/../torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
    [string "require "rnn"..."]:45: in main chunk
    [C]: in function 'xpcall'
    /home/../torch/install/share/lua/5.1/itorch/main.lua:179: in function </home/../torch/install/share/lua/5.1/itorch/main.lua:143>
    /home/../

Getting rnn ready for torch

The objective is to build a general recurrent library for torch

how can I install this library?

I am new to lua and torch, how can I install the library in torch , can install it in luarocks install ??

nn.Periodic decorator

For e.g. : apply zerograd every 3 forwards:

nn.Periodic(nn.ZeroGrad(), 3)

Could be useful for truncated BPTT.

Help understanding what encInSeq, decInSeq, decOutSeq are in the encoder and decoder example.

Hello! I just found this library and have been trying to use seq2seq learning to flip a sequences of numbers. However, i'm stuck trying to understand how each of these tensors in the encoder decoder example...

encInSeq
decInSeq
decOutSeq

...each relate to a seq2seq model like this:

My current understanding is that encInSeq is the tensor input that is given to the encoder network, the decInSeq is the tensor input given to the decode network, and finally decOutSeq is the expected output tensor for the decode layer. Is this correct? The fact that all of these tensors are 2x3 and 2x4 doesn't seem to match with this understanding. I'm sorry if this question seems obvious, but I have trying to figure this out for a few days now. (I'm new to using Torch for RNNs.) Thanks!

Support for masking (variable length input)

Thank you for this great RNN implementation.

I am trying to batch input data with a variable number of timesteps through an LSTM.
Is there a simple way to support this feature with your implementation ?

I am thinking about something like the mask_zero in http://keras.io/layers/recurrent/ and http://keras.io/layers/embeddings/ where the embedding would be a nn.LookupTable always returning a zero norm vector for padding.

Thanks

Seems like opt.accUpdate in recurrent-language-model.lua is no longer used?

Seems like opt.accUpdate in recurrent-language-model.lua is no longer an available option, but still appears in the code?

lookup = nn.LookupTable(ds:vocabularySize(), opt.hiddenSize[1], opt.accUpdate)

   acc_update = opt.accUpdate,

Strange interaction with nn.Container

Below is a minimal piece of code demonstrating a strange bug I have experienced.

Here, I have a typical LSTM sequence encoder. I also have a dummy class that inherits from nn.Container. The following code crashes. However, if you move the
parent:__init(self) line to go after mapper:forward(), then everything is ok. It fails in dpnn.Module:

   if moduleClones then
      assert(self.modules == nil)
      self.modules = modules
      clone.modules = moduleClones
   end

Since the container has a member called modules, this code crashes. It seems like the self pointer is wrong here or something. Any ideas about what is going on?

require 'nn'
require 'rnn'

local vocabSize = 25
local embeddingDim = 10
local rnnHidSize = 15

local lstm = nn.Sequencer(nn.LSTM(embeddingDim, rnnHidSize))
local mapper = nn.Sequential():add(nn.LookupTable(vocabSize,embeddingDim)):add(nn.SplitTable(2)):add(lstm):add(nn.SelectTable(-1))

--this is a minibatch of 'sentences'
local data = torch.rand(32,16):mul(vocabSize):ceil()


local NoopContainer, parent = torch.class('nn.NoopContainer', 'nn.Container')

function NoopContainer:__init()
    parent:__init(self)

    local length = 12
    local dd = torch.rand(32,length):mul(vocabSize):ceil()
    mapper:forward(dd)

end


local noop = nn.NoopContainer()

Bug in new AbstractRecurrent? updateGradInputStep not reset

in method updateGradInput, I find:
self.updateGradInputStep = self.updateGradInputStep or self.step

First BPP pass is fine. However, if I do a second BPP pass self.updateGradInputStep is initialized (it is 1 after the first BPP), and is decreased to negative values even. Shouldn't updateGradInputStep be reset at some point between BPP passes?

Possible problem in Sequencer backward on inputs of different lengths

Hi,

I just updated the rnn package and am now having an error from the backward step in the Sequencer module. I have a network with some LSTMs in it, wrapped inside Sequencer modules. When I try to train my network on inputs of variable length, I get this error:
Sequencer.lua:81: gradOutput should have as many elements as input

If I train my networks on inputs all the same length, I don't run into the error.

I wasn't able to replicate the problem in a simple contained network, so I cannot provide a minimal working example. However, this problem did not occur prior to update.
I do notice that before updating, my network wasn't using Recursor modules (they did not exist in my version), and after updating it is using them. For example, if I add a non-recurrent module inside a Sequencer (e.g. Dropout), it gets printed with Recursor when I print the model (nn.Sequencer @ nn.Recurser @ nn.Dropout), whereas before updating it was not printed (nn.Sequencer @ nn.Dropout).

I know this is not a very detailed description, but do you have any idea what might be the source of this problem?
I will keep debugging this.

Question on memory usage for LSTM?

Platform: OSX 10.10.3.
Revision: Reasonably current (17/5) git pull of Torch7 and rnn.
Runtime: CPU (not GPU - yet).

Background: I'm modelling events that happen over a 30-day window, with each window divided into 3 minute time-steps. So each distinct user has 30_24_(60/3)=14,400 events with a lot of these simply encoding the fact that nothing happened. I've got 60k users to look at initially and an 80:20 split between train and test, so 48k training users and 12k test users.

Problem: I inevitably run out of memory about 60 minutes after training - running into the 1 GB luajit cap. Adding collectgarbage() inside the forward / backward loop helped but only to delay the out of memory from 1 minute to 60 minutes. I'm using Tensors throughout my code, not tables.

My planned solutions: I'm working on restructuring my code to load the training data on the fly from storage and then using subsets of the data as mini-batches to hopefully remain within the 1 GB limit as well as evaluating more compressed / efficient ways of modelling the events themselves (but part of my research is to see how good LSTM is at extracting features from the events through time without having to preprocess those events..).

Question: Is this behaviour (exceeding the 1 GB luajit mem limit) simply expected behaviour in the current LSTM code for a dataset of this size - that it's using a table somewhere to maintain state (e.g. unrolling through time etc.) or is it more likely that there is a bug somewhere in my code manifesting itself as this problem and RNN / LSTM should have a stable / reasonable mem usage profile?