element-research / rnn Goto Github PK
View Code? Open in Web Editor NEWRecurrent Neural Network library for Torch7's nn
License: BSD 3-Clause "New" or "Revised" License
Recurrent Neural Network library for Torch7's nn
License: BSD 3-Clause "New" or "Revised" License
This is not a bug issues, but it would be interesting to have a few minimalistic working toy examples on the usage with and without Sequencer .
Hi Guys,
I tried to make use LSTM to deal with variable length sequences. But I failed to do that by using the MaskZero function. Could you please help me out? Thanks a lot!!
Here a minimal code example of what I mean:
require 'rnn'
require 'optim'
inSize = 20
batchSize = 2
hiddenSize = 10
seqLengthMax = 11
numTargetClasses=5
numSeq = 30
x, y1 = {}, {}
for i = 1, numSeq do
local seqLength = torch.random(1,seqLengthMax)
local temp = torch.zeros(seqLengthMax, inSize)
local targets ={}
if seqLength == seqLengthMax then
targets = (torch.rand(seqLength)*numTargetClasses):ceil()
else
targets = torch.cat(torch.zeros(seqLengthMax-seqLength),(torch.rand(seqLength)*numTargetClasses):ceil())
end
temp[{{seqLengthMax-seqLength+1,seqLengthMax}}] = torch.randn(seqLength,inSize)
table.insert(x, temp)
table.insert(y1, targets)
end
model = nn.Sequencer(
nn.Sequential()
:add(nn.MaskZero(nn.FastLSTM(inSize,hiddenSize),1))
:add(nn.MaskZero(nn.Linear(hiddenSize, numTargetClasses),1))
:add(nn.MaskZero(nn.LogSoftMax(),1))
)
criterion = nn.SequencerCriterion(nn.MaskZero(nn.ClassNLLCriterion(),1))
output = model:forward(x)
print(output[1])
err = criterion:forward(output, y1)
print(err)
Not sure if it matters. In LSTM.lua
, line 279 and 300 returns an global variable gradInput
, which I guess is nil
since is undeclared.
Hi,
I just noticed that Recurrent apparently can't be used with modules whose outputs are tables rather than tensors. The error comes in AbstractRecurrent.lua:48 where 'new' can't be called on a table. Was this intentional?
Thanks,
Shawn
The current repository fdc3b21 has issues in unit test.
$ th -lrnn -e "dofile('test/test.lua'); rnn.test()"
Running 18 tests
________*_____**__ ==> Done Completed 1876 asserts in 18 tests with 22 errors
--------------------------------------------------------------------------------
Recurrence
Recurrence fwd err 2
TensorEQ(==) violation val=0.1917265403829, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:2134: in function <test/test.lua:2105>
--------------------------------------------------------------------------------
Recurrence
Recurrence fwd err 3
TensorEQ(==) violation val=0.18550556665035, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:2134: in function <test/test.lua:2105>
--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 1
TensorEQ(==) violation val=0.19031328301695, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:2141: in function <test/test.lua:2105>
--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 2
TensorEQ(==) violation val=0.12799707568363, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:2141: in function <test/test.lua:2105>
--------------------------------------------------------------------------------
Recurrence
Recurrence bwd err 3
TensorEQ(==) violation val=0.16196581750727, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:2141: in function <test/test.lua:2105>
--------------------------------------------------------------------------------
Recurrence
Function call failed
...rs/Calvin/torch/install/share/lua/5.1/rnn/Recurrence.lua:167: expecting at least one updateOutput
stack traceback:
[C]: in function 'assert'
...rs/Calvin/torch/install/share/lua/5.1/rnn/Recurrence.lua:167: in function 'updateGradInputThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
test/test.lua:2144: in function <test/test.lua:2105>
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
test/test.lua:2399: in function 'test'
[string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
[C]: in function 'pcall'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
[C]: at 0x0101c582f0
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 1
TensorEQ(==) violation val=0.0055855133430601, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1955: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 2
TensorEQ(==) violation val=0.20695973077736, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1954: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 2
TensorEQ(==) violation val=0.063109996521219, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1955: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 3
TensorEQ(==) violation val=0.18835985089823, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1954: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 3
TensorEQ(==) violation val=0.087992354034426, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1955: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 4
TensorEQ(==) violation val=0.20735178234061, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1954: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 4
TensorEQ(==) violation val=0.062143137089905, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1955: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) fwd err 5
TensorEQ(==) violation val=0.21002205166161, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1954: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Recursor(Recurrent) bwd err 5
TensorEQ(==) violation val=0.072276017437007, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1955: in function <test/test.lua:1905>
--------------------------------------------------------------------------------
Recursor
Function call failed
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
[C]: in function 'assert'
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
test/test.lua:1958: in function <test/test.lua:1905>
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
test/test.lua:2399: in function 'test'
[string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
[C]: in function 'pcall'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
[C]: at 0x0101c582f0
--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
TensorEQ(==) violation val=0.17625401664154, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1136: in function <test/test.lua:1078>
--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
TensorEQ(==) violation val=0.18676221856625, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1136: in function <test/test.lua:1078>
--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
TensorEQ(==) violation val=0.18772510482446, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1136: in function <test/test.lua:1078>
--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) output err
TensorEQ(==) violation val=0.18784248574533, condition=1e-07
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1136: in function <test/test.lua:1078>
--------------------------------------------------------------------------------
Repeater
Repeater(Recursor) gradInput err
TensorEQ(==) violation val=0.073757910583675, condition=1e-06
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:61: in function 'assertTensorEq'
test/test.lua:1138: in function <test/test.lua:1078>
--------------------------------------------------------------------------------
Repeater
Function call failed
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
[C]: in function 'assert'
/Users/Calvin/torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
...in/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
/Users/Calvin/torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
test/test.lua:1141: in function <test/test.lua:1078>
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:115: in function 'pcall'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:186: in function '_run'
/Users/Calvin/torch/install/share/lua/5.1/torch/Tester.lua:161: in function 'run'
test/test.lua:2399: in function 'test'
[string "dofile('test/test.lua'); rnn.test()"]:1: in main chunk
[C]: in function 'pcall'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:117: in main chunk
[C]: at 0x0101c582f0
--------------------------------------------------------------------------------'''
Hi,
Does the LSTM support multiple batches directly (for performance benchmarking)? I tried to implement this. It didn't raise an error but the results seem to be inconsistent. (I am also confused because in the LSTM code file it is states that the expected input is either 1D or 2D, but in the Penn Tree Bank Sample multiple batches can be used. Here I want to time forward and backward separately though.). If I input identical sequences in one batch, I get different outputs.
BTW: Thanks for making this available to the public!
Here a minimal(the Mask zero part can also be removed) code example of what I mean:
require "rnn"
require "cunn"
torch.manualSeed(123)
batch_size= 2
maxLen = 4
wordVec = 5
nWords = 100
mode = 'CPU'
-- create random data with zeros as empty indicator
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) --
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s
-- not all sequences have the same lenght, 0 placeholder
for i=1, batch_size do
n_zeros = torch.random(maxLen-2)
inp1[{{i},{1, n_zeros}}] = torch.zeros(n_zeros)
end
-- make the first sequence the same as the second
inp1[{{2},{}}] = inp1[{{1},{}}]:clone()
lstm = nn.Sequential()
lstm:add(nn.LookupTableMaskZero(10000, wordVec, batch_size)) -- convert indices to word vectors
lstm:add(nn.SplitTable(1)) -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.MaskZero(nn.LSTM(wordVec, wordVec), 1))) -- Seq to Seq', 0-Seq to 0-Seq
if mode == 'GPU' then
lstm:cuda()
criterion:cuda()
labels = labels:cuda()
inp1 = inp1:cuda()
end
out = lstm:forward(inp1)
print('input 1', inp1[1])
print('lstm out 1', out[1])
print('input 2', inp1[2]) -- shoudl be the same as above
print('lstm out 2', out[2]) -- should be the same as above
output:
input 1 0
0
29
43
[torch.DoubleTensor of size 4]
lstm out 1 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
-0.0226 0.0012 0.1373 0.0064 0.0766
0.1174 0.1793 0.0684 0.0029 0.0138
[torch.DoubleTensor of size 4x5]
input 2 0
0
29
43
[torch.DoubleTensor of size 4]
lstm out 2 0.0000 0.0000 0.0000 0.0000 0.0000
0.0000 0.0000 0.0000 0.0000 0.0000
-0.0325 0.0143 0.2019 0.0113 0.1202
0.1606 0.2348 0.1093 0.0045 0.0208
[torch.DoubleTensor of size 4x5]
I'm working on a simple LSTM example, and I have an issue on model validation.
My idea is to train using mini-batch and validate at each epoch. The validation is made example by example and not in batch, so that I can use the same code base for prediction.
However I got this error:
th lstm_early_stop.lua
error for iteration 100 is 0.11727129280201
/Users/fabiofumarola/torch/install/bin/luajit: ...biofumarola/torch/install/share/lua/5.1/nn/CAddTable.lua:12: inconsistent tensor size at /Users/fabiofumarola/torch/pkg/torch/lib/TH/generic/THTensorMath.c:456
stack traceback:
[C]: in function 'add'
...biofumarola/torch/install/share/lua/5.1/nn/CAddTable.lua:12: in function 'updateOutput'
...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
...ofumarola/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function 'updateOutput'
...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'updateOutput'
...s/fabiofumarola/torch/install/share/lua/5.1/rnn/LSTM.lua:162: in function 'updateOutput'
...iofumarola/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
lstm_early_stop.lua:85: in function 'validate'
lstm_early_stop.lua:106: in main chunk
[C]: in function 'dofile'
...rola/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x010e1c1180
I suppose that I miss something on LSTM internal states initialisation. Can someone help me on this?
require 'rnn'
require 'optim'
batchSize = 50
rho = 10
hiddenSize = 64
inputSize = 4
outputSize = 1
seriesSize = 10000
seriesEval = 1000
model = nn.Sequential()
model:add(nn.FastLSTM(inputSize, hiddenSize, rho))
--model:add(nn.Linear(inputSize, hiddenSize))
--model:add(nn.Tanh())
model:add(nn.Linear(hiddenSize, outputSize))
criterion = nn.MSECriterion()
-- dummy dataset (task predict the next item)
dataset = torch.randn(seriesSize, inputSize)
evalset = torch.randn(seriesEval, inputSize)
-- define the index of the batch elements
offsets = {}
for i= 1, batchSize do
table.insert(offsets, math.ceil(math.random() * batchSize))
end
offsets = torch.LongTensor(offsets)
-- method to compute a batch
function nextBatch()
--get a batch of inputs
local inputs = dataset:index(1, offsets)
-- shift of one batch indexes
offsets:add(1)
for j=1,batchSize do
if offsets[j] > seriesSize then
offsets[j] = 1
end
end
-- a batch of targets
local targets = dataset[{{},{outputSize}}]:index(1,offsets)
return inputs, targets
end
-- get weights and loss wrt weights from the model
x, dl_dx = model:getParameters()
-- In the following code, we define a closure, feval, which computes
-- the value of the loss function at a given point x, and the gradient of
-- that function with respect to x. weigths is the vector of trainable weights,
-- it extracts a mini_batch via the nextBatch method
feval = function(x_new)
-- copy the weight if are changed
if x ~= x_new then
x:copy(x_new)
end
-- select a training batch
local inputs, targets = nextBatch()
-- reset gradients (gradients are always accumulated, to accommodate
-- batch methods)
dl_dx:zero()
-- evaluate the loss function and its derivative wrt x, given mini batch
local prediction = model:forward(inputs)
local loss_x = criterion:forward(prediction, targets)
model:backward(inputs, criterion:backward(prediction, targets))
return loss_x, dl_dx
end
--function for validation
validate = function(data)
local maxPosition = data:size()[1] - 1
local cumulatedError = 0
for i = 1, maxPosition do
local x = data[i]
local y = torch.DoubleTensor{data[i+1][4]}
local prediction = model:forward(x)
local err = criterion:forward(prediction, y)
cumulatedError = cumulatedError + err
end
return cumulatedError / maxPosition
end
sgd_params = {
learningRate = 0.1,
learningRateDecay = 1e-4,
weightDecay = 0,
momentum = 0
}
lr = 0.1
for i = 1, 10e3 do
-- train a mini_batch of batchSize in parallel
_, fs = optim.sgd(feval,x, sgd_params)
if sgd_params.evalCounter % 100 == 0 then
print('error for iteration ' .. sgd_params.evalCounter .. ' is ' .. fs[1] / rho)
local validationError = validate(evalset)
print('error on validation ' .. validationError)
end
end
So this is my first time toying around with Torch modules and the like, so there's a big chance I'm overlooking something obvious. I was trying to implement an attention model, but when testing the gradients using optim.checkgrad
they didn't match. I later realised that even for this simple model, I can't get them to match:
nn = require 'nn'
require 'rnn'
require 'optim'
hiddenSize = 2
nIndex = 2
r = nn.Recurrent(hiddenSize, nn.LookupTable(nIndex, hiddenSize),
nn.Linear(hiddenSize, hiddenSize))
rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
function f(x)
parameters:copy(x)
-- Do the forward prop
rnn:zeroGradParameters()
local err = 0
for i = 1, sequence:size(1) - 1 do
local output = rnn:forward(sequence[i])
err = err + criterion:forward(output, sequence[i + 1])
local gradOutput = criterion:backward(output, sequence[i + 1])
rnn:backward(sequence[i], gradOutput)
end
r:backwardThroughTime()
r:forget()
return err, grads
end
parameters, grads = rnn:getParameters()
sequence = torch.Tensor{1, 2, 1, 2}:resize(4, 1)
local err = optim.checkgrad(f, parameters:clone())
print(err)
This gives me errors anywhere between 0.1 and 0.01, which is way too big. After some digging I got to these lines in Recurrent.lua
. Removing these lines seems to fix the problem, making the gradient error falls to around 1e-7
.
-- startModule's gradParams shouldn't be step-averaged
-- as it is used only once. So un-step-average it
local params, gradParams = self.startModule:parameters()
if gradParams then
for i,gradParam in ipairs(gradParams) do
gradParam:mul(rho)
end
end
I fail to see where gradParams
get averaged, so I don't really understand the logic behind these lines. They seem to just scale the gradients for the initial hidden states with the number of steps?
Is there any google group + docs for this repository?
I reckon it would be easier to use the examples if they didnt need one to learn dp to understand them fully.
Basically, currently one has to kind of learn both rnn and dp together, to understand the examples. I reckon it would be faster learning curve if the examples in rnn assumed only knowledge of rnn, and core Torch libraries, such as nn etc.
Hi,
Thanks for the great RNN package! I am trying to implement a many-to-one RNN, LSTM specifically, where each sequence of inputs only produces a single output and finding it difficult to use the rnn package for this case. This is useful for e.g. in sentiment analysis where each review (with a set of words) gets mapped to a sentiment (positive or negative).
Any help regarding this would be appreciated. I am not sure if I should have raised an issue for this since this is a personal question but hoping others would benefit too.
Thanks a lot for your help!
Hi,
thanks for great recurrent package. I'm still new to Torch, so the problem is probably in my usage. However, when I try to stack LSTM into multilayer network by using Sequencer, the weights are constantly growing. Without sequencer with only one recurrent layer it works fine.
network:
-- LSTM
recurrent = nn.LSTM(inSize, hiddenSize1, rho)
recurrent2 = nn.LSTM(hiddenSize1, hiddenSize2, rho)
recurrent.scales = torch.Tensor(rho):fill(1)
recurrent2.scales = torch.Tensor(rho):fill(1)
linear = nn.Linear(hiddenSize2, outSize)
model = nn.Sequential()
sequencer = nn.Sequencer(recurrent)
sequencer2 = nn.Sequencer(recurrent2)
sequencer3 = nn.Sequencer(linear)
model:add(sequencer)
model:add(sequencer2)
model:add(sequencer3)
-- TRAINING
for t = 1, (trainSize - batchSize) do
-- load new sample
local inputs, targets, gradOutputs = {}, {}, {}
for step = 1, rho do
local index = t + step
inputs[step] = inputData:sub(1, inputData:size(1), index, index + batchSize - 1):transpose(1,2)
targets[step] = outputData:sub(1, outputData:size(1), index, index + batchSize - 1):transpose(1, 2)
end
local outputs = model:forward(inputs)
for step = 1, rho do
local err = criterion:forward(outputs[step], targets[step])
trainError = trainError + err
gradOutputs[step] = criterion:backward(outputs[step], targets[step])
end
model:backward(inputs, gradOutputs)
model:updateParameters(lrRNN)
linear:zeroGradParameters()
end
I have tried lowering the learning rate (1e-2, 1e-3, 1e-4), different size of scales (1, 1/rho), batch size (1,10) or rho (1,5,10,100). But none of these modification seemed to work. Any ideas?
When trying to test a simple network (to get the last state of an LSTM)
local sequenceLength = math.random(5, 10)
local vectorSize = math.random(5, 10)
local hiddenSize = math.random(5, 10)
local input = torch.rand(sequenceLength, vectorSize)
-- testJacobian doesn't support table inputs so use a split table module on the input.
local module = nn.Sequential()
module:add(nn.SplitTable(1, 2))
module:add(nn.Sequencer(nn.FastLSTM(vectorSize, hiddenSize)))
module:add(nn.SelectTable(-1))
nn.Jacobian.testJacobian(module, input)
nn.Jacobian.testJacobian
fails at https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L61
Am I missing something ?
Can you provide a pointer to solve this ?
Thanks
Question: Why does the LSTM implementation not inherit from Recurrent
? I it seems that (something like) this is equivalent to the current LSTM implementation, but avoids a lot of code duplication.
nn = require 'nn'
require 'rnn'
local hiddenSize = 2
local nIndex = 2
-- A silly hack to make sure LSTM.recurrentModule is fed zeros at step 1
local Start = torch.class('nn.Start', 'nn.Identity')
function Start:updateOutput(input)
self.output = {input, torch.zeros(2), torch.zeros(2)}
return self.output
end
function Start:updateGradInput(input, gradOutput)
self.gradInput = gradOutput[1]
return self.gradInput
end
-- The LSTM network
-- The input and feedback modules are unused
-- Merge basically turns {input, {output, cell}} into {input, output, cell}
-- The transfer module is the full LSTM module
local r = nn.Recurrent(nn.Start(), nn.Identity(), nn.Identity(),
nn.LSTM(hiddenSize, hiddenSize).recurrentModule,
9999, nn.FlattenTable())
local rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.SelectTable(1)) -- Since both the output and the cell is given
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())
I'm confused about how to use this library in such a way that it is API-compatible with the rest of torch. For other parts of torch, eg for simple MLPs, it seems that the standard pattern for training models is to do something like:
local parameters, gradParameters = model:getParameters()
local inputs, targets = getMiniBatch()
local function fEval(x)
if parameters ~= x then parameters:copy(x) end
model:zeroGradParameters()
local output = model:forward(inputs)
local err = criterion:forward(output, targets)
local df_do = criterion:backward(output, targets)
model:backward(inputs, df_do)
return err, gradParameters
end
optim.optimMethod(fEval, parameters)
I'd like to be able to use this package's RNN code, but train using training code I already have. However, it seems that backward() doesn't do something and we have to call backwardThroughTime() instead.
Thanks,
David
Hi, are you planning to add support for using library with opencl through cltorch and clnn?
Thanks
Hi,
thanks for nice package for Torch.
I was trying to reproduce your RAM results on MNIST dataset (< 1%). I am not able to reproduce such results, the early stooping criteria finish learning after ~800 epoch with 98.6%. Could you provide exact parameters which were used for learning?
Second question is about using LSTM. How this model can be used with LSTM units? It is just replace nn.Recurrent with nn.LSTM or sth more (because I am not able to run such model).
Thank you for this awesome package. Much wow.
I cannot use dp which all examples are based on (#60).
So I have to make sure optim is supported for architectures like these:
Sequencer(
Sequential(
module1,
LSTM,
module2,
...
)
)
Sequential(
module1,
Sequencer(
LSTM
),
module2,
...
)
The second one was suggested for a many-to-one transducer in #21.
AbstractRecurrent says that BPTT happens in updateParameters()
(https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L87). But optim manipulates the parameters directly, never calls updateParameters()
.
UPDATE: Sorry, a little confused about how similar stuff happens in different places.
Apparently, as long as you decorate with the Sequencer container, optim should work fine.
optim just wants you to call backward()
on your net and provide the gradParams. backward()
on your Sequencer should handle all the BPTT:
Module:backward() ->
Sequencer:updateGradInput() -> BPTT( LSTM:updateGradInput() )
Sequencer:accGradParameters() -> BPTT( LSTM:accGradParameters() )
If someone with a bit more insight could verify this, I would be super happy. Thanks!
Going to close this issue then.
Hi,
In our model there is a Sequencer with dropout module.
In the testing phase we call model:evaluate() but in the Sequencer the field sharedClones is not updated well - the field train is false only in the first module of sharedClones and in the rest of the module the train field remains true. The result is that the dropout module has its training behavior instead of testing behavior.
Could you please check it out?
Many thanks for your help,
Einat
Hello, thank you for great project!
After I trained a model using recurrent-visual-attention.lua ,
GRU is an efficient model can be replaced with LSTM.
Hi,
I have the following architecture:
lstm_seq = nn.Sequential()
lstm_seq:add(nn.Sequencer(l))
lstm_seq:add(nn.SelectTable(args.state_dim))
lstm_seq:add(nn.Linear(n_hid, n_hid))
lstm_seq:add(nn.Rectifier())
parallel_flows = nn.ParallelTable()
for f=1, 2 do
parallel_flows:add(lstm_seq:clone("weight","bias"))
end
lstm = nn.Sequential()
lstm:add(parallel_flows)
If I check the parameters by using:
w, dw = lstm.getParameters()
I get inconsistent sizes (dw seems to have almost twice the number of params as w).
However, when I turn off sharing params (in lstm_seq:clone()), the sizes are consistent. Do you have any idea why?
Thanks!
I want to recognize the online handwriten charaters with your lstm, is there any example? The example in https://github.com/nicholas-leonard/dp/blob/master/examples/recurrentlanguagemodel.lua is language model and not suite to my task.
Hey guys,
I am currently writing and testing some minimalexamples of this repo.
I have one problem with understanding the way I should handle sequencer in models, e.g. nested sequencer.
Can anybody tell me how I get the inner LSTMs to work?
require 'nn'
require 'rnn'
local inputsize = 10
local outputsize = 12
local inputdata_t = torch.rand(10)
local innermodel =
nn.Sequencer(
nn.Sequential()
:add(FastLSTM(inputsize, 2))
:add(FastLSTM(2, 5))
)
local model =
nn.Sequential()
:add(nn.CAddTable())
:add(innermodel)
:add(nn.Linear(5,12))
model = nn.Recurrence(model, 12, 1)
local inputs = {}
for ii=1,3 do
table.insert(inputs, inputdata_t[ii])
end
local outputs = model:forward(inputs)
I guess I need to wrap the complete module in another Sequencer? But how do I take different rhos for different modules in the same model.
Is there some kind of hold unit which catches the outputs?
Thx for helping
Haven't been able to figure out what's going on here... With the new LinearBias
I can get one step to give the correct gradient, but as soon as I have multiple steps weird stuff starts happening. The numerical gradients are generally much, much larger (1e3
vs. 1e-3
). Any ideas as to what is happening?
nn = require 'nn'
require 'rnn'
require 'optim'
hiddenSize = 2
nIndex = 2
r = nn.LSTM(hiddenSize, hiddenSize)
rnn = nn.Sequential()
rnn:add(r)
rnn:add(nn.Linear(hiddenSize, nIndex))
rnn:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
function f(x)
parameters:copy(x)
-- Do the forward prop
rnn:zeroGradParameters()
-- With or without fastBackward doesn't matter
r.fastBackward = false
local err = 0
for i = 1, inputs:size(1) do
local output = rnn:forward(inputs[i])
err = err + criterion:forward(output, targets[i])
local gradOutput = criterion:backward(output, targets[i])
rnn:backward(inputs[i], gradOutput)
end
r:backwardThroughTime()
r:forget()
return err, grads
end
parameters, grads = rnn:getParameters()
-- This works:
-- targets = torch.Tensor{1}:resize(1, 1)
-- inputs = torch.randn(1, 2)
targets = torch.Tensor{1, 2}:resize(2, 1)
inputs = torch.randn(2, 2)
local err, dC, dC_est = optim.checkgrad(f, parameters:clone())
-- Print the exact and numerical gradients side by side
print(torch.cat(dC:view(dC:size(1), 1), dC_est:view(dC_est:size(1), 1), 2))
assert(err < 0.0001, "failed")
print("passed")
Hi,
It seems that the lib has not implemented parameters()
in nn.Module
, so I could hardly get access to the parameters of RNN modules unless change the codes of the library itself.
I'm wonder if is there any way to check out the parameters outside the class like the getParameters()
in nn.Module
?
Edward
The Recurrent class of 'rnn' does not allow training updates on sequences of length 1. Minimal example:
require 'rnn'
x = torch.rand(200)
target = torch.rand(1)
rho = 5
hiddenSize = 100
-- RNN
r = nn.Recurrent(
hiddenSize, nn.Linear(200,hiddenSize),
nn.Linear(hiddenSize, hiddenSize), nn.Sigmoid(),
rho
)
seq = nn.Sequential()
seq:add(r)
seq:add(nn.Linear(hiddenSize, 1))
criterion = nn.MSECriterion()
output = seq:forward(x)
err = criterion:forward(output,target)
gradOutput = criterion:backward(output,target)
seq:backward(x,gradOutput)
seq:updateParameters(0.01)
As far as I understand this should not be an issue, yet when ran this gives something like:
/Users/hroosterhuis/torch/install/bin/luajit: /Users/hroosterhuis/torch/install/share/lua/5.1/nn/Add.lua:62: bad argument #1 to 'size' (dimension 1 out of range of 0D tensor at /Users/hroosterhuis/torch/pkg/torch/generic/Tensor.c:17)
stack traceback:
[C]: in function 'size'
/Users/hroosterhuis/torch/install/share/lua/5.1/nn/Add.lua:62: in function 'accGradParameters'
...s/hroosterhuis/torch/install/share/lua/5.1/nn/Module.lua:53: in function 'accUpdateGradParameters'
...oosterhuis/torch/install/share/lua/5.1/rnn/Recurrent.lua:247: in function 'accUpdateGradParametersThroughTime'
...is/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:73: in function 'backwardUpdateThroughTime'
...is/torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:83: in function 'updateParameters'
...roosterhuis/torch/install/share/lua/5.1/nn/Container.lua:31: in function 'updateParameters'
testRNN.lua:26: in main chunk
[C]: in function 'dofile'
...huis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x0103773780
Hi,
I was looking to implement an encoder-decoder LSTM architecture (like http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf). But the problem I have is that there doesn't seem to be a good way to pass the output of the encoder network to the decoder network as the hidden state.
More precisely, in LSTM:updateOutput
, prevOutput
is initialized to zero:
if self.step == 1 then
prevOutput = self.zeroTensor
However, I would need a way to pass in output[-1]
from the encoder network into the decoder network as prevOutput
. Of course, I will also need the gradients to flow back into the encoder properly.
Is there a way to achieve this setup with your current architecture?
Thanks a lot!
Hi,
probably a dumb issue, but still can't figure it out -
trying to init a GRU (e.g., r = nn.GRU(1,1)) gives an 'attempt to call field 'GRU' (a nil value)' error.
Same code for LSTM works fine. Would appreciate any help.
in this section of the code
https://github.com/Element-Research/rnn/blob/master/AbstractRecurrent.lua#L113
function AbstractRecurrent:forget(offset)
offset = offset or 0
if self.train ~= false then
-- bring all states back to the start of the sequence buffers
local lastStep = self.step - 1
if lastStep > self.rho + offset then
local i = 1 + offset
forget() is trying to check the boolean value self.train, but since AbstractRecurrent's parent class is nn.Container, there is no self.train, so the code below end up executing whether you called training() or evaluate().
Recently I updated the RNN package and the, up to this point, working script wouldn't run anymore.
To ensure the problem is not (completely) on my side I tested the script in the README
require 'rnn'
batchSize = 8
rho = 5
hiddenSize = 10
nIndex = 10000
mlp = nn.Sequential()
:add(nn.Recurrent(
hiddenSize, nn.LookupTable(nIndex, hiddenSize),
nn.Linear(hiddenSize, hiddenSize), nn.Sigmoid(),
rho
)
:add(nn.Linear(hiddenSize, nIndex))
:add(nn.LogSoftMax())
rnn = nn.Sequencer(mlp)
criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())
-- dummy dataset (task is to predict next item, given previous)
sequence = torch.randperm(nIndex)
offsets = {}
for i=1,batchSize do
table.insert(offsets, math.ceil(math.random()*batchSize))
end
offsets = torch.LongTensor(offsets)
lr = 0.1
i = 1
while true do
-- prepare inputs and targets
local inputs, targets = {},{}
for step=1,rho do
-- a batch of inputs
table.insert(inputs, sequence:index(1, offsets))
-- incement indices
offsets:add(1)
for j=1,batchSize do
if offsets[j] > nIndex then
offsets[j] = 1
end
end
-- a batch of targets
table.insert(targets, sequence:index(1, offsets))
end
local outputs = rnn:forward(inputs)
local err = criterion:forward(outputs, targets)
print(i, err/rho)
i = i + 1
local gradOutputs = criterion:backward(outputs, targets)
rnn:backward(inputs, gradOutputs)
rnn:updateParameters(lr)
rnn:zeroGradParameters()
end
After adding the missing ')' in line 12 after rho (is this a bug?) the script should run.
Instead it gave my the same error message as my privat script, which was just recently corrected from Mr. Léonard himself (the problem with the sequencer):
.../sebastian/Torch/install/share/lua/5.1/rnn/Recurrent.lua:148: expecting at least one updateOutput
stack traceback:
[C]: in function 'assert'
.../sebastian/Torch/install/share/lua/5.1/rnn/Recurrent.lua:148: in function 'updateGradInputThroughTime'
...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:107: in function 'backwardUpdateThroughTime'
...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:120: in function 'updateParameters'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
...an/Torch/install/share/lua/5.1/rnn/AbstractRecurrent.lua:117: in function 'updateParameters'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'func'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:25: in function 'applyToModules'
...e/sebastian/Torch/install/share/lua/5.1/nn/Container.lua:34: in function 'updateParameters'
torchrnntest.lua:55: in main chunk
[C]: in function 'dofile'
...tian/Torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00405ea0
Every luarock was updated to the latest version.
I looked in the code and I think it happened in commit 1fc81a4
with changing:
- if step > 1 then
- self.gradCells[step-1] = gradCell
- end
+ self.gradCells[step-1] = gradCell
I tried manipulating it but I think there is also a change in the sequencer part.
Can anybody reproduce the error?
I'm pretty (totally) new to Git and I'm just getting started with torch and rnn so please forgive me if this is no bug or if I am doing anything wrong.
Thank you Mr. Léonard for providing this awesome repo!
I have an example of use of LookupTable with the backward returning empty tensors (not computed)
is this a bug ?
gradInputs :
{
1 : DoubleTensor - size: 2x5
2 :
{
1 : DoubleTensor - empty
2 : DoubleTensor - empty
3 : DoubleTensor - empty
}
}
require 'nn'
require 'rnn'
require 'cutorch'
require 'cunn'
batchSize = 2
rho = 3
embeddingSize = 4
dictionarySize = 10
nbfeatures=5
inputs, targets = {}, {} -- inputs and outputs
for i = 1, nbfeatures do
local featureTensor=torch.Tensor(batchSize,1)
for j=1,batchSize do
featureTensor[j][1]=torch.random(1,dictionarySize)
end
table.insert(inputs, featureTensor)
end
for i = nbfeatures+1, rho+nbfeatures do
local measure=torch.Tensor(batchSize)
for j=1,batchSize do
measure[j]=torch.random(1,dictionarySize)
end
table.insert(inputs, measure)
end
for i = 1, rho do
local measure=torch.Tensor(batchSize)
for j=1,batchSize do
measure[j]=torch.random(1,dictionarySize)
end
table.insert(targets, measure)
end
premodel=nn.Sequential()
b1=nn.Sequential()
b1:add(nn.NarrowTable(1,nbfeatures))
b1:add(nn.JoinTable(2)) -- ->Tensor(batchSize X nbfeatures)
b2=nn.Sequential()
b2:add(nn.NarrowTable(nbfeatures+1,rho))
c=nn.ConcatTable()
premodel:add(c)
c:add(b1)
c:add(b2) -- ->{tensorF , {list of tensor(i)}}
inputsA=premodel:forward(inputs)
print('inputsA')
print(inputsA)
model=nn.Sequential()
p=nn.ParallelTable()
p:add(nn.Identity())
p:add(nn.Sequencer(nn.LookupTable(dictionarySize, embeddingSize))) -- ->ListofTensor(batchSize X embeddingSize)
model:add(p)
SliceList=nn.ConcatTable() -- purpose: create a list tensor created by joining tensorF & tensor(i)
for i=1, rho do
local Slice =nn.Sequential()
SliceList:add(Slice)
local cc=nn.ConcatTable() -- contains the 2 tensors to join
Slice:add(cc)
local a=nn.Sequential()
cc:add(a)
a:add(nn.SelectTable(2)) -- we select list of tensor(i)
a:add(nn.SelectTable(i)) -- we select a tensor(i)
local b=nn.Sequential()
cc:add(b)
b:add(nn.SelectTable(1)) -- we select tensorF
Slice:add(nn.JoinTable(2)) -- we create a single tensor = tensorF & tensor(i)
end
model:add(SliceList)
model:add(nn.Sequencer(nn.FastLSTM(embeddingSize+nbfeatures, embeddingSize, rho)))
model:add(nn.Sequencer(nn.Linear(embeddingSize, dictionarySize)))
model:add(nn.Sequencer(nn.LogSoftMax()))
criterion = nn.SequencerCriterion(nn.ClassNLLCriterion())
prediction = model:forward(inputsA)
err = criterion:forward(prediction, targets)
print('err=' .. err)
gradOutputs = criterion:backward(prediction, targets)
gradInputs=model:backward(inputsA, gradOutputs)
print('gradInputs')
print(gradInputs)
Hey Nicholas,
Thanks for adding in the BiSequencer. I see that code contains nn.ReverseTable() but I'm unable to find it in the nn or nnx packages. Am I missing something? I get the following error, of course:
rnn/BiSequencer.lua:48: attempt to call field 'ReverseTable' (a nil value)
Sequencers should have a method to turn forgetting between sequences on or off.
I am trying to train a stacked LSTM by calling forward/backward one step at a time. I need to do this because I run out of memory if I use a Sequencer on the whole mini-batch of sequences. If I just run the forward pass all is well; but when I add in the backward pass I get the error:
...Dropout.lua:42: bad argument #2 to 'cmul' (sizes do not match at /home/ubuntu/torch/extra/cutorch/lib/THC/THCTensorMathPointwise.cu:132)
Here is a code snippet for my model and feval function (I am using Optim for training):
model = nn.Sequential()
model:add(nn.LookupTable(vocab_size, 512))
model:add(nn.FastLSTM(512, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.FastLSTM(opt.rnn_size, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.FastLSTM(opt.rnn_size, opt.rnn_size, opt.seq_length))
model:add(nn.Dropout(opt.dropout))
model:add(nn.Linear(opt.rnn_size, vocab_size))
model:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
model:cuda()
criterion:cuda()
x, dl_dx = model:getParameters()
function feval(x_new)
if x ~= x_new then
x:copy(x_new)
end
------------------ get minibatch -------------------
local inputs, targets = loader:next_batch(1)
------------------- forward pass -------------------
dl_dx:zero()
local loss_x = 0
outputs = {}
for i = 1,opt.seq_length do
local lst = model:forward(inputs[i])
table.insert(outputs, lst)
loss_x = loss_x + criterion:forward(lst, targets[i])
end
loss_x = loss_x / opt.seq_length
for i = opt.seq_length,1,-1 do
model:backward(inputs[i], criterion:backward(outputs[i], targets[i]))
end
dl_dx = torch.clamp(dl_dx,-opt.grad_clip,opt.grad_clip)
return loss_x, dl_dx
end
Hi,
Recent changes to the Sequencer remember/forget mechanism introduced modes like "both" and "eval", which is very convenient. However, in "eval" mode, a forward step during evaluation will set the maximum number of BPTT steps (rho value) to the size of the input. Then, a subsequent epoch of training on a sequence of different size will fail in the backward step. Before the change, remember() worked fine.
The reason is probably the setting of rho in the recurrent module (in this case LSTM), which then causes the backward step during training to stop before reaching the beginning of the sequence. See LSTM:updateGradInputThroughTime().
Note: I know that the README says it is recommended to set mode="both" for LSTM, but I prefer the "eval" mode because each training example is independent. In any case, I suppose both modes should be possible for any AbstractRecurrent instance.
A minimal working example with LSTMs:
lstm = nn.LSTM(5,5)
seq = nn.Sequencer(lstm)
inputTrain = {torch.randn(5), torch.randn(5), torch.randn(5)}
inputEval = {torch.randn(5)}
modes = {'both', 'eval'}
for i, mode in ipairs(modes) do
print('\nmode: ' .. mode)
seq:remember(mode)
-- do one epoch of training
seq:training()
seq:forward(inputTrain)
seq:backward(inputTrain, inputTrain)
-- evaulate
seq:evaluate()
seq:forward(inputEval)
-- do another epoch of training
seq:training()
seq:forward(inputTrain)
-- this will fail when mode = 'eval'
seq:backward(inputTrain, inputTrain)
end
Could you look into that?
Many thanks for your help.
There are a few datasets available saved as tensors. I've been converting them to sequences something like the following:
-- conversion loop
local pixels = {}
for k = 1, raster_size do
pixels[k] = torch.Tensor({raw_data[k]})
end
collectgarbage()
...
sequenced_layer = rnn.Sequencer(...)
sequenced_layer:forward(pixels)
However the conversion loop seems to be relatively slow. If this is a common operation (converting Tensor data to Sequences), perhaps a faster Lua/C helper function would be a useful feature.
hi all,
just wondering if this library can be used for either multivariate or univariate time series prediction.
Is there already an example of this?
Many thanks,
best,
Andrew
Hi,
We tried to implement an LSTM similar for the imdb dataset for a batch of multiple sequences. For some reason the backpropagation does not work. Here a minimum sample of the code which causes the error and the corresponding error message. I used a Sequencer for the recurrent part which is supposed to be compatible with the forward and backward functions (according to documentation). Thank you for providing this module by the way :).
require "rnn"
require "cunn"
batch_size= 5
maxLen = 17
wordVec = 128
nWords = 10000
mode = 'GPU'
inp1 = torch.ceil(torch.rand(batch_size, maxLen)*nWords) --
labels = torch.ceil(torch.rand(batch_size)*2) -- create labels of 1s and 2s
lstm = nn.Sequential()
lstm:add(nn.LookupTable(nWords,wordVec, batch_size)) -- convert indices to word vectors
lstm:add(nn.SplitTable(1)) -- convert tensor to list of subtensors
lstm:add(nn.Sequencer(nn.LSTM(wordVec, wordVec))) -- lstm, no batch size here
lstm:add(nn.JoinTable(1)) -- stack list to tensor
lstm:add(nn.View(batch_size, -1, 128)) -- reshape tensor arbitrary y (maxLen)
lstm:add(nn.Mean(2)) -- average over words
lstm:add(nn.Linear(wordVec, 2)) -- bring to to classes
lstm:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
if mode == 'GPU' then
lstm:cuda()
criterion:cuda()
labels = labels:cuda()
inp1 = inp1:cuda()
end
out = lstm:forward(inp1)
print('out', #out) --- pritns (bsize, classes) here 5,2
print('labels', labels) -- vector of 1s and 2s with len batch size here 5
out_crit = criterion:forward(out, labels)
print('loss', out_crit) -- scalar
gradOut = criterion:backward(out, labels)
print('gradout', #gradOut) -- same as out 5,2
lstm:backward(inp1, gradOut) -- does not work
ERROR:
torch/install/share/lua/5.1/torch/Tensor.lua:460: expecting a contiguous tensor
stack traceback:
[C]: in function 'assert'
/home/../torch/install/share/lua/5.1/torch/Tensor.lua:460: in function 'view'
/home/../torch/install/share/lua/5.1/nn/View.lua:85: in function 'updateGradInput'
/home/../torch/install/share/lua/5.1/nn/Module.lua:31: in function 'backward'
/home/../torch/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
[string "require "rnn"..."]:45: in main chunk
[C]: in function 'xpcall'
/home/../torch/install/share/lua/5.1/itorch/main.lua:179: in function </home/../torch/install/share/lua/5.1/itorch/main.lua:143>
/home/../
The objective is to build a general recurrent library for torch
I am new to lua and torch, how can I install the library in torch , can install it in luarocks install ??
For e.g. : apply zerograd every 3 forwards:
nn.Periodic(nn.ZeroGrad(), 3)
Could be useful for truncated BPTT.
Hello! I just found this library and have been trying to use seq2seq learning to flip a sequences of numbers. However, i'm stuck trying to understand how each of these tensors in the encoder decoder example...
encInSeq
decInSeq
decOutSeq
...each relate to a seq2seq model like this:
My current understanding is that encInSeq
is the tensor input that is given to the encoder network, the decInSeq
is the tensor input given to the decode network, and finally decOutSeq is the expected output tensor for the decode layer. Is this correct? The fact that all of these tensors are 2x3 and 2x4 doesn't seem to match with this understanding. I'm sorry if this question seems obvious, but I have trying to figure this out for a few days now. (I'm new to using Torch for RNNs.) Thanks!
Thank you for this great RNN implementation.
I am trying to batch input data with a variable number of timesteps through an LSTM.
Is there a simple way to support this feature with your implementation ?
I am thinking about something like the mask_zero
in http://keras.io/layers/recurrent/ and http://keras.io/layers/embeddings/ where the embedding would be a nn.LookupTable
always returning a zero norm vector for padding.
Thanks
Seems like opt.accUpdate in recurrent-language-model.lua is no longer an available option, but still appears in the code?
lookup = nn.LookupTable(ds:vocabularySize(), opt.hiddenSize[1], opt.accUpdate)
acc_update = opt.accUpdate,
Below is a minimal piece of code demonstrating a strange bug I have experienced.
Here, I have a typical LSTM sequence encoder. I also have a dummy class that inherits from nn.Container. The following code crashes. However, if you move the
parent:__init(self) line to go after mapper:forward(), then everything is ok. It fails in dpnn.Module:
if moduleClones then
assert(self.modules == nil)
self.modules = modules
clone.modules = moduleClones
end
Since the container has a member called modules, this code crashes. It seems like the self pointer is wrong here or something. Any ideas about what is going on?
require 'nn'
require 'rnn'
local vocabSize = 25
local embeddingDim = 10
local rnnHidSize = 15
local lstm = nn.Sequencer(nn.LSTM(embeddingDim, rnnHidSize))
local mapper = nn.Sequential():add(nn.LookupTable(vocabSize,embeddingDim)):add(nn.SplitTable(2)):add(lstm):add(nn.SelectTable(-1))
--this is a minibatch of 'sentences'
local data = torch.rand(32,16):mul(vocabSize):ceil()
local NoopContainer, parent = torch.class('nn.NoopContainer', 'nn.Container')
function NoopContainer:__init()
parent:__init(self)
local length = 12
local dd = torch.rand(32,length):mul(vocabSize):ceil()
mapper:forward(dd)
end
local noop = nn.NoopContainer()
in method updateGradInput, I find:
self.updateGradInputStep = self.updateGradInputStep or self.step
First BPP pass is fine. However, if I do a second BPP pass self.updateGradInputStep is initialized (it is 1 after the first BPP), and is decreased to negative values even. Shouldn't updateGradInputStep be reset at some point between BPP passes?
Hi,
I just updated the rnn package and am now having an error from the backward step in the Sequencer module. I have a network with some LSTMs in it, wrapped inside Sequencer modules. When I try to train my network on inputs of variable length, I get this error:
Sequencer.lua:81: gradOutput should have as many elements as input
If I train my networks on inputs all the same length, I don't run into the error.
I wasn't able to replicate the problem in a simple contained network, so I cannot provide a minimal working example. However, this problem did not occur prior to update.
I do notice that before updating, my network wasn't using Recursor modules (they did not exist in my version), and after updating it is using them. For example, if I add a non-recurrent module inside a Sequencer (e.g. Dropout), it gets printed with Recursor when I print the model (nn.Sequencer @ nn.Recurser @ nn.Dropout), whereas before updating it was not printed (nn.Sequencer @ nn.Dropout).
I know this is not a very detailed description, but do you have any idea what might be the source of this problem?
I will keep debugging this.
Platform: OSX 10.10.3.
Revision: Reasonably current (17/5) git pull of Torch7 and rnn.
Runtime: CPU (not GPU - yet).
Background: I'm modelling events that happen over a 30-day window, with each window divided into 3 minute time-steps. So each distinct user has 30_24_(60/3)=14,400 events with a lot of these simply encoding the fact that nothing happened. I've got 60k users to look at initially and an 80:20 split between train and test, so 48k training users and 12k test users.
Problem: I inevitably run out of memory about 60 minutes after training - running into the 1 GB luajit cap. Adding collectgarbage() inside the forward / backward loop helped but only to delay the out of memory from 1 minute to 60 minutes. I'm using Tensors throughout my code, not tables.
My planned solutions: I'm working on restructuring my code to load the training data on the fly from storage and then using subsets of the data as mini-batches to hopefully remain within the 1 GB limit as well as evaluating more compressed / efficient ways of modelling the events themselves (but part of my research is to see how good LSTM is at extracting features from the events through time without having to preprocess those events..).
Question: Is this behaviour (exceeding the 1 GB luajit mem limit) simply expected behaviour in the current LSTM code for a dataset of this size - that it's using a table somewhere to maintain state (e.g. unrolling through time etc.) or is it more likely that there is a bug somewhere in my code manifesting itself as this problem and RNN / LSTM should have a stable / reasonable mem usage profile?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.