I am in real need for real examples, specially in vision area :) <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to input sequences of feature vectors to LSTM? about caffe-lstm HOT 23 OPEN

junhyukoh commented on July 21, 2024 3

How to input sequences of feature vectors to LSTM?

from caffe-lstm.

Comments (23)

junhyukoh commented on July 21, 2024 11

@mostafa-saad data is the input for LSTM. Clip is a binary indicator of continuity of the data (sequence).
For example, you can give different input sequences (i.e., [1 2 3 4], [1 2 3]) as one input data as follows: data = [1 2 3 4 1 2 3], clip = [0 1 1 1 0 1 1]. "0" indicates head of the sequence. By default, clip is [0 1 1 1 .... 1], which assumes that only one sequence from its head is given as input.
You can also do several forward passes for a very long or variable-length sequence.
For example, data = [1 2 3 4 5] can be divided into 5 forward passes as follows:

data=[1], clip=[0]
data=[2], clip=[1]
data=[3], clip=[1]
data=[4], clip=[1]
data=[5], clip=[1]
Although this seems very inefficient, it is actually necessary especially when a prediction is used as an input for the next time-step (i.e., text modelling).

I guess you don't have to use "clip" because 1) the input data is always given from the data, and the input sequence is complete (starting from its head, continuous). So, the default clip value should work for your case.

from caffe-lstm.

nakosung commented on July 21, 2024 1

https://github.com/jeffdonahue/caffe/tree/recurrent would be helpful.

from caffe-lstm.

junhyukoh commented on July 21, 2024

I think it's not different from the example I provided.
You can define LSTM layer on top of the K images in prototxt.
As @nakosung mentioned, Jeff Donahue has another LSTM implementation (seems to be merged to master branch soon) with examples on images.
You can find prototxt from his branch.
Thanks.

from caffe-lstm.

mostafa-saad commented on July 21, 2024

@junhyukoh
Thanks so much. One more question, what are the inputs for your LSTM layer? In an example, it takes 2 elements, data and clip?

from caffe-lstm.

mostafa-saad commented on July 21, 2024

Thanks. Just to make sure I understand you. Assume I am extending Alex Network with 1 LSTM layer, and say I have 3 videos for training. One with 4 frames, other with 3 frames and 3rd one with 5 frames. Clip should be as following:
clip = [0 1 1 1 0 1 1 0 1 1 1 1]?

How to use level db to input the clip layer from hard disk not from memory? Is it possible to just provide a text file?

I am just novice in Caffe and still learning, sorry for many questions.

from caffe-lstm.

junhyukoh commented on July 21, 2024

That's correct.

The current data_layer implementation (src/caffe/layers/data_layer.cpp) doesn't support clip.
So, you may have to implement your own data layer where the output is data/clip if you want to use leveldb.
Another way is to give data/clip directly from your own program like my example code (lstm_sequence.cpp) without using leveldb, but this doesn't run on a separate thread, which might be slower than implementing a new data layer.

from caffe-lstm.

mostafa-saad commented on July 21, 2024

What about an ImageData input layer <image, label> pair: images are dummy and the labels be the binary clip input? Do you think this would work?

from caffe-lstm.

junhyukoh commented on July 21, 2024

I think it would work if you give the pair correctly.

from caffe-lstm.

mecp commented on July 21, 2024

Excusme me, if this is a very simple question, but I am just starting with learning using neural networks using caffe.

Is it possible to use this network to train on continuous sequence of 2 variables for ex. [(2.77, 9.03), (2.01, 10.48),.....] and then predict next in sequence for supplied input?

So for training I could have sequence [[t0]...[t9]] (sequence of 10 time steps) as input and [t10] as expected output. And then do the prediction in same manner.

from caffe-lstm.

junhyukoh commented on July 21, 2024

@mecp Yes. It's possible to train the network on multi-dimensional input/output.

from caffe-lstm.

HaiboShi commented on July 21, 2024

@junhyukoh what's the difference between batch size, N_ and sequence length T_?

from caffe-lstm.

junhyukoh commented on July 21, 2024

@HaiboShi In RNN training, a training example is x_{1}, x_{2}, ..., x_{T_} (sequence). We can define N_ number of such sequences as a mini-batch.

from caffe-lstm.

HaiboShi commented on July 21, 2024

@junhyukoh and the diff of that mini-batch sum up together for updating the weight?

from caffe-lstm.

junhyukoh commented on July 21, 2024

@HaiboShi Yes, the diff is accumulated through mini-batch.
However, loss layers usually give normalized diffs to the bottom blobs (by dividing it by the size of mini-batch).
So, the weight diff is actually normalized by the size of the mini-batch.

from caffe-lstm.

HaiboShi commented on July 21, 2024

@junhyukoh Thanks. It helps a lot. and there's another specific question:
In lstm layer class, Blob h_to_h_;
What is that variable standing for? I noticed that it appears only in backward propagation step. Thanks!

from caffe-lstm.

HaiboShi commented on July 21, 2024

@junhyukoh while also, it seems that there's no top diff data in backward_cpu() function, I wonder how the gradient from last layer pass to lstm layer? thanks! 💯

from caffe-lstm.

junhyukoh commented on July 21, 2024

@HaiboShi h_to_h_ is an intermediate blob that computes h_{t+1} -> h_{t} gradient.
There is a top diff in backward_cpu() function in Line 209.
Dtype* top_diff = top_.mutable_cpu_diff()
top_ shares its memory with the actual top blob.

from caffe-lstm.

HaiboShi commented on July 21, 2024

@junhyukoh hi, thanks for your reply. there's one more question:
what is the clipping_threshold_ standing for? is it related to pregate_gradient?

I notice you do the accumulation for the batch:
caffe_add(H_, dh_t_1, h_to_h, dh_t_1);
does it mean that h{t} gradient is composed by the h{t+1}gradient of all elements in one batch?

from caffe-lstm.

kimshao commented on July 21, 2024

@junhyukoh Hi, junhyukoh,I am new to caffe, and I have read your example, and I have two questions here:
Firstly, in your example, TotalLength=seq_length=320,which means there is only one sequence for input. However,if I have more sequence,and train thousands of times, after the first time, the clip array turns to all 1, what does it mean if the value of clip is [1,1...]? Continue to input another sequence after the beginning one with a head 0 in clip? (I mean this line: train_clip_blob->mutable_cpu_data()[0] = seq_idx > 0;)
The second one is that it is noted that during the test phase, you reshape the input data that I cannot fully understand, also,there is no input data during test,isn't it? Can u explain it, please? (This line: test_data_blob->Reshape(shape);
test_clip_blob->Reshape(shape);)
l ll appreciate ur answer, thanks a lot!

from caffe-lstm.

kimshao commented on July 21, 2024

@junhyukoh And also, what is the difference between data and label? There is an object named 'data', but it is not mentioned in your code!

from caffe-lstm.

pciang commented on July 21, 2024

Hi @junhyukoh, I have a question about "clip" array. Let's say, during training phase, my input "data" is [A B C (eos)], and the desired label is [W X Y Z (eos)], does the "data", "label", and "clip" become something like this:

Data	A	B	C	(EOS)	W	X	Y	Z
Label				W	X	Y	Z	(EOS)
Clip	0	0	0	0	1	1	1	1

from caffe-lstm.

gabriellapizzuto commented on July 21, 2024

@junhyukoh What is the sequence length if I have a feature vector of (10, 50, 4, 4) ?

from caffe-lstm.

robosmith commented on July 21, 2024

@junhyukoh When training an LSTM with a single (long) repeated sequence and multiple epochs, should the clip value be 0 at the start of each epoch/data sequence, or just the first epoch?

from caffe-lstm.

How to input sequences of feature vectors to LSTM? about caffe-lstm HOT 23 OPEN

Comments (23)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent