Giter Site home page Giter Site logo

Comments (23)

junhyukoh avatar junhyukoh commented on July 21, 2024 11

@mostafa-saad data is the input for LSTM. Clip is a binary indicator of continuity of the data (sequence).
For example, you can give different input sequences (i.e., [1 2 3 4], [1 2 3]) as one input data as follows: data = [1 2 3 4 1 2 3], clip = [0 1 1 1 0 1 1]. "0" indicates head of the sequence. By default, clip is [0 1 1 1 .... 1], which assumes that only one sequence from its head is given as input.
You can also do several forward passes for a very long or variable-length sequence.
For example, data = [1 2 3 4 5] can be divided into 5 forward passes as follows:

  1. data=[1], clip=[0]
  2. data=[2], clip=[1]
  3. data=[3], clip=[1]
  4. data=[4], clip=[1]
  5. data=[5], clip=[1]
    Although this seems very inefficient, it is actually necessary especially when a prediction is used as an input for the next time-step (i.e., text modelling).

I guess you don't have to use "clip" because 1) the input data is always given from the data, and the input sequence is complete (starting from its head, continuous). So, the default clip value should work for your case.

from caffe-lstm.

nakosung avatar nakosung commented on July 21, 2024 1

https://github.com/jeffdonahue/caffe/tree/recurrent would be helpful.

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

I think it's not different from the example I provided.
You can define LSTM layer on top of the K images in prototxt.
As @nakosung mentioned, Jeff Donahue has another LSTM implementation (seems to be merged to master branch soon) with examples on images.
You can find prototxt from his branch.
Thanks.

from caffe-lstm.

mostafa-saad avatar mostafa-saad commented on July 21, 2024

@junhyukoh
Thanks so much. One more question, what are the inputs for your LSTM layer? In an example, it takes 2 elements, data and clip?

from caffe-lstm.

mostafa-saad avatar mostafa-saad commented on July 21, 2024

Thanks. Just to make sure I understand you. Assume I am extending Alex Network with 1 LSTM layer, and say I have 3 videos for training. One with 4 frames, other with 3 frames and 3rd one with 5 frames. Clip should be as following:
clip = [0 1 1 1 0 1 1 0 1 1 1 1]?

How to use level db to input the clip layer from hard disk not from memory? Is it possible to just provide a text file?

I am just novice in Caffe and still learning, sorry for many questions.

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

That's correct.

The current data_layer implementation (src/caffe/layers/data_layer.cpp) doesn't support clip.
So, you may have to implement your own data layer where the output is data/clip if you want to use leveldb.
Another way is to give data/clip directly from your own program like my example code (lstm_sequence.cpp) without using leveldb, but this doesn't run on a separate thread, which might be slower than implementing a new data layer.

from caffe-lstm.

mostafa-saad avatar mostafa-saad commented on July 21, 2024

What about an ImageData input layer <image, label> pair: images are dummy and the labels be the binary clip input? Do you think this would work?

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

I think it would work if you give the pair correctly.

from caffe-lstm.

mecp avatar mecp commented on July 21, 2024

Excusme me, if this is a very simple question, but I am just starting with learning using neural networks using caffe.

Is it possible to use this network to train on continuous sequence of 2 variables for ex. [(2.77, 9.03), (2.01, 10.48),.....] and then predict next in sequence for supplied input?

So for training I could have sequence [[t0]...[t9]] (sequence of 10 time steps) as input and [t10] as expected output. And then do the prediction in same manner.

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

@mecp Yes. It's possible to train the network on multi-dimensional input/output.

from caffe-lstm.

HaiboShi avatar HaiboShi commented on July 21, 2024

@junhyukoh what's the difference between batch size, N_ and sequence length T_?

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

@HaiboShi In RNN training, a training example is x_{1}, x_{2}, ..., x_{T_} (sequence). We can define N_ number of such sequences as a mini-batch.

from caffe-lstm.

HaiboShi avatar HaiboShi commented on July 21, 2024

@junhyukoh and the diff of that mini-batch sum up together for updating the weight?

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

@HaiboShi Yes, the diff is accumulated through mini-batch.
However, loss layers usually give normalized diffs to the bottom blobs (by dividing it by the size of mini-batch).
So, the weight diff is actually normalized by the size of the mini-batch.

from caffe-lstm.

HaiboShi avatar HaiboShi commented on July 21, 2024

@junhyukoh Thanks. It helps a lot. and there's another specific question:
In lstm layer class, Blob h_to_h_;
What is that variable standing for? I noticed that it appears only in backward propagation step. Thanks!

from caffe-lstm.

HaiboShi avatar HaiboShi commented on July 21, 2024

@junhyukoh while also, it seems that there's no top diff data in backward_cpu() function, I wonder how the gradient from last layer pass to lstm layer? thanks! 💯

from caffe-lstm.

junhyukoh avatar junhyukoh commented on July 21, 2024

@HaiboShi h_to_h_ is an intermediate blob that computes h_{t+1} -> h_{t} gradient.
There is a top diff in backward_cpu() function in Line 209.
Dtype* top_diff = top_.mutable_cpu_diff()
top_ shares its memory with the actual top blob.

from caffe-lstm.

HaiboShi avatar HaiboShi commented on July 21, 2024

@junhyukoh hi, thanks for your reply. there's one more question:
what is the clipping_threshold_ standing for? is it related to pregate_gradient?

I notice you do the accumulation for the batch:
caffe_add(H_, dh_t_1, h_to_h, dh_t_1);
does it mean that h{t} gradient is composed by the h{t+1}gradient of all elements in one batch?

from caffe-lstm.

kimshao avatar kimshao commented on July 21, 2024

@junhyukoh Hi, junhyukoh,I am new to caffe, and I have read your example, and I have two questions here:
Firstly, in your example, TotalLength=seq_length=320,which means there is only one sequence for input. However,if I have more sequence,and train thousands of times, after the first time, the clip array turns to all 1, what does it mean if the value of clip is [1,1...]? Continue to input another sequence after the beginning one with a head 0 in clip? (I mean this line: train_clip_blob->mutable_cpu_data()[0] = seq_idx > 0;)
The second one is that it is noted that during the test phase, you reshape the input data that I cannot fully understand, also,there is no input data during test,isn't it? Can u explain it, please? (This line: test_data_blob->Reshape(shape);
test_clip_blob->Reshape(shape);)
l ll appreciate ur answer, thanks a lot!

from caffe-lstm.

kimshao avatar kimshao commented on July 21, 2024

@junhyukoh And also, what is the difference between data and label? There is an object named 'data', but it is not mentioned in your code!

from caffe-lstm.

pciang avatar pciang commented on July 21, 2024

Hi @junhyukoh, I have a question about "clip" array. Let's say, during training phase, my input "data" is [A B C (eos)], and the desired label is [W X Y Z (eos)], does the "data", "label", and "clip" become something like this:

Data A B C (EOS) W X Y Z
Label W X Y Z (EOS)
Clip 0 0 0 0 1 1 1 1

from caffe-lstm.

gabriellapizzuto avatar gabriellapizzuto commented on July 21, 2024

@junhyukoh What is the sequence length if I have a feature vector of (10, 50, 4, 4) ?

from caffe-lstm.

robosmith avatar robosmith commented on July 21, 2024

@junhyukoh When training an LSTM with a single (long) repeated sequence and multiple epochs, should the clip value be 0 at the start of each epoch/data sequence, or just the first epoch?

from caffe-lstm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.