nicodjimenez / lstm Goto Github PK
View Code? Open in Web Editor NEWMinimal, clean example of lstm neural network training in python, for learning purposes.
Minimal, clean example of lstm neural network training in python, for learning purposes.
@nicodjimenez Could you please upload a schematic diagram of the LSTM architecture? Will help a lot.
Thanks.
Conceptually, why dont y_list
and input_val_arr
have the same dimensions? Is your toy example intended as a multi-input-single-output network?
Changing x_dim = 1
had poor results on convergence (a feeble attempt to make y_list
and input_val_arr
look more familiar by having equal size). However, replicating y_list
, such that for each x
there is an associated y
(single-input-single-output), had adequate convergence and looked more familiar to common examples I've seen around the web:
yy = [-0.5,0.2,0.1, -0.5]
y_list = [[y for _ in xrange(x_dim)] for y in yy]
input_val_arr = [np.random.random(x_dim) for _ in yy]
and then I made very minimal changes to the ToyLossLayer
:
@classmethod
def loss(self, pred, label):
return np.sum((pred[:len(label)] - label) ** 2)
@classmethod
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[:len(label)] = 2 * (pred[:len(label)] - label)
return diff
I'm very new to nn and so far have only seen single-input-single-output cases where x and y are equally sized vectors (For example, Karpathy's word prediction rnn: https://gist.github.com/karpathy/d4dee566867f8291f086). I thought to check with you on your intention for the example_0()
to ensure I was understanding your work fully. Am I close?
Thanks! this is the easiest to read code I've found on LSTM.
when: y_list = [-0.8333333333, 0.33333, 0.166666667, -0.8]
result: iter 9999: y_pred = [-0.76083, -0.00007, -0.00007, -0.79996], loss: 1.442e-01
import numpy as np
from lstm import LstmParam, LstmNetwork
class ToyLossLayer:
def __init__(self,mem_cell_ct):
self.v=np.zeros(mem_cell_ct)
def value(self, pred):
out=self.v.dot(pred)
return out
def loss(self, pred, label):
out=self.value(pred)
return (out- label) ** 2
def bottom_diff(self, pred, label):
out=self.value(pred)
df = 2 * (out - label)/self.v.shape[0]
diff=df*self.v
self.v-=0.1*pred*df
return diff
def example_0():
# learns to repeat simple sequence from random inputs
np.random.seed(0)
# parameters for input data dimension and lstm cell count
mem_cell_ct = 1000
x_dim = 50
lstm_param = LstmParam(mem_cell_ct, x_dim)
lstm_net = LstmNetwork(lstm_param)
y_list = [-8.8, 80.2, 3.1, 8000.8]
input_val_arr = [np.random.random(x_dim) for _ in y_list]
loss=ToyLossLayer(mem_cell_ct)
for cur_iter in range(10000):
# print(y_list)
print("iter", "%2s" % str(cur_iter), end=": ")
for ind in range(len(y_list)):
lstm_net.x_list_add(input_val_arr[ind])
print("y_pred = [" +
", ".join(["% 2.5f" % loss.value(lstm_net.lstm_node_list[ind].state.h) for ind in range(len(y_list))]) +
"]", end=", ")
lossv = lstm_net.y_list_is(y_list, loss)
print("loss:", "%.3e" % lossv)
lstm_param.apply_diff(lr=0.1)
lstm_net.x_list_clear()
if __name__ == "__main__":
example_0()
when: y_list = [-8.8, 80.2, 3.1, 8000.8]
result: iter 9999: y_pred = [-8.72629, 80.11621, 3.70385, 8000.36304], loss: 5.988e-01
hi, would you please explain what bottom and top means in your code? Thank you very much.
@classmethod
def loss(self, pred, label):
return (pred[0] - label) ** 2
@classmethod
def bottom_diff(self, pred, label):
diff = np.zeros_like(pred)
diff[0] = 2 * (pred[0] - label)
return diff
why use pred[0] to compute losee and buttom_diff
Hey,
I ported your example to D. Maybe it's useful to reference here in your readme or so..
Issue: lstm.py--the 98th line.
There is a problem with the code of line: self.state.h = self.state.s * self.state.o. You forget the tanh function. The formula is h_{t} = o_{t} * tanh(s_{t}). Therefore, the correct one is the line of code as follows.
self.state.h = tanh(self.state.s) * self.state.o
Pasted the partial lines of code as follows.
def bottom_data_is(self, x, s_prev = None, h_prev = None):
# if this is the first lstm node in the network
if s_prev is None: s_prev = np.zeros_like(self.state.s)
if h_prev is None: h_prev = np.zeros_like(self.state.h)
# save data for use in backprop
self.s_prev = s_prev
self.h_prev = h_prev
# concatenate x(t) and h(t-1)
xc = np.hstack((x, h_prev))
self.state.g = np.tanh(np.dot(self.param.wg, xc) + self.param.bg)
self.state.i = sigmoid(np.dot(self.param.wi, xc) + self.param.bi)
self.state.f = sigmoid(np.dot(self.param.wf, xc) + self.param.bf)
self.state.o = sigmoid(np.dot(self.param.wo, xc) + self.param.bo)
self.state.s = self.state.g * self.state.i + s_prev * self.state.f
self.state.h = self.state.s * self.state.o
Hi #nicodjimenez would like to work on your code. But while executing test.py file I am getting global name 'ToyLossLayer' is not defined
kindly explain what is ToyLossLayer and why we are sending to "y_list_is( )" in lstm.py
And I don't understand what is "end" in the print statement.
print("iter", "%2s"%str(cur_iter, end":")
self.param.wi_diff += np.outer(di_input, self.xc)
self.param.wf_diff += np.outer(df_input, self.xc)
self.param.wo_diff += np.outer(do_input, self.xc)
self.param.wg_diff += np.outer(dg_input, self.xc)
self.param.bi_diff += di_input
self.param.bf_diff += df_input
self.param.bo_diff += do_input
self.param.bg_diff += dg_input
I am not clear about this code
lstm.py the 97 line
self.state.s = self.state.g * self.state.i + s_prev * self.state.f
should be
self.state.s = np.tanh(self.state.g * self.state.i + s_prev * self.state.f)
if i want run the code in new dataset,and the data range not in [-1,1],what should i do?
Why self.state.h = self.state.s * self.state.o
?
But not h = tanh(s) * o
?
Hi,
I'm very new to LSTMs and I'm quite confused about how to make use of them.
What do the random input values mean here, when you're teaching it the small y_list sequence ?
I ported the code in C++ and I would basically like to teach it how to speak, after reading some text at the character-level (as discussed here). My purpose is to understand what my input values and y_list should be.
Thanks a lot.
Hi,
I implemented your code (with some changes) in Julia here if you'd like to include it in the README along with the D port.
Hi,
I was wondering how did you choose the number of memory cells?
And why and how did you choose 100 as a value for your parameter?
And finally, just to be clear : Do you create in the code, 4 LSTM nodes, because of the 4 "input sequence- target value" pairs?
And what is the difference between a memory cell and a LSTM node? I've quickly read the paper that you recommand in your blog, but it's still not clear for me.
Thanks in advance for your answers.
Ick
If a next sequence without the true label is coming, how to predict it ? Thanks!
In LstmState __init__
, x_dim
looks to be unnecessary. Is that correct?
https://github.com/nicodjimenez/lstm/blob/master/lstm.py#L65
referencing the first line in your backward pass:
# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s
Mathematically where does the + top_diff_s
come from? Is my guess that it is purely just a fudge factor to prevent the gradient from going to zero (hence preventing the vanishing gradient) accurate? Or, is there more math behind it that I'm overlooking?
Thanks again for your clarification!
thanks for your great work on this blog, it's the clearest and simplest blog on lstm i have read.
in your blog, derivative of s(t) is get by this equation,
i wonder why the left of this equation is equal to the right.
in my opinion, left = 0.5*right. and why we need the top_diff_s if we could calculate
directly.
I am new to nn and lstm, your blog is the best and clearest blog about implement and derivation of lstm, thanks for your great work.
def top_diff_is(self, top_diff_h, top_diff_s):
# notice that top_diff_s is carried along the constant error carousel
ds = self.state.o * top_diff_h + top_diff_s
this is your code above. I wanna ask why plus top_diff_s? I am not understand your comment .
Hello. I have a question about dimension of input.
In case
mem_cell_ct = 2
x_dim = 1
Biases are not considered
Input of first cell must be: [x_value, 0]
Input of seconds cell: [cell_1_output, 0]
But in your case:
Input of first cell is: [x_value, 0, 0]
Input of seconds cell: [x_value, cell_1_output, 0]
Is it correct that cells in hidden layers get actual input?
This implementation is for 50 input and 1 output. I'm trying to use it with 2 output and it gives an error as follows.
File "test.py", line 43, in example_0
loss = lstm_net.y_list_is(y_list, ToyLossLayer)
File "lstm.py", line 155, in y_list_is
diff_h = loss_layer.bottom_diff(self.lstm_node_list[idx].state.h, y_list[idx])
File "test.py", line 17, in bottom_diff
diff[0] = 2 * (pred[0] - label)
ValueError: setting an array element with a sequence.
when run test.py, there is a error.
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
changing the '==' of two lines to 'is' is right, it likes 'if s_prev is None: s_prev = np.zeros_like(self.state.s)'
Hi ,
I am new in LSTM , and I wanna know whether your code apply in the action recognition about skeleton data. And which deep learning architecture like caffe , theano, tensorflow I can use ?
Thanks
I think with the definition of the network and loss function, the backpropagation should be auto computed, why should we explicitly definite it in the code?
hi, sorry to bother you again
I read this line in your code: self.state.h = self.state.s * self.state.o
but when I found in the paper, it saids may be like this:
self.state.h = np.tanh(self.state.s) * self.state.o
would you tell me which one is right?
In L95 of lstm.py, as far as I can see you are omitting to apply tanh() to the new cell state before multiplying it with the squashed o(t).
As referenced in the article you mention in your readme in the last equation on page 20, and in this excellent tutorial page I found (https://colah.github.io/posts/2015-08-Understanding-LSTMs/), you have to apply tanh() to your new cell state before you multiply it with o(t). I don't see you doing that in your code, so unless this is being corrected somewhere else I failed to notice, it should be corrected.
Otherwise, this is an excellent resource, thanks a lot :)
what does self.wi = rand_arr(-0.1, 0.1, mem_cell_ct, concat_len) signify? Is wi having input gate weights for both input x_{t} as well as the h_{t-1}. What does mem_cell_ct contain and what is concat_len
Hello,
I made a Swift port over here: https://github.com/emilianbold/swift-lstm
Thanks for the code! I hope to use it for some practical application soon.
PS: The repository has no license, so I assume it's safe to assume some sort of Public Domain license? I want to be able to use my port and tweaks...
hi,
I'm very new to lstm.
I use test.py finish the training, then how can I input some test samples to get the acc or something?
thank you!
I am very interested LSTM project. But I found that I could not understand the "M" in your blog very well. In your blog, you said that "M" is the total number of memory cells. But I think "M" equals to the time steps. Do note that there are LSTM networks with multiple memory cells. But I think the LSTM network you discussed is comprised of only one memory cell.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.