Giter Site home page Giter Site logo

siamese-lstm's Introduction

Siamese-LSTM

Download the word2vec model from https://code.google.com/archive/p/word2vec/ and download the file: GoogleNews-vectors-negative300.bin.gz Set training=False if you want to load trained weights Files:

  1. semtrain.p- training data (SemEval 2014)
  2. semtest.p- testing date (SemEval 2014)
  3. stsallrmf.p- all STS data.

Scripts: (in examples folder)

  1. example1.py : Load trained model to predict sentence similarity on a scale of 1.0-5.0
  2. example2.py : Load trained model and check Pearson, Spearman and MSE.
  3. example3.py : Train the model (takes a long time to compile gradients)
  4. examples.ipynb : explanation of the MaLSTM code (iPython notebook)

Mueller, J and Thyagarajan, A. Siamese Recurrent Architectures for Learning Sentence Similarity. Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI 2016). http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12195

siamese-lstm's People

Contributors

aditya1503 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

siamese-lstm's Issues

Low pearson scores using the given "bestsem.p" model

Hi. I am very enjoying your paper and implementation. It is really impressive. But when I try to run your code with the given model "bestsem.p" using the script example2.py, it ends up with pearson socres around 0.715 which is far away from the best scores in your paper. Could do please help me solve this? Thank you very much!

Prediction calibration step

Hi aditya1503, thanks for sharing your code. It was stated in your paper (p. 4) that a prediction calibration step was implemented to convert the predictions into values similar to the input. I copied below the exact text in your paper:

Due to the simple construction of our similarity function, the predictions of our model are constrained to follow the exp(−x) curve and are thus not suited for these evaluation metrics. After training our model, we apply an additional nonparametric regression step to obtain better-calibrated predictions (with respect to MSE). Over the training set, the given labels (under original [1, 5] scale) are regressed against the univariate MaLSTM g-predicted relatedness as the sole covariate, and the fitted regression function is evaluated on the MaLSTM-predicted relatedness of the test pairs to produce adjusted final predictions. We use the classical local-linear estimator discussed in Fan and Gijbels (1992) with bandwidth selected using leave-one-out cross-validation. This calibration step serves as a minor correction for our restrictively simple similarity function (which is necessary to retain interpretability of the sentence representations).

I have two clarifications:

  1. Could you please point out to me which part in your code this is? Sorry, I'm not sure in which part of the code it is implemented.
  2. Would converting the input labels to lie between [0,1] at the start have the same effect?((relatedness_score) - 1) / 4

Cheers,
Kurt

Getting Error while loading dwords.p file

Traceback (most recent call last):
File "/home/bindu/Desktop/text similarity/Siamese-LSTM-master/SiameseLSTM.py", line 403, in
dtr=pickle.load(open("./dwords.p",'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 4: ordinal not in range(128)

How can I get the best result in the paper

Hi,
I run example3.py and only get the result of 0.82 (Pearson correlation). There is a big gap compared with the best result in the paper. How can I get the best result in the paper? Thanks!

Failed to train a new model with Theano error

During the training in example3, a error occurs and I failed to figure it out.Could do please help me?
Thanks!


TypeError Traceback (most recent call last)
/home/liu/Siamese-LSTM/example3.py in ()
3 #Syn_aug=True # it False faster but does slightly worse on Test dataset
4
----> 5 sls=lstm("new.p",load=False,training=True)
6
7 train=pickle.load(open("stsallrmf.p","rb"))#[:-8]

/home/liu/Siamese-LSTM/lstm.pyc in init(self, nam, load, training)
297 grads.append(grads[i])
298
--> 299 self.f_grad_shared, self.f_update = adadelta(lr, tnewp, grads,emb11,mask11,emb21,mask21,y, cost)
300
301

/home/liu/Siamese-LSTM/lstm.pyc in adadelta(lr, tparams, grads, emb11, mask11, emb21, mask21, y, cost)
186
187 f_grad_shared = theano.function([emb11,mask11,emb21,mask21,y], cost, updates=zgup + rg2up,
--> 188 name='adadelta_f_grad_shared')
189
190 updir = [-tensor.sqrt(ru2 + 1e-6) / tensor.sqrt(rg2 + 1e-6) * zg

/home/liu/.pyenv/versions/anaconda2-4.1.0/lib/python2.7/site-packages/theano/compile/function.pyc in function(inputs, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input)
318 on_unused_input=on_unused_input,
319 profile=profile,
--> 320 output_keys=output_keys)
321 # We need to add the flag check_aliased inputs if we have any mutable or
322 # borrowed used defined inputs

/home/liu/.pyenv/versions/anaconda2-4.1.0/lib/python2.7/site-packages/theano/compile/pfunc.pyc in pfunc(params, outputs, mode, updates, givens, no_default_updates, accept_inplace, name, rebuild_strict, allow_input_downcast, profile, on_unused_input, output_keys)
440 rebuild_strict=rebuild_strict,
441 copy_inputs_over=True,
--> 442 no_default_updates=no_default_updates)
443 # extracting the arguments
444 input_variables, cloned_extended_outputs, other_stuff = output_vars

/home/liu/.pyenv/versions/anaconda2-4.1.0/lib/python2.7/site-packages/theano/compile/pfunc.pyc in rebuild_collect_shared(outputs, inputs, replace, updates, rebuild_strict, copy_inputs_over, no_default_updates)
205 ' function to remove broadcastable dimensions.')
206
--> 207 raise TypeError(err_msg, err_sug)
208 assert update_val.type == store_into.type
209

TypeError: ('An update must have the same type as the original shared variable (shared_var=1lstm1_U_rgrad2, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

while training got Error

Kindly help ASAP to resolve the issus. i couldnot get the problem while training and running the Example3

File "/home/seemab/seemab/LSTM/example3.py", line 6, in
sls=lstm("new.p",load=False,training=True)

File "/home/seemab/seemab/LSTM/lstm.py", line 289, in init
gradi = tensor.grad(cost, wrt=tnewp.values())#/bts

File "/usr/local/lib/python3.5/dist-packages/theano/gradient.py", line 502, in grad
" of type " + str(type(elem)))

TypeError: Expected Variable, got odict_values([1lstm1_U, 1lstm1_W, 1lstm1_b, 2lstm1_U, 2lstm1_W, 2lstm1_b]) of type <class 'odict_values'>

Understanding gradient manipulation

Hello,

I'm trying to implement your code in Keras, and achieve the same results as you. I've mimicked LSTM initialization, and checked their math and constants to make it fit yours, although I'm still 20% away from your results (using your input data). My guess is the problem lies in the optimizer (it seems different than in Keras Adadelta), and I don't understand what's happenning in:

https://github.com/aditya1503/Siamese-LSTM/blob/master/lstm.py#L289

            gradi = tensor.grad(cost, wrt=tnewp.values())#/bts
            grads=[]
            l=len(gradi)
            for i in range(0,l/2):
                gravg=(gradi[i]+gradi[i+l/2])/(4.0)
            #print i,i+9
                grads.append(gravg)
            for i in range(0,len(tnewp.keys())/2):
                    grads.append(grads[i])
        
            self.f_grad_shared, self.f_update = adadelta(lr, tnewp, grads,emb11,mask11,emb21,mask21,y, cost)

I don't know where to implement this, in Keras logic (I presume this code only runs once, upon network definition), but I've tried:

def get_updates(self, loss, params):        
    gradi = self.get_gradients(loss, params)

    grads = []      
    l = len(gradi)   # for 2 LSTMs, l = 6, 3 'weights' per each
    half_l = int(l / 2)
    print(half_l)
    for i in range(0, half_l):
        gravg = (gradi[i] + gradi[i + half_l]) / (4.0)
        grads.append(gravg)

    alt_half_l = int(len(params) / 2)
    print(alt_half_l)
    for i in range(0, alt_half_l):
        grads.append(grads[i])

    shapes = [K.int_shape(p) for p in params]
    ...

in my own optimizer (based on Keras original Adadelta, again mimicking your constants).

However, the loss/cost per batch went from 0.08 (from a single LSTM, applied to 2 inputs, as suggested by Keras for the Siamese logic) to 0.4, thus it is a logic error.

I guess that gradient manipulation is being applied frequently in my code, such as per batch (as by Keras logic), while in your code it's applied once, on Adadelta initialization/definition.

Can someone help me understand what's happening in the above code? what is it for, is it run per batch, and/or why not a single LSTM shared, as Keras suggests in:
https://keras.io/getting-started/functional-api-guide/#shared-layers

Best,
Pedro

Missed the function of saving the new training model

Hi. Here is some suggestion about your code.
Although in the example3.py, it seems you intend to train a new model with the new file name "new.p" but in the end, this "new.p" will never be created or saved. If possible, could do please add the function within several lines to make the new model to be saved after a new training process. I am also working on it but it seems still need more time for me. Just a suggestion. If possible, I am expecting to see your new version.

Thank you very much!

Entailment code not yet added?

The paper mentions training and testing of the embeddings for an entailment prediction task. Is the code or function to get the entailment prediction not added? If it is added, please point me towards it as I am unable to find it.

Also, how should I get the LSTM embeddings of any text using the trained model?

Thank you.

no model saves

Can anyone please explain where does this model saves?if i remove provided bestsem.p file ten there si no model which has weights.

hi, when I set training=True, error arised, type mismatching?

mike@Vostro-3653:~/works/Siamese-LSTM$ python main.py 
Loading Word2Vec
Traceback (most recent call last):
  File "main.py", line 6, in <module>
    sls=lstm("bestsem.p",load=True,training=True)
  File "/home/mike/works/Siamese-LSTM/lstm.py", line 299, in __init__
    self.f_grad_shared, self.f_update = adadelta(lr, tnewp, grads,emb11,mask11,emb21,mask21,y, cost)
  File "/home/mike/works/Siamese-LSTM/lstm.py", line 188, in adadelta
    name='adadelta_f_grad_shared')
  File "/home/mike/.local/lib/python2.7/site-packages/theano/compile/function.py", line 320, in function
    output_keys=output_keys)
  File "/home/mike/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 442, in pfunc
    no_default_updates=no_default_updates)
  File "/home/mike/.local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 207, in rebuild_collect_shared
    raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=1lstm1_U_rgrad2, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

training on new data

while trining on new data file i got this error.

File "", line 1, in
debugfile('/home/seemab/seemab/Siamese-LSTM-master/trainQA.py', wdir='/home/seemab/seemab/Siamese-LSTM-master')

File "/home/seemab/.local/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 728, in debugfile
debugger.run("runfile(%r, args=%r, wdir=%r)" % (filename, args, wdir))

File "/home/seemab/anaconda2/lib/python2.7/bdb.py", line 400, in run
exec cmd in globals, locals

File "", line 1, in

File "/home/seemab/.local/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
execfile(filename, namespace)

File "/home/seemab/.local/lib/python2.7/site-packages/spyder/utils/site/sitecustomize.py", line 94, in execfile
builtins.execfile(filename, *where)

File "/home/seemab/seemab/Siamese-LSTM-master/trainQA.py", line 17, in
sls.train_lstm(train,66)

File "lstmQA.py", line 350, in train_lstm
ls2.append(embed(x2[j]))

File "sentencesQA.py", line 78, in embed
dmtr[count]=model[stmx[count]]

File "/home/seemab/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 169, in getitem
return self.get_vector(entities)

File "/home/seemab/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 277, in get_vector
return self.word_vec(word)

File "/home/seemab/anaconda2/lib/python2.7/site-packages/gensim/models/keyedvectors.py", line 274, in word_vec
raise KeyError("word '%s' not in vocabulary" % word)

KeyError: "word 'functions,' not in vocabulary"

this error is raised for many different words.

KeyError: "word 'given,' not in vocabulary"

how to deal with it.?
Kindly reply

Question on custom synonym library

Hi, great work on this, I've enjoyed going through the code to learn from your implementation. I was wondering if you had any thoughts on how to integrate in a custom synonym library (in addition to not instead of) which could help boost some of the similarity metrics?

Thanks

-pankaj

please give a solution

while running the code i'm getting the following errors ..can anyone suggest me what changes i need to do..

Loading Word2Vec
Traceback (most recent call last):
File "/home/mcis-lap-40/Downloads/Siamese-LSTM-master/main.py", line 5, in
sls=lstm("bestsem.p",load=True,training=True)
File "/home/mcis-lap-40/Downloads/Siamese-LSTM-master/lstm.py", line 299, in init
self.f_grad_shared, self.f_update = adadelta(lr, tnewp, grads,emb11,mask11,emb21,mask21,y, cost)
File "/home/mcis-lap-40/Downloads/Siamese-LSTM-master/lstm.py", line 188, in adadelta
name='adadelta_f_grad_shared')
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 326, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 449, in pfunc
no_default_updates=no_default_updates)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 208, in rebuild_collect_shared
raise TypeError(err_msg, err_sug)
TypeError: ('An update must have the same type as the original shared variable (shared_var=1lstm1_U_rgrad2, shared_var.type=TensorType(float32, matrix), update_val=Elemwise{add,no_inplace}.0, update_val.type=TensorType(float64, matrix)).', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

Process finished with exit code 1

License

What license is this work under? Great implementation!

Error while running example3

I get the following error when I run python examples/example3.py by uncommenting the line "training=True # Set to false to load weights".
Epoch 65
Epoch 65 Update 22840 Cost 0.030776206
Epoch 65 Update 22880 Cost 0.032375146
Epoch 65 Update 22920 Cost 0.024936032
Epoch 65 Update 22960 Cost 0.030370418
Epoch 65 Update 23000 Cost 0.03584883
Epoch 65 Update 23040 Cost 0.042079
Epoch 65 Update 23080 Cost 0.033460055
Epoch 65 Update 23120 Cost 0.023712454
Epoch 65 Update 23160 Cost 0.038346637
epoch took: 45.9180030823
Pre-training done
Traceback (most recent call last):
File "examples/example3.py", line 14, in
train=expand(train)
File "/home/projects/Siamese-LSTM/sentences.py", line 146, in expand
sa,cnt1=chsyn(i[0],data)
File "/home/projects/Siamese-LSTM/sentences.py", line 93, in chsyn
if q in cachedStopWords or q.title() in cachedStopWords or q.lower() in cachedStopWords:
NameError: global name 'cachedStopWords' is not defined

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.