hans / glove.py Goto Github PK

Python implementation of GloVe word embedding algorithm (Pennington et al., 2014) for educational purposes

Python 100.00%

glove.py's Introduction

This repository contains an implementation of the GloVe word vector learning algorithm in Python 2 (NumPy + SciPy). (A contributed Python 3 version is available here.)

You can follow along with the accompanying tutorial on my blog.

The implementation is for educational purposes only; you should look elsewhere if you are looking for an efficient / robust solution.

glove.py's People

Contributors

Stargazers

Watchers

Forkers

leapease akkineniramesh aimran clijda prateekmehta adammenges likaiguo rahulmohan sonu5623 hitluobin xsongx wowuq karimlabib manasrk bertomartin zxsted ashhher3 dustinmayeda himmelstein sevinjyolchuyeva jackhannnnnn anmolchachra theo-m jianzhengming ravitejaanantha leondz maraimelbadri ownway22 yumy-yumy mostafaashraf413 maierhofert mrknight21 lxiangge1126 janeshenyy lefft moherx mengyingzh mght x-lai dingvale nzigel selvamshan vaibhavbarmy kellywzhang xflee lalish99 abhinay141 mividalocas dunovank 111x0m7 casually-pylearner tantelitiana22 zjhao666 wxc1884 dhp-cherry liuheng0111 aybkao xray2lan cp736421469 eigenz1 epapagia zhenwang9102 qizhengsun zora-zjj eedreamer yacaikk chawit albertkao227 dengcfei iq-scm

glove.py's Issues

Bug in gradient updating

Your article and sample code is very useful! But there is a bug in run_iter I think. Vectors are updated in the same loop that gradients are calculated in. I don't think that is correct? Instead, all gradients should be calculated in one loop and in a second loop all updates should be applied.

grad_bias should multiple learning_rate?

    # Compute gradients for bias terms
    grad_bias_main = weight * cost_inner
    grad_bias_context = weight * cost_inner
    # # in stanford c version ：should multiple learning_rate
    # grad_bias_main = weight * cost_inner * learning_rate
    # grad_bias_context = weight * cost_inner * learning_rate

about gradient

I have seen your blog about the code. Then I'm wondering if the gradient part missed multiplying the 2.That makes the learning rate becomes the original's half. Am I correct?

How to read a vocabulary file

I have implemented glove.py on BBCNews dataset.I have formed a corpus of a single file with single space between words.Vocabulary file got generated.Can you please explain to me how to read it?

vocabulary.txt

I have passed arguments to command prompt as follows
C:\Users\JAYASHREE\Documents\NLP>python Glove_python_bbc.py "C:/Users/JAYASHREE/Documents/NLP/text-corpus.txt" --vocab-path C:/Users/JAYASHREE/Documents/NLP/vocabulary.txt --cooccur-path C:/Users/JAYASHREE/Documents/NLP/cooccur_matrix.txt -w 10 --min-count 10 --vector-path C:/Users/JAYASHREE/Documents/NLP/word-vector.txt -s 40 --iterations 10 --learning-rate 0.1 --save-often

text-corpus.zip

Load glove

How do I load the saved model (bin file)?

Why we maintain two sets of embedings for one word?

Hello Jon:
Very nice implementation and very clear tutorial. I now nearly understand this pretty work except for one point: Why we maintain two sets of embedings for one word? As you told in the tutorial that one is used when the word appeard as the main word and the other one used when the word is context. I saw similar settings in the original code of GloVe. But it is still unclear(unintuiative) to me why we do this? What benifits do we obtain doing this and have you tried what will happen if we just use one set of embeding for one word?

hans / glove.py Goto Github PK

glove.py's Introduction

glove.py's People

Contributors

Stargazers

Watchers

Forkers

glove.py's Issues

Bug in gradient updating

grad_bias should multiple learning_rate?

about gradient

How to read a vocabulary file

Load glove

Why we maintain two sets of embedings for one word?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent