maciejkula / glove-python Goto Github PK
View Code? Open in Web Editor NEWToy Python implementation of http://www-nlp.stanford.edu/projects/glove/
License: Apache License 2.0
Toy Python implementation of http://www-nlp.stanford.edu/projects/glove/
License: Apache License 2.0
I tried to load the pre-trained vectors from the GloVe website http://www-nlp.stanford.edu/data/glove.6B.50d.txt.gz
I could load them after converting the txt file into a pickle .p file. I got a "Glove" object.
However, when I try the function most_similar, an error occurs:
AttributeError: 'Glove' object has no attribute 'word_vectors'
What is the solution? (I am a beginner...)
When I try to import glove, I get this. Does anyone know a solution?
On a mac (osx 10.12) both clang and gcc seem to fail. output below ...
Building wheels for collected packages: glove-python
Running setup.py bdist_wheel for glove-python ... error
Complete output from command /Users/barry/miniconda3/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/31/yd2dv9h54m7_9fp95r8llkk80000gn/T/pip-build-i_roohw3/glove-python/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d /var/folders/31/yd2dv9h54m7_9fp95r8llkk80000gn/T/tmpj98_zqpmpip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-3.5
creating build/lib.macosx-10.9-x86_64-3.5/glove
copying glove/init.py -> build/lib.macosx-10.9-x86_64-3.5/glove
copying glove/corpus.py -> build/lib.macosx-10.9-x86_64-3.5/glove
copying glove/glove.py -> build/lib.macosx-10.9-x86_64-3.5/glove
running build_ext
building 'glove.glove_cython' extension
creating build/temp.macosx-10.9-x86_64-3.5
creating build/temp.macosx-10.9-x86_64-3.5/glove
gcc -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/barry/miniconda3/include -arch x86_64 -I/Users/barry/miniconda3/include/python3.5m -c glove/glove_cython.c -o build/temp.macosx-10.9-x86_64-3.5/glove/glove_cython.o -fopenmp -ffast-math -march=native
clang: error: unsupported option '-fopenmp'
error: command 'gcc' failed with exit status 1
In python 3 dict.keys() returns a dict_keys object and dict_values a dict_values object instead of an array.
So in glove.py line 165 should be changed from:
word_ids = np.array(cooccurrence.keys(), dtype=np.int32)
to
word_ids = np.array(list(cooccurrence), dtype=np.int32)
and line 166 should be changed from:
values = np.array(cooccurrence.values(), dtype=np.float64)
to
values = np.array(list(cooccurrence.values()), dtype=np.float64)
Btw, I'm using python 3.5 because pickle gives problems when working with big models in python 2.7.
I successful train my own glove , using this following link
if load glove model from stanford, I use this way
https://stackoverflow.com/questions/37793118/load-pretrained-glove-vectors-in-python
I am running the command pip install glove_python
C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.13.26128\bin\HostX86\x86\link.exe /nologo /INCREMENTAL:NO /LTCG /nodefaultlib:libucrt.lib ucrt.lib /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:c:\users\hi5an\appdata\local\programs\python\python35-32\libs /LIBPATH:c:\users\hi5an\appdata\local\programs\python\python35-32\PCbuild\win32 "/LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.13.26128\Lib\x86" "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.16299.0\um\x86" /LIBPATH:C:\WINDOWS\Microsoft.NET\Framework\v4.0.30319 "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.16299.0\ucrt\x86" stdc++.lib /EXPORT:PyInit_corpus_cython build\temp.win32-3.5\Release\glove/corpus_cython.obj /OUT:build\lib.win32-3.5\glove\corpus_cython.cp35-win32.pyd /IMPLIB:build\temp.win32-3.5\Release\glove\corpus_cython.cp35-win32.lib -fopenmp -ffast-math -march=native
LINK : warning LNK4044: unrecognized option '/fopenmp'; ignored
LINK : warning LNK4044: unrecognized option '/ffast-math'; ignored
LINK : warning LNK4044: unrecognized option '/march=native'; ignored
LINK : fatal error LNK1181: cannot open input file 'stdc++.lib'
error: command 'C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\VC\Tools\MSVC\14.13.26128\bin\HostX86\x86\link.exe' failed with exit status 1181
Complete logs in the attached file
log.txt
Not sure if I am missing something here but thought I'd ask for clarification - the loss function is not squared.
loss = entry_weight * (prediction - c_log(count))
Also this implementation does not generate seperate vectors for when word is used in context?
I am trying to install glove-python on RHEL 7.
I am using Python 2.7.5
I have GCC version 4.8.5 installed on my machine.
When I run pip install glove-python==0.1.0
, I get the following error:
Running setup.py install for glove-python ... error
Complete output from command /home/vaibhavtulsyan/mar_12/my-dir/.pyenv/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-9Uc3Dm/glove-python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-record-pSuN67/install-record.txt --single-version-externally-managed --compile --install-headers /home/vaibhavtulsyan/mar_12/my-dir/.pyenv/include/site/python2.7/glove-python:
running install
running build
running build_py
running build_ext
building 'glove.corpus_cython' extension
gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/python2.7 -c glove/corpus_cython.cpp -o build/temp.linux-x86_64-2.7/glove/corpus_cython.o -fopenmp -ffast-math -march=native
gcc: error trying to exec 'cc1plus': execvp: No such file or directory
error: command 'gcc' failed with exit status 1
Output of gcc -v
:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-36) (GCC)
C:\Users\BINVI01>pip install glove_python
Collecting glove_python
Using cached glove_python-0.1.0.tar.gz
Requirement already satisfied: numpy in c:\python27\lib\site-packages (from glov
e_python)
Requirement already satisfied: scipy in c:\python27\lib\site-packages (from glov
e_python)
Installing collected packages: glove-python
Running setup.py install for glove-python ... error
Complete output from command c:\python27\python.exe -u -c "import setuptools
, tokenize;file='c:\users\binvi01\appdata\local\temp\pip-build-lfd7z
\glove-python\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read
().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" instal
l --record c:\users\binvi01\appdata\local\temp\pip-sozxqa-record\install-record.
txt --single-version-externally-managed --compile:
running install
running build
running build_py
creating build
creating build\lib.win32-2.7
creating build\lib.win32-2.7\glove
copying glove\corpus.py -> build\lib.win32-2.7\glove
copying glove\glove.py -> build\lib.win32-2.7\glove
copying glove_init.py -> build\lib.win32-2.7\glove
running build_ext
building 'glove.glove_cython' extension
creating build\temp.win32-2.7
creating build\temp.win32-2.7\Release
creating build\temp.win32-2.7\Release\glove
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic:\python27\include
-Ic:\python27\PC /Tcglove/glove_cython.c /Fobuild\temp.win32-2.7\Release\glove/g
love_cython.obj -fopenmp -ffast-math -march=native
cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
cl : Command line warning D9002 : ignoring unknown option '-march=native'
glove_cython.c
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:c:\python27\libs /L
IBPATH:c:\python27\PCbuild /LIBPATH:c:\python27\PC\VS9.0 /EXPORT:initglove_cytho
n build\temp.win32-2.7\Release\glove/glove_cython.obj /OUT:build\lib.win32-2.7\g
love\glove_cython.pyd /IMPLIB:build\temp.win32-2.7\Release\glove\glove_cython.li
b /MANIFESTFILE:build\temp.win32-2.7\Release\glove\glove_cython.pyd.manifest -fo
penmp
LINK : warning LNK4044: unrecognized option '/fopenmp'; ignored
Creating library build\temp.win32-2.7\Release\glove\glove_cython.lib and
object build\temp.win32-2.7\Release\glove\glove_cython.exp
building 'glove.metrics.accuracy_cython' extension
creating build\temp.win32-2.7\Release\glove\metrics
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic:\python27\include
-Ic:\python27\PC /Tcglove/metrics/accuracy_cython.c /Fobuild\temp.win32-2.7\Rele
ase\glove/metrics/accuracy_cython.obj -fopenmp -ffast-math -march=native
cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
cl : Command line warning D9002 : ignoring unknown option '-march=native'
accuracy_cython.c
creating build\lib.win32-2.7\glove\metrics
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:c:\python27\libs /L
IBPATH:c:\python27\PCbuild /LIBPATH:c:\python27\PC\VS9.0 /EXPORT:initaccuracy_cy
thon build\temp.win32-2.7\Release\glove/metrics/accuracy_cython.obj /OUT:build\l
ib.win32-2.7\glove\metrics\accuracy_cython.pyd /IMPLIB:build\temp.win32-2.7\Rele
ase\glove/metrics\accuracy_cython.lib /MANIFESTFILE:build\temp.win32-2.7\Release
\glove/metrics\accuracy_cython.pyd.manifest -fopenmp
LINK : warning LNK4044: unrecognized option '/fopenmp'; ignored
Creating library build\temp.win32-2.7\Release\glove/metrics\accuracy_cyth
on.lib and object build\temp.win32-2.7\Release\glove/metrics\accuracy_cython.exp
building 'glove.corpus_cython' extension
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic:\python27\include
-Ic:\python27\PC /Tpglove/corpus_cython.cpp /Fobuild\temp.win32-2.7\Release\glov
e/corpus_cython.obj -fopenmp -ffast-math -march=native
cl : Command line warning D9002 : ignoring unknown option '-fopenmp'
cl : Command line warning D9002 : ignoring unknown option '-ffast-math'
cl : Command line warning D9002 : ignoring unknown option '-march=native'
corpus_cython.cpp
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Include\xlocale(342) : warning C4530: C++ exception handler used, but
unwind semantics are not enabled. Specify /EHsc
glove/corpus_cython.cpp(1894) : warning C4018: '>=' : signed/unsigned mismat
ch
glove/corpus_cython.cpp(2225) : warning C4018: '<' : signed/unsigned mismatc
h
glove/corpus_cython.cpp(2496) : warning C4018: '<' : signed/unsigned mismatc
h
glove/corpus_cython.cpp(3403) : warning C4244: 'argument' : conversion from
'double' to 'float', possible loss of data
glove/corpus_cython.cpp(3431) : warning C4244: 'argument' : conversion from
'double' to 'float', possible loss of data
C:\Users\BINVI01\AppData\Local\Programs\Common\Microsoft\Visual C++ for Pyth
on\9.0\VC\Bin\link.exe /DLL /nologo /INCREMENTAL:NO /LIBPATH:c:\python27\libs /L
IBPATH:c:\python27\PCbuild /LIBPATH:c:\python27\PC\VS9.0 stdc++.lib /EXPORT:init
corpus_cython build\temp.win32-2.7\Release\glove/corpus_cython.obj /OUT:build\li
b.win32-2.7\glove\corpus_cython.pyd /IMPLIB:build\temp.win32-2.7\Release\glove\c
orpus_cython.lib /MANIFESTFILE:build\temp.win32-2.7\Release\glove\corpus_cython.
pyd.manifest -fopenmp -ffast-math -march=native
LINK : warning LNK4044: unrecognized option '/fopenmp'; ignored
LINK : warning LNK4044: unrecognized option '/ffast-math'; ignored
LINK : warning LNK4044: unrecognized option '/march=native'; ignored
LINK : fatal error LNK1181: cannot open input file 'stdc++.lib'
error: command 'C:\Users\BINVI01\AppData\Local\Programs\Common\Micros
oft\Visual C++ for Python\9.0\VC\Bin\link.exe' failed with exit status 1181
----------------------------------------
Command "c:\python27\python.exe -u -c "import setuptools, tokenize;file='c:
\users\binvi01\appdata\local\temp\pip-build-_lfd7z\glove-python\setup.py'
;f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n')
;f.close();exec(compile(code, file, 'exec'))" install --record c:\users\binv
i01\appdata\local\temp\pip-sozxqa-record\install-record.txt --single-version-ext
ernally-managed --compile" failed with error code 1 in c:\users\binvi01\appdata
local\temp\pip-build-_lfd7z\glove-python\
I was trying to train the latest Wikipedia dump size 15gb, obviously it has large corpus and token count (approx 360m). Since the co-occurrence matrix need to live in the memory, I want to provide a min number for Freq count of the word while creating vocab which in turn creates the co-occurrence matrix. I could not find any parameter for that. Also the code is in cython so it's hard to understand for noob like me. Any idea how can I create vocab and co-occurrence making it memory efficient?
Hi, I am new to NLP and interested to explore the hype of word2vec. I wanna carry out some intrinsic evaluation such as "man-women=father-mother". In gensim package, we can do so directly with a most_similar function. I do not know how to do that in glove-python, in addition, I wonder whether I can use glove-python to do document classification, how can I use the functions like similar_paragraph, transform_paragraph and etc. Expect your help, thanks!
Here's the output of this one,
> python setup.py install
running install
running bdist_egg
running egg_info
writing pbr to glove.egg-info/pbr.json
writing requirements to glove.egg-info/requires.txt
writing glove.egg-info/PKG-INFO
writing top-level names to glove.egg-info/top_level.txt
writing dependency_links to glove.egg-info/dependency_links.txt
reading manifest file 'glove.egg-info/SOURCES.txt'
writing manifest file 'glove.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-intel/egg
running install_lib
running build_py
running build_ext
building 'glove.glove_cython' extension
gcc-4.9 -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c glove/glove_cython.c -o build/temp.macosx-10.10-intel-2.7/glove/glove_cython.o -fopenmp
gcc-4.9: error: unrecognized command line option '-Wshorten-64-to-32'
error: command 'gcc-4.9' failed with exit status 1
Trying to use llvm-gcc
results in missing a -lgomp
library which I assume is related to the -fopenmp
flag in the setup file.
I think cython introduces the shorten-64-to-32
flag based on my python version, I may have to use a different version of that.
How can I make an edited glove.py work?
I have Mac and I am running python 3.5.2 and gcc version 7.1.0
When i ran pip command i got the following error
/usr/bin/clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I/Users/hsethi/anaconda/include -arch x86_64 -I/Users/xxxxx/anaconda/include/python3.5m -c glove/glove_cython.c -o build/temp.macosx-10.6-x86_64-3.5/glove/glove_cython.o -fopenmp -ffast-math
clang: error: unsupported option '-fopenmp'
error: command '/usr/bin/clang' failed with exit status 1
----------------------------------------
Command "/Users/xxxx/anaconda/bin/python -u -c "import setuptools, tokenize;__file__='/private/var/folders/2y/442qhylj3bs2mln_h3lh4zvrbrhh_y/T/pip-build-1wyf65em/glove-python/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/2y/442qhylj3bs2mln_h3lh4zvrbrhh_y/T/pip-bgxjtwil-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/2y/442qhylj3bs2mln_h3lh4zvrbrhh_y/T/pip-build-1wyf65em/glove-python/
examples/analogy_tasks_evaluation.py
only works if metrics is in the top level module namespace, as in from glove import Glove, metrics
. At least in my most recent pip installed version, metrics is not in the module's exported namespace.
I'm running Python 2.7.6 on Ubuntu.
what if i want to return the word/paragraph vector instead of the word similarities?
Current implementation uses double (float64) everywhere.
This is probably an overkill -- single precision (float32) may be enough, and cut the memory down a lot. Both for the C++ and scipy.sparse matrices.
Is there a specific reason behind using double? Are there numerical problems with single?
(word2vec uses single precision everywhere, for example)
Hi, I am new to Glove in Python. I was wondering how to load the pre-trained Stanford Glove word vectors.
Request your help.
Regards
SBS
I'm getting the following error when trying to load http://nlp.stanford.edu/data/glove.840B.300d.zip
In [1]: import glove
In [2]: %time glv = glove.Glove.load_stanford("glove.840B.300d.txt")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-5e84d129b242> in <module>()
----> 1 get_ipython().magic(u'time glv = glove.Glove.load_stanford("glove.840B.300d.txt")')
virtualEnv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
2161 magic_name, _, magic_arg_s = arg_s.partition(' ')
2162 magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163 return self.run_line_magic(magic_name, magic_arg_s)
2164
2165 #-------------------------------------------------------------------------
virtualEnv/local/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
2082 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
2083 with self.builtin_trap:
-> 2084 result = fn(*args,**kwargs)
2085 return result
2086
<decorator-gen-60> in time(self, line, cell, local_ns)
virtualEnv/Executionr/local/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
191 # but it's overkill for just that one bit of state.
192 def magic_deco(arg):
--> 193 call = lambda f, *a, **k: f(*a, **k)
194
195 if callable(arg):
virtualEnv/local/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in time(self, line, cell, local_ns)
1175 else:
1176 st = clock2()
-> 1177 exec(code, glob, local_ns)
1178 end = clock2()
1179 out = None
<timed exec> in <module>()
virtualEnv/local/lib/python2.7/site-packages/glove/glove.pyc in load_stanford(cls, filename)
265 instance.word_vectors = (np.array(vectors)
266 .reshape(no_vectors,
--> 267 no_components))
268 instance.word_biases = np.zeros(no_vectors)
269 instance.add_dictionary(dct)
ValueError: total size of new array must be unchanged
Any suggestions on how to load the vectors?
I am getting this error on trying to import glove.
I am using Python 2.7 and Anaconda 4.2 and Ubuntu 14.04.
Solved it by installing libgcc on Anaconda.
I posted this just in case anybody runs into the same issue in future. Thanks!
I've found no documentation about usage of this package, so I don't understand how to correctly tune this model. The example just mentions the code below:
glove = Glove(no_components=100, learning_rate=0.05)
glove.fit(corpus.matrix, epochs=30, no_threads=4, verbose=True)
But it doesn't explains anything. I don't understand what no_components
and learning_rate
means. And what effect on the result has the number of epochs? Thank you.
I think this implementation is missing a parameter to discard words that appeared less than a given number of times, at least I couldn't find such a parameter in the code.
I was trying to install glove on an ec2 instance. I have python3.6 and have already installed gcc. Even then, it is failing to install. The displayed message is:
glove/glove_cython.c:262:10: fatal error: omp.h: No such file or directory
#include <omp.h>
^~~~~~~
compilation terminated.
error: command 'gcc' failed with exit status 1
Hi
I have a corpus in Hindi and want to train the Glove model on this dataset.
My corpus is in the form of a folder of text documents.
How can I do this? Please provide appropriate code. Thanks.
Hi all,
Thanks for this nice implementation. However, the dict constructed from the input data by the Corpus() method has a shape:
{
word_1: ID_1,
word_2: ID_2 ... }
So, what about a word appearing often in the corpus ? The last ID is just replaced ? It should be a list no ?
And one more, when training on multiple chunks of documents, the method add_dictionary() simply replace the old dict created on the chunk No 1 by the one created with the No 2. Do you want me to pull a new version who will merge the two dicts instead ?
This is more RAM-friendly to interate through a generator when the input corpus is huge as hell ...
Thanks !
In line 166 and 167
word_ids = np.array(cooccurrence.keys(), dtype=np.int32)
values = np.array(cooccurrence.values(), dtype=np.float64)
gives
TypeError: int() argument must be a string, a bytes-like object or a number, not 'dict_keys'
Changing them to the following solved the issue:
word_ids = np.array(list(cooccurrence.keys()), dtype=np.int32)
values = np.array(list(cooccurrence.values()), dtype=np.float64)
Specifically
Hello,
I was wondering if someone managed to reproduce the results of the sentence similarity scores on the STS benchmark dataset (http://ixa2.si.ehu.es/stswiki/index.php/STSbenchmark). I tried to do that using your function transform_paragraph together with tokenizing the sentences with StanfordTokenizer from the NLTK library, but I managed to get to the Pearson coef. of only a bit over 0.3 on the testing set (the STS shows around 0.4).
I know the transform_paragraph function is only experimental, but I was wondering whether you implemented it completely yourself or you used an official GloVe sentence embedding (I myself do not know how exactly they weighted the individual words to get the sentence vector).
Thanks :)
I am consistently getting the following error:
gcc-4.9: error: unrecognized command line option '-Wshorten-64-to-32'
Any help ?
Hey there!
Have you thought about implementing __contains__
for "man" in model
or __getitem__
for model["man"]
support? That would be really neat. I tried monkey patching them, but it did not work (not sure why to be honest). The functions could be something like this (I think):
def __getitem__(self, word):
try:
word_idx = self.dictionary[word]
except KeyError:
raise Exception('Word not in dictionary')
return self.word_vectors[word_idx]
def __contains__(self, word):
return word in self.dictionary
I would have done a pull request but I am too uncertain about cython and stuff.
Errors:
-Failed building wheel for glove-python
-Failed cleaning build dir for glove-python
-twisted 18.7.0 requires PyHamcrest>=1.9.0, which is not installed.
Hi there, I would like to try your example.py but I have no idea what corpus are to use/ I have just started learning python and machine learning and I am really confused. In your using guide example: ipython -i -- examples/example.py -c my_corpus.txt -t 10
I tried using the link that you have provided (http://www-nlp.stanford.edu/projects/glove) under "Download pre-trained word vectors" - I chose the Wikipedia 2014 + Gigaword 5 - (glove.6B.zip). In this glove.6B.zip file there are 4 files (glove.6B.50d , glove.6B.100d, glove.6B.200d and glove.6B.300d)
In the python command I tried running it using -i -- examples/example.py -c my_corpus.txt -t 10 where I renamed on the the file i.e. (glove.6B.50d to my_ corpus.txt).
I get an error message where it says : No module named corpus_cython. Did I do any of the steps wrongly?
I was wondering ,if you can provide me with the link to "my_corpus.txt" where you can get the result
In [1]: glove.most_similar('physics')
Out[1]:
[('biology', 0.89425889335342257),
('chemistry', 0.88913708236100086),
('quantum', 0.88859617025616333),
('mechanics', 0.88821824562025431)
Thank you.
I'd like to get the context vectors out too (not just word vectors). Is this possible with current implementation?
The idea is to try to decouple words and contexts completely, ala Levy&Goldberg's "dependency based embeddings", to experiment with functional similarities.
After successfully installing glove_python with pip install on Mac Sierra 10.12.6, I get the following import error when trying to import the package:
from glove import Glove
from glove import Corpus
ImportError: dlopen(/Users/thomas/anaconda/lib/python3.6/site-packages/glove/glove_cython.cpython-36m-darwin.so, 2): Symbol not found: _GOMP_parallel
Referenced from: /Users/thomas/anaconda/lib/python3.6/site-packages/glove/glove_cython.cpython-36m-darwin.so
Expected in: flat namespace
in /Users/thomas/anaconda/lib/python3.6/site-packages/glove/glove_cython.cpython-36m-darwin.so
Does anyone know how to resolve this issue?
I'm new to Glove. I could install glove-python using pip. Can someone please redirect me to a sample code that uses glove.6B.zip pre-trained model such that I get vectors for new sentences. I tried converting that word2vec but ran into issues. Could someone help me providing a sample code for the same.
Hello.
I would like to see the loss of the vector optimization procedure to see if my glove model has relatively converged (and add a tolerance to make an early stopping). All I would need is the loss from the call to the cython module/function fit_vectors. I guess you would need to modify the C code of glove and somehow return the loss. Is there any easy way to achieve this?
Thanks in advance
Hi Maciej!
Thanks for sharing this project; I've been pursuing the idea of wrapping the reference stanfordnlp/GloVe source code as a Cython extension. Your library has been very useful in learning more about Cython!
I just wanted to ask what your motivation was for writing the algorithm from scratch, and what you would think of trying to thinly-wrap the original distribution. Thanks!
I have trained a model using glove-python, and I want to finetune it using other data. Is it possible?
If I load a trained glove model and train it with fit(), it starts to train from scratch.
I'd like to get a word vector from its associated word and vice-versa.
Is it possible to do this in existing version of the code?
What is the input data structure of the paragraph_transform function? Is it a list of word? I got keyError when I give a list of words as input.
I hope this is a simple question, because I'm new to NLP.
Thanks. This is great project.
UPDATE: did I understand it wrong? I'm using pretrained model and just use this function to convert short sentence/paragraph to a vector. But reading the code, it seems this function does gradient update, which seems like a training process...
Hi, I'm getting a Memory Error
when I'm trying to run an example script (probably during creation of coo matrix. Is there a way to save intermediate results in file (or some other method) to decrease the memory usage?
I see that in glove.py there is a transform_paragraph function, which as the name suggests transforms a paragraph into a vector. However, at the end of the function I see it calling another transform_paragraph, this time from glove_cython.pyx. What is the purpose of this last call, it seems to be working without the call just as well.
None of the solutions actually work in IOS. Can someone help, please?
I try to get a GloVe model of French words, but accentuated letters disappeared. For example, 'président' is replaced by 'prsident', etc...
hi there, I wanted to modify the corpus_cython.pyx script to take into account the left context of each word. The problem is, after i run python setup.py cythonize
and pip install -e .
those changes have no effect . What might be the cause?
I'm running it on Windows, using Anaconda.
Cheers.
For a cooc matrix with dimensionality 1.5 million * 1.5 million I get the following error:
File "build/bdist.linux-x86_64/egg/glove/corpus.py", line 70, in fit
max_map_size)
File "glove/corpus_cython.pyx", line 279, in glove.corpus_cython.construct_cooccurrence_matrix (glove/corpus_cython.cpp:3465)
File "glove/corpus_cython.pyx", line 145, in glove.corpus_cython.Matrix.to_coo (glove/corpus_cython.cpp:2290)
ValueError: negative dimensions are not allowed
It's a bit weird cause the dimensions use integer and int.max >> (1.5 mio)^2.
Hi,
I am using Python 2.7.When I tried to install glove_python using pip install glove_python command, I was asked to download Microsoft Visual C++ compiler.I downloaded the same and again I am facing issue installation.I am attaching the logs.Please help me in installation.
glove_python_error.txt
While this is not an issue with glove-python, it's worth noting that pickling large corpora/models causes the following error:
SystemError: error return without exception set
According to this numpy issue, it is a bug in pickle that has been fixed in Python 3.3.
I think it would be worth pointing that out on the README for future reference.
whether using python setup.py install or pip install glove-python commands from Anaconda prompt, any installation attempt ends in failure with the error 2 below
c:\Anaconda>pip install glove-python
Collecting glove-python
Downloading glove_python-0.1.0.tar.gz (263kB)
100% |################################| 266kB 744kB/s
Requirement already satisfied (use --upgrade to upgrade): numpy in c:\anaconda\l
ib\site-packages (from glove-python)
Requirement already satisfied (use --upgrade to upgrade): scipy in c:\anaconda\l
ib\site-packages (from glove-python)
Building wheels for collected packages: glove-python
Running setup.py bdist_wheel for glove-python ... error
Complete output from command c:\anaconda\python.exe -u -c "import setuptools,
tokenize;__file__='c:\\users\\captain\\appdata\\local\\temp\\pip-build-jiom9g\\g
love-python\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).re
ad().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d c:\users\captain\
appdata\local\temp\tmpbo71fdpip-wheel- --python-tag cp27:
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-2.7
creating build\lib.win-amd64-2.7\glove
copying glove\corpus.py -> build\lib.win-amd64-2.7\glove
copying glove\glove.py -> build\lib.win-amd64-2.7\glove
copying glove\__init__.py -> build\lib.win-amd64-2.7\glove
running build_ext
building 'glove.glove_cython' extension
error: [Error 2] The system cannot find the file specified
Macbook Air
El Capitan 10.11.3
Python 2.7.11
Was able to (apparently) successfully install glove-python. However, when attempting to:
from glove import Corpus, Glove
I receive the following error:
Traceback (most recent call last):
File "build_glove.py", line 15, in
from glove import Glove
File "/Users/tyler/gitlab/parsing/quero/venv/lib/python2.7/site-packages/glove/init.py", line 1, in
from .corpus import Corpus
File "/Users/tyler/gitlab/parsing/quero/venv/lib/python2.7/site-packages/glove/corpus.py", line 10, in
from .corpus_cython import construct_cooccurrence_matrix
ImportError: dlopen(/Users/tyler/gitlab/parsing/quero/venv/lib/python2.7/site-packages/glove/corpus_cython.so, 2): no suitable image found. Did find:
/Users/tyler/gitlab/parsing/quero/venv/lib/python2.7/site-packages/glove/corpus_cython.so: mach-o, but wrong architecture
Any thoughts or comments on the matter? I have yet to find a solution.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.