planeshifter / node-word2vec Goto Github PK
View Code? Open in Web Editor NEWNode.js interface to the Google word2vec tool.
License: Apache License 2.0
Node.js interface to the Google word2vec tool.
License: Apache License 2.0
I download from 'Word2Vec' http://nilc.icmc.usp.br/embeddings
this http://143.107.183.175:22980/download.php?file=embeddings/word2vec/cbow_s50.zip
And try to load
var w2v2 = require( 'word2vec' );
w2v2.loadModel( 'C:\Users\Ranieri\Documents\Projetos\playground\datas.txt', function( error, model ) {
console.log(error)
console.log( model );
});
but nothing happens
Node 8.11
This line here was removing the last item from each vector which made their lengths different which caused a whole bunch of chaos down the line (multiplying by undefined).
I have no idea why it's there, but I did notice that it works for the example vector.txt
in this project. Maybe something to do with \r and \n?
Also I added this:
if(isNaN(words) || isNaN(size)) {
throw new Error("First line of input text file should be <number of words> <length of vector>. See example data 'vectors.txt' in repo");
}
After this line since that caused me a lot of trouble (I don't think it's mentioned anywhere in the readme).
Thanks for the awesome library!
var w2v = require( 'word2vec' );
w2v.word2vec( __dirname + '/input.txt', __dirname + '/output.txt', {
cbow: 1,
size: 200,
window: 8,
negative: 25,
hs: 0,
sample: 1e-4,
threads: 20,
iter: 15,
minCount: 2
});
the example don't seem to work? it only returns Child process exited with code null also output.txt empty
Hello,
It seems there are some files missing for the "npm test":
Do you continue this project?
Thanks,
Thomas
As the title says, my loadModel callback never gets called and no error messages get thrown.
I had to copy model.js from 0.9.2 and make the changes mentioned in #8 to get it working again.
Cheers
As the title says, .mostSimilar()
returns array of objects with word = undefined
and dist = -1
for an out-of-dictionary word. The array is as long as the number of entries requested in the second argument of mostSimilar
. I would expect it to return an empty array or null if the word is not found in the dictionary.
Cheers
Hi there - I am running a very basic example, but something seems not to work. I get a code 126 all the time, but not sure what is happening. Here is my code
w2v = require('word2vec');
w2v.word2phrase( 'in.txt', 'out.txt', {
threshold:100,
debug:2,
minCount: 5
}, done);
function done(data)
{
console.log(data);
}
make --directory=src
'make' is not recognized as an internal or external command,
operable program or batch file.
npm ERR! Windows_NT 10.0.14393
npm ERR! argv "F:\studies\node\node.exe" "F:\studies\node\node_modules\npm\bin\npm-cli.js" "install" "word2vec"
npm ERR! node v6.11.0
npm ERR! npm v3.10.10
npm ERR! code ELIFECYCLE
npm ERR! [email protected] postinstall: make --directory=src
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] postinstall script 'make --directory=src'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the word2vec package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! make --directory=src
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs word2vec
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls word2vec
npm ERR! There is likely additional logging output above.
npm ERR! Please include the following file with any support request:
npm ERR! F:\git repository\npm-debug.log
Are there some pre-computations done which cause this lib to eat up RAM? Because I'm using a 160mb text file of word vectors and the node process is taking up 900mb+ of RAM.
Just wondering whether there is a good reason for this, or whether I should dig about looking for some inefficiencies somewhere.
Thanks
Is it possible to call the word2vec method and pass training data in raw and get the vectors out as raw also ?
The documentation seems to suggest you can only pass a path to the to input data and output data files.
Any suggestions to overcome this ?
Hi there, I trained a model on the google news corpus and was able to successfully create an output to load, however when I use the loadModel function with the output dataset I'm getting a memory overflow from node.
Im running Node v18, linux
I also tried including the flag for memory allocation in the npm run script, and while it seems to run for a longer period before overflowing, it still doesn't complete with up to 12Gb allocated.
I'm running on a 16Gb RAM, but I was wondering if I'm missing an optimization step. The word embeddings file is only 3Gb.
If there's anything that can be done to better utilize memory I feel like it should work, as I was able to train this model on the same machine.
Any help would be much appreciated! Thanks.
Thanks for your effort, but when I try to do a "npm install word2vec", I get an error, as mentioned in the title of the issue:
"clang: error: the clang compiler does not support '-march=native'". I also have a computer with an Apple M1 chip, a MacBook Air (M1, 2020). I was trying to figure it out, but it ended up not being so straightforward for me. Could you please give me a hint to help me solve this? Thanks!
Hello, When I start node, I get this error
Error: spawn ./word2vec ENOENT
at _errnoException (util.js:1024:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:678:11)
at startup (bootstrap_node.js:187:16)
at bootstrap_node.js:608:3
I really do not know how to fix it
Hi,
First of all, thanks for the awesome work!
I am trying to import the pre-trained files from the fasttext repo: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
The model loads without a problem; however, when I try mostSimilar
, the most similar words appear to be numbers:
loadedModel.mostSimilar('hi')
> [ { word: '73301', dist: 0.4461598818767161 },
{ word: '266', dist: 0.44462500361860946 },
{ word: '399', dist: 0.44260747560473973 },
{ word: '-0.13061', dist: 0.4250619904094889 },
{ word: '745', dist: 0.4089746546859616 },
{ word: '7', dist: 0.39388342200258686 },
{ word: '233', dist: 0.38675386429631425 },
{ word: '.33347', dist: 0.38672456155896373 },
{ word: '999', dist: 0.3798941950492955 },
{ word: '.5158', dist: 0.3761412428047805 },
{ word: '4785', dist: 0.3756878374324986 },
{ word: '', dist: 0.3753017613199615 },
{ word: '4091', dist: 0.3728785618174816 },
{ word: '0.18393', dist: 0.3702285209309231 },
{ word: '5', dist: 0.3694416515730196 },
{ word: '', dist: 0.3682340927295216 },
{ word: '2', dist: 0.3682152969462404 },
{ word: '68', dist: 0.36721353813091373 },
{ word: '10285', dist: 0.36564681449501635 },
{ word: '', dist: 0.36526450978156066 },
{ word: '014575', dist: 0.36389461240841203 },
{ word: '468', dist: 0.36371019302454455 },
{ word: '-0.00046764', dist: 0.3637013226972051 },
{ word: '.012665', dist: 0.36367885124101007 },
{ word: '142', dist: 0.3636392745394945 },
{ word: '574', dist: 0.36060934864973193 },
{ word: '0.6865', dist: 0.3602319353978014 },
{ word: '91', dist: 0.357913584485305 },
{ word: '53', dist: 0.35790250493633724 },
{ word: '925', dist: 0.3576282053138198 },
{ word: '1942', dist: 0.35588944804722655 },
{ word: '', dist: 0.3558833583782604 },
{ word: '3', dist: 0.3546257354328858 },
{ word: '-0.059739', dist: 0.3546232535404894 },
{ word: '', dist: 0.35400407472165496 },
{ word: '08', dist: 0.3536348589615367 },
{ word: '093', dist: 0.35353088901048624 },
{ word: '0.11736', dist: 0.3529077373455495 },
{ word: '.12359', dist: 0.3511316591255266 },
{ word: '10224', dist: 0.35079793819829935 } ]
I also tried hello
it says it is out of the dictionary. How can I import the Fasttext files so that this won't happen?
Great implementation! I didn't understand from the description if I have to train the model on an existing corpus or if I can also use the default values to calculate similarity. If so, where do the default values come from then? Thanks!
Tried installing and got this error
distance.c:18:10: fatal error: 'malloc.h' file not found
#include <malloc.h>
^
1 error generated.
make: *** [distance] Error 1
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "word2vec"
npm ERR! node v0.12.2
npm ERR! npm v2.7.4
npm ERR! code ELIFECYCLE
npm ERR! [email protected] postinstall: `make --directory=src`
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] postinstall script 'make --directory=src'.
npm ERR! This is most likely a problem with the word2vec package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! make --directory=src
npm ERR! You can get their info via:
npm ERR! npm owner ls word2vec
npm ERR! There is likely additional logging output above.
Thank you very much for the node-port of w2v. Very helpful and very much appreciated!
A minor quirk I just stumbled upon: The similarity between two equal terms results in null rather than 1. I checked the same model in gensim and there I get a similarity of 1.0 as expected.
Why is that?
Given the example provided.
vector('king') - vector('man') + vector('woman')
It would be nice to be able to pass in the raw vector and get the word.
When I ran:
const w2v = require( 'word2vec' );
const corpusFilePath = 'cleared_words.txt';
w2v.word2vec(corpusFilePath, "vectors.txt", { size: 300 }, () => {
console.log("DONE");
});
It returns:
Child process exited with code 126
DONE
leaving vectors.txt unchanged. Is this because I'm running this on MacOS?
function is written as .word2phrases() but it is .word2phrase()
I tried to install this package on windows and it failed with following error:
npm ERR! [email protected] postinstall: `make --directory=src`
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] postinstall script 'make --directory=src'.
npm ERR! Make sure you have the latest version of node.js and npm installed.
npm ERR! If you do, this is most likely a problem with the word2vec package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR! make --directory=src
npm ERR! You can get information on how to open an issue for this project with:
npm ERR! npm bugs word2vec
npm ERR! Or if that isn't available, you can get their info via:
npm ERR! npm owner ls word2vec
npm ERR! There is likely additional logging output above.
npm ERR! Please include the following file with any support request:
npm ERR! C:\rbws\learning\nlp\projects\nlpnode\npm-debug.log
Hi Philipp,
I downloaded from https://code.google.com/p/word2vec/ the file GoogleNews-vectors-negative300.bin.gz
w2v = require('word2vec');
{ word2vec: [Function: word2vec],
word2phrase: [Function: word2phrase],
loadModel: [Function: loadModel],
WordVector: [Function: WordVector] }w2v.loadModel("/home/marco/crawlscrape/bashUtilitiesDir/GoogleNews-vectors-negative300.bin", function(err, model) {
... console.log(model);
... });
undefined
TypeError: Cannot read property 'length' of undefined
at /home/marco/node_modules/word2vec/lib/model.js:408:30
at FSReqWrap.wrapper as oncompletew2v.loadModel("/home/marco/crawlscrape/bashUtilitiesDir/GoogleNews-vectors-negative300.bin", function(err, model) {
... console.log(model);
... });
undefined
TypeError: undefined is not a function
at readOne (/home/marco/node_modules/word2vec/lib/model.js:433:55)
at FSReqWrap.wrapper as oncomplete
What do I have to do in order to successfully load the GoogleNews-vectors-negative300 model?
Looking forward to your kind help.
Marco
warning "@tensorflow/tfjs > @tensorflow/[email protected]" has unmet peer dependency "seedrandom@^3.0.5".
[4/4] Building fresh packages...
error C:\Users\lenov\Desktop\apps\test\s\a\node_modules\word2vec: Command failed.
Exit code: 1
Command: make --directory=src
Arguments:
Directory: C:\Users\lenov\Desktop\apps\test\s\a\node_modules\word2vec
Output:
'make' is not recognized as an internal or external command,
operable program or batch file.
info Visit https://yarnpkg.com/en/docs/cli/add for documentation about this command.
Hello dev guys! :)
Would appreciate your help with install failure. I got GnuWin32 installed and path is specified in system and user path settings. Still I got this error.
`C:\Users\kolyk\WebstormProjects\whislabackend>npm install word2vec
> [email protected] postinstall C:\Users\kolyk\WebstormProjects\whislabackend\node_modules\word2vec
> make --directory=src
make: Entering directory `C:/Users/kolyk/WebstormProjects/whislabackend/node_modules/word2vec/src'
gcc word2vec.c -o word2vec -lm -pthread -O3 -march=native -Wall -funroll-loops -Wno-unused-result -fno-stack-protector
process_begin: CreateProcess(NULL, gcc word2vec.c -o word2vec -lm -pthread -O3 -march=native -Wall -funroll-loops -Wno-unused-result -fno-stack-protector, ...) failed.
make (e=2): The system cannot find the file specified.
make: *** [word2vec] Error 2
make: Leaving directory `C:/Users/kolyk/WebstormProjects/whislabackend/node_modules/word2vec/src'
npm ERR! code ELIFECYCLE
npm ERR! errno 2
npm ERR! [email protected] postinstall: `make --directory=src`
npm ERR! Exit status 2
npm ERR!
npm ERR! Failed at the [email protected] postinstall script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR! C:\Users\kolyk\AppData\Roaming\npm-cache\_logs\2017-08-20T14_11_30_352Z-debug.log
C:\Users\kolyk\WebstormProjects\whislabackend>
Any ideas?
Thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.