Giter Site home page Giter Site logo

loretoparisi / fasttext.js Goto Github PK

View Code? Open in Web Editor NEW
186.0 8.0 28.0 3.39 MB

FastText for Node.js

License: MIT License

JavaScript 98.16% Shell 0.84% Dockerfile 0.10% Python 0.40% Makefile 0.49%
deeplearning word2vec machinelearning text-classification javascript nodejs neural-networks language-detection word-embeddings fasttext

fasttext.js's People

Contributors

j-waters avatar kanghyeongmin avatar liquid36 avatar loretoparisi avatar microaijp avatar tk-pietsch avatar woozyking avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fasttext.js's Issues

Executable not found in path on Windows

Hi !

Node version: 12.13.1
OS: Windows 10

I'm trying to create a simple fastText model for text classification, but the lib is not working fine on Windows. As soon as I add this code:

const fastText = new FastText({
      serializeTo: './models/fastText',
      trainFile: './trainingData.txt'
});

I have the error:

Error: executable not found in path
    at new FastText (C:\Users\madelpech\[...]\node_modules\fasttext.js\lib\index.js:104:23)

I have looked to the code and in the index.js file line 99 you are calling this._options.bin = Util.GetBinFolder('fasttext');. I have checked the binaries and it appears that the Windows binary is not named fasttext but fastText.exe instead. This should be the issue here.

Can you have a look and tell me if I am right?

Thanks!

Node installing problem

hello :)

once i want to run the server file:
node server.js i got this error:


events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at _errnoException (util.js:1022:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:880:14)

Please Advice
node -v
v8.11.1
+++++++++++++++++++
npm -v
5.6.0


fasttext.js
Current

Trying to get in touch regarding a security issue

Hey there!

I'd like to report a security issue but cannot find contact instructions on your repository.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

I cannot run this project on apple M1 silicon

This is Train.js

Fe58Nh

This is 100_normal_stemming_train.tsv

LxVMbZ

This is the running result

Jyk2Ii

When I try to test the dataset with the model 100_normal_stemming.bin training by fasttext python package, it shows as blow

rhqjKi

Time for scale up, fastText.Js with Redis and Clustering mode

Node.js Cluster Module
Node.js has implementation the core cluster modules, that allowing applications to run on more than one core.

Cluster module a parent/master process can be forked in any number of child/worker processes and communicate with them sending messages via IPC communication.

Moreover if we support caching process, by Redis which is more complex version of Memcached.

Redis always served and modified data in the server’s main memory. The impact is system will quickly retrieve data that will be needed. It also reduces time to open web pages and make your site faster. Redis works to help and improve load performance

Using Redis we can store cache using SET and GET, besides that redis also can work with complex type data like Lists, Sets, ordered data structures, and so forth.

And we can work with Node.js clustering with PM2

PM2 is a production process manager for Node.js applications with a built-in load balancer. It allows you to keep applications alive forever, to reload them without downtime and to facilitate common system admin tasks. One of its nicer features is an automatic use of Node’s Cluster API. PM2 gives your application the ability to be run as multiple processes, without any code modifications.

I hope this may interest you to add this wonderful methods to this wonderful API

Source : https://goo.gl/MMDe3m

async/await version

This would be easy to do with promisify in newer versions of Node.

const {promisify} = require('util');
exports.exists = promisify(train);
...

Personally I far prefer async/await to the callback syntax.

ERROR: Cannot run server

Steps to reproduce

  1. Clone project
  2. Run train
    $ node examples/train.js
  3. Run server
    $ node examples/server.js

Expected behaviour

Server starting with below message

model loaded
server is listening on 3000

Actual behaviour

Fail to start server with below message

events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at exports._errnoException (util.js:1024:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:851:14)

I wll fix and make PR

bug in FastText.prototype.nn()

fasttext.js/lib/index.js

FastText.prototype.nn = function (data) {
: 
  self.dataAppendCallback = onDataCallback;
}

FastText.prototype.nn = function (data) {
: 
  self.dataAppendCallback = onDataAppendCallback;
}

Move repo to a GitHub *org*

Have you considered moving to an org repo?

I have github.com/fasttext. We can do steps like this:

  1. I can make you admin.

  2. I remove myself as admin.

  3. You can transfer this repo there.

  4. You can rename this repo js.

Then it will be at github.com/fasttext/js.

Other related tools can go in other repos, for example fastText in an AWS Lambda.

npm?

Firstly bravo and thank you for this

I think for this lib to gain momentum, which will ultimately benefit all users, it will be good to put it on npm.

Is it in the plans? Is there any reason not to?

ERROR: Cannot install via Docker

Steps to reproduce

  1. Clone project
  2. Build the docker image
    $ docker build -t fasttext.js .
  3. Run server by docker
    $ docker run --rm -it -p 3000:3000 fasttext.js node fasttext.js/examples/server.js

Expected behaviour

Server starting on docker with below message

model loaded
server is listening on 3000

Actual behaviour

Fail to start server on docker with below message

events.js:183
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at _errnoException (util.js:1022:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:880:14)

Fail to load WASM error

Heya, I am currently facing this error when calling FastText.loadWASM():

TypeError: Failed to parse URL from /home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.wasm
/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:230
      throw ex;
      ^

RuntimeError: abort(TypeError: Failed to parse URL from /home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.wasm) at Error
    at jsStackTrace (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1937:19)
    at stackTrace (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1954:16)
    at process.abort (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1653:44)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
    at process.abort (/home/brendan/internal-sourcing-tool/node_modules/fasttext.js/lib/wasm/fasttext_wasm.js:1659:11)
    at process.emit (node:events:513:28)
    at emit (node:internal/process/promises:149:20)
    at processPromiseRejections (node:internal/process/promises:283:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)

Node.js v18.15.0

Below is my code snippet, following the example given in this issue:

async function main() {
    const modelPath = path.resolve(__dirname, "../model");
    console.log(modelPath + `/300-dim-10-epoch.bin`);
    let FastText = require("fasttext.js");
    const ft =  new FastText({
        loadModel: modelPath + `/300-dim-10-epoch.bin`
    })
    try {
        await ft.loadWASM();
        const vec = ft.getWordVector("hello");
        console.log(vec);
    }catch(err){
        console.log(err);
    }
}

main();

There is no issues with my model directory and I tried running the snippet via the npm install package and git clone installation method.

Is there any plan for StarSpace :)

Dear loretoparisi,

I loved your FastText project , and i believe you aim on StarSpace and load it into memory :) Is there any plan to simplify StarSpace , since i feel it is not clear as fastText. There is no clear docs and examples

So , i have a big faith in your style , and i hope to see a step form you in this regard

With Respect

Is it library outdated?

looks that last commit was in the last year and I am unable to run this library. My node version 21.5.0, Windows 10, and anything I try gives me the error:

TypeError: fetch failed
C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:230
      throw ex;
      ^

RuntimeError: abort(TypeError: fetch failed) at Error
    at jsStackTrace (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1937:19)
    at stackTrace (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1954:16)
    at process.abort (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1653:44)
    at process.emit (node:events:519:28)
    at emit (node:internal/process/promises:150:20)
    at processPromiseRejections (node:internal/process/promises:284:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)
    at process.abort (C:\Users\d.malugin\Desktop\2Di\tgCrowler\node_modules\fasttext.js\lib\wasm\fasttext_wasm.js:1659:11)
    at process.emit (node:events:519:28)
    at emit (node:internal/process/promises:150:20)
    at processPromiseRejections (node:internal/process/promises:284:27)
    at process.processTicksAndRejections (node:internal/process/task_queues:96:32)

So the question is: This library is still supporting?

Load into Memory

Dear @loretoparisi
I installed your fasttext.js in order to solve memory problem that we discus about in facebookresearch/fastText#276 (comment)

Now when i run :
node fasttext_predict.js
it take like 5 sec to load the module,

"use strict";

(function() {

var DATA_ROOT='./data';

var FastText = require('./fasttext.js/lib/index');
var fastText = new FastText({
    loadModel: DATA_ROOT + '/model_gender.bin' // must specifiy filename and ext
});

var sample="Bashar Al Masri";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="Hisahm al mjude";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
   fastText.unload();
})
.catch(error => {
    console.error("predict error",error);
});

}).call(this);

and It return to stdout the prediction and exit , due to fastText.unload();
Now i need to call this file "node fasttext_predict.js UserName" from any place passing some args [UserName] to it and return to the stdout the result directly , since you saide it will be loaded into memory , in order to be able to get this result from the php webserver.

It is the same problem with the C++ file loading , i need it to be run in the background !

Question : usage of pretrainedVectors?

when i'm trying to train a text classifier using pretrainedVectors: ./wiki.da.vec' resulting in model having size 2.79 GB and taking more time than usual.

var fastText = new FastText({
        serializeTo: './output_model',
        trainFile: './input.txt',
        debug: true,
        train: {
            dim: 300,
            pretrainedVectors: './wiki.da.vec'
        }
    });

Did the same with fasttext python library
model = fasttext.train_supervised('input.txt', dim=300, pretrainedVectors='wiki.da.vec',verbose=4)

And getting model with size 300+ MB and giving proper classification.

Did i miss anything while using fasttext.js ?

Using in the browser with WASM

Are the uploaded binaries for WASM usable in the browser? I tried following https://fasttext.cc/docs/en/webassembly-module.html#build-a-webpage-that-uses-fasttext to load fasttext using the provided binaries but I got import errors:

<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no">
</head>
<body>
    <script type="module">
        import {FastText, addOnPostRun} from "./fasttext.js";

        addOnPostRun(() => {
            let ft = new FastText();
            console.log(ft);
        });

    </script>
</body>
</html>

Uncaught SyntaxError: import not found: addOnPostRun in the browser.

Different prediction for the same keyword same model

Dear author.

Thank you for this wonderful node add on.
I have strange problem

When i test directe predictions with fasttext i mean without node, i have no problem.

But when i pass same keyword to the node server, each time i have different label different accuracy.

wget -qO- http://local
host:3030/?text=beshoo

Each time i send this url i have different label.

Regards

INFO: Could not find files for the given pattern(s).

I'm trying to run the module on Windows. When I try to initialize it like so:

const FastText = require('fasttext.js');

const fastText = new FastText({
    serializeTo: './model',
    trainFile: './train.txt'
});

I receive the error: INFO: Could not find files for the given pattern(s)..
Yes, I definitely have a train.txt file in the same folder as index.js (where I execute this code). I even tried using path.join(__dirname, '/train.txt'), however I got the same result.

calculate distance feature?

maybe add an api to calculate vector distance

fasttext print-word-vectors trainresult.bin < queries.txt

住宅 -0.3543 -0.36086 -0.1972 -0.48346 -0.4279 0.084653 -0.74038 -0.77876 -0.69068 -0.42149 0.41304 0.9636 -0.11907 -0.081701 0.27681 -0.15278 -0.17322 -0.27368 -0.69611 0.42335 0.11701 -0.43995 0.1868 0.38824 0.42387 0.46397 0.38974 -0.59129 0.69363 0.26292 -0.36955 -0.27438 1.0732 0.0046569 -0.39709 0.44935 0.67039 -0.39564 -0.080179 0.0036072 -0.48187 -0.66577 0.27598 -0.54607 1.0294 -0.29769 0.52144 -0.044384 0.15926 -1.0104 0.80332 -0.60356 0.40641 -0.039965 0.41868 -0.0072699 0.069652 -0.12544 -0.30716 0.21804 -0.36222 -0.51133 -0.24029 -0.7333 0.26404 -0.30949 -0.17224 -0.52331 -1.1139 -0.26803 0.4566 0.28051 -0.50781 0.26043 0.11501 0.17622 -0.1344 -0.46 0.00035005 0.13337 0.50925 -0.82658 0.32135 -0.33323 0.75423 -0.60863 0.42117 0.35665 -0.17826 -0.82987 0.53353 -0.12717 -0.46963 0.15568 0.4642 -0.16868 -0.18377 0.65137 -0.0067536 1.4116

别墅 0.00094935 0.0073073 -0.00094808 -0.0010876 0.0012463 0.0014312 -0.0026107 0.0041731 0.0024454 -0.00093893 0.0045996 0.00050681 -0.00040101 0.0015428 0.0065499 -0.0007207 -0.0022505 -0.0046939 0.0039677 0.0047148 -0.0031379 0.0042863 -0.0056759 -0.0031934 0.0037867 0.006272 0.0050499 -0.0022674 0.0062237 0.00062629 0.0033722 -0.0027245 0.0016423 -0.0037467 -0.00014838 -0.0048198 0.0043823 0.002268 -0.00093589 -0.0034395 -0.0021894 0.0013966 -0.0010953 -0.00073448 0.0012601 0.00037782 -0.0012559 -0.00079777 0.0022461 -0.00085852 -0.001242 0.0039883 0.0017836 -0.00036524 -0.0013768 -0.0036831 0.0023176 0.0027225 0.0010305 0.0020299 0.00057907 -3.4135e-05 0.0029027 -0.00064469 7.3418e-05 -0.0051284 -0.0001829 -0.004983 -0.0024 -0.002313 -2.4026e-05 0.0068082 -0.0062092 0.0045259 -0.0023891 0.0015408 0.00077602 -0.0024638 0.0056508 0.0036942 -0.00089141 -0.0031128 -0.0040772 -0.00063497 -0.006542 -0.0016326 0.002223 -0.0040703 -3.8115e-05 -0.0020506 -0.003437 0.0037226 -0.0062743 0.00098213 0.00030893 0.0013302 -0.002533 0.0038249 -0.0050515 0.0025223

npm module: loadSentence is not a function

Given a script, after installing fasttext.js with npm:

import FastText from 'fasttext.js';

const ft = new FastText({
        loadModel: '/wiki.simple.vec'
});
console.log(ft);
ft.loadSentence();

My output is

FastText {
  samplesCallbacks: {},
  dataCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  dataErrorCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  onExitCallbacks: Deque { _capacity: 16, _length: 0, _front: 0 },
  dataAppendCallback: null,
  onErrorDataAppendCallback: null,
  _options: {
    bin: '.../node_modules/fasttext.js/lib/bin/darwin/fasttext',
    child: { detached: false },
    debug: false,
    preprocess: true,
    trainFile: '',
    testFile: '',
    serializeTo: '',
    loadModel: '/wiki.simple.vec',
    train: {
      wordNgrams: 2,
      minCount: 1,
      minCountLabel: 1,
      minn: 3,
      maxn: 6,
      t: 0.0001,
      bucket: 10000000,
      dim: 10,
      lr: 0.1,
      ws: 5,
      loss: 'ns',
      lrUpdateRate: 100,
      epoch: 5,
      thread: 5
    },
    trainIncremental: false,
    test: { precisionRecall: 1, verbosity: 2 },
    predict: { mostlikely: 2, verbosity: 2, normalize: true }
  },
  exec: [Function (anonymous)],
  send: [Function (anonymous)],
  sendEOF: [Function (anonymous)],
  learn: [Function (anonymous)]
}
file:///.../test.js:7
ft.loadSentence();
   ^

TypeError: ft.loadSentence is not a function
    at file:///.../test.js:7:4
    at ModuleJob.run (node:internal/modules/esm/module_job:197:25)
    at async Promise.all (index 0)
    at async ESMLoader.import (node:internal/modules/esm/loader:337:24)
    at async loadESM (node:internal/process/esm_loader:88:5)
    at async handleMainPromise (node:internal/modules/run_main:61:12)

I do see that loadSentence is added to the prototype in the lib/index.js file in github but it's not in the node_modules version

image

Nearest Neighbor sometimes returns only 1 result

Using nearest neighbor, every once in a while locally (and always when hosted on beanstalk with docker), the result set will only contain a single result. Running the same request again may return the full result set.

I identified the issue as this line self.dataAppendCallback = onDataCallback; where it should be onAppendDataCallback instead.

I cannot do a pull request at the moment, but may be able to in the future.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.