Giter Site home page Giter Site logo

richtr / guesslanguage.js Goto Github PK

View Code? Open in Web Editor NEW
211.0 211.0 39.0 626 KB

A natural language detection library based on trigram statistical analysis for Node.js and the Web.

Home Page: http://richtr.github.com/guessLanguage.js/

Makefile 0.05% JavaScript 98.59% HTML 1.36%

guesslanguage.js's People

Contributors

autowp avatar richtr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guesslanguage.js's Issues

Eample code not working.

In example HTML, if i remove the entire text and add something else or the same text it always return unknown... which makes me think that this plugin doesn't work at all?

document trigram building process

It would be nice if the trigram building process was documented. For example, the German data contains strings like didiescheincheichdenin which doesn't look like a trigram. Also, is it known what was used as input originally?

detection of zh_TW and zh_CN

This text, copied from http://www.gov.tw is detected as being zh.
駕駛執照替代役水質補(捐)助資訊出國疫苗地價稅牌照稅退休金技能檢定生育補助交通事故天災事變生活扶助健康檢查消費者身心障礙健康保險勞工保險教育補助
It would be nice if the detection was more specific (zh-TW).
The documentation claims to detect "zh": Chinese and "zh-TW" Chinese (Taiwan).

Source of language datasets

Where is the source text dataset for the Ngrams of those 100 languages? Would like to see if it is different from wooorm/franc#78 usage of UDHR, and if it is more accurate than them.

Strange exports of module in node

After install with NPM npm install guesslanguage
I have strange behaviour when importing.
Seems like the module exports a JS object so i have to duplicate and write guessLanguage.guessLanguage.detect

here an example in coffeescript

guessLanguage = require 'guesslanguage'
guessLanguage.guessLanguage.detect text, (language) ->
        console.log "language is : "+language

Licensing and the Quail project

We are discussing using guessLanguage in the Quail accessibility project here:

quailjs/quail#58

One issue is that of license incompatibility, since the project is MIT. We are really interested in using your existing trigram database, as there were some edge cases we have to capture with our own language-guessing project. Would you be amenable to us including the trigram database in the project, or discuss other options?

Wrong Classification for an English Text

I tried this text in the demo page
aim at your goals, how many words do you want I can't understand?????

And the result is:
Detected language of provided text is Afrikaans [af].

Norsk will never match

Norsk (Norwegian) is listed in the trigram database as "nb" (Bokmål), but as "no" (Both Bokmål and Nynorsk) in the guessLanguage.js file.

A fix would be to rename the property in the trigram database ("nb" to "no"), but that might result in incorrect results, although the difference between Bokmål and Nynorsk is not that big for the trigram. A better fix would probably be (without changing the database) to classify as "nb" instead of "no".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.