Giter Site home page Giter Site logo

franc's Introduction

franc

Build Status Coverage Status Code Climate

Detect the language of text.

What’s so cool about franc?

  1. franc supports more languages(†) than any other library, or Google;
  2. franc is easily forked to support 339 languages;
  3. franc is just as fast as the competition.

† - If humans write in the language, on the web, and the language has more than one million speakers, franc detects it.

Installation

npm:

npm install franc

franc is also available pre-built as an AMD, CommonJS, and globals module, supporting 75, 176, and 339 languages.

Usage

var franc = require('franc');

franc('Alle menslike wesens word vry'); // "afr"
franc('এটি একটি ভাষা একক IBM স্ক্রিপ্ট'); // "ben"
franc('Alle mennesker er født frie og'); // "nno"
franc(''); // "und"

franc.all('O Brasil caiu 26 posições');
/*
 * [
 *  [ 'por', 1 ],
 *  [ 'src', 0.8948665297741273 ],
 *  [ 'glg', 0.8862422997946612 ],
 *  [ 'snn', 0.8804928131416838 ],
 *  [ 'bos', 0.8394250513347022 ],
 *  [ 'hrv', 0.8336755646817249 ],
 *  [ 'lav', 0.833264887063655 ],
 *  [ 'cat', 0.8303901437371664 ],
 *  [ 'spa', 0.8242299794661191 ],
 *  [ 'bam', 0.8242299794661191 ],
 *  [ 'sco', 0.8069815195071869 ],
 *  [ 'rmy', 0.7839835728952772 ],
 *   ...
 * ]
 */

/* "und" is returned for too-short input: */
franc('the'); // 'und'

/* You can change what’s too short (default: 10): */
franc('the', {'minLength': 3}); // 'sco'

/* Provide a whitelist: */
franc.all('O Brasil caiu 26 posições', {
    'whitelist' : ['por', 'src', 'glg', 'spa']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'src', 0.8948665297741273 ],
 *   [ 'glg', 0.8862422997946612 ],
 *   [ 'spa', 0.8242299794661191 ]
 * ]
*/

/* Provide a blacklist: */
franc.all('O Brasil caiu 26 posições', {
    'blacklist' : ['src', 'glg', 'lav']
});
/*
 * [
 *   [ 'por', 1 ],
 *   [ 'snn', 0.8804928131416838 ],
 *   [ 'bos', 0.8394250513347022 ],
 *   [ 'hrv', 0.8336755646817249 ],
 *   [ 'cat', 0.8303901437371664 ],
 *   [ 'spa', 0.8242299794661191 ],
 *   [ 'bam', 0.8242299794661191 ],
 *   [ 'sco', 0.8069815195071869 ],
 *   [ 'rmy', 0.7839835728952772 ],
 *   ...
 * ]
 */

CLI

Install:

npm install --global franc

Use:

Usage: franc [options] <string>

Detect the language of text

Options:

  -h, --help                    output usage information
  -v, --version                 output version number
  -m, --min-length <number>     minimum length to accept
  -w, --whitelist <string>      allow languages
  -b, --blacklist <string>      disallow languages

Usage:

# output language
$ franc "Alle menslike wesens word vry"
# afr

# output language from stdin (expects utf8)
$ echo "এটি একটি ভাষা একক IBM স্ক্রিপ্ট" | franc
# ben

# blacklist certain languages
$ franc --blacklist por,glg "O Brasil caiu 26 posições"
# src

# output language from stdin with whitelist
$ echo "Alle mennesker er født frie og" | franc --whitelist nob,dan
# nob

Supported languages

franc supports 176 “languages”, by default. For a complete list, check out supported-languages.md.

Supporting more or less languages

Supporting more or less languages is easy: fork the project and run the following:

npm install # Install development dependencies.
export THRESHOLD=100000 # Set minimum speakers to a 100,000.
npm run build # Run the `build` script.

The above would create a version of franc with support for any language with 100,000 or more speakers. To support all languages, even dead ones like Latin, specify -1.

Browser

I’ve compiled three versions of franc for use in the browser. They’re UMD compliant: they work with AMD, CommonJS, and <script>s.

  • franc.jsfranc with support for languages with 8 million or more speakers (75 languages);

  • franc-most.jsfranc with support for languages with 1 million or more speakers (175 languages, the same as the npm version);

  • franc-all.jsfranc with support for all languages (339 languages, carful, huge!).

Derivation

Franc is a derivative work from guess-language (Python, LGPL), guesslanguage (C++, LGPL), and Language::Guess (Perl, GPL). Their creators granted me the rights to distribute franc under the MIT license: respectively, Maciej Ceglowski, Jacob R. Rideout, and Kent S. Johnson.

License

MIT © Titus Wormer

franc's People

Contributors

wooorm avatar kamilbielawski avatar pandrewhk avatar dsblv avatar jeffhuys avatar

Watchers

James Cloos avatar Mike Vegeto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.