Giter Site home page Giter Site logo

spacy-fastlang's Introduction

spacy_fastlang

Install

Assuming you have a working python environment, you can simply install it using

pip install spacy_fastlang

Usage

The library exports a pipeline component called language_detector that will set two spacy extensions

  • doc._.language = ISO code of the detected language or xx as a fallback
  • doc._.language_score = confidence
import spacy_fastlang  # noqa: F401 # pylint: disable=unused-import
nlp = spacy.load("...")
nlp.add_pipe("language_detector")
doc = nlp(en_text)

doc._.language == "..."
doc._.language_score >= ...

Options

Check the tests to see more examples and available options

License

Everythin is under MIT except the default model which is distributed under Creative Commons Attribution-Share-Alike License 3.0 by facebook here

spacy-fastlang's People

Contributors

magsen avatar magsenabbethales avatar thomasthiebaud avatar tteodoro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spacy-fastlang's Issues

french number not detected as french

french word for number are not detected as 'fr' language
small test program below yield result:

Text: ceci est du francais LANG: fr SCORE: 0.8692260980606079
Text: donnez quarante cinq cinquante euros LANG: xx SCORE: 0.44422411918640137
Text: donnez 45 50 euros LANG: en SCORE: 0.20330046117305756

import spacy
import spacy_fastlang
nlp_lang = spacy.blank('xx')
nlp_lang.add_pipe('language_detector', config={"supported_languages": ["fr","en"]})

texts = ["ceci est du francais","donnez quarante cinq cinquante euros", "donnez 45 50 euros"]

for t in texts:
    doc = nlp_lang(t)
    print( "Text: {} LANG: {} SCORE: {}".format( t, doc._.language, doc._.language_score))

Lower spacy version requirements

Currently the required spaCy version is set to 3.6.0 or higher. However, AFAIK the LanguageDetector does not use any spaCy features specific to that version so the version constraint can safely be relaxed to a lower version (probably >= 3.0.0)

I personally tried it with spaCy 3.4.4 and it worked without any issues

Not good at detecting english?

Hi, I tried this against youtube transcripts generated by Whisper, and I found that it doesn't do a good job of predicting english.

Is this expected behavior? Is it better at some languages than others?

Thanks for any help, Kastan

CleanShot 2022-11-16 at 21 02 37

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.