Comments (5)
Thanks, I'll have to give that a try and share some rough results here. I do think it would be nice/useful to present such stats in the official benchmark comparisons as there's no way to know what "noticeably slower" means. I know that Fasttext and cld2 tend to be exceptionally fast, so perhaps noticeably slower is still quite acceptable. But if it's a difference of 0.001s vs 1s, then obviously that's a problem.
from lingua-py.
In chapter 9.5 of the README it says:
Lingua's high detection accuracy comes at the cost of being noticeably slower than other language detectors.
The statistical models in Lingua are larger than those of similar libraries. So querying them takes more time.
There is a benchmark script in this repo which gives you a clue how performant the library is. You can run it locally with poetry:
poetry run python3 scripts/benchmark.py
from lingua-py.
@nickchomey I'm relatively new to this repo but it has more languages than the translation repo I have been using. Could help test and show an "output chart" or help craft then submit a PR for this, so I'm willing to collab with you to look at a few options to generate the stats.
from lingua-py.
@datatalking this isn't a focus for me at the moment and probably won't be for at least a few months, so Im not able to collaborate on anything. But if you have time and desire to do so, that would be great!
from lingua-py.
Performance metrics are now provided in the README.
from lingua-py.
Related Issues (20)
- Pinned version of Numpy dependency
- support for romanized Indian languages?
- Add fastspell to the comparison HOT 6
- Performance comparison plots are not accessible HOT 2
- `compute_language_confidence_values_in_parallel` crashes with big dataset HOT 11
- Offsets incorrect HOT 4
- Can you release the Python source code?
- Language recognition fails for programming language code HOT 8
- Add type annotations to v2.x HOT 3
- Convert language to ISO 639-1 language code HOT 4
- TypeError: cannot pickle `Language` object with v2.0.1 HOT 4
- ISO Codes
- CHINESE detect error HOT 5
- Crash on particular emoji with detect_multiple_languages HOT 3
- detect_multiple_languages_of crashes on Arabic HOT 1
- Yanked versions HOT 4
- Readme file too long for Azure Artifacts HOT 6
- Add v2+ support for Alpine Linux by providing `musllinux` wheels HOT 5
- High-confidence false detections on text from webpages that contain many languages HOT 1
- detect_multiple_languages_of() does not work at all for mixed English, Chinese and Japanese HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lingua-py.