Comments (4)
I have used a maxLength of 4, it helped my detection results quite a bit (together with setting maxNgrams to 9000).
The library also "only" includes the Universal Declaration of Human Rights as the base for the language detection, so many common phrases and words will be missing (like country names, idioms, greetings, specific nouns). Depending on the usage it makes sense to add additional words/sentences to the language samples, so they can factor in. I did it for my use case (analyzing emails) and it worked very well.
from language-detection.
You have to know that english is very similiar to other languages and the sentences you choosed are not very long, this two things make it very hard to detect the right language. On the other side this library is more or less optimized on speed. If you want to improve the detection phase please have a look at here. Where I described the procedure how to improve the detection.
Maybe I should increase the detection by default a little bit?
If you need any further help let me know. :)
from language-detection.
What about setting maxLength? Does a bigger number improve detection?
from language-detection.
The length of ngrams does not improve the detection much. Maybe setting the maxLength to 4 or 5 will make a difference but I wouldn't go further otherwise it's getting even worse.
Update: If you want to improve the detection be sure to set the maxNgrams to a value greater than 300. A good value would be 3000. For more information look here.
from language-detection.
Related Issues (20)
- Support for Kazakh language
- How can the library detect the wrong language on such simple text? HOT 1
- the word "LOL" is not an english word ? HOT 1
- Compatible for PHP 8 HOT 2
- Language detection with php 5.6 HOT 4
- The detected languages seem wrong very often HOT 2
- where is project amdvbflash? HOT 2
- English text recognition HOT 2
- Feature Request - Min language's values
- Detection of english string does not work correctly HOT 8
- What's the right way of checking whether or not the text is in a specific language? HOT 2
- Deprecation notice with PHP 8.1 HOT 4
- What dataset? HOT 2
- Unable to detect Chinese if there is only 1 character HOT 1
- Incorrect language is being returned for specific words
- Testing
- How can I add a new language?
- Can you recommend any article data to train better? The default data is too small
- Is there any way to get the full name of the language along with the language code?
- "ia" language?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from language-detection.