Comments (5)
@maximium Thanks for bringing this to our attention and for the PR.
@Nick-S-2018 pls review the PR.
from manticoresearch.
@maximium The reason why Thai chars are currently in 'cjk' is that our 'cjk' charset, in fact, comprises the languages which don't use spaces between words. We've done that to help users to deal with setting ngram_chars for such languages.
@sanikolaev It appears that the optimal solution is to move the Thai script to a separate charset that can be used along with cjk and non_cjk.
from manticoresearch.
@sanikolaev It appears that the optimal solution is to move the Thai script to a separate charset that can be used along with cjk and non_cjk.
Indeed. The problem with having thai in cjk is tokenizing every letter as a word, which is not correct and gives a lot of irrelevant results. The only way I see to handle thai text is to split text to separate words in app before indexing with some dictionary based tokenizer. And separate Thai charset fits perfectly for this.
from manticoresearch.
Related Issues (20)
- OPTIMIZE fails, then it says no such table (rt index) HOT 10
- `range` filter produced random order results even with `order by` condition HOT 1
- `sort` is not respected by manticoresearch
- Improve RELOAD ... switchover=1
- Protected tables in replication of Galera
- Treating document IDs as numbers in /search responses HOT 4
- Daemon crashes on _search request from Kibana HOT 1
- Fix json escaping for mysqldump HOT 2
- Make it possible to heal from dupes in a disk chunk HOT 3
- Search by JSON keys works incorrectly HOT 1
- Possible issue with Buddy on Windows HOT 6
- buddy can not work at Windows if listen is 127.0.0.1 HOT 5
- WARNING: wordlist size mismatch (size=18, checkpoints=0) HOT 3
- Crash if I use an SQL with a reserved word HOT 4
- Escape is not working for HTTP and SQL HOT 4
- IDF calculation issue
- Extended quote and double quote functionality
- Show correct data types in /cli_json HOT 2
- Escaping in wordforms and exceptions
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from manticoresearch.