Comments (2)
这个结构以utf16为码表,不适合储存大词典。汉字的Unicode区间为0x4E00--0x9FA5,比较分散。你可以尝试用字节做码表。
from ahocorasickdoublearraytrie.
Compared with hashmap, DAT consumes less memory. However, hashmap of 100000000 docs can be build in memory, while DAT with 10000000 docs leads to OOM?
from ahocorasickdoublearraytrie.
Related Issues (20)
- 匹配的结果冗余太多,需要二次过滤
- OOM when building dat/acdat. Compared with hashmap, DAT consumes less memory. Why hashmap of 100000000 docs can be build, while DAT with 10000000 docs leads to OOM?
- 存储代码是否有问题
- Thread safe HOT 2
- parse when building? HOT 1
- ArrayIndexOutOfBoundsException, when build with an empty map HOT 1
- 有python封装吗 HOT 1
- 词条数超过100万时,报 Requested array size exceeds VM limit
- Inspiration in newer papers about double array tries HOT 5
- parseText throw ArrayIndexOutOfBoundsException when build with an empty map HOT 2
- 子类继承 AhoCorasickDoubleArrayTrie 实现匹配时忽略特定字符 HOT 2
- 没那么快呢? HOT 2
- Publish PGP key ID.
- 调整 matche 类方法参数中的 String 为 CharSequence HOT 1
- 匹配结果包含空字符串
- 在哪里可以下载到1.2.3版本的源码包 HOT 3
- 在文章中如何自动替换命中的单词
- Feature request: 请问能否移植 hanlp 中的 parseLongestText 方法到该库?
- Push new release containg support for JPMS
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ahocorasickdoublearraytrie.