Comments (13)
## ansj配置
ansj:
dic_path: "ansj/dic/user/" ##用户词典位置
ambiguity_path: "ansj/dic/ambiguity.dic" ##歧义词典
enable_name_recognition: true ##人名识别
enable_num_recognition: true ##数字识别
enable_quantifier_recognition: false ##量词识别
enabled_stop_filter: true ##是否基于词典过滤
stop_path: "ansj/dic/stopLibrary.dic" ##停止过滤词典
首页..说明文档中有的..ctrl+f : 分词文件配置
from elasticsearch-analysis-ansj.
按照这个配置过,测试实验后,还是得不到预想结果,是不是还有别的注意点?
以下是我的配置和测试结果:
1.配置
################################## ANSJ PLUG CONFIG ################################
#默认分词器,索引
index.analysis.analyzer.default.type: index_ansj
#默认分词器,查询
index.analysis.analyzer.default_search.type: query_ansj
ansj:
dic_path: "ansj/dic/user/"
ambiguity_path: "ansj/dic/ambiguity.dic"
enable_name_recognition: true
enable_num_recognition: true
enable_quantifier_recognition: false
enabled_stop_filter: true
stop_path: "ansj/dic/stopLibrary.dic"
2.启动log
[2016-05-27 13:42:04,546][INFO ][DICLOG ] init user userLibrary ok path is : D:\IDEA_APL\elasticsearch-2.3.1\config\ansj\dic\user\userlib.dic
[2016-05-27 13:42:04,548][WARN ][DICLOG ] init ambiguity warning :D:\IDEA_APL\elasticsearch-2.3.1\config\ansj\dic\ambiguity.dic because : file not found or faile
d to read !
[2016-05-27 13:42:05,550][INFO ][DICLOG ] init user userLibrary ok path is : D:\IDEA_APL\elasticsearch-2.3.1\plugins\elasticsearch-analysis-ansj\default.dic
3.测试结果
http://127.0.0.1:9200/_cat/test_index1/analyze?text=斌斌$强强$庆雨&analyzer=dic_ansj
斌 0 1 0 word
斌 1 2 1 word
$ 2 3 2 word
强强 3 5 3 word
$ 5 6 4 word
庆 6 7 5 word
雨 7 8 6 word
from elasticsearch-analysis-ansj.
自定义词典内容如下:
D:\IDEA_APL\elasticsearch-2.3.1\config\ansj\dic\user\userlib.dic
斌斌 a 37557
强强 a 37557
庆雨 a 37557
from elasticsearch-analysis-ansj.
配置都没有修改吗??
你改成这个
1.配置
################################## ANSJ PLUG CONFIG ################################
#默认分词器,索引
index.analysis.analyzer.default.type: index_ansj
#默认分词器,查询
index.analysis.analyzer.default_search.type: query_ansj
ansj:
dic_path: "D:\IDEA_APL\elasticsearch-2.3.1\config\ansj\dic\user\userlib.dic"
记得yml的缩进..
from elasticsearch-analysis-ansj.
现在好像连词典都没找到. ...词典个是之间用tab \t隔开
from elasticsearch-analysis-ansj.
还是不行
1,配置缩进和全路径已改
ansj:
dic_path: "D:/IDEA_APL/elasticsearch-2.3.1/config/ansj/dic/user/userlib.dic"
2.词典个是之间已经用tab \t隔开,这个没问题,这三行词放到D:\IDEA_APL\elasticsearch-2.3.1\plugins\elasticsearch-analysis-ansj\default.dic能得到预想效果。
from elasticsearch-analysis-ansj.
这个问题解决了,
原因:我的自定义词典的文件类型PC,文件编码ANSI。
解决:改成文件类型unix,文件编码UTF-8就可以了。
from elasticsearch-analysis-ansj.
现在出现了个新问题:停用词词典不起作用。
1.配置:
ansj:
dic_path: "ansj/dic/user/" ##用户词典位置
ambiguity_path: "ansj/dic/ambiguity.dic" ##歧义词典
enabled_stop_filter: true ##是否基于词典过滤
stop_path: "ansj/dic/stopLibrary.dic" ##停止过滤词典
2.D:\IDEA_APL\elasticsearch-2.3.1_1\config\ansj\dic\stopLibrary.dic
文件类型unix,文件编码UTF-8
"
.
。
,
、
!
?
3.测试结果(停用词没有过滤掉)
http://127.0.0.1:9200/_cat/test_index1/analyze?text=斌斌"强强"庆雨&analyzer=dic_ansj
斌斌 0 2 0 word
" 2 3 1 word
强强 3 5 2 word
" 5 6 3 word
庆雨 6 8 4 word
from elasticsearch-analysis-ansj.
log显示停用词加载成功
[2016-05-27 17:15:16,785][INFO ][ansj-initializer ] ansj停止词典加载完毕!
[2016-05-27 17:15:17,722][INFO ][DICLOG ] init core library ok use time :918
[2016-05-27 17:15:18,504][INFO ][DICLOG ] init ngram ok use time :753
[2016-05-27 17:15:18,510][INFO ][ansj-initializer ] ansj分词器预热完毕,可以使用!
from elasticsearch-analysis-ansj.
我看了一下elasticsearch-analysis-ansj-master源码,好像没有调用AnsjElasticConfigurator.filter
from elasticsearch-analysis-ansj.
很有可能 因为这个功能好像给删掉了 我回头确认下
发自我的 iPhone
在 2016年5月27日,18:28,fqhaier [email protected] 写道:
我看了一下elasticsearch-analysis-ansj-master源码,好像没有调用AnsjElasticConfigurator.filter
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
from elasticsearch-analysis-ansj.
+1
from elasticsearch-analysis-ansj.
请问下我的2.3.1版本也遇到了相同的问题,要怎么解决?
from elasticsearch-analysis-ansj.
Related Issues (20)
- elasticsearch6.4.0如何配置mysql方式的热加载词典 HOT 3
- 7.8 ansj 插件的实现是不是有点过时了
- 配置自定义字典为jdbc方式,启动elasticsearch后报错 HOT 5
- 配置自定义词典都没有成功是什么原因 HOT 1
- 7.6.2.0版本 HOT 11
- ansj.cfg.yml HOT 2
- 配置中文停用词不生效 HOT 1
- 如何实现短语屏蔽功能 HOT 6
- 2.4.5版本中flush/dic接口奇怪现象 HOT 9
- 除修改config/ansj.cfg.yml添加自定义词典,还有其他方式添加自定义词典吗? HOT 8
- 怎么在插件中加载自己训练的crf模型 HOT 1
- 使用自定义停用词库后报错 HOT 1
- 请问后续能支持8.4.1版本吗?8.3.3版本安装不上 HOT 4
- 8.3.3版本的包各种报错 HOT 1
- 7.10.x版本支持 HOT 1
- 7.17.9版本支持 HOT 2
- 8.7.0版本配置完自定义词典后,分词报error HOT 2
- 如何热更新词语 HOT 1
- 8.7.1版本 _analyze 报错 HOT 2
- es8.8.2配置分词不生效 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-analysis-ansj.