nlpchina / elasticsearch-analysis-ansj Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
创建index没有问题,开始索引数据时出现错误:
[2016-04-23 17:45:27,284][WARN ][rest.suppressed ] /products/product/13 Params: {index=products, id=13, type=product}
RemoteTransportException[[Lightbright][127.0.0.1:9300][indices:data/write/index[p]]]; nested: NullPointerException;
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:78)
at org.ansj.util.AnsjReader.<init>(AnsjReader.java:34)
at org.ansj.util.AnsjReader.<init>(AnsjReader.java:49)
at org.ansj.splitWord.analysis.IndexAnalysis.<init>(IndexAnalysis.java:133)
at org.ansj.lucene5.AnsjAnalyzer.getTokenizer(AnsjAnalyzer.java:95)
at org.ansj.elasticsearch.index.analysis.AnsjAnalysis$1.create(AnsjAnalysis.java:59)
at org.elasticsearch.index.analysis.CustomAnalyzer.createComponents(CustomAnalyzer.java:83)
at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:176)
at org.apache.lucene.document.Field.tokenStream(Field.java:562)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:628)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:365)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:321)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:234)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:450)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1477)
at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1256)
at org.elasticsearch.index.engine.InternalEngine.innerIndex(InternalEngine.java:530)
at org.elasticsearch.index.engine.InternalEngine.index(InternalEngine.java:457)
at org.elasticsearch.index.shard.IndexShard.index(IndexShard.java:601)
at org.elasticsearch.index.engine.Engine$Index.execute(Engine.java:836)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:237)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:158)
at org.elasticsearch.action.index.TransportIndexAction.shardOperationOnPrimary(TransportIndexAction.java:66)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:639)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:279)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:271)
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:75)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:376)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
我的配置文件如下:
index:
analysis:
analyzer:
customer_ansj_index:
tokenizer: index_ansj
customer_ansj_query:
tokenizer: query_ansj
另一个建议默认配置文件的redis地址改为127.0.0.1
下面例子左边是输入,右边是索引后的terms,我希望所每一组的term都是position都是0,1,2,这样短语搜索就可以使用任何一种组合,例如postion1的terms有 [liu,l],position2:[de,d],position3:[hua,h]
同时,我希望索引的结果如下:
{
"tokens": [
{
"token": "l",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "d",
"start_offset": 1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "h",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
},
{
"token": "liu",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "de",
"start_offset":1,
"end_offset": 2,
"type": "word",
"position": 1
},
{
"token": "hua",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 2
}
]
}
怎么避免index_ansj分词方式中,不将歧义字典中的词再次分开, 歧义字典中有个 不确定 但是分词结果中既有 不确定,也有 确定 。。。 设置了enable_skip_user_define: false
我是按作者文档来配置,但启动es时,一直报:没有找到redis相关配置!
说明一下:我是前启动redis再启动es的,并且都是在同一台服务器上,没有网络问题。
求高手帮忙解答,谢谢!!!
{"error":{"root_cause":[{"type":"null_pointer_exception","reason":null}],"type":"null_pointer_exception","reason":null},"status":500}
你好,感谢开发这个分词插件,我在使用中发现了一点问题,想请教一下。
比如我有两条记录:
1.雪野新村
2.上南新村
如果我以雪
为关键字,结果为空,但是用雪野
为关键字,则可以搜索出记录1
如果我以上
为关键字,可以搜索出记录2
请问单字的搜索是怎么处理的?对以上的结果不是很理解,这与分词有关吗
我看了下字典中没有这个词啊,为什么会分一个这么不合理的词出来呢?
请问,当redis服务器故障时,该插件是否有自动重连功能,如果有,该如何配置呢。
我使用的是elasticsearch的2.4.4版本,没有对应版本的插件,我使用2.4.1的对应插件会有问题吗?
日志
[2014-09-25 22:15:09,251][INFO ][cluster.service ] [Atalon] new_master [Atalon][oeOrI-8rTuSh9AQiADXVpw][db01.mst365.cn][inet[/10.171.229.120:9300]], reason: zen-disco-join (elected_as_master)
[2014-09-25 22:15:09,281][INFO ][http ] [Atalon] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/10.171.229.120:9200]}
[2014-09-25 22:15:09,282][INFO ][node ] [Atalon] started
[2014-09-25 22:15:11,947][INFO ][ansj-analyzer ] ansj分词器预热完毕,可以使用!
[2014-09-25 22:15:11,947][INFO ][ansj-analyzer ] 没有找到redis相关配置!
[2014-09-25 22:15:12,417][INFO ][gateway ] [Atalon] recovered [1] indices into cluster_state
[2014-09-25 22:23:57,245][INFO ][node ] [Atalon] stopping ...
[2014-09-25 22:23:57,279][INFO ][node ] [Atalon] stopped
[2014-09-25 22:23:57,280][INFO ][node ] [Atalon] closing ...
[2014-09-25 22:23:57,288][INFO ][node ] [Atalon] closed
能不能提供一个raw的配置文件,直接复制readme的配置容易格式容易出问题!!
另外,yaml一般都是2格缩进的。
2 errors
at org.elasticsearch.common.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:361) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.inject.InjectorBuilder.initializeStatically(InjectorBuilder.java:137) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:93) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:96) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.inject.Guice.createInjector(Guice.java:70) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.common.inject.ModulesBuilder.createInjector(ModulesBuilder.java:43) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:482) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.node.Node.(Node.java:238) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap$6.(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:242) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:360) ~[elasticsearch-5.3.0.jar:5.3.0]
at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:123) ~[elasticsearch-5.3.0.jar:5.3.0]
... 6 more
ip: master.redis.yao.com:6379
这个配置找不到redis···
ES 1.7.2 按 1.x 版本指令安装后不能使用,运行:
curl '127.0.0.1:9200/_analyze?analyzer=query_ansj' -d '中华人民共和国'
出现下列错误:
{"error":"ElasticsearchIllegalArgumentException[failed to find analyzer [query_ansj]]","status":400}
换成 search_ansj,index_ansj 也报同样的错误。
日志显示停词库加成功,但是分词结果中仍然有停词,求助!
我试过每行一个单词,以及如下这样
与 p 1000
专业 n 1000
两种停词库格式
es版本2.3.3
为什么我的elasticsearch2.1.1版本的安装上ansj后高亮显示的还是单字拆分的呢?而且怎么没有停用词词库呢?
elasticsearch版本:2.3.3
elasticsearch.yml
配置如下:
index :
analysis :
tokenizer :
index_ansj :
type : index_ansj
filter :
ini_synonym :
type : synonym
synonyms_path: ansj/dic/synonym.txt
analyzer :
custom1 :
tokenizer : index_ansj
filter : [ini_synonym]
index.analysis.analyzer.default.type: custom1
启动的时候会打印日志:
[.kibana] IndexCreationException[failed to create index]; nested: IllegalArgumentException[Unknown Analyzer type [custom1] for [default]];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:362)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: Unknown Analyzer type [custom1] for [default]
at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:320)
at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60)
at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:233)
at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:105)
at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:143)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
curl -XGET 'localhost:9200/_analyze?tokenizer=index_ansj&filter=pinyin&pretty' -d '你好,我是小明的同学小强'
{
"error" : {
"root_cause" : [ {
"type" : "remote_transport_exception",
"reason" : "[node1][127.0.0.1:9300][indices:admin/analyze[s]]"
} ],
"type" : "null_pointer_exception",
"reason" : null
},
"status" : 500
}
analyzer分词正常
curl -XGET 'localhost:9200/_analyze?analyzer=index_ansj&pretty' -d '你好,我是小明的同学小强'
{
"tokens" : [ {
"token" : "你好",
"start_offset" : 0,
"end_offset" : 2,
"type" : "word",
"position" : 0
}, {
"token" : ",",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 1
}, {
"token" : "我",
"start_offset" : 3,
"end_offset" : 4,
"type" : "word",
"position" : 2
}, {
"token" : "是",
"start_offset" : 4,
"end_offset" : 5,
"type" : "word",
"position" : 3
}, {
"token" : "小明",
"start_offset" : 5,
"end_offset" : 7,
"type" : "word",
"position" : 4
}, {
"token" : "的",
"start_offset" : 7,
"end_offset" : 8,
"type" : "word",
"position" : 5
}, {
"token" : "同学",
"start_offset" : 8,
"end_offset" : 10,
"type" : "word",
"position" : 6
}, {
"token" : "小强",
"start_offset" : 10,
"end_offset" : 12,
"type" : "word",
"position" : 7
} ]
}
支持同义词了吗?
5.0.1似乎接口改变比较多,插件支持不过
比如
publish ansj_term u:c:视康
publish ansj_term u:d:视康
publish ansj_term a:c:减肥瘦身-减肥,nr,瘦身,v
u:c
u:d
...
表示的是什么意思?
elasticsearch版本:2.3.3
插件版本:2.3.3.2
添加-Des.security.manager.enabled=false
启动有报错提示:
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:22)
at org.ansj.elasticsearch.pubsub.redis.RedisUtils.getConnection(RedisUtils.java:22)
at org.ansj.elasticsearch.index.config.AnsjElasticConfigurator$1.run(AnsjElasticConfigurator.java:85)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: ValidateObject failed
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1203)
at redis.clients.util.Pool.getResource(Pool.java:20)
... 3 more
[2016-07-02 21:23:23,838][ERROR][ansj-redis-utils ] Could not get a resource from the pool
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at redis.clients.util.Pool.getResource(Pool.java:22)
at org.ansj.elasticsearch.pubsub.redis.RedisUtils.getConnection(RedisUtils.java:22)
at org.ansj.elasticsearch.index.config.AnsjElasticConfigurator$1.run(AnsjElasticConfigurator.java:85)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: Could not create a validated object, cause: ValidateObject failed
at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1203)
at redis.clients.util.Pool.getResource(Pool.java:20)
... 3 more
[2016-07-02 21:23:23,839][INFO ][ansj-initializer ] redis守护线程准备完毕,ip:127.0.0.1:6379,port:6379,channel:ansj_term
Exception in thread "Thread-3" java.lang.NullPointerException
at java.util.Objects.requireNonNull(Objects.java:203)
at org.ansj.elasticsearch.index.config.AnsjElasticConfigurator$1.run(AnsjElasticConfigurator.java:89)
at java.lang.Thread.run(Thread.java:745)
通过channal发布,没有产生ext.dic 文件而且添加的词条也没起作用。
Elasticsearch 1.7.2
ansj 插件 1.x 版本
类型 string,未指定 not_analyze, 值 fd04fe9b5225461204e75837f1616575, 分词器 index_ansj
该值被分词结果如下:
{'tokens': [{'end_offset': 2,
'position': 1,
'start_offset': 0,
'token': 'fd',
'type': 'word'},
{'end_offset': 4,
'position': 2,
'start_offset': 2,
'token': '04',
'type': 'word'},
{'end_offset': 6,
'position': 3,
'start_offset': 4,
'token': 'fe',
'type': 'word'},
{'end_offset': 7,
'position': 4,
'start_offset': 6,
'token': '9',
'type': 'word'},
{'end_offset': 8,
'position': 5,
'start_offset': 7,
'token': 'b',
'type': 'word'},
{'end_offset': 18,
'position': 6,
'start_offset': 8,
'token': '5225461204',
'type': 'word'},
{'end_offset': 19,
'position': 7,
'start_offset': 18,
'token': 'e',
'type': 'word'},
{'end_offset': 24,
'position': 8,
'start_offset': 19,
'token': '75837',
'type': 'word'},
{'end_offset': 25,
'position': 9,
'start_offset': 24,
'token': 'f',
'type': 'word'},
{'end_offset': 32,
'position': 10,
'start_offset': 25,
'token': '1616575',
'type': 'word'}]}
请问这样分词的依据是什么啊?谢谢!
您好,目前analyze的结果返回:
}, {
"token" : "大爷",
"start_offset" : 55,
"end_offset" : 57,
"type" : "word",
"position" : 32
}, {
"token" : "的",
"start_offset" : 57,
"end_offset" : 58,
"type" : "word",
"position" : 33
} ]
如何让type或者添加一个字段让返回的信息包含词性信息?
如果有详细的说明文档了,还请告诉我一下,因为找了很久都没找到。。。
redis远程push的方式,有具体的命令说明吗?
我想用来管理默认词典、停用词典 等,应该如何配置?
同义词词典怎么配置?
ansj会把标点符号,甚至是空格都保留下来,这对兼容.Net、C++、C#等词很有用,但很多无用的符号,要怎么处理呢,当作停用词放到停用词典里?
除在http://maven.nlpcn.org/down/library/ 中看到「ambiguity.dic」之外,其他词典都没找到范例,不知能否告知一下在哪了解?
支持 ElasticSearch 2.4.4
配置文件如下:
ambiguity_path: "/绝对路径/config/ansj/dic/ambiguity.dic"
通过redis-cli发布
publish ansj_term a:c:减肥瘦身-减肥,nr,瘦身,v
错误日志如下:
[2016-09-19 09:13:06,710][ERROR][ansj-redis-msg-file ] appendAMB exception
java.security.PrivilegedActionException: java.io.FileNotFoundException: ansj/dic/ambiguity.dic (没有那个文件或目录)
at java.security.AccessController.doPrivileged(Native Method)
at org.ansj.elasticsearch.pubsub.redis.FileUtils.appendFile(FileUtils.java:71)
at org.ansj.elasticsearch.pubsub.redis.FileUtils.appendAMB(FileUtils.java:58)
at org.ansj.elasticsearch.pubsub.redis.AddTermRedisPubSub.onMessage(AddTermRedisPubSub.java:38)
at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:113)
at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:83)
at redis.clients.jedis.Jedis.subscribe(Jedis.java:1974)
at org.ansj.elasticsearch.index.config.AnsjElasticConfigurator$1.run(AnsjElasticConfigurator.java:93)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: ansj/dic/ambiguity.dic (没有那个文件或目录)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileWriter.<init>(FileWriter.java:107)
at org.ansj.elasticsearch.pubsub.redis.FileUtils$1.run(FileUtils.java:74)
... 9 more
java.security.PrivilegedActionException: java.io.FileNotFoundException: ansj/dic/ambiguity.dic (没有那个文件或目录)
at java.security.AccessController.doPrivileged(Native Method)
at org.ansj.elasticsearch.pubsub.redis.FileUtils.appendFile(FileUtils.java:71)
at org.ansj.elasticsearch.pubsub.redis.FileUtils.appendAMB(FileUtils.java:58)
at org.ansj.elasticsearch.pubsub.redis.AddTermRedisPubSub.onMessage(AddTermRedisPubSub.java:38)
at redis.clients.jedis.JedisPubSub.process(JedisPubSub.java:113)
at redis.clients.jedis.JedisPubSub.proceed(JedisPubSub.java:83)
at redis.clients.jedis.Jedis.subscribe(Jedis.java:1974)
at org.ansj.elasticsearch.index.config.AnsjElasticConfigurator$1.run(AnsjElasticConfigurator.java:93)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: ansj/dic/ambiguity.dic (没有那个文件或目录)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileWriter.<init>(FileWriter.java:107)
at org.ansj.elasticsearch.pubsub.redis.FileUtils$1.run(FileUtils.java:74)
... 9 more
es1.5.1。
从maven私服上下载zip可以运行,mvn package
的zip运行报错,java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ansjindex.AnsjIndexAnalyzerProvider
,路径很奇怪,没有找到原因。
烦请帮忙看下,多谢!
完整堆栈如下
org.elasticsearch.indices.IndexCreationException: [group_20160108_172451] failed to create index at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:330) at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:311) at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:180) at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:467) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:188) at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find analyzer type [ansj_index] or tokenizer for [index_ansj] at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204) at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85) at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99) at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131) at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69) at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:328) ... 8 more Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [ansj_index] at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:476) at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:464) at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356) ... 16 more Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ansjindex.AnsjIndexAnalyzerProvider at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) 2698,2-9 96% at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:158) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: failed to find analyzer type [ansj_index] or tokenizer for [index_ansj] at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:372) at org.elasticsearch.common.inject.AbstractModule.configure(AbstractModule.java:60) at org.elasticsearch.common.inject.spi.Elements$RecordingBinder.install(Elements.java:204) at org.elasticsearch.common.inject.spi.Elements.getElements(Elements.java:85) at org.elasticsearch.common.inject.InjectorShell$Builder.build(InjectorShell.java:130) at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:99) at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:131) at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:69) at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:328) ... 8 more Caused by: org.elasticsearch.common.settings.NoClassSettingsException: Failed to load class setting [type] with value [ansj_index] at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:476) at org.elasticsearch.common.settings.ImmutableSettings.getAsClass(ImmutableSettings.java:464) at org.elasticsearch.index.analysis.AnalysisModule.configure(AnalysisModule.java:356) ... 16 more Caused by: java.lang.ClassNotFoundException: org.elasticsearch.index.analysis.ansjindex.AnsjIndexAnalyzerProvider at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.elasticsearch.common.settings.ImmutableSettings.loadClass(ImmutableSettings.java:474) ... 18 more
如题,项目clone下来后,执行mvn clean install 报错
[ERROR] Failed to execute goal on project elasticsearch-analysis-ansj: Could not
resolve dependencies for project org.ansj:elasticsearch-analysis-ansj:jar:2.1.1
: The following artifacts could not be resolved: org.ansj:ansj_seg:jar:3.6, org.
ansj:ansj_lucene5_plug:jar:3.0: Failure to find org.ansj:ansj_seg:jar:3.6 in htt
p://repo1.maven.org/maven2/ was cached in the local repository, resolution will
not be reattempted until the update interval of repo1 has elapsed or updates are
forced -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e swit
ch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please rea
d the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyReso
lutionException
比如通过类似下面这样的配置?
PUT /my_index
{
"settings":{
"analysis":{
"analyzer":{
"default":{
"type": "search_ansj",
"enabled_stop_filter": false
}
}
}
}
}
ElasticSearch 5.0出来了,搞下支持下吧!!!
貌似最高版本只有5.0.1
升级ansj的版本,旧版本找不到了。然后需要依赖单独的tree_split,这个居然没有在ansj里面引用···
<groupId>org.ansj</groupId>
<artifactId>ansj_seg</artifactId>
<version>1.4</version>
<classifier>min</classifier>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.ansj</groupId>
<artifactId>tree_split</artifactId>
<version>1.2</version>
<scope>compile</scope>
</dependency>
利用 curl 设置 mapping可以写一个详细的例子吗?
调试真不太方便...
使用index_ansj模式进行分词。会把单字也切分出来。拿作者的例子“六味地黄丸软胶囊“。切分结果中包含了"六、味、地、黄、丸、软、胶、囊",与作者描述的不一致。
ERROR: Plugin [elasticsearch.analysis.ansj] is incompatible with Elasticsearch [2.2.1]. Was designed for version [2.1.1]
使用的是2.3.4版本,中文文章中包含都种标点符号,使用fvh方式显示高亮,获得的高亮却总是混乱,与输入关键词无关,应该是分词索引position错乱了,尝试过滤标点符号,依然失败。
http://localhost:9200/docs/_analyze?text=ce%20shi&analyzer=index_ansj
测试发现空格也被算作一个和token,要如何去掉这个?
{"tokens":[{"token":"ce","start_offset":0,"end_offset":2,"type":"en","position":0},{"token":" ","start_offset":2,"end_offset":3,"type":"null","position":1},{"token":"shi","start_offset":3,"end_offset":6,"type":"en","position":2}]}
如何在elasticsearch.yml 中配置用户自定义词典的路径
请大侠给个例子
分词文件配置里面的ansj配置是需要些在elasticsearch.yml吗?我写进去还是提示没有找到
[2016-09-14 18:16:24,400][INFO ][ansj-initializer ] 没有找到redis相关配置!
里皮带国家队, 分词成 里/皮带/国家队
请问该如何解决?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.