Giter Site home page Giter Site logo

infinilabs / analysis-stconvert Goto Github PK

View Code? Open in Web Editor NEW
349.0 15.0 74.0 368 KB

🚲 STConvert is analyzer that convert chinese characters between traditional and simplified.中文简繁體互相转换.

License: Apache License 2.0

Java 100.00%
analyzer traditional elasticsearch convert-chinese-characters

analysis-stconvert's People

Contributors

amooncake avatar jlleitschuh avatar medcl avatar nomoa avatar zeeshanasghar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

analysis-stconvert's Issues

elasticsearch-analysis-stconvert 1.8.3

I installed the plugin elasticsearch-analysis-stconvert 1.8.3 , and updated elasticsearch.yml to restart elasticsearch and encounter the following issues:

[2016-07-27 14:27:02,163][WARN ][cluster.action.shard ] [Nitro] [test][1] received shard failed for target shard [[test][1], node[0tGrnF7ZQZyKV__YKdwq_Q], [P], v[3], s[INITIALIZING], a[id=I03NjNN7SCaezVLXd9PNpw], unassigned_info[[reason=CLUSTER_RECOVERED], at[2016-07-27T06:27:01.631Z]]], indexUUID [nb_j7plXT4q6vqH9OZnkTQ], message [failed to create index], failure [IndexCreationException[failed to create index]; nested: IllegalStateException[[index.version.created] is not present in the index settings for index with uuid: [null]]; ]
[test] IndexCreationException[failed to create index]; nested: IllegalStateException[[index.version.created] is not present in the index settings for index with uuid: [null]];
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:360)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyNewIndices(IndicesClusterStateService.java:294)
at org.elasticsearch.indices.cluster.IndicesClusterStateService.clusterChanged(IndicesClusterStateService.java:163)
at org.elasticsearch.cluster.service.InternalClusterService.runTasksForExecutor(InternalClusterService.java:610)
at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:772)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:231)
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: [index.version.created] is not present in the index settings for index with uuid: [null]
at org.elasticsearch.Version.indexCreated(Version.java:580)
at org.elasticsearch.index.analysis.Analysis.parseAnalysisVersion(Analysis.java:99)
at org.elasticsearch.index.analysis.AbstractTokenizerFactory.(AbstractTokenizerFactory.java:40)
at org.elasticsearch.index.analysis.STConvertTokenizerFactory.(STConvertTokenizerFactory.java:38)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.FactoryProxy.get(FactoryProxy.java:54)
at org.elasticsearch.common.inject.InjectorImpl$4$1.call(InjectorImpl.java:823)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.InjectorImpl$4.get(InjectorImpl.java:818)
at org.elasticsearch.common.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:236)
at com.sun.proxy.$Proxy15.create(Unknown Source)
at org.elasticsearch.index.analysis.AnalysisService.(AnalysisService.java:95)
at org.elasticsearch.index.analysis.AnalysisService.(AnalysisService.java:70)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.elasticsearch.common.inject.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:50)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:86)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
at org.elasticsearch.common.inject.SingleParameterInjector.inject(SingleParameterInjector.java:42)
at org.elasticsearch.common.inject.SingleParameterInjector.getAll(SingleParameterInjector.java:66)
at org.elasticsearch.common.inject.ConstructorInjector.construct(ConstructorInjector.java:85)
at org.elasticsearch.common.inject.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:104)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:47)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:886)
at org.elasticsearch.common.inject.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:43)
at org.elasticsearch.common.inject.Scopes$1$1.get(Scopes.java:59)
at org.elasticsearch.common.inject.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:46)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:201)
at org.elasticsearch.common.inject.InjectorBuilder$1.call(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorImpl.callInContext(InjectorImpl.java:879)
at org.elasticsearch.common.inject.InjectorBuilder.loadEagerSingletons(InjectorBuilder.java:193)
at org.elasticsearch.common.inject.InjectorBuilder.injectDynamically(InjectorBuilder.java:175)
at org.elasticsearch.common.inject.InjectorBuilder.build(InjectorBuilder.java:110)
at org.elasticsearch.common.inject.InjectorImpl.createChildInjector(InjectorImpl.java:157)
at org.elasticsearch.common.inject.ModulesBuilder.createChildInjector(ModulesBuilder.java:55)
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:358)
... 9 more

stconvert char filter causing highlighted search errors

char filter definition:

"ts_char_filter" : {
    "type" : "stconvert",
    "delimiter" : "#",
    "keep_both" : false,
    "convert_type" : "t2s"
 }

without the char filter, a plain highlighted match_all query could return the matched document, query json is shown below:

{
    "_source": {"exclude": ["text"]},
    "query" : {
        "match_all" : {}
    },
    "highlight": {
        "encoder": "html",
        "pre_tags": ["<span style='color: red;'>"],
        "post_tags": ["</span>"],
        "fields": {
            "text": {
                "fragment_size": 120
            }
        }
    }
}

however by add the traditional to simplified char filter, an error would return:

"failures": [
    {
        "shard": 3,
        "index": "my_index",
        "node": "59j1d4i9Qdm0RcVfIN2UoQ",
        "reason": {
            "type": "invalid_token_offsets_exception",
            "reason": "Token www.victorycity.com.hk exceeds length of provided text sized 90702"
        }
    }
]

raw text is attached below:
test.txt

elasticsearch log:

at java.lang.Thread.run(Thread.java:748) [?:1.8.0_141]
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token www.victorycity.com.hk  exceeds length of provided text sized 90702

can you please take a look into it, thanks.

6.8.0 norma 报错

{
	"settings": {
		"analysis": {
			"analyzer": {
				"my_analyzer": {
					"type": "custom",
					"tokenizer": "ik_max_word",
					"char_filter": [
						"tsconvert"
					],
					"filter": [
						"lowercase"
					]
				}
			},
			"char_filter": {
				"tsconvert": {
					"type": "stconvert",
					"convert_type": "t2s"
				}
			},
			"normalizer": {
				"my_normalizer": {
					"type": "custom",
					"char_filter": [
						"tsconvert"
					],
					"filter": [
						"lowercase"
						, "asciifolding"
					]
				}
			}
		}
	},
	"mappings": {
		"_doc": {
			"properties": {
				"foo": {
					"type": "keyword",
					"normalizer": "my_normalizer"
				},
				"name": {
					"type": "text",
					"analyzer": "my_analyzer"
				}
			}
		}
	}
}
{
	"error": {
		"root_cause": [
			{
				"type": "remote_transport_exception",
				"reason": "[node-1][172.19.0.1:9301][indices:admin/create]"
			}
		],
		"type": "illegal_argument_exception",
		"reason": "Custom normalizer [my_normalizer] may not use char filter [tsconvert]"
	},
	"status": 400
}

麻烦看下

Unknown char_filter type [stconvert] when trying to add to index settings

Using Elastic Cloud, ES version is 6.4.1.
elasticsearch-analysis-stconvert-7.0.0.zip was installed as an installable custom plugin, and then enabled in "Manage plugins and settings".

Then run (similar to as shown in the example):

PUT /index_name/_settings
{
    "analysis" : {
            "char_filter" : {
                "tsconvert" : {
                    "type" : "stconvert",
                    "convert_type" : "t2s"
                }
            }
        }
}

Got response:

{
  "status": 400,
  "error": {
    "root_cause": [
      {
        "reason": "Unknown char_filter type [stconvert] for [tsconvert]",
        "type": "illegal_argument_exception"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "Unknown char_filter type [stconvert] for [tsconvert]"
  }
}

“電子”转换简体错误

如题,“電子“一词转换简体错误,除”電子鐘錶“、”電子錶" 等已在词典之外,其他如“電子報" 等会转换错误,仍为”電子“。

{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["tsconvert"],
  "text" : "電子"
}

Output:
{
    "tokens": [
        {
            "token": "電子",
            "start_offset": 0,
            "end_offset": 2,
            "type": "word",
            "position": 0
        }
    ]
}

如何在charfilter中获取当前字段名

我在基于stconvert的charfilter研究es的字符串过滤,也找到了这里面进行字符处理的位置。但是现在我需要对一次输入的不同字段进行分别处理,比如对于索引:
image
我只想处理字段word而不处理foo;现在我能拿到的仅仅是输入的字符串,并不能获取当前处理的字符串是属于哪个字段的,请问大家有办法可以获取么? @medcl

how to config stconvert in django-dsl?

html_strip = analyzer(
'ik',
tokenizer="ik_smart",
filter=["standard", "lowercase", "stop", "snowball"],
# char_filter=["html_strip"]
# char_filter={
# "tsconvert" : {
# "convert_type": "t2s",
# "type": "stconvert"
# }
# }
char_filter=["stconvert"]
)

I want to config the convert_type:t2s ,But there are errors:
elasticsearch.exceptions.TransportError: TransportError(500, 'settings_exception', 'Failed to load settings from [{"number_of_shards":3,"analysis":{"analyzer":{"ik":{"filter":["standard","lowercas e","stop","snowball"],"char_filter":[{"tsconvert":{"convert_type":"t2s","type":"stconvert"}}],"typ e":"custom","tokenizer":"ik_smart"}}},"number_of_replicas":1}]')

简繁体转换CharFilter实现

提供一个简繁体转换的charfilter,在分词之前统一成简体或者繁体,方便后续的分词,避免tokenizer分词的不正确

想要一个stconvert加ik的效果

仿照之前medcl大神写的pinyin+ik的分析器写的:

PUT http://localhost:9200/medcl/ -d'
{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "custom_stconvert_analyzer" : {
                    "tokenizer" : "ik_smart",
                    "filter" : "stconvert"
                }
            }
        }
    }
}'

但是并没有效果哇,求解?

写入es数据时,Properties类 导致锁竞争降低并发性能

并发写入es数据时,发现这个插件会有性能瓶颈。瓶颈代码位置 STConverter.java:104 java.util.Hashtable.containsKey
这里很容易导致锁竞争,写并发上去后,就容易卡在这里。
事实上,字典在初始化时用Properties(继承自Hashtable)加载完成后,并没有在其他地方并触发任何改动,那么这里的锁就是没有必要的。
不要用Hashtable(或者Properties),可以直接改用Hashmap。

How to build for ElasticSearch 7.17.7?

There's no release binaries..

Building elastic.search version manually in pom.xml is generating errors:
Shouldn't elasticsearch-analysis-stconvert matches target ElasticSearch version?

[ERROR] /root/elasticsearch-analysis-stconvert/src/main/java/org/elasticsearch/index/analysis/STConvertAnalyzerProvider.java:[28,9] constructor AbstractIndexAnalyzerProvider in class org.elasticsearch.index.analysis.AbstractIndexAnalyzerProvider<T> cannot be applied to given types;

Error in t2s conversion of 恭弘

Using the tsconvert tokenizer or char_filter, "恭弘" gets converted to "叶 叶 恭弘:叶:叶".

I think it is because of line 4050 in t2s.properties, which is weirdly formatted, and the only line with "=" in the whole file:

恭弘=叶 恭弘:叶

This causes downstream problems with SmartCN, and the character counts for tokens are wrong for everything on the same line that comes after.

I recompiled a copy of v1.8.5 without this line and 恭弘 is unchanged, which is definitely better if not ideal.

how to use stconvert in ik analyzed indexes?

Here is my setting and mapping information

settings index: { number_of_shards: 1 } do
    mappings do
      indexes :seq_in_nb, type: :integer
      indexes :likes_count, type: :integer
      indexes :title, analyzer: :ik
      indexes :content_without_markup, analyzer: :ik
      indexes :shared, analyzer: :keyword, type: :boolean
      indexes :locked, analyzer: :keyword, type: :boolean
    end
  end

the content in our setting includes traditional and simplified Chinese. Therefore we want to search terms with both traditional and simplified Chinese and we can get the same searching result.

Now under this setting, my contents are divided into two groups with respect to each Chinese words(e.g. 中國 and **).

Here comes two question:

  1. could I make the words with same meaning into just one keyword (e.g. the document containing "中國" or "**" could come to the same index)
  2. As I use the following query setting
...
{ match_phrase: {content_without_markup: { analyzer: :t2s_convert, query: keyword, slop: 10} } }
...

I could use "國" to search for "国", but can't use "中國" to search for "**". Why this situation happen? It seems almost worked. But why phrase failed?

Support es 2.4.0

medcl大大,能不能發布一下 es 2.4.0 的版本,在此謝過 orz

7.17.7 zip release

Hi,

The 7.17.7 release is missing the zip. Can you please build that?
Thank you very much.

Failed to find analyzer tsconvert/tsconvert_keep_both/stconvert_keep_both

stconvert

curl -XGET http://localhost:9200/index/_analyze\?text\=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&analyzer\=stconvert

{"tokens":[{"token":"北京國際電視檯","start_offset":0,"end_offset":7,"type":"word","position":0},{"token":"北京國際電視臺"


tsconvert

curl -XGET http://localhost:9200/index/_analyze\?text\=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&analyzer\=tsconvert

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[XYXYHAu][127.0.0.1:9300][indices:admin/analyze[s]]"}],"type":"illegal_argument_exception","reason":"failed to find analyzer [tsconvert]"},"status":400}


tsconvert_keep_both

curl -XGET http://localhost:9200/index/_analyze\?text\=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&analyzer\=tsconvert_keep_both

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[XYXYHAu][127.0.0.1:9300][indices:admin/analyze[s]]"}],"type":"illegal_argument_exception","reason":"failed to find analyzer [tsconvert_keep_both]"},"status":400}


stconvert_keep_both

curl -XGET http://localhost:9200/index/_analyze\?text\=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&analyzer\=stconvert_keep_both

{"error":{"root_cause":[{"type":"remote_transport_exception","reason":"[XYXYHAu][127.0.0.1:9300][indices:admin/analyze[s]]"}],"type":"illegal_argument_exception","reason":"failed to find analyzer [stconvert_keep_both]"},"status":400}


version info

  • elasticsearch version: 5.3.0
  • elasticsearch-analysis-stconvert: 5.3.0

构建了 8.10.2 ,8.10.3,8.10.4,7.17.14供使用

elasticsearch-analysis-stconvert 新包构建说明

elasticsearch-analysis-stconvert 作者创业之后应该是没时间维护这个插件了,
没有构建支持8.10.2版本的包,于是自己fork了代码,拉到本地修改修改pom.xml中的版本号,比如:
<elasticsearch.version>8.10.2</elasticsearch.version>
并升级了nlp-lang-1.7.jar 为nlp-lang-1.7.9.jar, nlp-lang组件中拼音数据有更新

修改之后运行构建命令 mvn clean compile package,打包成功,得到elasticsearch-analysis-stconvert-8.10.2.zip
插件安装命令示例:
./elasticsearch-plugin -v install file:///var/services/homes/lizongbo/esplugins/elasticsearch-analysis-stconvert-8.10.2.zip

为方便大家使用,我构建了以下版本的包供使用:

8.10.2,8.10.3,8.10.4,7.17.14

https://github.com/lizongbo/elasticsearch-analysis-stconvert/releases/tag/v8.10.4
https://github.com/lizongbo/elasticsearch-analysis-stconvert/releases/tag/v8.10.3
https://github.com/lizongbo/elasticsearch-analysis-stconvert/releases/tag/v8.10.2
https://github.com/lizongbo/elasticsearch-analysis-stconvert/releases/tag/v7.17.14

有其它版本包需求的也可以在此提出来,我可以构建上传。

是否可以有 whitelist 或者 blacklist

用到了 简体字转为繁体字的 query,但是发现有些简体字不能转,比如

自动转为了 ,这两个字在繁体中都有,会出现问题。

请问有解决办法吗?

ES 5.6.8 support

Can you please update stconvert to support ES 5.6.8 as you did for ik and pinyin?

找不到:tsconvert_keep_both 、stconvert_keep_both

能找到tsconvert和stconvert,但没找到:tsconvert_keep_both 和stconvert_keep_both

curl -XGET http://localhost:9200/stconvert/_analyze?text=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&tokenizer=tsconvert_keep_both\&pretty

{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[HadoopTemp][192.168.1.208:9300][indices:admin/analyze[s]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "failed to find tokenizer under [tsconvert_keep_both]"
},
"status" : 400
}

curl -XGET http://localhost:9200/stconvert/_analyze?text=%e5%8c%97%e4%ba%ac%e5%9b%bd%e9%99%85%e7%94%b5%e8%a7%86%e5%8f%b0%2c%e5%8c%97%e4%ba%ac%e5%9c%8b%e9%9a%9b%e9%9b%bb%e8%a6%96%e8%87%ba\&tokenizer=stconvert_keep_both\&pretty
{
"error" : {
"root_cause" : [
{
"type" : "remote_transport_exception",
"reason" : "[HadoopTemp][192.168.1.208:9300][indices:admin/analyze[s]]"
}
],
"type" : "illegal_argument_exception",
"reason" : "failed to find tokenizer under [stconvert_keep_both]"
},
"status" : 400
}

設置了轉換器后分詞不太理想

設置了轉換器后分詞不太理想,比如搜索 清華 沒有結果,搜索清華大學有結果,這個是什麽原因,我設置如下:
{
"analysis": {
"char_filter": {
"tsconvert": {
"type": "stconvert",
"convert_type": "t2s"
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"char_filter": [
"tsconvert"
],
"tokenizer": "ik_smart",
"filter": [
"lowercase"
]
}
}
}
}
有什麽建議嗎?

Mapping definition for [key_word] has unsupported parameters: [analyzer : tsconvert]

我在创建索引的时候报错,下面是我的代码:

curl -XPUT my_index
{
    "mappings": {
        "item": {
            "properties": {
                "key_word": {
                    "type": "keyword", 
                    "analyzer": "tsconvert"
                }, 
                "title": {
                    "type": "keyword",
                    "analyzer": "tsconvert"
                }
            }
        }
    }
}

提交的时候出现下面的错误,插件我已经安装好了。

{
    "error": {
        "root_cause": [
            {
                "type": "mapper_parsing_exception",
                "reason": "Mapping definition for [key_word] has unsupported parameters:  [analyzer : tsconvert]"
            }
        ],
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping [item]: Mapping definition for [key_word] has unsupported parameters:  [analyzer : tsconvert]",
        "caused_by": {
            "type": "mapper_parsing_exception",
            "reason": "Mapping definition for [key_word] has unsupported parameters:  [analyzer : tsconvert]"
        }
    },
    "status": 400
}

无法在normalizer中使用该char_filter

版本:6.3.1,
"st_normalizer": {
"type": "custom",
"char_filter": [
"tsconvert"
]
}

STConvertCharFilterFactory.class 应该implements MultiTermAwareComponent
否则在es代码中检查时会有报错,导致无法正常应用在normalizer中
if (charFilter instanceof MultiTermAwareComponent == false) {
throw new IllegalArgumentException("Custom normalizer [" + name() + "] may not use char filter ["+ charFilterName + "]");
}

新版本还是有这种问题。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.