Giter Site home page Giter Site logo

jmorphy2's Introduction

Java CI Appveyor status

Jmorphy2

Java port of the pymorphy2

Clone project:

git clone https://github.com/anti-social/jmorphy2
cd jmorphy2

Compile project, build jars and run tests:

./gradlew build

Elasticsearch plugin

Plugin installation

  • From a debian package:
curl -SLO https://github.com/anti-social/jmorphy2/releases/download/v0.2.3-es7.14.2/elasticsearch-analysis-jmorphy2-plugin_0.2.3-es7.14.2_all.deb
dpkg -i elasticsearch-analysis-jmorphy2-plugin_0.2.3-es7.14.2_all.deb
  • Using elasticsearch-plugin command:
# Specify correct path of your Elasticsearch installation
export es_home=/usr/share/elasticsearch
${es_home}/bin/elasticsearch-plugin install "https://github.com/anti-social/jmorphy2/releases/download/v0.2.3-es7.14.2/analysis-jmorphy2-0.2.3-es7.14.2.zip"

Building plugin

Default elasticsearch version against which plugin is built is 7.14.2

To build for specific elastisearch version run build as:

./gradlew assemble -PesVersion=7.13.4

Supported elasticsearch versions: 6.6.x, 6.7.x, 6.8.x, 7.0.x, 7.1.x, 7.2.x, 7.3.x, 7.4.x, 7.5.x, 7.6.x, 7.7.x, 7.8.x, 7.9.x, 7.10.x, 7.11.x, 7.12.x, 7.13.x, 7.14.x

For older elasticsearch version use specific branches:

  • es-5.4 for Elasticsearch 5.4.x, 5.5.x and 5.6.x
  • es-5.1 for Elasticsearch 5.1.x, 5.2.x and 5.3.x

And install assembled plugin:

# Specify correct path of your Elasticsearch installation
export es_home=/usr/share/elasticsearch
sudo ${es_home}/bin/elasticsearch-plugin install file:jmorphy2-elasticsearch/build/distributions/analysis-jmorphy2-0.2.2-SNAPSHOT-es7.13.2.zip

Or just run elasticsearch inside the container (only works for plugin built for default Elasticsearch version):

# build container and run elasticsearch with jmorphy2 plugin
vagga elastic

Using podman or docker:

podman build -t elasticsearch-jmorphy2 -f Dockerfile.elasticsearch .github
podman run --name elasticsearch-jmorphy2 -p 9200:9200 -e "ES_JAVA_OPTS=-Xmx1g" -e "discovery.type=single-node" elasticsearch-jmorphy2

Test elasticsearch with jmorphy2 plugin

Create index with specific analyzer and test it:

curl -X PUT -H 'Content-Type: application/yaml' 'localhost:9200/test_index' -d '---
settings:
  index:
    analysis:
      filter:
        delimiter:
          type: word_delimiter
          preserve_original: true
        jmorphy2_russian:
          type: jmorphy2_stemmer
          name: ru
        jmorphy2_ukrainian:
          type: jmorphy2_stemmer
          name: uk
      analyzer:
        text_ru:
          tokenizer: standard
          filter:
          - delimiter
          - lowercase
          - jmorphy2_russian
        text_uk:
          tokenizer: standard
          filter:
          - delimiter
          - lowercase
          - jmorphy2_ukrainian
'

# Test russian analyzer
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_ru
text: Привет, лошарики!
'
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_ru
text: ёж еж ежики
'
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_ru
text: путин
'

# Test ukrainian analyzer
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_uk
text: Пригоди Котигорошка
'
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_uk
text: їжаки
'
curl -X GET -H 'Content-Type: application/yaml' 'localhost:9200/test_index/_analyze' -d '---
analyzer: text_uk
text: комп\'ютером
'

jmorphy2's People

Contributors

anti-social avatar genme avatar kirillpampam avatar koopey avatar maksaimer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jmorphy2's Issues

Работа с UnknownWordUnit

У меня сложилось представление, что jmorphy не умеет склонять несловарные слова.

К примеру: morph.parse("няшка") возвращает

<ParsedWord: "няшка", "UNKN", "няшка", "няшка", 1,000000, class net.uaprom.jmorphy2.AnalyzerUnit$UnknownWordUnit>

с единственным элементом в getLexeme() и невозможностью склонять.

В то время как pymoprhy2 на morph.parse(u'няшка') отвечает

[Parse(word=u'няшка', tag=OpencorporaTag('NOUN,inan,femn sing,nomn'), normal_form=u'няшка', score=1.0, methods_stack=((<FakeDictionary>, u'няшка', 9, 0), (<KnownSuffixAnalyzer>, u'няшка')))]

а morph.parse(u'няшка')[0].inflect({'gent'}).word выдает ожидаемое u'няшки'.

Это ожидаемое поведение со стороны jmorphy? Или я просто не умею его готовить?

Разбор слово->сокращение, сокращение->слово

Можно ли управлять выдачей лем/лексем со связью Full-Contracted?
Ибо имея на входе скажем слово "век" при переборе всех лексем мы получим и букву "в", что является предлогом и соответственно при поиске по слову "век" мы найдем все тексты где есть предлог "в", что совсем не то что ожидается.
Обратное так же верно.
Можно ли как-то этим управлять? Ограничивать выдачу лексем при обработке сокращений?
И если да, то как? На уровне кода и/или словаря?
C Pymorphy аналогичная ситуация.

Разбор несловарного слова "аминовен"

Разбирается с леммой "амин", отбрасывая суффикс "овен". Насколько я понимаю необходимо в словарь opencorpora добавить информацию о парадигме, в которой возможен вариант, когда словоформа "аминовен" в именительном падеже?

инициализация в дефолтной кодировке

Под Windows MorphAnalyzer словари читает в system default (win 1251) кодировке , отчего парсер не работает, вылетают всякие непонятные исключения.
Словари судя по структуре в словарях - UTF-8

BUILD FAILED

Hello, there are some problems. I am trying to build an assembly for ELK version 7.10.1 following the instructions.

vagga assemble -PesVersion=7.10.1

`Starting a Gradle Daemon, 1 incompatible and 3 stopped Daemons could not be reused, use --status for details

FAILURE: Build failed with an exception.

  • Where:
    Build file '/work/jmorphy2-elasticsearch/build.gradle' line: 17

  • What went wrong:
    A problem occurred evaluating project ':jmorphy2-elasticsearch'.

Failed to apply plugin [class 'org.elasticsearch.gradle.info.GlobalBuildInfoPlugin']
Gradle 6.6.1+ is required

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

  • Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
Use '--warning-mode all' to show the individual deprecation warnings.
See https://docs.gradle.org/6.5.1/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 6s
`

`gradle --version


Gradle 6.6.1

Build time: 2020-08-25 16:29:12 UTC
Revision: f2d1fb54a951d8b11d25748e4711bec8d128d7e3

Kotlin: 1.3.72
Groovy: 2.5.12
Ant: Apache Ant(TM) version 1.10.8 compiled on May 10 2020
JVM: 11.0.9.1 (Debian 11.0.9.1+1-post-Debian-1deb10u2)
OS: Linux 4.19.0-8-amd64 amd64
`

Elasticsearch не видит фильтр jmorphy2_stemmer

Может я неверно использую плагин, пробую вот такой конфиг для анализатора:

"rus_analyzer": {
                                    "type": "custom",
                                     "tokenizer": "standard",
                                     "filter": ["lowercase", "jmorphy2_stemmer"]
                           }
В результате

IllegalArgumentException[Custom Analyzer [rus_analyzer] failed to find filter under name [jmorphy2_stemmer]]; ","status":400}

При загрузке плагин elastic вроде видит: loaded [analysis-jmorphy2, ... ].

Как правильно развернуть библиотеку на elasticsearch? Кстати куда нужно копировать словари?

mvn test failed

пытаюсь собрать jmorphy2 , установил maven 3.2.3

mvn test выдает:


T E S T S

Running net.uaprom.jmorphy2.MorphAnalyzerTest
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.859 sec <<< FAILURE! - in net.uaprom.jmorphy2.MorphAnalyzerTest
test(net.uaprom.jmorphy2.MorphAnalyzerTest) Time elapsed: 0.8 sec <<< ERROR!
java.util.regex.PatternSyntaxException: Unknown character property name {Latin} near index 11
[\p{IsLatin}\d\p{Punct}]+
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at java.util.regex.Pattern.charPropertyNodeFor(Pattern.java:2437)
at java.util.regex.Pattern.family(Pattern.java:2412)
at java.util.regex.Pattern.range(Pattern.java:2335)
at java.util.regex.Pattern.clazz(Pattern.java:2268)
at java.util.regex.Pattern.sequence(Pattern.java:1818)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
at net.uaprom.jmorphy2.MorphAnalyzer$RegexUnit.(MorphAnalyzer.java:279)
at net.uaprom.jmorphy2.MorphAnalyzer$LatinUnit.(MorphAnalyzer.java:307)
at net.uaprom.jmorphy2.MorphAnalyzer.(MorphAnalyzer.java:69)
at net.uaprom.jmorphy2.Jmorphy2TestsHelpers.newMorphAnalyzer(Jmorphy2TestsHelpers.java:24)
at net.uaprom.jmorphy2.Jmorphy2TestsHelpers.newMorphAnalyzer(Jmorphy2TestsHelpers.java:20)
at net.uaprom.jmorphy2.MorphAnalyzerTest.setUp(MorphAnalyzerTest.java:25)

jar hell exception after installing plugin

OS: Astra Linux 1.6 Smolensk (aka Debian 9)
Packages:
elasticsearch 7.17.6
elasticsearch-analysis-jmorphy2-plugin 0.2.3~es7.17.6

After jmorphy deb package install, elasticsearch cannot start, exception:

[ERROR][o.e.b.Bootstrap          ] [xxxxxx] Exception
java.lang.IllegalStateException: failed to load plugin analysis-jmorphy2 due to jar hell
        at org.elasticsearch.plugins.PluginsService.checkBundleJarHell(PluginsService.java:691) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:531) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.plugins.PluginsService.<init>(PluginsService.java:170) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.node.Node.<init>(Node.java:411) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.node.Node.<init>(Node.java:309) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Bootstrap$5.<init>(Bootstrap.java:234) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:234) ~[elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:434) [elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:169) [elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:160) [elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:77) [elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:112) [elasticsearch-cli-7.17.6.jar:7.17.6]
        at org.elasticsearch.cli.Command.main(Command.java:77) [elasticsearch-cli-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:125) [elasticsearch-7.17.6.jar:7.17.6]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:80) [elasticsearch-7.17.6.jar:7.17.6]
Caused by: java.lang.IllegalStateException: jar hell!
class: org.apache.lucene.analysis.ar.ArabicAnalyzer$DefaultSetHolder
jar1: /usr/share/elasticsearch/plugins/analysis-jmorphy2/lucene-analyzers-common-8.9.0.jar
jar2: /usr/share/elasticsearch/lib/lucene-analyzers-common-8.11.1.jar
        at org.elasticsearch.jdk.JarHell.checkClass(JarHell.java:297) ~[elasticsearch-core-7.17.6.jar:7.17.6]
        at org.elasticsearch.jdk.JarHell.checkJarHell(JarHell.java:192) ~[elasticsearch-core-7.17.6.jar:7.17.6]
        at org.elasticsearch.plugins.PluginsService.checkBundleJarHell(PluginsService.java:689) ~[elasticsearch-7.17.6.jar:7.17.6]
        ... 14 more

es 7.10 7.11

need support for new versions es :) Thanks you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.