Giter Site home page Giter Site logo

German Morphology about corenlp HOT 10 CLOSED

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024
German Morphology

from corenlp.

Comments (10)

AngledLuffa avatar AngledLuffa commented on June 15, 2024 1

It's not shown on the demo page, but it's definitely part of the API

from corenlp.

AngledLuffa avatar AngledLuffa commented on June 15, 2024 1

I think (although I can't swear to it, having not done anything with the Arabic normalization) that this is pursuing the wrong angle. The Arabic datasets have morphology added as extra pieces to the words in question. The German datasets such as the UD dataset you linked have it as an entirely separate column in the training data. The preprocessing code wouldn't be a one-to-one language swap, but rather something entirely different.

It really is a project that literally no one at Stanford is going to take on or even provide help in any but the most cursory manner. If it's this important that it be in Java, such that repeated suggestions of "just use Stanza" aren't sufficient, you can try contacting @manning to see if there's an arrangement that can be made to sponsor one of the group's (very few) Java programmers to figure out how to make the morphological analyzer you need in Java.

Stanza is available here:

https://github.com/stanfordnlp/stanza

the morphological analyzer is part of the POS tagger, which is documented here (look for feats):

https://stanfordnlp.github.io/stanza/pos.html

the version that uses the transformer is the default_accurate package, so, something like

pipe = stanza.Pipeline("de", package="default_accurate")

from corenlp.

AngledLuffa avatar AngledLuffa commented on June 15, 2024

from corenlp.

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024

I tried Stanza online, I fail to see any morphology information.

Without the morphology, it is challenging to program some levels of German Grammar matchers

IMG_20231031_174513

from corenlp.

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024

I found the morphology features..seem similar to those I found in Spacy.

Are morphology features come from independent sources than Spacy.

Curious of the source, especially for German

from corenlp.

AngledLuffa avatar AngledLuffa commented on June 15, 2024

from corenlp.

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024

@AngledLuffa

It seems this support only the Arabic language.

Will it make sense to support German too?

from corenlp.

AngledLuffa avatar AngledLuffa commented on June 15, 2024

There is roughly 0% chance of that happening from someone here. It would require someone who knows German, who knows Java, who isn't satisfied with the python Stanza toolkit. If someone outside Stanford produced such a project and sent us a PR, we would be happy to integrate it.

from corenlp.

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024

@AngledLuffa

UD German GSD

transformer-based version of the tagger.

Could you please share the link?

Additional relevant references

from corenlp.

GeorgeS2019 avatar GeorgeS2019 commented on June 15, 2024
isMorphTreeFile

ArabicTreeReaderFactory

Morph file is gold tree file with morph analyses in the pre-terminals.

@AngledLuffa

  • Could you please suggest where I could read more how to create this Morph File
  • Do I need GermanTreeReaderFactory? Is it already included in CoreNLP?

from corenlp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.