Comments (10)
It's not shown on the demo page, but it's definitely part of the API
from corenlp.
I think (although I can't swear to it, having not done anything with the Arabic normalization) that this is pursuing the wrong angle. The Arabic datasets have morphology added as extra pieces to the words in question. The German datasets such as the UD dataset you linked have it as an entirely separate column in the training data. The preprocessing code wouldn't be a one-to-one language swap, but rather something entirely different.
It really is a project that literally no one at Stanford is going to take on or even provide help in any but the most cursory manner. If it's this important that it be in Java, such that repeated suggestions of "just use Stanza" aren't sufficient, you can try contacting @manning to see if there's an arrangement that can be made to sponsor one of the group's (very few) Java programmers to figure out how to make the morphological analyzer you need in Java.
Stanza is available here:
https://github.com/stanfordnlp/stanza
the morphological analyzer is part of the POS tagger, which is documented here (look for feats
):
https://stanfordnlp.github.io/stanza/pos.html
the version that uses the transformer is the default_accurate
package, so, something like
pipe = stanza.Pipeline("de", package="default_accurate")
from corenlp.
from corenlp.
I tried Stanza online, I fail to see any morphology information.
Without the morphology, it is challenging to program some levels of German Grammar matchers
from corenlp.
I found the morphology features..seem similar to those I found in Spacy.
Are morphology features come from independent sources than Spacy.
Curious of the source, especially for German
from corenlp.
from corenlp.
It seems this support only the Arabic language.
Will it make sense to support German too?
from corenlp.
There is roughly 0% chance of that happening from someone here. It would require someone who knows German, who knows Java, who isn't satisfied with the python Stanza toolkit. If someone outside Stanford produced such a project and sent us a PR, we would be happy to integrate it.
from corenlp.
UD German GSD
transformer-based version of the tagger.
Could you please share the link?
Additional relevant references
from corenlp.
isMorphTreeFile
ArabicTreeReaderFactory
Morph file is gold tree file with morph analyses in the pre-terminals.
- Could you please suggest where I could read more how to create this Morph File
- Do I need GermanTreeReaderFactory? Is it already included in CoreNLP?
from corenlp.
Related Issues (20)
- IntervalTree#remove null pointer exception HOT 4
- i am getting a lock screen bug HOT 3
- Upgrade Apache Lucene to resolve vulnerability for consumers HOT 8
- negation modifier HOT 4
- Add Automatic-Module-Name to MANIFEST.MF HOT 22
- english.all.3class.distsim.crf.ser.gz ???? HOT 1
- Training Shift Reduce Parser HOT 1
- German Lemma not working? HOT 10
- Wrong POS for "keine": PRON instead of DET HOT 7
- Support HOT 2
- Is downloads.cs.stanford.edu down? HOT 3
- Arabic Processing data HOT 2
- VBN vs VBD in the input files from PTB
- Is https://corenlp.run down? HOT 1
- Local Server Run Fails Due to Main Website Outage HOT 3
- Cannot instantiate a StanfordCoreNLP pipeline in a Springboot Project using Maven HOT 3
- I use this command, but the word-cut results are same to space-split. Thank you very much.
- Unable to install CoreNLP software HOT 7
- KBA appears to miss per:child when separated by location
- edu.stanford.nlp.pipeline.StanfordCoreNLP HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from corenlp.