Comments (8)
hopefully @antalvdb can shine a light on this?
from frog.
This paper from 2007 has the basic performance estimations for POS tagging, morphological analysis, and dependency parsing. The latter is computed by a predecessor of the CoNLL dependency parsing evaluator. The paper does not specify scores for the lemmatizer (but see this paper), the shallow parser / XP chunker (current score on test data: 91.3 precision, 92.5 recall, 91.9 F-score) or the named entity recognizer (current score on test data: overall F-score 82.1, persons 81.7, locations 90.9, organizations 75.1). The latter scores have not been published yet.
from frog.
Thank you for links to the papers and the scores. That already provides some information.
I'm trying to compare frog
to the model accuracies reported with udpipe
models (https://ufal.mff.cuni.cz/udpipe/users-manual#universal_dependencies_20_models_performance) which were either built on the UD_Dutch or UD_Dutch-LassySmall corpus (see http://universaldependencies.org/treebanks/nl-comparison.html or details on these 2 corpora). As the numbers reported in the papers are highly likely driven by the corpus used I wonder if there are accuracy related metrics (precision/recall/f/uas/las) scores available for a model which was also trained on these corpora from universaldependencies? Or is this wishfull thinking that someone would have done this?
from frog.
Unfortunately we do not have the time to do these types of comparative evaluations. We always welcome anyone willing to put in time to do these types of exercises, and are happy to assist where possible.
Frog's parser is described in more detail here. The memory-based parser emulates the Alpino parser. Its inference (constraint satisfaction inference) is fast, but produces parses that are less accurate than those of Alpino. We trade accuracy for (predictable, relatively high) speed.
from frog.
Thank you for the input. I understand completely that you don't have time for this. It's not a small task.
I'm basically asking because recently I wrote an R wrapper around UDPipe (https://github.com/bnosac/udpipe) and I'm now investigating how good UDPipe is in comparison to other similar parsers, e.g. the Alpino parser or frog for Dutch, opennlp or the python pattern nlp package.
I was recently making a comparison between UDPipe & Spacy (https://github.com/jwijffels/udpipe-spacy-comparison) but I would like to have e.g. Frog added as well as the python pattern library, Alpino and OpenNLP. Do you know if such research has already been done so that maybe I can take a short-cut in this analysis.
from frog.
As far as I know, it has not been done and would be a very welcome study..
from frog.
Last week, I got into contact with Gertjan van Noord & Gosse Bouma as they had a paper on evaluating Alpino versus Parsey/Parseysaurus on the UD_Lassy-Small treebank http://aclweb.org/anthology/W17-0403 . I received the output of the Alpino results in CONLLU format which allowed a comparison to UDPipe. There were some nitty-gritty details on the evaluation but it already gave an indication of accuracy.
All it takes for making a comparison is providing the annotation result of some text for which we know the annotation in conllu output after which the evaluation script used by the CONLL17 shared task available at https://github.com/ufal/conll2017/blob/master/evaluation_script/conll17_ud_eval.py can be used. But the tricky part is to get the annotation result in conllu format :)
from frog.
as this is not a REAL issue, I close this
from frog.
Related Issues (20)
- Frog Chunker creates invalid FoLiA HOT 2
- released frog (0.29) depends on unreleased libfolia (2.15) HOT 2
- Building on Ubuntu 22.04 LTS Pop!_OS HOT 1
- Token annotation error for XML output with non-standard rules HOT 3
- segmentation fault when invoked with a missing [[tokenizer]] section in the configuration HOT 5
- Server mode creates only 1 paragraph HOT 2
- Add JSON output as an alternative to 'tabbed' format HOT 3
- Frog breaks while processing large amount of txt data HOT 11
- Keep the deep_morph structure intact when resolving MWU's HOT 1
- Simplify option and configuration handling
- MWU output when no Parser is selected HOT 7
- Update debian package for v0.20
- Python Frog HOT 2
- Frog (through python-frog) accumulates a huge number of temporary files HOT 11
- Praktische vragen rondom grote datasets HOT 7
- Bug: frog server; frog-:connection lost unexpected : write to client failed HOT 2
- Segfault on FoLiA in to FoLiA out (speech data with events and utterances) HOT 7
- New release? HOT 3
- frog lemmatizer with --deep-morph misses a morpheme in FoLiA output
- [Docker] Initialization fails for nld-vnn and dum HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frog.