This project is aimed to evaluate the effects of changes to a corpus annotation on POS tagging, with cross-validation.
The Czech corpus DESAM (with its attributive tagset) is assumed, as well RFTagger. Originally developed as part of my master’s thesis.
MIT licensed, except the 3rd party files which have their own licences.
Currently, the code includes parts of my (unreleased) chart parser “ijáček”. It should be released as well and the common code should be shared across the projects.
To be written, but you need at least Python 3.5, RFTagger, and GNU Make. Plus the DESAM corpus or any corpus using the Czech attributive tagset. The tagset is employed by a free morphological analyzer Majka.
There may be some useful description in readme.html.
(Optional) Python 3 packages, available in Arch Linux AUR:
python-beautifulsoup4 4.5.1-1
(required bycompare_evaluation.py
)python-tabulate
(convert_to_latex.py
, just a helper script)python-colorlog
(optional)python-pygments
(pygments_lexer.py
, also an unnecessary part)
Firefox >= 51 is advised for colourful emojis to help navigate generated HTML tables with better visual cue than just shapes/glyphs.
Czech comments in the code do not contain important stuff.