Check changes in your OCR Ground Truth
If the Ground Truth data is version controlled via git repository, you can use "GTCheck" to validate and commit your modification. Therefore GTCheck will display for any line the original text, the modified version as well as the corresponding image. A virtual keyboard supports character replacements and the transcription of missing text.
This installation is tested with Ubuntu and we expect that it should work for other similar environments.
- Python> 3.6
git clone https://github.com/UB-Mannheim/GTCheck.git
cd GTCheck
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip setuptools
pip install -r requirements.txt
python setup.py install
gtcheck run-server {parameters..}
gtcheck add-repo path/to/git-repo {parameters..}
gtcheck run-single path/to/git-repo {parameters..}
Extract page-xml to gt-linepairs The working directory is either the path to a mets.xml file or the folder with page-xml files. It is possible to pass a mets.xml if it already exists, else a temporary mets-file will be created.
gtcheck extract-page path/to/working-directory {-m path/to/mets.xml} -I {INPUTGROUPNAME} -O {OUTPUTGROUNAME} {parameters...}
Add to server and check! (see above)
Update page-xml files
gtcheck update-page path/to/repo -I {./ GROUPNAME} {parameters...}
In the first page you can set up your git credentials and select the branch or create a new branch for committing the modifications.
In this page you can see and edit the modifications and the original text.
The modifications can be committed (with the commit message), skipped (if not clear what to do), added to the stage mode and later can be committed all at once or can be stashed (keep the original version).
A virtual keyboard supports character replacements and the transcription of missing text
TIFF images
Not all browser support tif images.
The workaround atm is installing browser extension.
E.g. Firefox you can find tiff viewer:
https://addons.mozilla.org/de/firefox/addon/
UTF-8 Foldername:
git config core.quotepath off
Spellchecking:
This app uses the browser spellchecking.
E.g. Firefox:
https://support.mozilla.org/en-US/kb/how-do-i-use-firefox-spell-checker
https://addons.mozilla.org/de/firefox/language-tools/
Copyright (c) 2020 Universitätsbibliothek Mannheim
Author:
GTCheck is Free Software. You may use it under the terms of the Apache 2.0 License. See LICENSE for details.