Import the data from various OCR formats (Olive XML, Mets/Alto, etc.) into a canonical JSON format defined by the Impresso project (see documentation of schemas). A sample of the input data that this importer can deal with can be found in sample_data/.
Code documentation, including a brief guide on how to write a custom importer, can be found at https://impresso.github.io/impresso-text-acquisition/.
Version
3.6
Documentation
Python docstring style https://pythonhosted.org/an_example_pypi_project/sphinx.html
Sphinx configuration file (docs/conf.py
) generated with:
sphinx-quickstart --ext-githubpages
To compile the documentation
cd docs/
make html
To view locally:
Install http-sever
(a node-js package):
npm install http-server -g
Then:
cd docs
http-server
And you'll be able to browse it at http://127.0.0.1:8080.
Testing
Python pytest framework: https://pypi.org/project/pytest/
Tox: https://tox.readthedocs.io/en/latest/
Passing arguments
Doctopt: http://docopt.org/
Style
4 space indentation