Giter Site home page Giter Site logo

ocrd_manager's Introduction

OCR-D Manager

OCR-D Manager is a server that mediates between Kitodo and OCR-D. It resides on the site of the Kitodo installation (so the actual OCR server can be managed independently) but runs in its own container (so Kitodo can be managed independently).

Specifically, it gets called by Kitodo.Production or Kitodo.Presentation to handle OCR for a document, and in turn calls the OCR-D Controller for workflow processing.

For an integration as a service container, orchestrated with other containers (Kitodo+Controller+Monitor), see this meta-repo.

OCR-D Manager is responsible for

  • data transfer from Kitodo to Manager to Controller and back,
  • delegation to Controller,
  • signalling/reporting,
  • result validation,
  • result extraction (putting ALTO files in the process directory where Kitodo.Production expects them, or updating the METS for Kitodo.Presentation).

It is currently implemented as SSH login server with an installation of OCR-D core and an SSH client to connect to the Controller.

Usage

Building

Build or pull the Docker image:

make build # or docker pull ghcr.io/slub/ocrd_manager

Starting and mounting

Then run the container – providing a host-side directory for the volumes …

  • DATA: directory for data processing (including images or existing workspaces),
    defaults to current working directory
  • WORKFLOWS: directory for scripts (preconfigured workflows),
    defaults to ./workflows in current working directory

… but also files …

  • KEYS: public key credentials for log-in to the manager
  • PRIVATE: private key credentials for log-in to the controller …

… and (optionally) some environment variables

  • UID: numerical user identifier to be used by programs in the container
    (will affect the files modified/created); defaults to current user
  • GID: numerical group identifier to be used by programs in the container
    (will affect the files modified/created); defaults to current group
  • UMASK: numerical user mask to be used by programs in the container
    (will affect the files modified/created); defaults to 0002
  • PORT: numerical TCP port to expose the SSH server on the host side
    defaults to 9022 (for non-priviledged access)
  • CONTROLLER network address:port for the controller client (must be reachable from the container network)
  • ACTIVEMQ network address:port of ActiveMQ server listening to result status (must be reachable from the container network)
  • NETWORK name of the Docker network to use
    defaults to bridge (the default Docker network)

… thus, for example:

make run DATA=/mnt/workspaces WORKFLOWS=/mnt/workflows KEYS=~/.ssh/id_rsa.pub PORT=9022 PRIVATE=~/.ssh/id_rsa

(You can also run the service via docker-compose manually – just cp .env.example .env and edit to your needs.)

General management

Then you can log in as user ocrd from remote (but let's use manager in the following – without loss of generality):

ssh -p 9022 ocrd@manager bash -i

(Typically though, you will run a non-interactive script, see next section.)

Processing

In the Manager, you can run shell scripts that do

  • data management and validation via ocrd CLIs
  • OCR processing by running workflows in the controller via ssh ocrd@ocrd_controller log-ins

The data management will depend on which Kitodo context you want to integrate into (Production 2 / 3 or Presentation).

From image to ALTO files

For Kitodo.Production, there is a preconfigured script process_images.sh (or for_production.sh) which takes the following arguments:

SYNOPSIS:

process_images.sh [OPTIONS] DIRECTORY

where OPTIONS can be any/all of:
 --lang LANGUAGE    overall language of the material to process via OCR
 --script SCRIPT    overall script of the material to process via OCR
 --workflow FILE    workflow file to use for processing, default:
                    ocr-workflow-default.sh
 --no-validate      skip comprehensive validation of workflow results
 --img-subdir IMG   name of the subdirectory to read images from, default:
                    images
 --ocr-subdir OCR   name of the subdirectory to write OCR results to, default:
                    ocr/alto
 --proc-id ID       process ID to communicate in ActiveMQ callback
 --task-id ID       task ID to communicate in ActiveMQ callback
 --help             show this message and exit

and DIRECTORY is the local path to process. The script will import
the images from DIRECTORY/IMG into a new (temporary) METS and
transfer this to the Controller for processing. After resyncing back
to the Manager, it will then extract OCR results and export them to
DIRECTORY/OCR.

If ActiveMQ is used, the script will exit directly after initialization,
and run processing in the background. Completion will then be signalled
via ActiveMQ network protocol (using the proc and task ID as message).

ENVIRONMENT VARIABLES:

 CONTROLLER: host name and port of OCR-D Controller for processing
 ACTIVEMQ: URL of ActiveMQ server for result callback (optional)
 ACTIVEMQ_CLIENT: path to ActiveMQ client library JAR file (optional)

The workflow parameter is optional and defaults to the preconfigured script ocr-workflow-default.sh which contains a trivial workflow:

  • import of the images into a new OCR-D workspace
  • preprocessing, layout analysis and text recognition with a single Tesseract processor call
  • format conversion of the result from PAGE-XML to ALTO-XML

It can be replaced with the (path) name of any workflow script mounted under /workflows or /data.

For example (assuming testdata is a directory with image files mounted under /data):

ssh -T -p 9022 ocrd@manager process_images.sh --proc-id 1 --task-id 3 --lang deu --script Fraktur --workflow myocr.sh testdata

From METS to METS file

For Kitodo.Presentation, there is a preconfigured script process_mets.sh (or for_presentation.sh) which takes the following arguments:

SYNOPSIS:

process_mets.sh [OPTIONS] METS

where OPTIONS can be any/all of:
 --workflow FILE    workflow file to use for processing, default:
                    ocr-workflow-default.sh
 --no-validate      skip comprehensive validation of workflow results
 --pages RANGE      selection of physical page range to process
 --img-grp GRP      fileGrp to read input images from, default:
                    DEFAULT
 --ocr-grp GRP      fileGrp to write output OCR text to, default:
                    FULLTEXT
 --url-prefix URL   convert result text file refs from local to URL
                    and prefix them
 --help             show this message and exit

and METS is the path of the METS file to process. The script will copy
the METS into a new (temporary) workspace and transfer this to the
Controller for processing. After resyncing back, it will then extract
OCR results and copy them to METS (adding file references to the file
and copying files to the parent directory).

ENVIRONMENT VARIABLES:

 CONTROLLER: host name and port of OCR-D Controller for processing

For the workflow parameter, the same goes here as above.

For example (assuming testdata is a directory with image files mounted under /data):

ssh -T -p 9022 ocrd@manager process_mets.sh --lang deu --script Fraktur --workflow myocr.sh testdata/mets.xml

Data transfer

For sharing data between the Manager and Controller, it is recommended to transfer files explicitly (as this will make the costs more measurable and controllable).

(This is currently implemented via rsync.)

The data lifecycle should be:

  • on Controller: short-lived
  • on Manager: as long as process is active in Production

(This is currently not managed.)

Logging

All logs are accumulated on standard output, which can be inspected via Docker:

docker logs ocrd_manager

Logs for all services can also be viewed on the Monitor web server.

Testing

After building and starting, you can use the test target for a round-trip:

make test DATA=/mnt/workspaces

This will download sample data and run the default workflow on them. (All logging is still accumulated on the Docker output, so the shell itself will not print any. See above)

(If the Manager has been started externally already, make sure to pass the correct value for the NETWORK variable – the makefile will then attempt to use docker exec instead of ssh ocrd@localhost to connect.)

To clean up the results, use:

make clean-testdata

Maintainers

If you have any questions or encounter any problems, please do not hesitate to contact us.

ocrd_manager's People

Contributors

bertsky avatar markusweigelt avatar svenmarcus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ocrd_manager's Issues

Processing is slow maybe there is some potential for optimizing with simple means

Processing for a OCR with six tif files is very slow. It needs 2:45 minutes to finish (round about 25 seconds per page)

2024-02-23T17:39:00.625388219Z # ocrd-controller:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11
2024-02-23T17:39:00.644442553Z # ocrd-controller:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11
2024-02-23T17:39:00.712902085Z # ocrd-controller:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11
2024-02-23T17:39:00.720364377Z # ocrd-controller:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11
2024-02-23T17:39:00.721615933Z # ocrd-controller:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.11
2024-02-23T17:39:00.965586869Z  * Starting enhanced syslogd rsyslogd       �[80G 
�[74G[ OK ]
2024-02-23T17:39:00.996466810Z  * Starting OpenBSD Secure Shell server sshd       �[80G 
�[74G[ OK ]
2024-02-23T17:39:02.999967028Z Feb 23 17:39:01 ocrd-manager rsyslogd: rsyslogd's groupid changed to 106
2024-02-23T17:39:03.000001113Z Feb 23 17:39:01 ocrd-manager rsyslogd: rsyslogd's userid changed to 105
2024-02-23T17:39:03.000102256Z Feb 23 17:39:01 ocrd-manager rsyslogd: [origin software="rsyslogd" swVersion="8.2001.0" x-pid="41" x-info="https://www.rsyslog.com"] start
2024-02-23T17:39:21.001845171Z Feb 23 17:39:20 ocrd-manager process_images.sh: ocr_init initialize variables and directory structure
2024-02-23T17:39:21.001870972Z Feb 23 17:39:20 ocrd-manager process_images.sh: running with --proc-id testdata-kitodo --task-id 1 /data/testdata-kitodo CONTROLLER=ocrd-controller:22 ACTIVEMQ=kitodo-mq:61616
2024-02-23T17:39:21.001876461Z Feb 23 17:39:20 ocrd-manager process_images.sh: using workflow '/workflows/ocr-workflow-default.sh':
2024-02-23T17:39:21.001880104Z Feb 23 17:39:20 ocrd-manager process_images.sh: "tesserocr-recognize -P segmentation_level region -P model frak2021 -I OCR-D-IMG -O OCR-D-OCR" "fileformat-transform -P from-to \"page alto\" -P script-args \"--no-check-border --dummy-word\" -I OCR-D-OCR -O FULLTEXT" 
2024-02-23T17:39:22.001851250Z Feb 23 17:39:21 ocrd-manager process_images.sh: {
2024-02-23T17:39:22.001878423Z Feb 23 17:39:21 ocrd-manager process_images.sh:   acknowledged: true,
2024-02-23T17:39:22.001883055Z Feb 23 17:39:21 ocrd-manager process_images.sh:   insertedId: ObjectId("65d8d849a9a529735a6555b3")
2024-02-23T17:39:22.001886405Z Feb 23 17:39:21 ocrd-manager process_images.sh: }
2024-02-23T17:39:22.001889203Z Feb 23 17:39:21 ocrd-manager process_images.sh: ocr_exit in async mode - immediate termination of the script
2024-02-23T17:39:22.001892372Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images' -> 'ocr-d//data/testdata-kitodo/images'
2024-02-23T17:39:22.001895671Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000009.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000009.tif.original.jpg'
2024-02-23T17:39:22.001898677Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000010.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000010.tif.original.jpg'
2024-02-23T17:39:22.001901520Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000011.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000011.tif.original.jpg'
2024-02-23T17:39:22.001914627Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000012.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000012.tif.original.jpg'
2024-02-23T17:39:22.001918256Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000013.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000013.tif.original.jpg'
2024-02-23T17:39:22.001921254Z Feb 23 17:39:21 ocrd-manager process_images.sh: '/data/testdata-kitodo/images/00000014.tif.original.jpg' -> 'ocr-d//data/testdata-kitodo/images/00000014.tif.original.jpg'
2024-02-23T17:39:22.001924138Z Feb 23 17:39:21 ocrd-manager process_images.sh: sending incremental file list
2024-02-23T17:39:22.001926858Z Feb 23 17:39:21 ocrd-manager process_images.sh: created directory /data/KitodoJob_91_testdata-kitodo
2024-02-23T17:39:22.001929509Z Feb 23 17:39:21 ocrd-manager process_images.sh: ./
2024-02-23T17:39:22.001932006Z Feb 23 17:39:21 ocrd-manager process_images.sh: ocrd.log
2024-02-23T17:39:22.001934460Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/
2024-02-23T17:39:22.001936996Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000009.tif.original.jpg
2024-02-23T17:39:22.001939606Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000010.tif.original.jpg
2024-02-23T17:39:22.001943330Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000011.tif.original.jpg
2024-02-23T17:39:22.001946124Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000012.tif.original.jpg
2024-02-23T17:39:22.001948693Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000013.tif.original.jpg
2024-02-23T17:39:22.001951396Z Feb 23 17:39:21 ocrd-manager process_images.sh: images/00000014.tif.original.jpg
2024-02-23T17:39:22.001953868Z Feb 23 17:39:21 ocrd-manager process_images.sh: 
2024-02-23T17:39:22.001956375Z Feb 23 17:39:21 ocrd-manager process_images.sh: sent 2,485,610 bytes  received 221 bytes  4,971,662.00 bytes/sec
2024-02-23T17:39:22.001959075Z Feb 23 17:39:21 ocrd-manager process_images.sh: total size is 2,484,374  speedup is 1.00
2024-02-23T17:39:23.002055787Z Feb 23 17:39:22 ocrd-manager process_images.sh: WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-02-23T17:39:23.002082296Z Feb 23 17:39:22 ocrd-manager process_images.sh: 2024-02-23 17:39:22 INFO  KitodoActiveMQClient:76 - Sending of message for taskId='1' was successful
2024-02-23T17:39:23.002086840Z Feb 23 17:39:22 2024-02-23 17: 39:22 INFO  KitodoActiveMQClient:76 - Sending of message for taskId='1' was successful
2024-02-23T17:39:23.002090172Z Feb 23 17:39:22 ocrd-manager process_images.sh: execute 3 commands via SSH by the controller
2024-02-23T17:39:23.002092977Z Feb 23 17:39:22 ocrd-manager process_images.sh: set -Ee#015
2024-02-23T17:39:23.002095618Z Feb 23 17:39:22 ocrd-manager process_images.sh: cd 'KitodoJob_91_testdata-kitodo'#015
2024-02-23T17:39:23.002098283Z Feb 23 17:39:22 ocrd-manager process_images.sh: echo $$ > ocrd.pid#015
2024-02-23T17:39:23.002101086Z Feb 23 17:39:22 ocrd-manager process_images.sh: if test -f mets.xml; then OV=--overwrite; else OV=; ocrd-import -j 1 -i; fi#015
2024-02-23T17:39:23.002117815Z Feb 23 17:39:22 ocrd-manager process_images.sh: ocrd validate tasks $OV --workspace . "tesserocr-recognize -P segmentation_level region -P model frak2021 -I OCR-D-IMG -O OCR-D-OCR" "fileformat-transform -P from-to \"page alto\" -P script-args \"--no-check-border --dummy-word\" -I OCR-D-OCR -O FULLTEXT" #015
2024-02-23T17:39:23.002122169Z Feb 23 17:39:22 ocrd-manager process_images.sh: ocrd process $OV "tesserocr-recognize -P segmentation_level region -P model frak2021 -I OCR-D-IMG -O OCR-D-OCR" "fileformat-transform -P from-to \"page alto\" -P script-args \"--no-check-border --dummy-word\" -I OCR-D-OCR -O FULLTEXT" #015
2024-02-23T17:39:23.002125797Z Feb 23 17:39:22 ocrd-manager process_images.sh: /data$ set -Ee#015
2024-02-23T17:39:23.002128389Z Feb 23 17:39:22 ocrd-manager process_images.sh: /data$ cd 'KitodoJob_91_testdata-kitodo'#015
2024-02-23T17:39:23.002131015Z Feb 23 17:39:22 ocrd-manager process_images.sh: /data/KitodoJob_91_testdata-kitodo$ echo $$ > ocrd.pid#015
2024-02-23T17:39:23.002133778Z Feb 23 17:39:22 ocrd-manager process_images.sh: /data/KitodoJob_91_testdata-kitodo$ #015<n OV=--overwrite; else OV=; ocrd-import -j 1 -i; fi#015
2024-02-23T17:39:34.003322229Z Feb 23 17:39:33 ocrd-manager process_images.sh: 17:39:33.646 INFO ocrd.resolver.workspace_from_nothing - Writing METS to /data/KitodoJob_91_testdata-kitodo/mets.xml#015
2024-02-23T17:39:34.003346640Z Feb 23 17:39:33 ocrd-manager process_images.sh: /data/KitodoJob_91_testdata-kitodo#015
2024-02-23T17:39:41.004753306Z Feb 23 17:39:40 ocrd-manager process_images.sh: 17:39:40.542 INFO ocrd-import - adding -g p0001 -G OCR-D-IMG -m image/jpeg -i f00000009_tif_original 'images/00000009.tif.original.jpg'#015
2024-02-23T17:39:45.005192068Z Feb 23 17:39:44 ocrd-manager process_images.sh: 17:39:44.961 INFO ocrd-import - adding -g p0002 -G OCR-D-IMG -m image/jpeg -i f00000010_tif_original 'images/00000010.tif.original.jpg'#015
2024-02-23T17:39:50.006037044Z Feb 23 17:39:49 ocrd-manager process_images.sh: 17:39:49.257 INFO ocrd-import - adding -g p0003 -G OCR-D-IMG -m image/jpeg -i f00000011_tif_original 'images/00000011.tif.original.jpg'#015
2024-02-23T17:39:54.006668326Z Feb 23 17:39:53 ocrd-manager process_images.sh: 17:39:53.540 INFO ocrd-import - adding -g p0004 -G OCR-D-IMG -m image/jpeg -i f00000012_tif_original 'images/00000012.tif.original.jpg'#015
2024-02-23T17:39:58.007133350Z Feb 23 17:39:57 ocrd-manager process_images.sh: 17:39:57.962 INFO ocrd-import - adding -g p0005 -G OCR-D-IMG -m image/jpeg -i f00000013_tif_original 'images/00000013.tif.original.jpg'#015
2024-02-23T17:40:03.007542176Z Feb 23 17:40:02 ocrd-manager process_images.sh: 17:40:02.234 INFO ocrd-import - adding -g p0006 -G OCR-D-IMG -m image/jpeg -i f00000014_tif_original 'images/00000014.tif.original.jpg'#015
2024-02-23T17:40:05.007814476Z Feb 23 17:40:04 ocrd-manager process_images.sh: 17:40:04.329 WARNING ocrd-import - converting 'ocrd.pid' to 'OCR-D-IMG/ocrd_*.tif' prior to import#015
2024-02-23T17:40:05.007833500Z Feb 23 17:40:04 ocrd-manager process_images.sh: convert-im6.q16: no decode delegate for this image format `PID' @ error/constitute.c/ReadImage/560.#015
2024-02-23T17:40:05.007836843Z Feb 23 17:40:04 ocrd-manager process_images.sh: convert-im6.q16: no images defined `OCR-D-IMG/ocrd_%04d.tif' @ error/convert.c/ConvertImageCommand/3258.#015
2024-02-23T17:40:07.008328248Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.557 WARNING ocrd-import - unknown type of file 'ocrd.pid'#015
2024-02-23T17:40:07.008386281Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.843 INFO ocrd.cli.workspace.bulk-add - [   1/6] OCR-D-IMG image/jpeg p0001 f00000009_tif_original images/00000009.tif.original.jpg#015
2024-02-23T17:40:07.008399735Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.852 INFO ocrd.cli.workspace.bulk-add - [   2/6] OCR-D-IMG image/jpeg p0002 f00000010_tif_original images/00000010.tif.original.jpg#015
2024-02-23T17:40:07.008408520Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.853 INFO ocrd.cli.workspace.bulk-add - [   3/6] OCR-D-IMG image/jpeg p0003 f00000011_tif_original images/00000011.tif.original.jpg#015
2024-02-23T17:40:07.008416634Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.853 INFO ocrd.cli.workspace.bulk-add - [   4/6] OCR-D-IMG image/jpeg p0004 f00000012_tif_original images/00000012.tif.original.jpg#015
2024-02-23T17:40:07.008424149Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.853 INFO ocrd.cli.workspace.bulk-add - [   5/6] OCR-D-IMG image/jpeg p0005 f00000013_tif_original images/00000013.tif.original.jpg#015
2024-02-23T17:40:07.008431502Z Feb 23 17:40:06 ocrd-manager process_images.sh: 17:40:06.854 INFO ocrd.cli.workspace.bulk-add - [   6/6] OCR-D-IMG image/jpeg p0006 f00000014_tif_original images/00000014.tif.original.jpg#015
2024-02-23T17:40:10.008738430Z Feb 23 17:40:09 ocrd-manager process_images.sh: 17:40:09.125 INFO ocrd-import - Success on '.'#015
2024-02-23T17:40:10.008766132Z Feb 23 17:40:09 ocrd-manager process_images.sh: /data/KitodoJob_91_testdata-kitodo$ #015<ck-border --dummy-word\" -I OCR-D-OCR -O FULLTEXT" #015
2024-02-23T17:40:37.012009422Z Feb 23 17:40:23 ocrd-manager process_images.sh: /data/KitodoJob_91_testdata-kitodo$ #015<ck-border --dummy-word\" -I OCR-D-OCR -O FULLTEXT" #015
2024-02-23T17:40:37.012029683Z Feb 23 17:40:36 ocrd-manager process_images.sh: 17:40:36.727 INFO ocrd.task_sequence.run_tasks - Start processing task 'tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR -p '{"segmentation_level": "region", "model": "frak2021", "dpi": 0, "padding": 0, "textequiv_level": "word", "overwrite_segments": false, "overwrite_text": true, "shrink_polygons": false, "block_polygons": false, "find_tables": true, "find_staves": false, "sparse_text": false, "raw_lines": false, "char_whitelist": "", "char_blacklist": "", "char_unblacklist": "", "tesseract_parameters": {}, "xpath_parameters": {}, "xpath_model": {}, "auto_model": false, "oem": "DEFAULT"}''#015
2024-02-23T17:40:39.012485331Z Feb 23 17:40:38 ocrd-manager process_images.sh: 17:40:38.556 INFO processor.TesserocrRecognize - Using model 'frak2021' in /models/ocrd-resources/ocrd-tesserocr-recognize/ for recognition at the word level#015
2024-02-23T17:40:39.012509370Z Feb 23 17:40:38 ocrd-manager process_images.sh: 17:40:38.606 INFO processor.TesserocrRecognize - INPUT FILE 0 / p0001#015
2024-02-23T17:40:39.012512864Z Feb 23 17:40:38 ocrd-manager process_images.sh: 17:40:38.683 INFO processor.TesserocrRecognize - Page 'p0001' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:39.012515490Z Feb 23 17:40:38 ocrd-manager process_images.sh: 17:40:38.684 INFO processor.TesserocrRecognize - Processing page 'p0001'#015
2024-02-23T17:40:40.012524426Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.552 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0001.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0001.IMG-BIN.png#015
2024-02-23T17:40:40.012542302Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.556 INFO processor.TesserocrRecognize - Detected region 'region0000': 732,859 782,859 782,877 732,877 (CAPTION_TEXT)#015
2024-02-23T17:40:40.012545869Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.609 INFO processor.TesserocrRecognize - Detected line 'region0000_line0000': 732,859 782,859 782,877 732,877#015
2024-02-23T17:40:40.012548271Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.612 INFO processor.TesserocrRecognize - Detected region 'region0001': 466,949 1063,949 1063,1028 466,1028 (CAPTION_TEXT)#015
2024-02-23T17:40:40.012550513Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.612 INFO processor.TesserocrRecognize - Detected line 'region0001_line0000': 466,949 1063,949 1063,1028 466,1028#015
2024-02-23T17:40:40.012552749Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.709 INFO processor.TesserocrRecognize - Detected region 'region0002': 357,1126 1166,1126 1166,1179 357,1179 (FLOWING_TEXT)#015
2024-02-23T17:40:40.012554943Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.711 INFO processor.TesserocrRecognize - Detected line 'region0002_line0000': 357,1126 1166,1126 1166,1179 357,1179#015
2024-02-23T17:40:40.012557066Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.715 INFO processor.TesserocrRecognize - Detected region 'region0003': 494,1564 1039,1564 1039,1768 494,1768 (FLOWING_TEXT)#015
2024-02-23T17:40:40.012559149Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.716 INFO processor.TesserocrRecognize - Detected line 'region0003_line0000': 683,1564 838,1564 838,1607 683,1607#015
2024-02-23T17:40:40.012561314Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.716 INFO processor.TesserocrRecognize - Detected line 'region0003_line0001': 494,1635 1039,1635 1039,1688 494,1688#015
2024-02-23T17:40:40.012563557Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.718 INFO processor.TesserocrRecognize - Detected line 'region0003_line0002': 648,1715 863,1715 863,1768 648,1768#015
2024-02-23T17:40:40.012567962Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.728 INFO processor.TesserocrRecognize - Detected region 'region0004': 254,0 1564,0 1564,2280 254,2280 (PULLOUT_IMAGE)#015
2024-02-23T17:40:40.012570649Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.729 INFO processor.TesserocrRecognize - INPUT FILE 1 / p0002#015
2024-02-23T17:40:40.012573026Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.893 INFO processor.TesserocrRecognize - Page 'p0002' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:40.012575397Z Feb 23 17:40:39 ocrd-manager process_images.sh: 17:40:39.893 INFO processor.TesserocrRecognize - Processing page 'p0002'#015
2024-02-23T17:40:41.012739038Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.780 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0002.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0002.IMG-BIN.png#015
2024-02-23T17:40:41.012757514Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.782 INFO processor.TesserocrRecognize - Detected region 'region0000': 0,0 122,0 122,2280 0,2280 (FLOWING_IMAGE)#015
2024-02-23T17:40:41.012772160Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.810 INFO processor.TesserocrRecognize - Detected region 'region0001': 334,895 1305,895 1305,1164 334,1164 (FLOWING_TEXT)#015
2024-02-23T17:40:41.012776075Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.810 INFO processor.TesserocrRecognize - Detected line 'region0001_line0000': 334,895 1305,895 1305,980 334,980#015
2024-02-23T17:40:41.012779078Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.911 INFO processor.TesserocrRecognize - Detected line 'region0001_line0001': 335,992 1303,992 1303,1038 335,1038#015
2024-02-23T17:40:41.012782217Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.913 INFO processor.TesserocrRecognize - Detected line 'region0001_line0002': 335,1055 1304,1055 1304,1101 335,1101#015
2024-02-23T17:40:41.012785069Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.914 INFO processor.TesserocrRecognize - Detected line 'region0001_line0003': 539,1119 1102,1119 1102,1164 539,1164#015
2024-02-23T17:40:41.012787904Z Feb 23 17:40:40 ocrd-manager process_images.sh: 17:40:40.916 INFO processor.TesserocrRecognize - INPUT FILE 2 / p0003#015
2024-02-23T17:40:42.012921896Z Feb 23 17:40:41 ocrd-manager process_images.sh: 17:40:41.040 INFO processor.TesserocrRecognize - Page 'p0003' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:42.012944639Z Feb 23 17:40:41 ocrd-manager process_images.sh: 17:40:41.041 INFO processor.TesserocrRecognize - Processing page 'p0003'#015
2024-02-23T17:40:43.012950988Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.426 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0003.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0003.IMG-BIN.png#015
2024-02-23T17:40:43.012981836Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.428 INFO processor.TesserocrRecognize - Detected region 'region0000': 507,58 581,58 581,88 507,88 (FLOWING_IMAGE)#015
2024-02-23T17:40:43.012986766Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.429 INFO processor.TesserocrRecognize - Detected region 'region0001': 678,267 1242,267 1242,302 678,302 (FLOWING_TEXT)#015
2024-02-23T17:40:43.012989760Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.429 INFO processor.TesserocrRecognize - Detected line 'region0001_line0000': 678,267 1242,267 1242,302 678,302#015
2024-02-23T17:40:43.012992612Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.431 INFO processor.TesserocrRecognize - Detected region 'region0002': 275,304 1243,304 1243,315 275,315 (HORZ_LINE)#015
2024-02-23T17:40:43.012995106Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.431 INFO processor.TesserocrRecognize - Detected region 'region0003': 621,671 889,671 889,724 621,724 (FLOWING_TEXT)#015
2024-02-23T17:40:43.012997686Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.432 INFO processor.TesserocrRecognize - Detected line 'region0003_line0000': 621,671 889,671 889,724 621,724#015
2024-02-23T17:40:43.013000353Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.509 INFO processor.TesserocrRecognize - Detected region 'region0004': 268,798 1244,798 1244,1861 268,1861 (FLOWING_TEXT)#015
2024-02-23T17:40:43.013016480Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.511 INFO processor.TesserocrRecognize - Detected line 'region0004_line0000': 276,798 1244,798 1244,875 276,875#015
2024-02-23T17:40:43.013020511Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.615 INFO processor.TesserocrRecognize - Detected line 'region0004_line0001': 273,892 1241,892 1241,938 273,938#015
2024-02-23T17:40:43.013023129Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.619 INFO processor.TesserocrRecognize - Detected line 'region0004_line0002': 272,954 1241,954 1241,999 272,999#015
2024-02-23T17:40:43.013027151Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.621 INFO processor.TesserocrRecognize - Detected line 'region0004_line0003': 272,1016 1238,1016 1238,1061 272,1061#015
2024-02-23T17:40:43.013029840Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.623 INFO processor.TesserocrRecognize - Detected line 'region0004_line0004': 272,1078 1238,1078 1238,1123 272,1123#015
2024-02-23T17:40:43.013033133Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.624 INFO processor.TesserocrRecognize - Detected line 'region0004_line0005': 272,1140 1237,1140 1237,1185 272,1185#015
2024-02-23T17:40:43.013035616Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.626 INFO processor.TesserocrRecognize - Detected line 'region0004_line0006': 271,1201 1237,1201 1237,1248 271,1248#015
2024-02-23T17:40:43.013038394Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.627 INFO processor.TesserocrRecognize - Detected line 'region0004_line0007': 272,1263 1238,1263 1238,1306 272,1306#015
2024-02-23T17:40:43.013040825Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.629 INFO processor.TesserocrRecognize - Detected line 'region0004_line0008': 271,1324 1236,1324 1236,1373 271,1373#015
2024-02-23T17:40:43.013043382Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.631 INFO processor.TesserocrRecognize - Detected line 'region0004_line0009': 271,1387 1236,1387 1236,1434 271,1434#015
2024-02-23T17:40:43.013046272Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.633 INFO processor.TesserocrRecognize - Detected line 'region0004_line0010': 269,1448 1236,1448 1236,1493 269,1493#015
2024-02-23T17:40:43.013048751Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.635 INFO processor.TesserocrRecognize - Detected line 'region0004_line0011': 270,1511 1235,1511 1235,1557 270,1557#015
2024-02-23T17:40:43.013051506Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.710 INFO processor.TesserocrRecognize - Detected line 'region0004_line0012': 269,1573 1236,1573 1236,1621 269,1621#015
2024-02-23T17:40:43.013053915Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.716 INFO processor.TesserocrRecognize - Detected line 'region0004_line0013': 268,1636 1236,1636 1236,1683 268,1683#015
2024-02-23T17:40:43.013056462Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.724 INFO processor.TesserocrRecognize - Detected line 'region0004_line0014': 268,1697 1236,1697 1236,1743 268,1743#015
2024-02-23T17:40:43.013059302Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.738 INFO processor.TesserocrRecognize - Detected line 'region0004_line0015': 268,1758 1234,1758 1234,1807 268,1807#015
2024-02-23T17:40:43.013061957Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.740 INFO processor.TesserocrRecognize - Detected line 'region0004_line0016': 268,1820 401,1820 401,1861 268,1861#015
2024-02-23T17:40:43.013068061Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.811 INFO processor.TesserocrRecognize - Detected region 'region0005': 1468,0 1564,0 1564,2280 1468,2280 (FLOWING_IMAGE)#015
2024-02-23T17:40:43.013071489Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.814 INFO processor.TesserocrRecognize - INPUT FILE 3 / p0004#015
2024-02-23T17:40:43.013074227Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.908 INFO processor.TesserocrRecognize - Page 'p0004' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:43.013076939Z Feb 23 17:40:42 ocrd-manager process_images.sh: 17:40:42.908 INFO processor.TesserocrRecognize - Processing page 'p0004'#015
2024-02-23T17:40:45.013527572Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.693 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0004.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0004.IMG-BIN.png#015
2024-02-23T17:40:45.013597856Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.696 INFO processor.TesserocrRecognize - Detected region 'region0000': 0,0 154,0 154,2280 0,2280 (FLOWING_IMAGE)#015
2024-02-23T17:40:45.013608608Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.706 INFO processor.TesserocrRecognize - Detected region 'region0001': 333,254 913,254 913,289 333,289 (FLOWING_TEXT)#015
2024-02-23T17:40:45.013617840Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.708 INFO processor.TesserocrRecognize - Detected line 'region0001_line0000': 333,254 913,254 913,289 333,289#015
2024-02-23T17:40:45.013625319Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.810 INFO processor.TesserocrRecognize - Detected region 'region0002': 336,286 1305,286 1305,304 336,304 (HORZ_LINE)#015
2024-02-23T17:40:45.013632199Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.813 INFO processor.TesserocrRecognize - Detected region 'region0003': 334,358 1316,358 1316,1103 334,1103 (FLOWING_TEXT)#015
2024-02-23T17:40:45.013639158Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.813 INFO processor.TesserocrRecognize - Detected line 'region0003_line0000': 416,358 1307,358 1307,411 416,411#015
2024-02-23T17:40:45.013646455Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.815 INFO processor.TesserocrRecognize - Detected line 'region0003_line0001': 335,421 1309,421 1309,477 335,477#015
2024-02-23T17:40:45.013653297Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.817 INFO processor.TesserocrRecognize - Detected line 'region0003_line0002': 334,485 1309,485 1309,539 334,539#015
2024-02-23T17:40:45.013659974Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.818 INFO processor.TesserocrRecognize - Detected line 'region0003_line0003': 337,548 1310,548 1310,601 337,601#015
2024-02-23T17:40:45.013666774Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.820 INFO processor.TesserocrRecognize - Detected line 'region0003_line0004': 337,612 1312,612 1312,662 337,662#015
2024-02-23T17:40:45.013676602Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.821 INFO processor.TesserocrRecognize - Detected line 'region0003_line0005': 337,672 1312,672 1312,721 337,721#015
2024-02-23T17:40:45.013684516Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.823 INFO processor.TesserocrRecognize - Detected line 'region0003_line0006': 338,735 1312,735 1312,790 338,790#015
2024-02-23T17:40:45.013727261Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.825 INFO processor.TesserocrRecognize - Detected line 'region0003_line0007': 340,798 1314,798 1314,853 340,853#015
2024-02-23T17:40:45.013736629Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.827 INFO processor.TesserocrRecognize - Detected line 'region0003_line0008': 340,858 1313,858 1313,916 340,916#015
2024-02-23T17:40:45.013744291Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.829 INFO processor.TesserocrRecognize - Detected line 'region0003_line0009': 341,924 1314,924 1314,971 341,971#015
2024-02-23T17:40:45.013751883Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.830 INFO processor.TesserocrRecognize - Detected line 'region0003_line0010': 342,986 1316,986 1316,1041 342,1041#015
2024-02-23T17:40:45.013759317Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.832 INFO processor.TesserocrRecognize - Detected line 'region0003_line0011': 345,1051 1252,1051 1252,1103 345,1103#015
2024-02-23T17:40:45.013768181Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.835 INFO processor.TesserocrRecognize - Detected region 'region0004': 347,1121 1326,1121 1326,1862 347,1862 (FLOWING_TEXT)#015
2024-02-23T17:40:45.013778933Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.835 INFO processor.TesserocrRecognize - Detected line 'region0004_line0000': 429,1121 1316,1121 1316,1176 429,1176#015
2024-02-23T17:40:45.013786523Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.911 INFO processor.TesserocrRecognize - Detected line 'region0004_line0001': 347,1184 1316,1184 1316,1241 347,1241#015
2024-02-23T17:40:45.013794362Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.913 INFO processor.TesserocrRecognize - Detected line 'region0004_line0002': 347,1248 1318,1248 1318,1303 347,1303#015
2024-02-23T17:40:45.013801771Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.915 INFO processor.TesserocrRecognize - Detected line 'region0004_line0003': 347,1305 1320,1305 1320,1363 347,1363#015
2024-02-23T17:40:45.013812482Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.917 INFO processor.TesserocrRecognize - Detected line 'region0004_line0004': 348,1375 1319,1375 1319,1428 348,1428#015
2024-02-23T17:40:45.013824992Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.919 INFO processor.TesserocrRecognize - Detected line 'region0004_line0005': 350,1434 1321,1434 1321,1491 350,1491#015
2024-02-23T17:40:45.013834200Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.920 INFO processor.TesserocrRecognize - Detected line 'region0004_line0006': 350,1498 1322,1498 1322,1553 350,1553#015
2024-02-23T17:40:45.013844309Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.922 INFO processor.TesserocrRecognize - Detected line 'region0004_line0007': 351,1561 1322,1561 1322,1616 351,1616#015
2024-02-23T17:40:45.013855983Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.924 INFO processor.TesserocrRecognize - Detected line 'region0004_line0008': 352,1623 1324,1623 1324,1673 352,1673#015
2024-02-23T17:40:45.013863636Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.926 INFO processor.TesserocrRecognize - Detected line 'region0004_line0009': 352,1685 1323,1685 1323,1736 352,1736#015
2024-02-23T17:40:45.013871353Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.928 INFO processor.TesserocrRecognize - Detected line 'region0004_line0010': 352,1747 1325,1747 1325,1798 352,1798#015
2024-02-23T17:40:45.013893445Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.930 INFO processor.TesserocrRecognize - Detected line 'region0004_line0011': 354,1810 1326,1810 1326,1862 354,1862#015
2024-02-23T17:40:45.013902608Z Feb 23 17:40:44 ocrd-manager process_images.sh: 17:40:44.935 INFO processor.TesserocrRecognize - INPUT FILE 4 / p0005#015
2024-02-23T17:40:46.013598583Z Feb 23 17:40:45 ocrd-manager process_images.sh: 17:40:45.109 INFO processor.TesserocrRecognize - Page 'p0005' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:46.013619323Z Feb 23 17:40:45 ocrd-manager process_images.sh: 17:40:45.109 INFO processor.TesserocrRecognize - Processing page 'p0005'#015
2024-02-23T17:40:47.013695887Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.395 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0005.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0005.IMG-BIN.png#015
2024-02-23T17:40:47.013723303Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.396 INFO processor.TesserocrRecognize - Detected region 'region0000': 657,290 1234,290 1234,324 657,324 (FLOWING_TEXT)#015
2024-02-23T17:40:47.013727050Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.406 INFO processor.TesserocrRecognize - Detected line 'region0000_line0000': 657,290 1234,290 1234,324 657,324#015
2024-02-23T17:40:47.013729857Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.407 INFO processor.TesserocrRecognize - Detected region 'region0001': 269,324 1235,324 1235,338 269,338 (HORZ_LINE)#015
2024-02-23T17:40:47.013732659Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.506 INFO processor.TesserocrRecognize - Detected region 'region0002': 263,396 1234,396 1234,1259 263,1259 (FLOWING_TEXT)#015
2024-02-23T17:40:47.013735126Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.508 INFO processor.TesserocrRecognize - Detected line 'region0002_line0000': 267,396 1234,396 1234,446 267,446#015
2024-02-23T17:40:47.013737653Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.512 INFO processor.TesserocrRecognize - Detected line 'region0002_line0001': 264,459 1234,459 1234,509 264,509#015
2024-02-23T17:40:47.013740472Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.513 INFO processor.TesserocrRecognize - Detected line 'region0002_line0002': 265,521 1231,521 1231,571 265,571#015
2024-02-23T17:40:47.013743477Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.514 INFO processor.TesserocrRecognize - Detected line 'region0002_line0003': 264,583 1232,583 1232,633 264,633#015
2024-02-23T17:40:47.013745955Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.516 INFO processor.TesserocrRecognize - Detected line 'region0002_line0004': 264,646 1232,646 1232,693 264,693#015
2024-02-23T17:40:47.013748533Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.517 INFO processor.TesserocrRecognize - Detected line 'region0002_line0005': 264,708 1232,708 1232,755 264,755#015
2024-02-23T17:40:47.013752841Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.519 INFO processor.TesserocrRecognize - Detected line 'region0002_line0006': 263,770 1232,770 1232,820 263,820#015
2024-02-23T17:40:47.013770292Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.521 INFO processor.TesserocrRecognize - Detected line 'region0002_line0007': 263,832 873,832 873,873 263,873#015
2024-02-23T17:40:47.013774253Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.522 INFO processor.TesserocrRecognize - Detected line 'region0002_line0008': 346,904 1229,904 1229,951 346,951#015
2024-02-23T17:40:47.013776728Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.524 INFO processor.TesserocrRecognize - Detected line 'region0002_line0009': 264,965 1228,965 1228,1016 264,1016#015
2024-02-23T17:40:47.013779259Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.525 INFO processor.TesserocrRecognize - Detected line 'region0002_line0010': 264,1027 1226,1027 1226,1083 264,1083#015
2024-02-23T17:40:47.013781824Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.527 INFO processor.TesserocrRecognize - Detected line 'region0002_line0011': 263,1090 1228,1090 1228,1141 263,1141#015
2024-02-23T17:40:47.013784167Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.528 INFO processor.TesserocrRecognize - Detected line 'region0002_line0012': 264,1151 1226,1151 1226,1200 264,1200#015
2024-02-23T17:40:47.013786625Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.530 INFO processor.TesserocrRecognize - Detected line 'region0002_line0013': 265,1212 1058,1212 1058,1259 265,1259#015
2024-02-23T17:40:47.013789009Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.531 INFO processor.TesserocrRecognize - Detected region 'region0003': 748,1333 1176,1333 1176,1384 748,1384 (FLOWING_TEXT)#015
2024-02-23T17:40:47.013791509Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.532 INFO processor.TesserocrRecognize - Detected line 'region0003_line0000': 748,1333 1176,1333 1176,1384 748,1384#015
2024-02-23T17:40:47.013793926Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.533 INFO processor.TesserocrRecognize - Detected region 'region0004': 1478,0 1564,0 1564,2280 1478,2280 (FLOWING_IMAGE)#015
2024-02-23T17:40:47.013796491Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.611 INFO processor.TesserocrRecognize - INPUT FILE 5 / p0006#015
2024-02-23T17:40:47.013799230Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.720 INFO processor.TesserocrRecognize - Page 'p0006' images will use 300 DPI from image meta-data#015
2024-02-23T17:40:47.013802047Z Feb 23 17:40:46 ocrd-manager process_images.sh: 17:40:46.720 INFO processor.TesserocrRecognize - Processing page 'p0006'#015
2024-02-23T17:40:49.014288903Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.013 INFO ocrd.workspace.save_image_file - created file ID: OCR-D-OCR_p0006.IMG-BIN, file_grp: OCR-D-OCR, path: OCR-D-OCR/OCR-D-OCR_p0006.IMG-BIN.png#015
2024-02-23T17:40:49.014352253Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.016 INFO processor.TesserocrRecognize - Detected region 'region0000': 346,275 964,275 964,322 346,322 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014362689Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.016 INFO processor.TesserocrRecognize - Detected line 'region0000_line0000': 346,275 964,275 964,322 346,322#015
2024-02-23T17:40:49.014369898Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.018 INFO processor.TesserocrRecognize - Detected region 'region0001': 344,306 1313,306 1313,328 344,328 (HORZ_LINE)#015
2024-02-23T17:40:49.014403959Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.019 INFO processor.TesserocrRecognize - Detected region 'region0002': 634,451 1030,451 1030,511 634,511 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014412236Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.019 INFO processor.TesserocrRecognize - Detected line 'region0002_line0000': 634,451 1030,451 1030,511 634,511#015
2024-02-23T17:40:49.014418865Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.022 INFO processor.TesserocrRecognize - Detected region 'region0003': 0,0 158,0 158,2280 0,2280 (FLOWING_IMAGE)#015
2024-02-23T17:40:49.014425152Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.023 INFO processor.TesserocrRecognize - Detected region 'region0004': 350,618 480,618 480,646 350,646 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014431469Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.023 INFO processor.TesserocrRecognize - Detected line 'region0004_line0000': 350,618 480,618 480,646 350,646#015
2024-02-23T17:40:49.014437539Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.024 INFO processor.TesserocrRecognize - Detected region 'region0005': 352,630 890,630 890,707 352,707 (HEADING_TEXT)#015
2024-02-23T17:40:49.014443711Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.106 INFO processor.TesserocrRecognize - Detected line 'region0005_line0000': 352,630 890,630 890,707 352,707#015
2024-02-23T17:40:49.014452999Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.108 INFO processor.TesserocrRecognize - Detected region 'region0006': 353,712 701,712 701,795 353,795 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014459511Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.109 INFO processor.TesserocrRecognize - Detected line 'region0006_line0000': 353,712 627,712 627,749 353,749#015
2024-02-23T17:40:49.014465865Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.111 INFO processor.TesserocrRecognize - Detected line 'region0006_line0001': 354,756 701,756 701,795 354,795#015
2024-02-23T17:40:49.014472412Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.209 INFO processor.TesserocrRecognize - Detected region 'region0007': 355,800 762,800 762,837 355,837 (HEADING_TEXT)#015
2024-02-23T17:40:49.014478546Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.211 INFO processor.TesserocrRecognize - Detected line 'region0007_line0000': 355,800 762,800 762,837 355,837#015
2024-02-23T17:40:49.014484692Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.213 INFO processor.TesserocrRecognize - Detected region 'region0008': 355,847 717,847 717,884 355,884 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014490851Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.214 INFO processor.TesserocrRecognize - Detected line 'region0008_line0000': 355,847 717,847 717,884 355,884#015
2024-02-23T17:40:49.014496937Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.216 INFO processor.TesserocrRecognize - Detected region 'region0009': 356,860 1028,860 1028,928 356,928 (PULLOUT_TEXT)#015
2024-02-23T17:40:49.014503181Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.216 INFO processor.TesserocrRecognize - Detected line 'region0009_line0000': 356,860 1028,860 1028,928 356,928#015
2024-02-23T17:40:49.014509055Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.220 INFO processor.TesserocrRecognize - Detected region 'region0010': 357,939 711,939 711,1200 357,1200 (FLOWING_TEXT)#015
2024-02-23T17:40:49.014524775Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.220 INFO processor.TesserocrRecognize - Detected line 'region0010_line0000': 357,939 711,939 711,974 357,974#015
2024-02-23T17:40:49.014531988Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.221 INFO processor.TesserocrRecognize - Detected line 'region0010_line0001': 357,980 701,980 701,1017 357,1017#015
2024-02-23T17:40:49.014539639Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.222 INFO processor.TesserocrRecognize - Detected line 'region0010_line0002': 359,1030 670,1030 670,1065 359,1065#015
2024-02-23T17:40:49.014546520Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.223 INFO processor.TesserocrRecognize - Detected line 'region0010_line0003': 358,1076 626,1076 626,1110 358,1110#015
2024-02-23T17:40:49.014553582Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.224 INFO processor.TesserocrRecognize - Detected line 'region0010_line0004': 360,1118 624,1118 624,1155 360,1155#015
2024-02-23T17:40:49.015124218Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.226 INFO processor.TesserocrRecognize - Detected line 'region0010_line0005': 362,1161 663,1161 663,1200 362,1200#015
2024-02-23T17:40:49.015221514Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.228 INFO processor.TesserocrRecognize - Detected region 'region0011': 362,1205 762,1205 762,1244 362,1244 (HEADING_TEXT)#015
2024-02-23T17:40:49.015233540Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.228 INFO processor.TesserocrRecognize - Detected line 'region0011_line0000': 362,1205 762,1205 762,1244 362,1244#015
2024-02-23T17:40:49.015241804Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.230 INFO processor.TesserocrRecognize - Detected region 'region0012': 363,1254 723,1254 723,1468 363,1468 (FLOWING_TEXT)#015
2024-02-23T17:40:49.015248957Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.231 INFO processor.TesserocrRecognize - Detected line 'region0012_line0000': 363,1254 720,1254 720,1287 363,1287#015
2024-02-23T17:40:49.015255630Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.232 INFO processor.TesserocrRecognize - Detected line 'region0012_line0001': 364,1296 708,1296 708,1335 364,1335#015
2024-02-23T17:40:49.015262127Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.233 INFO processor.TesserocrRecognize - Detected line 'region0012_line0002': 364,1340 720,1340 720,1377 364,1377#015
2024-02-23T17:40:49.015268812Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.234 INFO processor.TesserocrRecognize - Detected line 'region0012_line0003': 364,1389 720,1389 720,1417 364,1417#015
2024-02-23T17:40:49.015275032Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.307 INFO processor.TesserocrRecognize - Detected line 'region0012_line0004': 366,1432 723,1432 723,1468 366,1468#015
2024-02-23T17:40:49.015281370Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.312 INFO processor.TesserocrRecognize - Detected region 'region0013': 367,1447 949,1447 949,1557 367,1557 (HEADING_TEXT)#015
2024-02-23T17:40:49.015287655Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.313 INFO processor.TesserocrRecognize - Detected line 'region0013_line0000': 368,1447 903,1447 903,1513 368,1513#015
2024-02-23T17:40:49.015294050Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.315 INFO processor.TesserocrRecognize - Detected line 'region0013_line0001': 367,1518 949,1518 949,1557 367,1557#015
2024-02-23T17:40:49.015316774Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.317 INFO processor.TesserocrRecognize - Detected region 'region0014': 368,1569 516,1569 516,1596 368,1596 (FLOWING_TEXT)#015
2024-02-23T17:40:49.015328283Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.318 INFO processor.TesserocrRecognize - Detected line 'region0014_line0000': 368,1569 516,1569 516,1596 368,1596#015
2024-02-23T17:40:49.015334864Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.319 INFO processor.TesserocrRecognize - Detected region 'region0015': 479,1604 1027,1604 1027,1689 479,1689 (PULLOUT_TEXT)#015
2024-02-23T17:40:49.015340947Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.319 INFO processor.TesserocrRecognize - Detected line 'region0015_line0000': 514,1604 1026,1604 1026,1640 514,1640#015
2024-02-23T17:40:49.015346918Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.322 INFO processor.TesserocrRecognize - Detected line 'region0015_line0001': 479,1649 1027,1649 1027,1689 479,1689#015
2024-02-23T17:40:49.015353148Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.325 INFO processor.TesserocrRecognize - Detected region 'region0016': 749,1768 965,1768 965,1822 749,1822 (FLOWING_TEXT)#015
2024-02-23T17:40:49.015359290Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.326 INFO processor.TesserocrRecognize - Detected line 'region0016_line0000': 749,1768 965,1768 965,1822 749,1822#015
2024-02-23T17:40:49.015368842Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.795 INFO ocrd.task_sequence.run_tasks - Finished processing task 'tesserocr-recognize -I OCR-D-IMG -O OCR-D-OCR -p '{"segmentation_level": "region", "model": "frak2021", "dpi": 0, "padding": 0, "textequiv_level": "word", "overwrite_segments": false, "overwrite_text": true, "shrink_polygons": false, "block_polygons": false, "find_tables": true, "find_staves": false, "sparse_text": false, "raw_lines": false, "char_whitelist": "", "char_blacklist": "", "char_unblacklist": "", "tesseract_parameters": {}, "xpath_parameters": {}, "xpath_model": {}, "auto_model": false, "oem": "DEFAULT"}''#015
2024-02-23T17:40:49.015384562Z Feb 23 17:40:48 ocrd-manager process_images.sh: 17:40:48.795 INFO ocrd.task_sequence.run_tasks - Start processing task 'fileformat-transform -I OCR-D-OCR -O FULLTEXT -p '{"from-to": "page alto", "script-args": "--no-check-border --dummy-word", "ext": ""}''#015
2024-02-23T17:41:44.021769800Z Feb 23 17:41:43 ocrd-manager process_images.sh: 17:41:43.028 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0004 (p0004)#015
2024-02-23T17:41:45.021765996Z Feb 23 17:41:44 ocrd-manager process_images.sh: 17:41:44.924 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0002 (p0002)#015
2024-02-23T17:41:46.021892392Z Feb 23 17:41:45 ocrd-manager process_images.sh: 17:41:45.919 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0001 (p0001)#015
2024-02-23T17:41:47.022207668Z Feb 23 17:41:46 ocrd-manager process_images.sh: 17:41:46.819 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0003 (p0003)#015
2024-02-23T17:41:59.023698340Z Feb 23 17:41:58 ocrd-manager process_images.sh: 17:41:58.821 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:01.023873329Z Feb 23 17:42:00 ocrd-manager process_images.sh: 17:42:00.113 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:01.023905136Z Feb 23 17:42:00 ocrd-manager process_images.sh: 17:42:00.921 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:02.023992883Z Feb 23 17:42:01 ocrd-manager process_images.sh: 17:42:01.530 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:15.025091442Z Feb 23 17:42:14 ocrd-manager process_images.sh: 17:42:14.417 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0001.xml FULLTEXT/FULLTEXT_p0001.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:15.025116538Z Feb 23 17:42:14 ocrd-manager process_images.sh: 17:42:14.922 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0004.xml FULLTEXT/FULLTEXT_p0004.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:16.025250517Z Feb 23 17:42:16 ocrd-manager process_images.sh: 17:42:16.022 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0002.xml FULLTEXT/FULLTEXT_p0002.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:17.025245182Z Feb 23 17:42:16 ocrd-manager process_images.sh: 17:42:16.616 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0003.xml FULLTEXT/FULLTEXT_p0003.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:28.027309542Z Feb 23 17:42:27 ocrd-manager process_images.sh: 17:42:27.638 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0006 (p0006)#015
2024-02-23T17:42:28.027335273Z Feb 23 17:42:27 ocrd-manager process_images.sh: 17:42:27.823 INFO ocrd-fileformat-transform - page --> alto: input file OCR-D-OCR_p0005 (p0005)#015
2024-02-23T17:42:33.027851074Z Feb 23 17:42:32 ocrd-manager process_images.sh: 17:42:32.907 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:33.027888757Z Feb 23 17:42:32 ocrd-manager process_images.sh: 17:42:32.933 WARNING page-to-alto - PAGE-XML has neither Border nor PrintSpace - PrintSpace will fill the image#015
2024-02-23T17:42:39.028697937Z Feb 23 17:42:38 ocrd-manager process_images.sh: 17:42:38.108 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0005.xml FULLTEXT/FULLTEXT_p0005.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:39.028721437Z Feb 23 17:42:38 ocrd-manager process_images.sh: 17:42:38.117 INFO ocrd-fileformat-transform - Successfully executed: ocr-transform page alto OCR-D-OCR/OCR-D-OCR_p0006.xml FULLTEXT/FULLTEXT_p0006.xml -- --no-check-border --dummy-word#015
2024-02-23T17:42:41.029255211Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.749 INFO ocrd.cli.workspace.bulk-add - [   1/6] p0001 FULLTEXT_p0001 FULLTEXT/FULLTEXT_p0001.xml#015
2024-02-23T17:42:41.029284932Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.750 INFO ocrd.cli.workspace.bulk-add - [   2/6] p0002 FULLTEXT_p0002 FULLTEXT/FULLTEXT_p0002.xml#015
2024-02-23T17:42:41.029288151Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.750 INFO ocrd.cli.workspace.bulk-add - [   3/6] p0003 FULLTEXT_p0003 FULLTEXT/FULLTEXT_p0003.xml#015
2024-02-23T17:42:41.029304665Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.750 INFO ocrd.cli.workspace.bulk-add - [   4/6] p0004 FULLTEXT_p0004 FULLTEXT/FULLTEXT_p0004.xml#015
2024-02-23T17:42:41.029307567Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.751 INFO ocrd.cli.workspace.bulk-add - [   5/6] p0005 FULLTEXT_p0005 FULLTEXT/FULLTEXT_p0005.xml#015
2024-02-23T17:42:41.029310078Z Feb 23 17:42:40 ocrd-manager process_images.sh: 17:42:40.751 INFO ocrd.cli.workspace.bulk-add - [   6/6] p0006 FULLTEXT_p0006 FULLTEXT/FULLTEXT_p0006.xml#015
2024-02-23T17:42:42.030070403Z Feb 23 17:42:41 ocrd-manager process_images.sh: 17:42:41.068 INFO ocrd.task_sequence.run_tasks - Finished processing task 'fileformat-transform -I OCR-D-OCR -O FULLTEXT -p '{"from-to": "page alto", "script-args": "--no-check-border --dummy-word", "ext": ""}''#015
2024-02-23T17:42:42.030173322Z Feb 23 17:42:41 ocrd-manager process_images.sh: 17:42:41.069 INFO ocrd.cli.process - Finished#015

document HTTP interface

So far we only describe the SSH / CLI calls, but our HTTP server offers a fully transparent wrapper, so the readme and mkdocs should reflect that.

SSH to Manager does not get the exit status so Kitodo.Production script does not run asynchronously

Currently the exit status is not returned and therefore the script in Kitodo.Production does not run asynchronously anymore.
Have checked several causes here. The changed alias of for_production script does not seem to play a role here.

When i remove the first pipe with the tee command it works fine. So i think detaching does no longer work correctly.
https://github.com/slub/ocrd_manager/blob/main/process_images.sh#L101

Maybe we have to create a separate wrapping inner subshell with the tee command to stream to ocrd.log and the outer one has only the last command which detaches from the subshell at the top.

Some questions on folder structure

Hi,

i was able to test your setup with an installation of Kitodo (without Docker), great work so far. I did some local adjustments to make it work for me. While doing that a few questions arised.

  1. I tried to reduce the copy operations which if i am not mistaken right now are doing the following:
  • copy the images from Kitodo process folder to the "WORKDIR" which is located on the manager server
  • copy the images from the "WORKDIR" to the "REMOTE_DIR" on the processing server
  • after the OCR is done copy the whole OCR data back to the "WORKDIR"
  • copy the OCR results (ALTO) from the "WORKDIR" to the Kitodo process folder

What is the rationale behind the "WORKDIR" for example and why does the data have to be copied so many times? I reduced the number of copy processes by using shared volumes between the Servers and e.g. copied directly from the process folder to the remote folder, but i would like to be sure that i am not violating some deeper architectural ideas here.

  1. I am running the ocrd_manager standalone right now. For that i run docker compose up for the ocrd_manager component and for the ocrd_monitor component seperateley. The idea is probably that both services are using a shared volume to store job data

https://github.com/markusweigelt/ocrd_manager/blob/317de6b17e6f1701ea2f6d1bda16277d9eaaf24a/docker-compose.yml#L40-L41

https://github.com/markusweigelt/ocrd_manager/blob/317de6b17e6f1701ea2f6d1bda16277d9eaaf24a/docker-compose.yml#L36

But right now i have two folders in /var/lib/docker/volume named ocrd_manager_shared and ocrd_monitor_shared. What can i do that both services are actually using the same shared folder?

Thanks a lot for the support.

add alternative callback mechanism besides ActiveMQ

To run asynchronously, we currently rely on ActiveMQ exclusively. It is well integrated with Kitodo.Production, but may not be first choice for other use cases (like DFG Viewer / OCR-on-demand). So perhaps allowing to pass a simple HTTP URL to be POSTed on termination to would be most versatile?

(This could apply to both the SSH interface and the additional HTTP interface in #63.)

refactor for_production.sh

refactor for_production.sh (functions, file includes) to make re-usable for different scenarios (for_presentation, ...)

Symlink for_production.sh and for_production.sh

Inside the container these scripts are no symbolic links to process_images.sh and process_mets.sh but separate scripts. Probably the copy process in the Dockerfile resolves symbolic links. Therefore the linking must be done within the Dockerfile. The existing symbolic links in the repository can be removed.

Monitor: show job success status

We currently only show the termination status by looking at the PID on the Controller (which is communicated via a file $WORKDIR/ocrd.pid):
https://github.com/markusweigelt/ocrd_manager/blob/daed8299411dfb4f3476c5d8ea602ab9ac20c3a4/ocrd_monitor/serve.py#L182-L184

But more importantly, we should also communicate the exit value (of the Manager script, including ocrd process command, resynchronization, ocrd workspace validate command etc).

To differentiate between "total" and "partial" failures, and at what stage, we still need the last lines of the log (persisted job-locally under $WORKDIR/ocrd.log), though.

To achieve that, we could:

  • amend logexit and close by persisting $? into (say) $WORKDIR/ocrd.ret
  • evaluate that file in the Monitor in case the process has already terminated
  • add a link for the local logs in the job view

default workflow(s): utilise --lang and --script info

For example, with ocrd-tesserocr-recognize we could do something like:

shopt -s nocasematch
MODEL=eng
case "$LANGUAGE" in
  de|deu|ger) MODEL=deu
    case "$SCRIPT" in
      Fraktur) MODEL=frak2021;;
      ...
    esac;;
  fr|fre|fra) MODEL=fra;;
  hsb) MODEL=hsb
    case "$SCRIPT" in
      Fraktur) MODEL=hsbfraktur;;
      ...
    esac;;
  ...
esac

The question is: do we only apply this when no --workflow is supplied, or should we assume that all workflow files themselves may contain placeholders, e.g. $TESSMODEL, which we must replace on the fly?

add minimal REST API for entry points

In order to be able to run workflows from the Monitor (without the need for SSH login and accompanying auth hassles), and as a principle alternative for external interfaces, we should wrap process_images.sh and process_mets.sh into analogue HTTP endpoints (including all parameters).

(As long as everything is based on bash, we can easily create a web server via socat...)

Benchmarking workflows

  • measurement of workflow runtime
  • maybe integration of measurement tool
  • provide benchmark data

Manager should be an SSH server itself

Since Kitodo.Production will unlikely have access to a Docker installation (i.e. be able to docker run ocrd_manager something from a script task) or even a native OCR-D core installation (i.e. be able to install core in the system and then sh initialsetup.sh something) – we should use the same base recipe as in https://github.com/bertsky/ocrd_controller.

  • install openssh-server in Dockerfile, provide unprivileged access for pseudo-user ocrd
  • provide some callable (say: shell script taking parameters) which will
    • (perhaps: transfer the process data / Vorgangsdaten to the controller)
    • log into the controller and run the predefined workflow
    • (perhaps: retransfer the results from the controller)
    • post-process the results by
      • validating the whole OCR-D workspace
      • determining which are the result fileGrps (from either the workflow definition or the position of the fileGrps in the METS or the timestamps of the subdirectories)
      • copying/moving the result files (for now: ALTO, later: PAGE / PDF / TEI / ...) to a path in the process directory where Production expects them (ocr/%08d.xml I guess), by iterating over the METS with ocrd workspace find
    • signalling the exit status via ActiveMQ
  • provide a smoke test for the callable (preferably via a Makefile):
    • start up the server
    • download test data
    • call the controller
    • (perhaps somehow receive/register the signal)

Unbound variable COLORTERM

While running for_production.sh following error occurs.

Mar 28 12:09:18 ocrd_manager for_production.sh: /usr/share/ocr-fileformat/lib.sh: line 4: COLORTERM: unbound variable

manage discrete repo of workflows, share as volume

The Monitor must have read access (soon also: write access) to the repository of workflow scripts. Currently they are deployed in the Manager container only.

Proposal: a dedicated path within the shared volume, optionally under git version control. Workflows are copied to the process directory prior to processing.

Conflicting SysLogHandler in ocrd_logging.conf

Following error occurs when running make test

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
    self.socket.send(msg)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 855, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 937, in emit
    self._connect_unixsocket(self.address)
  File "/usr/lib/python3.8/logging/handlers.py", line 866, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
Call stack:
  File "/usr/local/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ocrd/cli/workspace.py", line 61, in workspace_cli
    initLogging()
  File "/usr/local/lib/python3.8/site-packages/ocrd_utils/logging.py", line 112, in initLogging
    logging.getLogger('').critical('initLogging was called multiple times. Source of latest call:')
Message: 'initLogging was called multiple times. Source of latest call:'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
    self.socket.send(msg)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 855, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 937, in emit
    self._connect_unixsocket(self.address)
  File "/usr/lib/python3.8/logging/handlers.py", line 866, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
Call stack:
  File "/usr/local/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ocrd/cli/workspace.py", line 61, in workspace_cli
    initLogging()
  File "/usr/local/lib/python3.8/site-packages/ocrd_utils/logging.py", line 114, in initLogging
    logging.getLogger('').critical(line)
Message: '  File "/usr/local/lib/python3.8/site-packages/ocrd/cli/workspace.py", line 61, in workspace_cli'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 934, in emit
    self.socket.send(msg)
OSError: [Errno 9] Bad file descriptor

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 855, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/logging/handlers.py", line 937, in emit
    self._connect_unixsocket(self.address)
  File "/usr/lib/python3.8/logging/handlers.py", line 866, in _connect_unixsocket
    self.socket.connect(address)
FileNotFoundError: [Errno 2] No such file or directory
Call stack:
  File "/usr/local/bin/ocrd", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1685, in invoke
    super().invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/ocrd/cli/workspace.py", line 61, in workspace_cli
    initLogging()
  File "/usr/local/lib/python3.8/site-packages/ocrd_utils/logging.py", line 114, in initLogging
    logging.getLogger('').critical(line)
Message: '    initLogging()'
Arguments: ()

The temporary solution was to replace the handler with a NullHandler.

[handler_consoleHandler]
#class=StreamHandler
#class=logging.handlers.SysLogHandler
class=logging.NullHandler
formatter=defaultFormatter
#args=(sys.stderr,)
#args=('/dev/log', 'user')
args=()

handle missing controller gracefully

ATM if the admin forgot to activate the controller service or point to an external instance with CONTROLLER_HOST / CONTROLLER_PORT_SSH, for both entry points (for_production/presentation.sh) we have a very unfortunate behaviour:

  • Manager logs show the workflow is started, nothing more (no error that Controller cannot be reached)
  • Monitor shows the job as terminated (because no connection can be established to Controller at all)
  • script does not terminate

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.