Giter Site home page Giter Site logo

tesseract's Introduction

tesseract

Bindings to Tesseract-OCR: a powerful optical character recognition (OCR) engine that supports over 100 languages. The engine is highly configurable in order to tune the detection algorithms and obtain the best possible results.

Project Status: Active – The project has reached a stable, usable state and is being actively developed. CRAN_Status_Badge CRAN RStudio mirror downloads

Hello World

Simple example

# Simple example
text <- ocr("https://jeroen.github.io/images/testocr.png")
cat(text)

# Get XML HOCR output
xml <- ocr("https://jeroen.github.io/images/testocr.png", HOCR = TRUE)
cat(xml)

Roundtrip test: render PDF to image and OCR it back to text

# Full roundtrip test: render PDF to image and OCR it back to text
curl::curl_download("https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf", "R-intro.pdf")
orig <- pdftools::pdf_text("R-intro.pdf")[1]

# Render pdf to png image
img_file <- pdftools::pdf_convert("R-intro.pdf", format = 'tiff', pages = 1, dpi = 400)

# Extract text from png image
text <- ocr(img_file)
unlink(img_file)
cat(text)

Installation

On Windows and MacOS the package binary package can be installed from CRAN:

install.packages("tesseract")

Installation from source on Linux or OSX requires the Tesseract library (see below).

Install from source

On Debian or Ubuntu install libtesseract-dev and libleptonica-dev. Also install tesseract-ocr-eng to run examples.

sudo apt-get install -y libtesseract-dev libleptonica-dev tesseract-ocr-eng

On Ubuntu you can optionally use this PPA to get the latest version of Tesseract:

sudo add-apt-repository ppa:alex-p/tesseract-ocr-devel
sudo apt-get install -y libtesseract-dev tesseract-ocr-eng

On Fedora we need tesseract-devel and leptonica-devel

sudo yum install tesseract-devel leptonica-devel

On RHEL and CentOS we need tesseract-devel and leptonica-devel from EPEL

sudo yum install epel-release
sudo yum install tesseract-devel leptonica-devel

On OS-X use tesseract from Homebrew:

brew install tesseract

Tesseract uses training data to perform OCR. Most systems default to English training data. To improve OCR results for other languages you can to install the appropriate training data. On Windows and OSX you can do this in R using tesseract_download():

tesseract_download('fra')

On Linux you need to install the appropriate training data from your distribution. For example to install the spanish training data:

Alternatively you can manually download training data from github and store it in a path on disk that you pass in the datapath parameter or set a default path via the TESSDATA_PREFIX environment variable. Note that the Tesseract 4 and Tesseract 3 use different training data format. Make sure to download training data from the branch that matches your libtesseract version.

tesseract's People

Contributors

dmi3kno avatar jeroen avatar kant avatar maelle avatar samuel-rosa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract's Issues

Setting engine options (variables)

Adds support for setting engine parameters. For example to simulate the "numeric" parameter from the tesseract command line tool which only recognizes numbers:

tes <- tesseract(options = list(tessedit_char_whitelist = "0123456789"))
ocr("image.png", engine = tes)

Implemented in 6809fbd.

Use on Windows 7 with German training data ("umlauts")?

How do I use the package with German training data? For example, this won't work as expected:

library(tesseract)
library(magick)
library(magrittr)
packageVersion("tesseract")
# [1] ‘1.2’

# download german training data
dir.create(dir <- file.path(tempdir(), "tessdata"))
fn <- "https://github.com/tesseract-ocr/tessdata/raw/master/deu.traineddata"
download.file(fn, file.path(dir, basename(fn)), mode="wb")

# download and crop example jpg image with umlauts
download.file("https://i.ytimg.com/vi/mr-mCMtISfA/maxresdefault.jpg", tf <- tempfile(fileext = ".jpg"), mode="wb")
tf %>% image_read %>% image_crop("900x600+500+300") %>% image_write(tf2 <<- tempfile(fileext = ".jpg"))

# initialize and predict: 
tesseract("deu", tempdir())
# <tesseract engine>
#  loaded: deu 
#  datapath: C:\Windows\TEMP\Rtmpa4DpNt/tessdata/ 
#  available: deu 

cat(ocr(tf2))
# 56H

## which is wrong. Correct would be äöü

I dunno if this is the right place to ask, but... I'll give it a try. And thanks for the package(s) btw.

Output hocr and pdf

First of all, thanks for this great functionality by simplifying the usage of tesseract from R and the possibility to download the language files with a single line of r-code. This is powerful!

It would also be really nice in case it would be possible to output an ocr'ed document in hocr format or as a searchable pdf directly. This would make the package even more simple to use for people (like me) that doesn't have the skills to configure or override the settings in tesseract.
With an additional parameter "output" to the ocr function that could be one of {"text", "hocr" or "pdf"} it could look like this:
out <- ocr("test.tif", engine = tesseract("swe"), output = "hocr")

I think this would make this R package very strong in terms of how widely it could be used.

Again, thanks for a great work!

How use the user-words file to implement my tesseract OCR

Hi all,

I'm working in recognize some pdf image that haven't an excellent quality and I want to perform a better OCR using the third dictionary, called user-words, but I don't know wich is the procedure to do it.

I search it on the web but didn't find anything, could you help me?

Thank you very much, have a nice day

Processing bitmap with Tesseract

I'm working on a project with bitmap images - specifically raw bitmap arrays of PDF pages extracted through the pdf_render_page command in the pdftools package. I'd like to be able to pass them directly to Tesseract via this package for OCR - is this something that's possible? Are there decent workarounds if it's not possible?

If my question's unclear, let me know and I'll provide additional info.

changing value of "minimum characters to try"

I'm working on text extraction from vehicle number plate images and want to avoid getting "Too few characters, skipping this page" error. I'm working in Rstudio. Tesseract-ocr has option to change variable minCharactersToTry but how do I implement that using Rstudio? Could you please spare a little time to guide me through that?

Any assistance is appreciated.

Utilising the user_patterns_suffix option

Hi, I was wondering if anyone can help me figure out how to use the user_patterns_suffix option when setting up the engine? I'd like to parse PDFs for a specific code that has a certain pattern, and have been trying to do something like what this guide does: http://www.philhack.com/ocr/.

What I've tried:
I tried setting up a file in my tessdata folder called eng.user-patterns that has my patterns, one pattern per line. I then tried doing tes <- tesseract(options = list(user_patterns_suffix = "user-patterns")) but I see no differences in the output. I also tried disabling all other dawg's by using
list(load_system_dawg="F",load_freq_dawg="F",load_punc_dawg="F",load_number_dawg="F",load_unambig_dawg="F",load_bigram_dawg="F",load_fixed_length_dawgs="F"), however again the results are the same.

Any help much appreciated.
Nas

Tesseract_download() Error

Dear all,

When I use tesseract_download() to download training data for a new language, it returns an error. The demo is like this:

library(tesseract)
tesseract_download("chi_tra")

# Error in curl::curl_fetch_memory(url, curl::new_handle(progressfunction = progress_fun,  :   
#  Could not resolve host: raw.githubusercontent.com

The problem has nothing to do with the package. I fixed this problem by modifying the host file. Add one line 199.232.68.133 raw.githubusercontent.com at the end.

Sorry for bothering you, and thank you for reading!

Not installable for R 3.6

$ install.packages("tesseract")
“package ‘tesseract’ is not available (for R version 3.6.0)”

non character options

FYI. It looks like the package can only handle character options.
As in something like this will not work.

engine <- tesseract(language = "eng", options = list(load_system_dawg = 0L, load_freq_dawg = 0L))

and does this set the option correctly?
engine <- tesseract(language = "eng", options = list(load_system_dawg = "0", load_freq_dawg = "0"))

Getting character confidence

Hey.

First of all, please allow me to thank and congratulate you guys on the excellent work you have done in this repository.

Now, on to the issue.
Is there any way of getting the confidence level of every character in the ocr result?
As displayed in the Tesseract training wiki.

Currently, I'm doing the following
`library(tesseract)

tes <- tesseract(language = "neelam", options = list(tessedit_dump_choices = "1", textord_heavy_nr = "1"))

text <- ocr("/home/sampurn/projects/ICR-R/training/cursive.neelam.exp5.png", engine = tes)`

..which yields the following:
Info in pixReadStreamPng: converting (cmap + alpha) ==> RGBA Info in pixReadStreamPng: converting 8 bpp cmap with alpha ==> RGBA Pass1: px [p [70 ] x [78 ] ] Pass1: B [B [42 ] ] Pass1: c [c [63 ] ] Pass1: D [D [44 ] ] Pass1: t [t [74 ] ] Pass1: N [N [4e ] ] Pass1: G [G [47 ] ] Pass1: H [H [48 ] ] Pass1: I [I [49 ] ] Pass1: J [J [4a ] ] Pass1: K [K [4b ] ] Pass1: l [l [6c ] ] Pass1: M [M [4d ] ] Pass1: NopQRQTUvave [N [4e ] o [6f ] p [70 ] Q [51 ] R [52 ] Q [51 ] T [54 ] U [55 ] va [76 61 ] v [76 ] e [65 ] ] Pass1: k [k [6b ] ] Pass1: l [l [6c ] ] Pass1: M [M [4d ] ] Pass1: a [a [61 ] ] Pass1: b [b [62 ] ] Pass1: e [e [65 ] ] Pass1: d [d [64 ] ] Pass1: E [E [45 ] ] Pass1: f [f [66 ] ] Pass1: g [g [67 ] ] Pass1: h [h [68 ] ] Pass1: lj [l [6c ] j [6a ] ] Pass1: n [n [6e ] ] Pass1: o [o [6f ] ] Pass1: p [p [70 ] ] Pass1: q [q [71 ] ] Pass1: ri [r [72 ] i [69 ] ] Pass1: g [g [67 ] ] Pass1: t [t [74 ] ] Pass1: u [u [75 ] ] Pass1: v [v [76 ] ] Pass1: w [w [77 ] ] Pass1: x [x [78 ] ] Pass1: p [p [70 ] ] Pass1: e [e [65 ] ] Pass1: N [N [4e ] ] Pass1: .ie. [. [2e ] i [69 ] e [65 ] . [2e ] ] Pass1: B [B [42 ] ] Pass1: ep [e [65 ] p [70 ] ] Pass1: J [J [4a ] ] Pass1: c [c [63 ] ] Pass1: T [T [54 ] ] Pass1: B [B [42 ] ] Pass1: oC [o [6f ] C [43 ] ] Pass1: xo [x [78 ] o [6f ] ] Pass1: Q [Q [51 ] ] Pass1: a [a [61 ] ] Pass1: C [C [43 ] ] Pass1: D [D [44 ] ] Pass1: E. [E [45 ] . [2e ] ] Pass1: J [J [4a ] ] Pass1: Ig [I [49 ] g [67 ] ] Pass1: The [T [54 ] h [68 ] e [65 ] ] Pass1: quilck [q [71 ] u [75 ] i [69 ] l [6c ] c [63 ] k [6b ] ] Pass1: brown [b [62 ] r [72 ] o [6f ] w [77 ] n [6e ] ] Pass1: dog [d [64 ] o [6f ] g [67 ] ] Pass1: dumped [d [64 ] u [75 ] m [6d ] p [70 ] e [65 ] d [64 ] ] Pass1: over [o [6f ] v [76 ] e [65 ] r [72 ] ] Pass1: the [t [74 ] h [68 ] e [65 ] ] Pass1: lazy [l [6c ] a [61 ] z [7a ] y [79 ] ] Pass1: fox [f [66 ] o [6f ] x [78 ] ] Pass1: m [m [6d ] ] Pass1: . [. [2e ] ] Pass1: d [d [64 ] ] Pass1: the [t [74 ] h [68 ] e [65 ] ] Pass1: quick [q [71 ] u [75 ] i [69 ] c [63 ] k [6b ] ] Pass1: brown [b [62 ] r [72 ] o [6f ] w [77 ] n [6e ] ] Pass1: dog [d [64 ] o [6f ] g [67 ] ] Pass1: jumpe [j [6a ] u [75 ] m [6d ] p [70 ] e [65 ] ] Pass1: over [o [6f ] v [76 ] e [65 ] r [72 ] ] Pass1: the [t [74 ] h [68 ] e [65 ] ] Pass1: lazy [l [6c ] a [61 ] z [7a ] y [79 ] ] Pass1: fox [f [66 ] o [6f ] x [78 ] ] Pass1: ROWN [R [52 ] O [4f ] W [57 ] N [4e ] ] Pass1: DOG [D [44 ] O [4f ] G [47 ] ] Pass1: JUMPED [J [4a ] U [55 ] M [4d ] P [50 ] E [45 ] D [44 ] ] Pass1: THE [T [54 ] H [48 ] E [45 ] ] Pass1: QUIcK [Q [51 ] U [55 ] I [49 ] c [63 ] K [4b ] ] Pass1: B [B [42 ] ] Pass1: ox [o [6f ] x [78 ] ] Pass1: E [E [45 ] ] Pass1: wrcv [w [77 ] r [72 ] c [63 ] v [76 ] ] Pass1: P [P [50 ] ] Pass1: ovER [o [6f ] v [76 ] E [45 ] R [52 ] ] Pass1: TH [T [54 ] H [48 ] ] Pass2: px [p [70 ] x [78 ] ] Pass2: B [B [42 ] ] Pass2: c [c [63 ] ] Pass2: D [D [44 ] ] Pass2: t [t [74 ] ] Pass2: N [N [4e ] ] Pass2: G [G [47 ] ] Pass2: H [H [48 ] ] Pass2: I [I [49 ] ] Pass2: J [J [4a ] ] Pass2: K [K [4b ] ] Pass2: l [l [6c ] ] Pass2: M [M [4d ] ] Pass2: NoopQRQTUvave [N [4e ] o [6f ] o [6f ] p [70 ] Q [51 ] R [52 ] Q [51 ] T [54 ] U [55 ] va [76 61 ] v [76 ] e [65 ] ] Pass2: k [k [6b ] ] Pass2: l [l [6c ] ] Pass2: M [M [4d ] ] Pass2: a [a [61 ] ] Pass2: b [b [62 ] ] Pass2: e [e [65 ] ] Pass2: d [d [64 ] ] Pass2: E [E [45 ] ] Pass2: f [f [66 ] ] Pass2: g [g [67 ] ] Pass2: h [h [68 ] ] Pass2: lj [l [6c ] j [6a ] ] Pass2: n [n [6e ] ] Pass2: o [o [6f ] ] Pass2: p [p [70 ] ] Pass2: q [q [71 ] ] Pass2: ri [r [72 ] i [69 ] ] Pass2: g [g [67 ] ] Pass2: t [t [74 ] ] Pass2: u [u [75 ] ] Pass2: v [v [76 ] ] Pass2: w [w [77 ] ] Pass2: x [x [78 ] ] Pass2: p [p [70 ] ] Pass2: e [e [65 ] ] Pass2: N [N [4e ] ] Pass2: .ie. [. [2e ] i [69 ] e [65 ] . [2e ] ] Pass2: B [B [42 ] ] Pass2: ep [e [65 ] p [70 ] ] Pass2: J [J [4a ] ] Pass2: c [c [63 ] ] Pass2: T [T [54 ] ] Pass2: B [B [42 ] ] Pass2: oC [o [6f ] C [43 ] ] Pass2: xo [x [78 ] o [6f ] ] Pass2: Q [Q [51 ] ] Pass2: a [a [61 ] ] Pass2: C [C [43 ] ] Pass2: D [D [44 ] ] Pass2: E. [E [45 ] . [2e ] ] Pass2: J [J [4a ] ] Pass2: Ig [I [49 ] g [67 ] ] Pass2: The [T [54 ] h [68 ] e [65 ] ] Pass2: quilck [q [71 ] u [75 ] i [69 ] l [6c ] c [63 ] k [6b ] ] Pass2: brown [b [62 ] r [72 ] o [6f ] w [77 ] n [6e ] ] Pass2: dog [d [64 ] o [6f ] g [67 ] ] Pass2: dumped [d [64 ] u [75 ] m [6d ] p [70 ] e [65 ] d [64 ] ] Pass2: over [o [6f ] v [76 ] e [65 ] r [72 ] ] Pass2: the [t [74 ] h [68 ] e [65 ] ] Pass2: lazy [l [6c ] a [61 ] z [7a ] y [79 ] ] Pass2: fox [f [66 ] o [6f ] x [78 ] ] Pass2: m [m [6d ] ] Pass2: . [. [2e ] ] Pass2: d [d [64 ] ] Pass2: the [t [74 ] h [68 ] e [65 ] ] Pass2: quick [q [71 ] u [75 ] i [69 ] c [63 ] k [6b ] ] Pass2: brown [b [62 ] r [72 ] o [6f ] w [77 ] n [6e ] ] Pass2: dog [d [64 ] o [6f ] g [67 ] ] Pass2: jumpe [j [6a ] u [75 ] m [6d ] p [70 ] e [65 ] ] Pass2: over [o [6f ] v [76 ] e [65 ] r [72 ] ] Pass2: the [t [74 ] h [68 ] e [65 ] ] Pass2: lazy [l [6c ] a [61 ] z [7a ] y [79 ] ] Pass2: fox [f [66 ] o [6f ] x [78 ] ] Pass2: ROWN [R [52 ] O [4f ] W [57 ] N [4e ] ] Pass2: DOG [D [44 ] O [4f ] G [47 ] ] Pass2: JUMPED [J [4a ] U [55 ] M [4d ] P [50 ] E [45 ] D [44 ] ] Pass2: THE [T [54 ] H [48 ] E [45 ] ] Pass2: QUIcK [Q [51 ] U [55 ] I [49 ] c [63 ] K [4b ] ] Pass2: B [B [42 ] ] Pass2: ox [o [6f ] x [78 ] ] Pass2: E [E [45 ] ] Pass2: wrcv [w [77 ] r [72 ] c [63 ] v [76 ] ] Pass2: P [P [50 ] ] Pass2: ovER [o [6f ] v [76 ] E [45 ] R [52 ] ] Pass2: TH [T [54 ] H [48 ] ]

Please find my tesseract information here.

PS. Let me know if I ought to ask this in a google forum/group.
Once again, thanks for your work.

trouble installing tesseract

Hi - I'm having trouble installing tesseract...Can you pls guide me ? Thanks.

devtools::install_github("ropensci/tesseract")
Downloading GitHub repo ropensci/tesseract@master
<U+221A> checking for file '/state/partition1/scratch/6883612.cusco.hpcc.uh.edu/Rtmp5FZjEt/remotes39e25d4a4e5d/ropensci-tesseract-ee10219/DESCRIPTION' (428ms)

  • preparing 'tesseract':
    <U+221A> checking DESCRIPTION meta-information
  • cleaning src
  • running 'cleanup'
  • checking for LF line-endings in source and make files and shell scripts
  • checking for empty or unneeded directories
  • building 'tesseract_4.0.tar.gz'
    Warning: invalid uid value replaced by that for user 'nobody'

Installing package into '/home/svaid/R/x86_64-pc-linux-gnu-library/3.4'
(as 'lib' is unspecified)

  • installing source package 'tesseract' ...
    Package tesseract was not found in the pkg-config search path.
    Perhaps you should add the directory containing `tesseract.pc'
    to the PKG_CONFIG_PATH environment variable
    No package 'tesseract' found
    Using PKG_CFLAGS=-I/usr/include/tesseract -I/usr/include/leptonica
    Using PKG_LIBS=-ltesseract
    Using CXX11CPP: g++ -E -std=gnu++0x
    ------------------------- ANTICONF ERROR ---------------------------
    Configuration failed because tesseract was not found. Try installing:
  • deb: libtesseract-dev libleptonica-dev (Debian, Ubuntu, etc)
  • rpm: tesseract-devel leptonica-devel (Fedora, CentOS, RHEL)
  • csw: libtesseract_dev (Solaris)
  • brew: tesseract (Mac OSX)
    If tesseract is already installed, check that 'pkg-config' is in your
    PATH and PKG_CONFIG_PATH contains a tesseract.pc file. If pkg-config
    is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
    R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'

ERROR: configuration failed for package 'tesseract'

  • removing '/home/svaid/R/x86_64-pc-linux-gnu-library/3.4/tesseract'
    Error in i.p(...) :
    (converted from warning) installation of package '/scratch/6883612.cusco.hpcc.uh.edu/Rtmp5FZjEt/file39e26a76a536/tesseract_4.0.tar.gz' had non-zero exit status

HOCR support

Hi all,

Great work on getting this ported.
Quick question does the tessedit_create_hoc option work. I've tried the following code and also tried a few alternative values but getting just the text.

engine1 <- tesseract(options = list(tessedit_create_hocr = "1"))
text <- ocr("http://jeroenooms.github.io/images/testocr.png", engine = engine1)
cat(text)

Thanks

parsing bbox into geometry or {width, height, x_off, y_off}

Comma-separated bbox in ocr_data() is nice, but I usually want to extract some words from image and then I convert bbox to geom like this.

bbox_to_geom <- function(bbox){
  coord_list <- lapply(strsplit(bbox, ","), as.numeric)
  sapply(coord_list, function(x){paste0(x[3] - x[1], "x", x[4] - x[2], "+", x[1], "+", x[2])})
}

P.S. I see that you went along the path of facilitating magick geometry selection with helper functions. bbox is a case of geometry_area.

#from magick help file
geometry_area(width = NULL, height = NULL, x_off = 0, y_off = 0)

Maybe it makes sense, then, to parse bbox and return it as these four columns?

Unable to install R package with tesseract 4.0.0 (Ubuntu 16.04)

Hi! I could use this package without any problems while having tesseract 3 installed. However, after getting tesseract 4 (from the PPA) I cannot longer install the R package.

This is what I get in the command line with tesseract -v:

tesseract 4.0.0-beta.3-195-ge9cd
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found SSE

And here's the error after devtools::install_github("ropensci/tesseract") (being basically the same after install.packages("tesseract")):

Downloading GitHub repo ropensci/tesseract@master
from URL https://api.github.com/repos/ropensci/tesseract/zipball/master
Installing tesseract
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet CMD INSTALL  \
  '/tmp/RtmpZhNFsR/devtools180279de2ce5/ropensci-tesseract-6514c58'  \
  --library='/home/andres/R/x86_64-pc-linux-gnu-library/3.4' --install-tests 

* installing *source* package ‘tesseract’ ...
Found pkg-config cflags and libs!
Using PKG_CFLAGS=-I/usr/include/leptonica
Using PKG_LIBS=-ltesseract
Using CXX11CPP: g++ -E -std=gnu++11
------------------------- ANTICONF ERROR ---------------------------
Configuration failed because tesseract was not found. Try installing:
 * deb: libtesseract-dev libleptonica-dev (Debian, Ubuntu, etc)
 * rpm: tesseract-devel leptonica-devel (Fedora, CentOS, RHEL)
 * csw: libtesseract_dev (Solaris)
 * brew: tesseract (Mac OSX)
If tesseract is already installed, check that 'pkg-config' is in your
PATH and PKG_CONFIG_PATH contains a tesseract.pc file. If pkg-config
is unavailable you can set INCLUDE_DIR and LIB_DIR manually via:
R CMD INSTALL --configure-vars='INCLUDE_DIR=... LIB_DIR=...'
--------------------------------------------------------------------
ERROR: configuration failed for package ‘tesseract’
* removing ‘/home/andres/R/x86_64-pc-linux-gnu-library/3.4/tesseract’
Installation failed: Command failed (1)

However, those libraries are installed. Here's the output of apt list --installed | egrep "(libtess)|(libleptonica)":

libleptonica-dev/xenial,now 1.76.0-1+nmu1ppa1~xenial1 amd64 [installed]
libtesseract-dev/xenial,now 4.00~git2790-e9cd6024-1ppa1~xenial1 amd64 [installed]
libtesseract4/xenial,now 4.00~git2790-e9cd6024-1ppa1~xenial1 amd64 [installed,automatic]

I'm not really familiar with pkg-config, but in case it's useful, this is the output of pkg-config --list-all | grep ^tess in the command line:

tesseract                      tesseract - An OCR Engine that was developed at HP Labs between 1985 and 1995... and now at Google.

Thanks in advance!

Failing to install tesseract on OSX

I have OSX 10.13.3 and R 3.4.1.

I have installed tesseract with homebrew (inclusing all the required components) and it seems OK. I have used it in python and it works.
I try to install the R package (I tried to install it using R at the command line as well as with RStudio Server, installed with homebrew, with no changes). This is what I get:

> install.packages("tesseract")
Installing package into ‘/usr/local/lib/R/3.4/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.rstudio.com/src/contrib/tesseract_1.9.tar.gz'
Content type 'application/x-gzip' length 11477 bytes (11 KB)
==================================================
downloaded 11 KB

* installing *source* package ‘tesseract’ ...
** package ‘tesseract’ successfully unpacked and MD5 sums checked
Found pkg-config cflags and libs!
Using PKG_CFLAGS=-I/usr/local/Cellar/tesseract/3.05.01/include/tesseract -I/usr/local/Cellar/leptonica/1.75.3/include/leptonica
Using PKG_LIBS=-L/usr/local/Cellar/tesseract/3.05.01/lib -ltesseract
Using CXX11CPP: clang++ -E -std=gnu++11
** libs
rm -f tesseract.so RcppExports.o tesseract.o
clang++ -std=gnu++11 -I/usr/local/Cellar/r/3.4.3_1/lib/R/include -DNDEBUG -I/usr/local/Cellar/tesseract/3.05.01/include/tesseract -I/usr/local/Cellar/leptonica/1.75.3/include/leptonica -I"/usr/local/lib/R/3.4/site-library/Rcpp/include" -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/include   -fPIC  -I/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/include/darwin -c RcppExports.cpp -o RcppExports.o
clang++ -std=gnu++11 -I/usr/local/Cellar/r/3.4.3_1/lib/R/include -DNDEBUG -I/usr/local/Cellar/tesseract/3.05.01/include/tesseract -I/usr/local/Cellar/leptonica/1.75.3/include/leptonica -I"/usr/local/lib/R/3.4/site-library/Rcpp/include" -I/usr/local/opt/gettext/include -I/usr/local/opt/readline/include -I/usr/local/include   -fPIC  -I/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk-9.0.1.jdk/Contents/Home/include/darwin -c tesseract.cpp -o tesseract.o
clang++ -std=gnu++11 -dynamiclib -Wl,-headerpad_max_install_names -undefined dynamic_lookup -single_module -multiply_defined suppress -L/usr/local/opt/openblas/lib -L/usr/local/opt/gettext/lib -L/usr/local/opt/readline/lib -L/usr/local/lib -L/usr/local/Cellar/r/3.4.3_1/lib/R/lib -L/usr/local/opt/openblas/lib -L/usr/local/opt/gettext/lib -L/usr/local/opt/readline/lib -L/usr/local/lib -o tesseract.so RcppExports.o tesseract.o -L/usr/local/Cellar/gettext/0.19.8.1/lib -L/usr/local/Cellar/r/3.4.3_1/lib/R/lib -lR -lintl -Wl,-framework -Wl,CoreFoundation
installing to /usr/local/lib/R/3.4/site-library/tesseract/libs
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
Error: package or namespace load failed for ‘tesseract’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/usr/local/lib/R/3.4/site-library/tesseract/libs/tesseract.so':
  dlopen(/usr/local/lib/R/3.4/site-library/tesseract/libs/tesseract.so, 6): Symbol not found: __ZN6STRINGC1ERKS_
  Referenced from: /usr/local/lib/R/3.4/site-library/tesseract/libs/tesseract.so
  Expected in: flat namespace
 in /usr/local/lib/R/3.4/site-library/tesseract/libs/tesseract.so
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/usr/local/lib/R/3.4/site-library/tesseract’
Warning in install.packages :
  installation of package ‘tesseract’ had non-zero exit status

The downloaded source packages are in
	‘/private/var/folders/9v/251f7k_x6hn8t9wsz8v21rgh0000gn/T/RtmpwO56rV/downloaded_packages’

\Trying to research the problem, I've added in my .bash_profile the line:

export LD_LIBRARY_PATH=/usr/local/lib

I also tried to add the following line into /etc/rstudio/rserver.conf

rsession-ld-library-path=/usr/local/lib/

Not sure what to try next.

tesseract very slow in R

Hello there,

Thanks for this amazing binding! I am running into some performance issues and I wonder if you have some hints or ideas.

Basically, the R wrapper works fine but it is very slow. I tried to use furrr and multiprocessing but I have read on the internet that it is not that easy to run many tesseract processing in parallel. Is that true? were you able to run tesseract in parallel already?

Thanks~

TIFF Read Encoded Strip Error

Hi,
When i loop over my PDFs and use OCR_Data, after a while (about 2 hours) it produces the following error:

TIFFReadEncodedStrip Error

Read error at scanline 0; got 0 bytes, expected 9918

OK

It is a popup in windows - it is NOT an R error. After hitting "OK" it doesn't go away and the same popup occurs. Thus it completely stops my script.

I did see another issue raised from using OCR_Data about memory leak - perhaps this is related?

Thanks
Oneiricer

Set config variables

On the command line, tesseract lets one set config variables that otherwise have to be set in the config file of the library. I'm not sure where this lives in the C++ API, but would it be possible allow setting these variables in tesseract()? My specific application is a character whitelist, which on the command line would be

tesseract stdin stdout -c tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyz

I'm thinking something like

ocr("file.tiff", tesseract(options=list(tessedit_char_whitelist="abcdefghijklmnopqrstuvwxyz0123456789")))

train own tesseract model

I have some training data on scans of 16th century dutch texts containing a mixture of dutch/french/latin.
Of people who payed something for renting some land or a house.
I want to train a new model instead of using the existing pretrained models as I need to compare such a model to other OCR models which I have locally trained.

I can put the training data in the format defined at https://github.com/tesseract-ocr/tesstrain, namely in this format https://github.com/tesseract-ocr/tesstrain/blob/master/ocrd-testset.zip in order to train a model myself.
Does anyone know if it will be uberhaupt possible once trained to used this model using this R package? Will there be version conflicts. Anyone tried already?
I'm running Ubuntu with tesseract 4.1.1

Memory leak using ocr_data method

I am reading in a number of pdfs, performing pre-processing and then getting the text from the ocr_data method (from a temp PNG file). During my loops to go through each pdf and to get all the pdfs in a directory, I am removing unused variables and performing garbage collection (gc()) to reduce the memory load of my program. However, as I continue to read in more and more pdfs, the memory load on my system keeps increasing, eventually running out of memory and throwing an error. I can load the library and comment out the ocr_data() method call in my script and the memory load doesn't appear to increase (minimum load is fairly consistent over a time period where I have been observing an increase). If I replace ocr_data() with ocr(), I don't see an increase in memory load on my system. This would suggest that whatever is causing this issue originates with the ocr_data() method (probably something isn't being deleted on the C++ side that might be related to saving the data frame values that are returned, but it's just a guess).

Failed to extract text~~

Hi all,

I tried this package to extract text from a simple picture, however, the results are not as good as expected, here is my pic with 300dpi:
path_ziji_300

library(tesseract)
eng <- tesseract("eng")
results2 <- tesseract::ocr_data("path_ziji_600.png", engine = eng)
results2
# A tibble: 18 x 3
   word            confidence bbox               
   <chr>                <dbl> <chr>              
 1 Hippo                 95.1 205,207,572,344    
 2 pathway               96.1 614,207,1122,344   
 3 DCHS1/2               88.9 468,515,919,631    
 4 FAT                   16.7 1237,522,1427,601  
 5 1/2/3/4               16.7 1426,515,1762,631  
 6 TAOK                  18.4 2037,514,2314,630  
 7 1/2/3                 18.4 2313,514,2581,630  
 8 SAV1                  82.5 1061,1156,1316,1237
 9 ||                    83.3 1345,1132,1525,1276
10 STK3/4                92.3 1573,1154,1933,1237
11 --                   55.0 2098,1288,2286,1344
12 CRB1/2                78.2 485,2040,860,2189  
13 an                    33.0 1795,1975,2077,2200
14 TEAD2                 91.9 1311,2624,1671,2704
15 Cell                  96.2 703,2995,891,3078  
16 proliferation         95.7 922,2995,1507,3101 
17 and                   96.5 1537,2995,1702,3078
18 differentiation       96.0 1734,2995,2398,3078

Actually, it does not recognize every text in this picture! A little strange (because this pic is not complex at all), anyone could give me some suggestions about this?

Thanks a lot^_^

Bests,
Shisheng

> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936  LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936 LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] magick_2.1           tesseract_4.1        DO.db_2.9            AnnotationDbi_1.46.0 IRanges_2.18.1       S4Vectors_0.22.0    
[7] Biobase_2.44.0       BiocGenerics_0.30.0

Bug with Japanese

I cannot get tesseract to work with Japanese. These lines work as expected

tesseract_download("fra")
french <- tesseract("fra")
text <- ocr("http://ocrapiservice.com/static/images/examples/french_text.png", engine = french)

However, simply putting in "jpn" . . .

tesseract_download("jpn")
 Downloaded: 42.32 MB  (100%)
[1] "/Users/histmr/Library/R/3.3/library/tesseract/tessdata/jpn.traineddata"
japanese <- tesseract("jpn")
Failed loading language 'jpn'
Tesseract couldn't load any languages!
Error in tesseract_engine_internal(datapath, language) : 
Unable to find training data for: jpn

Yet tesseract can partly see the files!

tesseract_info()
$datapath
[1] "/Users/histmr/Library/R/3.3/library/tesseract/tessdata/"

$available
[1] "eng" "fra" "jpn"

$version
[1] "3.05.00"

Check for installed languages

Try to find a way to test for installed languages without initializing tesseract. On Debian a common problem is that people forget to install tesseract-ocr-eng.

Failed loading language 'osd' with tessedit_pageseg_mode = 1

Hi,

I'm experiencing an issue using page segmentation mode on 1 (auto+osd), where the following call results in an error message:

engine <- tesseract("osd+eng", options = list(tessedit_pageseg_mode = 1))
text1 <- ocr("http://jeroen.github.io/images/testocr.png", engine = engine)

Failed loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script detection requested, but osd language failed to load

As suggested in some related posts, I set TESSDATA_PREFIX to the tessdata directory. This does not seem to resolve the issue, even though the .traineddata files are located in this directory.

Sys.setenv(TESSDATA_PREFIX="/Users/name/Library/Application Support/tesseract4/tessdata")

Setting the language argument to "osd" instead of "osd+eng" will actually prevent the error message, however, the quality of the text recognition will suffer:

丁h5s 5৪ а ന്ന or オ2 podnr reod ro test the
ocr соde аnd see ㅠ 耽 ννorкs оп аи types
or 佩е টিоrmat

Any ideas how to solve this?
Thank you very much for your help!

Turning on legacy OCR engine mode

Whitelist and blacklist are not implemented in version 4.0 (issue) and user patterns do not work (issue). In this issue people recommend turning on legacy OEM using --oem 0 option flag. This option is not part of configs, but rather belong to the engine itself, like language.

Could we please enable more ocr options as arguments in tesseract::tesseract(), including oem to be able to temporarily switch to older version of the engine.

Poorer performance

I'm getting poorer performance. I suspect I'm not understanding something. Here is a link to a gist with code, pic, and output: https://gist.github.com/trinker/c24a0b56be1aa2cd22d4999a4e0f822c

if (!require("pacman")) install.packages("pacman")
pacman::p_load_current_gh('ropensci/tesseract')

img_url <- 'https://cloud.githubusercontent.com/assets/1763278/20309374/b47bd736-ab15-11e6-82df-215f79d0e8d2.png'


# Extract text from images
out <- ocr(img_url)
cat(out)

results in:

Dog
Fm m, me use W
Tins made )5 about me aomesuo aog For related speaes Imown as "dogs", see Camdae For Other uses, see Dag
"Dogg)e"IedIIec{s here Folme uamsn must, see Daggre (arm)
The domestic dog {Cams lupus rammans or Cams [Emma/Is)“ s 2 member 0' genus cams {canines) Inzl lorms pan
onne wolHIke cams“! and 5 me most wney abundant canmore ““5"“! The dog and me extant gray wonzre ssler
laxammm wnn modern wolves not closely related Io me wolves that were first domesticated I‘ll?! smoe ns
domestication, me dog has been seleclwely bred over mlllennla [or vznous benzvms, sensory capamlmes, 2nd
pnysxzal zllnbules “"1
Their long assoclallon wnn humans has led Io dogs belllg umquely zlluned Io numzn benzvmm‘ and are me lo
Hume on 2 slarcnrncn dlel much would be Inadequate [or other can} speoes “1| Dogs are ago the oldest
domesticated ammal Dogs vzly wney m shape, SlZe and colours “3! Dogs peflorm many roles [or people, such as
nummg, nemmg, puumg loads, prolecllon, asslsllllg pollce 2nd mnnary, companlonsmp and, more reoenw, mug
handicapped mammals Tns Influence on human soclew nzs gwen Inem me sobnquel, "man‘s best mend"

language nld, windows

FYI. I tried using the nld language. With tesseract 1.3. On Windows.

> ## Tesseract version and available languages
> tess_info <- tesseract_info()
> tess_info$version
[1] "3.04.01"
> ## Download the trained data for your language 
> ## for example dutch: https://github.com/tesseract-ocr/tessdata/blob/3.04.00/nld.traineddata
> download.file(url = "https://github.com/tesseract-ocr/tessdata/raw/3.04.00/nld.traineddata",
+               destfile = file.path(tess_info$datapath, "nld.traineddata"))
trying URL 'https://github.com/tesseract-ocr/tessdata/raw/3.04.00/nld.traineddata'
Content type 'application/octet-stream' length 17098919 bytes (16.3 MB)
downloaded 16.3 MB

> list.files(tess_info$datapath)
[1] "eng.traineddata" "fra.traineddata" "nld.traineddata" "osd.traineddata"

The language file nld.traineddata is apparently incorrect for version 3.04.01. The language traineddata file with tesseract_download is even for version 4.0 and completely fails.
Are there compatible language xxx.traindata for tesseract 3.04.01 or is there another windows alternative. FYI. I also tested out on Ubuntu with Tesseract version 3.0.3 and there it works fine.

tesseract("nld")

ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line HT
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��G
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 0H
ParamsModel::Incomplete line 
ParamsModel::Incomplete line i��G
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line HT
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �G
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �?H
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line A�G
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ���
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �"�
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line DO
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �D
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line X�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ,�G
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line X��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��G
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line IO
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line A�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ���
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line P*
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line H
ParamsModel::Incomplete line H
ParamsModel::Incomplete line �D
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line @T
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line Y�H
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line HT
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line _�
ParamsModel::Incomplete line H��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �D
ParamsModel::Incomplete line �
ParamsModel::Incomplete line @��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line P*
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��+
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line T��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line Y�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line p?H
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line iAH
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �79
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ,'
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 1"�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line @O
ParamsModel::Incomplete line 0"�
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ���
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �^H
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �D
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line ���
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line IT
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line X��
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line LT
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line @O
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line ��H
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line @�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line D
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �MO
ParamsModel::Incomplete line ���
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line @�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line @�
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line 
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
ParamsModel::Incomplete line �
Missing field PTRAIN_DIGITS_SHORT.
Failed loading language 'nld'
Tesseract couldn't load any languages!
Error in tesseract_engine_internal(datapath, language) : 
  Unable to find training data for: nld

The text is not recognized from png

I have to pull data from a pdf uploaded at a URL. The pdf is in an image/.png format hence while using the tesseract package few of the lines were not recognized.

The code:
library(rvest)
library(dplyr)
library(pdftools)
library(tesseract)

url="https://www.hindustancopper.com/Page/PriceCircular"
links=url %>%
#reading the html of the url
read_html()%>%
#fetching out the nodes and the attributes
html_nodes("#viewTable li:nth-child(1) a") %>% html_attr("href")%>%
#replacing few strings
str_replace("../..",'')
str(links)

#using pdftools to read the pdf
base_url <- 'https://www.hindustancopper.com'
event_url <- paste0(base_url, links)
event_url

#since the link has a scan copy and not the pdf itself hence using tesseract package
pdf_convert(event_url,
pages = 1,
dpi = 850,
filenames = "page1.png")
text <- ocr("page1.png")
cat(text)

The actual output reads the list of products and its prices as:
CONTINUOUS CAST COPPER WIRE ROD 11 MM 44567
CONTINUOUS CAST COPPER WIRE ROD NS 439678
CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc.

The expected output should be:
CONTINUOUS CAST COPPER WIRE ROD 11 MM 441567
CATHODE FULL 434122
CONTINUOUS CAST COPPER WIRE ROD NS 439678
CONTINUOUS CAST COPPER WIRE ROD 16 MM 443056...etc

I have tried several times changing the value of dpi argument but that did not help much. What else should be added as an argument to the functions that I might be missing.Thanks in advance!

Vignette suggestions

Thanks for a great package @jeroen! I have two suggestions regarding the vignette.

Vignette access

To read the vignette one either has to go read the CRAN version or to install the package. Would you be interested in a PR making a pkgdown website for tesseract? This way the vignette would be rendered.

hOCR

I looked at the posts on rOpenSci blog via https://ropensci.org/tags/tesseract/ in particular https://ropensci.org/technotes/2018/02/14/tesseract-18/ -> should this mention of hOCR be added to the vignette? I see it is already mentioned very briefly in the README.

Too few characters

When I try the example from the ropensci page, it gives out an error. Is this intended? Previously, it was possible to recognize text with less than 50 characters.

text <- ocr("http://jeroen.github.io/files/inlove.png")
cat(text) 
# Too few characters. Skipping this page

image resolution in ocr and ocr_data

I noticed reduced quality of recognition, when I used ocr_data() instead of ocr(HOCR=TRUE) and sure enough you are saving temp file with lower (72dpi) resolution in ocr_data(). Is there any reason for that other than speed?

It actually does not matter what resolution I use when scanning my image, it will be downsampled to 300dpi in temporary file. One way of circumventing it is of course saving my pre-processed image as file before passing the path to ocr().

On the topic of tmp files, I had recently ran into an issue with allocating new temp file names on Windows (bug report). I see that you are vapply- saving each magick "sub-image" (channel/layer?) to a temp file (using magick::image_write) and immediately read it back. What is the reason for doing it (since it is inside the same loop)? Could we get away with magick::image_convert/image_resize piped to OCR? I suspect it has something to do with speed, so you want to enforce lower density and bitmap format. But you have virtually no control over density or format when it comes to you reading image directly from file, so why bother? I am of strong opinion that if image comes in magick class, it should be user's resposibility to flatten and downsample the image to facilitate acceptable OCR speed.

On the subject of processing magick images with lappy/vapply, maybe we could make an exception for "one-page flat magick images" (length(img)==1, where inherits(img, "magick-image")) and treat them as raw? Is it multi-page tiffs that you intend to catch with that vapply?

quality in ocr on magick-image

Would it make sense to replace

magick::image_write(x, tmp, format = "tiff")
with
magick::image_write(x, tmp, format = "tiff", quality = 100)
to avoid quality loss

magick dependency

This is probably not an issue, but rather a suggestion and an invitation to discussion.

ImageMagick installation on Windows may be ... a little involved. Is there a reason you want to keep dependency on magick in tesseract? You only seem to use magick for saving magick images, so if you save them using other tools, you are afraid of losing class (do you even need to save them, as argued in #28)? I imagine that removing dependency on magick will help wider adoption of tesseract, particularly by folks to tend to use imager and other R image processing platforms

Avoid leptonica

The api->SetImage() function also has a method for bitmaps. Perhaps we can use this to pass bitmaps generated from magick() so we can avoid leptonica alltogether.

Tesseract example not working due to errors in tiff:writeTIFF

I am unable to run the Tesseract example

library(pdftools)
library(tiff)

# A PDF file with some text
setwd(tempdir())
news <- file.path(Sys.getenv("R_DOC_DIR"), "NEWS.pdf")
orig <- pdf_text(news)[1]

# Render pdf to jpeg/tiff image
bitmap <- pdf_render_page(news, dpi = 300)
tiff::writeTIFF(bitmap, "page.tiff")

# Extract text from images
out <- ocr("page.tiff")
cat(out)

tiff::writeTIFF triggers an error

tiff::writeTIFF(bitmap, "page.tiff")
Error in tiff::writeTIFF(bitmap, "page.tiff") : 
  INTEGER() can only be applied to a 'integer', not a 'raw'

Options

It seems that options are not recognized, when trying
options = list(classify_bln_numeric_mode = "0")

Warning. Invalid resolution 0 dpi. Using 70 instead.

When reading my tiff files which are set at 600 DPI, tesseract can not find the DPI setting and reverts to the default of 70 DPI.

Can we change the default resolution and/or set resolution manually?

I see you referenced this problem earlier in a different thread @jeroen
knipsel

Feature Request: Get all characters with confidence >x

This is related to #8 and #39 (or more accurately, the underlying ideas within them).

With the upstream issue that the whitelist and blacklist are not implemented in tesseract 4 (discussed in #39), it is difficult to extract all-numeric values. More generally, I have some text that follows very rigid formatting with columns of person identifiers (that are a mix of alpha-numeric and dash characters) and floating point numbers. The person identifiers will be hard to limit the values for, but the floating point numbers are easy as they come from the set 0-9, ".", and "-".

Is it possible within the ocr_data() function to get a vector of all characters that matched with >x confidence and the confidence values of those characters (where x is input by the user)?

That way, I could manually implement whitelist or blacklist functionality.

Linux users should install tesseract-ocr & related first

Users installing on Linux machines may see:

Error opening data file /usr/share/tesseract-ocr/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!

This is easily solved by first installing tesseract-ocr and related language packs (via apt-get or whatever).

Tesseract package installation issue from R studio in CentOS 7 server

Hi Team..
I am running R studio on CentOs 7 server..
Below is tesseract details installed at OS level
tesseract --version
tesseract 4.1.0
leptonica-1.75.3
libjpeg 6b (libjpeg-turbo 1.2.90) : libpng 1.5.13 : libtiff 4.0.3 : zlib 1.2.7

I am trying to install tesseract package from R studio..

It shows below error..

  • installing source package ‘tesseract’ ...
    ** package ‘tesseract’ successfully unpacked and MD5 sums checked
    Package tesseract was not found in the pkg-config search path.
    Perhaps you should add the directory containing `tesseract.pc'
    to the PKG_CONFIG_PATH environment variable

I am not able to understand the next steps on troubleshooting this error.. Could you please help..

R Version details.:
version
_
platform x86_64-redhat-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 5.0
year 2018
month 04
day 23
svn rev 74626
language R
version.string R version 3.5.0 (2018-04-23)
nickname Joy in Playing

Tesseract in R not recognizing “&” aka Ampersand

I am supposed to write a code to read in text from images using R. I am using the Tesseract and Magick packages for doing the same and am facing an issue where the code converts an "&" to "8:" I have attached the image that I am using as an input.
testimage
Below is the code that I am running:-
test2 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("2000") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() cat(test2) write.table(test2, "C:/Users/admin/Desktop/output2.txt", sep="\t")

I have ALSO tried to modify it and try the below, but still the result is the same:-
wl = paste(paste(letters, LETTERS, collapse="", sep=""), "0123456789&;") engine <- tesseract(options = list(tessedit_char_whitelist = wl), cache=FALSE) test3 <- image_read("C:/Users/admin/Desktop/testimage.jpg") %>% image_resize("500") %>% image_convert(colorspace = 'gray') %>% image_trim() %>% image_ocr() engine <- tesseract(options = list(tessedit_char_whitelist = ";&")) cat(test3)

Below is the output that I am getting:-
No relation between boycotting
panchayat polls 8: Article 35A:
Subramanian Swamy

I have gone through this website and have also posted same question on Stackoverflow but it has been several hours and did not get any solution for the same.

If someone can help, that will be a great help.

Unable to install tesseract (R package) on docker

Trying to install the R package(tesseract) on docker. I get the following error message. Please help!

/usr/bin/ld: cannot find -larchive
collect2: error: ld returned 1 exit status
make: *** [/usr/share/R/share/make/shlib.mk:6: tesseract.so] Error 1
ERROR: compilation failed for package ‘tesseract’

  • removing ‘/usr/local/lib/R/site-library/tesseract’

The downloaded source packages are in
‘/tmp/RtmpXMY0C6/downloaded_packages’
Warning message:
In install.packages("tesseract") :
installation of package ‘tesseract’ had non-zero exit status


FROM rocker/r-base

RUN apt-get update -qq && apt-get install -y
tesseract-ocr
libtesseract-dev
libleptonica-dev
tesseract-ocr-eng
libpoppler-cpp-dev
libmagick++-dev
libxml2
libcurl4-openssl-dev
libxml2-dev
git-core
libssl-dev
libgtk2.0-dev \
libcairo2-dev \
libxt-dev
xvfb \
xauth
libfftw3-dev
libx11-dev
libtiff-dev
xfonts-base

#RUN R -e "install.packages('tesseract')"
RUN R -e "install.packages('plumber')"
RUN R -e "install.packages('stringr')"
RUN R -e "install.packages('lubridate')"
RUN R -e "install.packages('stringdist')"
RUN R -e "install.packages('mime')"
RUN R -e "install.packages('Rook')"
RUN R -e "install.packages('magick')"
RUN R -e "install.packages('tm')"
RUN R -e "install.packages('pdftools')"
RUN R -e "install.packages('devtools')"
RUN R -e "install.packages('tesseract')"
#RUN R -e "devtools::install_github('ropensci/tesseract')"

Add ocr'ed text back to image and generate a PDF

It would be great if this package supported adding back the retrieved text from a raster to PDF format.

For example, using tesseract directly from the command line makes this possible in one single command:

tesseract --dpi 600 --oem 2 input_01.png output_01 pdf

Paragraphs and lines in ocr_data()

There are paragraph and line tags in xml content of ocr(HOCR=TRUE) ("above" word tags in the hierarchy). Although it is possible to infer some of the page structure from bounding boxes, I consider tesseract's own page parsing an important piece of metadata which might help quickly locate words that belong together. Right now I am manually aligning bboxes along the axis to infer rows and columns.

actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53

I downloaded training data from here, put it here:
/usr/share/tesseract-ocr/tessdata/

and got error while loading polish training data:

> library(tesseract)
> tesseract::tesseract_info()
$datapath
[1] "/usr/share/tesseract-ocr/tessdata/"

$available
[1] "slk" "osd" "pol" "equ" "eng"

$version
[1] "3.03"

> config <- tesseract(language = "pol")
actual_tessdata_num_entries_ <= TESSDATA_NUM_ENTRIES:Error:Assert failed:in file tessdatamanager.cpp, line 53

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .Call("tesseract_tesseract_engine_internal", PACKAGE = "tesseract",     datapath, language)
 2: tesseract_engine_internal(datapath, language)
 3: tesseract_engine(datapath, language, options)
 4: tesseract(language = "pol")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

Can you help me? My OS is UBUNTU.

Multi langague text

Hi,
I have text files with 2 languages (hebrew and english).
when using ocr function with each engine (heb or eng), the corresponding parts of the text are identified correctly. the rest however is wrong.
I know that there is a way to use 2 languages in tesseract, but is there a way to do so using the R package?

Thanks a lot for your help!!
Tomer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.