Giter Site home page Giter Site logo

ocropus-git's Introduction

##Version 0.4
## compile on ubuntu 14.04
scons
sudo scons install
ocropus


OCRopus - open source document analysis and OCR system (www.ocropus.org)

Version 0.3 (2008-10-15)


--------------------------------------------------------------------------------
Building OCRopus (quick start)
--------------------------------------------------------------------------------
1) make sure you have these packages installed (current Ubuntu/Debian versions should work):
    libpng (with headers)
    libjpeg (with headers)
    libtiff (with headers)

2) install iulib from http://code.google.com/p/iulib

3) install a current version of tesseract from the Subversion repository
    (http://code.google.com/p/tesseract-ocr)

4) from the release directory, run
    ./configure
    make
    sudo make install

Please refer to the file INSTALL for more help on building OCRopus from source.


--------------------------------------------------------------------------------
Executing OCRopus
--------------------------------------------------------------------------------
After successfully building and installing OCRopus you can use "ocroscript"
to recognize document images.
Try e.g.
    ocroscript recognize data/pages/alice_1.png


--------------------------------------------------------------------------------
Documentation
--------------------------------------------------------------------------------
Please refer to http://www.ocropus.org for the most recent documentation.


--------------------------------------------------------------------------------
Background
--------------------------------------------------------------------------------
OCRopus is a state-of-the-art document analysis and OCR system, featuring
    * pluggable layout analysis,
    * pluggable character recognition,
    * statistical natural language modeling and
    * multi-lingual capabilities.
OCRopus development is sponsored by Google and is initially intended for
high-throughput, high-volume document conversion efforts. We expect that
it will also be an excellent OCR system for many other applications.

OCRopus is mainly based on research projects of Thomas Breuel and the Image
Understanding and Pattern Recognition (IUPR) group of the German Research
Center for Artificial Intelligence (DFKI) located in Kaiserslautern, Germany.

OCRopus uses data structures and algorithms from iulib - the open source 
Image Understanding Library (http://code.google.com/p/iulib/) which has
been part of OCRopus until June 2008.


--------------------------------------------------------------------------------
Online Resources
--------------------------------------------------------------------------------
Homepage:
    http://www.ocropus.org

Forum / Mailinglist:
    http://groups.google.com/group/ocropus

Public Issue Tracker:
    http://code.google.com/p/ocropus/issues

OCRopus is made by IUPR:
    http://www.iupr.org

IUPR is a part of DFKI:
    http://www.dfki.de

hOCR Output Format:
    http://docs.google.com/View?docid=dfxcv4vc_67g844kf

ocropus-git's People

Contributors

michaelyin avatar

Watchers

James Cloos avatar Philipp Zumstein avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.