Giter Site home page Giter Site logo

harish2704 / pottan-ocr Goto Github PK

View Code? Open in Web Editor NEW
44.0 6.0 12.0 6.67 MB

A stupid OCR for malayalam language

Home Page: https://harish2704.github.io/pottan-demo/

License: MIT License

Python 39.47% Shell 4.12% JavaScript 16.67% CSS 0.13% HTML 9.69% Jupyter Notebook 29.93%
machine-learning ocr malayalam devanagari crnn pytorch keras complex-scripts

pottan-ocr's Introduction

Notice: This project is not relevent anymore since latest version of tesseract ocr is using same technology ( CNN-RNN models ) and it is capable of detecting complex scripts with very high accuracy . A web demo of latest tesseract ocr can be seen in the link given below

https://harish2704.github.io/ml-tesseract-demo/

Join the chat at https://gitter.im/pottan-ocr/Lobby

pottan-ocr

A stupid OCR for malayalam language. It can be Easily configured to process any other languages with complex scripts

Client side web Demo of individual line recognition

https://harish2704.github.io/pottan-demo/

Screenshot of complete page OCR

Screenshot

Installation

Clone the project

git clone https://github.com/harish2704/pottan-ocr
cd pottan-ocr

Run the installer bash script to complete the installation.

  • For Debian
    env DISTRO=debian ./tools/install-dependencies.sh
  • For Fedora
    env DISTRO=fedora ./tools/install-dependencies.sh
  • For OpenSUSE
    env DISTRO=opensuse ./tools/install-dependencies.sh
  • For Ubuntu
    ./tools/install-dependencies.sh

By default, the installer will install dependencies which is necessary to run the OCR. For training the OCR, pass the string for_training as first argument to installer.

  ./tools/install-dependencies.sh for_training

Usage

  1. Download latest pre-trained model file from pottan-ocr-data repository
wget 'https://github.com/harish2704/pottan-ocr-data/raw/master/crnn_11032020_171631_5.h5' -O pottan_ocr_latest.h5
  1. Create configuration file
cp ./config.yaml.sample ./config.yaml
  1. Run the OCR using any PNG/JPEG image
./bin/pottan ocr <trained_model.h5> <iamge_path> [ pottan_ocr_output.html ]

For more details, see the --help of bin/pottan and its subcommands

Usage:
./pottan <command> [ arguments ]

List of available commands ( See '--help' of individual command for more details ):

    extractWikiDump - Extract words from wiki xml dump ( most most of the text corpus ). Output is written to stdout.

    datagen         - Prepare training data from data/train.txt & data/validate.txt. ( Depreciated. used only for manual varification of training data )

    train           - Run the training

    ocr             - Run charector recognition with a pre-trained model and image file

Training

  • Training is done using synthetic images generated on the fly using text corpus. For this to work, we should have enough fonts installed in the system. In short, fonts listed in the ./config.yaml.sample should be available in the output of command fc-list :lang=ml
    • It is also possible to write the generated images to disk. sub-command datagen does exactly this. When running training, if the images already found to exists in the cache directory( eg: point cache directory to generated images directory ), it will be used for the training instead of generating new images. This idea is used to reduce CPU load during production training sessions
  • Also it is recommended to have a GPU for training the OCR.

Try training on Google colab

Open notebook

For more details, see wiki

Getting involved

Credits

  • Authors of http://arxiv.org/abs/1507.05717
  • jieru mei who created pytorch implementation for above mentioned model. Repo https://github.com/meijieru/crnn.pytorch. The model used in Pottan-OCR is taken from this project.
  • Tom and the contributes of Ocropy project ( https://github.com/tmbdev/ocropy ) which is the back-bone of pottan-ocr.
    • Code-base of pottan-ocr can do only one thing. Just convert a single line of image into single line of text.
    • Everything else including layout detection, line segmentation, output generation etc are handled by Ocropy.
    • pottan-ocr just works as core engine by replacing default engine Tesseract OCR used in the Ocropy
  • Pytorch https://pytorch.org/
  • Leon Chen and the Team behind KerasJS .
    • KerasJS is used to create the Web-based demo application of the OCR.
    • KerasJS does its job very well by running Keras Models in browsers with WebGL2 acceleration.
    • It also have great features such as visualizing each stages of process , which is not explored yet.

Thanks

pottan-ocr's People

Contributors

dependabot[bot] avatar gitter-badger avatar harish2704 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

pottan-ocr's Issues

How could you convert pytorch model to keras

@harish2704 I can not convert pytorch lstmcell to keras, Could you please provide some help?
Here is pytorch lstmcell,
self.att_lstm = nn.LSTMCell(1536,512)
h_att, c_att = self.att_lstm(att_lstm_input, (state[0][0], state[1][0]))
state[0][0], state[1][0] is the tensor(2, 512)

Recognition issues

For the image given below,
010010 bin

output is as follows
'വികുസ്തിപ്ലീക്കൂമ്പഠ (ൃവ൭൭ന്രിക്കൂന്ന റിധ്രിത ന്ത്രജ്ജ൭ത്തിമ്രത്ഠ൬ ഉങ്കല്ലൂ'

### Setup integration with Kraken / Ocropy

This will tool will work as an engine without doing any sort of

  • Layout analysis
  • Column detection
  • Output document generation

Etc.

It will simply do Imag with Line of text to Text conversion

Need more info on how to do the training?

I am trying to train for tamil fonts.

created this repo.
https://github.com/nithyadurai87/pottan-ocr-tamil

https://github.com/nithyadurai87/pottan-ocr-tamil/blob/master/config.yaml
added this file.

It seems the install step for training needs GPU, CUDA etc.
when installing them and enabling the NVDIA driver, my ubuntu got crashed.
reinstalled ubuntu.

Is there any way to train on cloud?
You used Floydhub.

Requesting you to add tutorial on how to train using floudhub.

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.