unproject_text - Perspective recovery of text using transformed ellipses
unpaper - a post-processing tool for scanned sheets of paper, especially for book pages that have been scanned from previously created photocopies.
deskew - Library used to deskew a scanned document
deskewing - Contains code to deskew images using MLPs, LSTMs and LLS tranformations
skew_correction - De-skewing images with slanted content by finding the deviation using Canny Edge Detection.
page_dewarp - Page dewarping and thresholding using a "cubic sheet" model
text_deskewing - Rotate text images if they are not straight for better text detection and recognition.
galfar/deskew - Deskew is a command line tool for deskewing scanned text documents. It uses Hough transform to detect "text lines" in the image. As an output, you get an image rotated so that the lines are horizontal.
TableNet - Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images.
ocr_attention - Robust Scene Text Recognition with Automatic Rectification.
masktextspotter.caffee2 - The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes".
InceptText-Tensorflow - An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection.
textspotter - An End-to-End TextSpotter with Explicit Alignment and Attention
RRD - RRD: Rotation-Sensitive Regression for Oriented Scene Text Detection.
SSTDNet - Implement 'Single Shot Text Detector with Regional Attention, ICCV 2017 Spotlight'.
R2CNN - caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection.
RRPN - Source code of RRPN ---- Arbitrary-Oriented Scene Text Detection via Rotation Proposals
Tensorflow_SceneText_Oriented_Box_Predictor - This project modify tensorflow object detection api code to predict oriented bounding boxes. It can be used for scene text detection.
DeepSceneTextReader - This is a c++ project deploying a deep scene text reading pipeline with tensorflow. It reads text from natural scene images. It uses frozen tensorflow graphs. The detector detect scene text locations. The recognizer reads word from each detected bounding box.
DeRPN - A novel region proposal network for more general object detection ( including scene text detection ).
Bartzi/see - SEE: Towards Semi-Supervised End-to-End Scene Text Recognition
Bartzi/stn-ocr - Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition
beacandler/R2CNN - caffe re-implementation of R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection
EAST(official) - A tensorflow implementation of EAST text detector
AdvancedEAST - AdvancedEAST is an algorithm used for Scene image text detect, which is primarily based on EAST, and the significant improvement was also made, which make long text predictions more accurate.
kurapan/EAST Implementation of EAST scene text detector in Keras
songdejia/EAST - This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.
cnn_lstm_ctc_ocr - Tensorflow-based CNN+LSTM trained with CTC-loss for OCR.
PassportScanner - Scan the MRZ code of a passport and extract the firstname, lastname, passport number, nationality, date of birth, expiration date and personal numer.
GRCNN-for-OCR - This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"
go-ocr - A tool for extracting text from scanned documents (via OCR), with user-defined post-processing.
insightocr - MXNet OCR implementation. Including text recognition and detection.
ocr_densenet - The first Xi'an Jiaotong University Artificial Intelligence Practice Contest (2018AI Practice Contest - Picture Text Recognition) first; only use the densenet to identify the Chinese characters
emedvedev/attention-ocr - A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.
scenetext - This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout.
Transkribus - Transkribus is a comprehensive platform for the digitisation, AI-powered recognition, transcription and searching of historical documents.
open-semantic-search - Open Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
ocrserver - A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well
cosc428-structor - ~1000 book pages + OpenCV + python = page regions identified as paragraphs, lines, images, captions, etc.