Giter Site home page Giter Site logo

daoos / image-ocr---extraction-to-excel---workflow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mohanraobc83/image-ocr---extraction-to-excel---workflow

0.0 0.0 0.0 2.23 MB

An attempt to automate the process of extracting the transaction details from bank forms and store details in a excel file

License: GNU General Public License v3.0

Jupyter Notebook 100.00%

image-ocr---extraction-to-excel---workflow's Introduction

Image-OCR---Extraction-to-Excel---Workflow

An attempt to automate the process of extracting the transaction details from bank forms and store details in an excel file using Optical Character Recognition using Tesseract engine and PyTesseract library

Initial declaration

Most of the code snippets are sourced from PyImageSearch site and various inputs from stackoverflow. The sequencing of the steps and modification for the need of the project was done by me but i am not claiming anything as my original working pure sense I thank all the experts who have solved the tough problems and were gracious enough to share them in public domain

There is a lot of scope for automated OCR solution which would read data from images of Bank forms and then converting it into an excel file The objective of this project was to read the images of the bank forms in a folder, extract required fields from the from and then add everything to an excel file

The workflow has 3 steps Step1 - Standardizing and correcting the errors in the incoming images files - The code does this process in batch (20200110_Image_Standardization_codes_V1.0) Step2 - Read the standardized images and then extract all data to a set of text files using Tesseract engine and pytesseract library Step3 - Extract required fields and data elements from text files and consolidate all the input image file data into an excel file

It was an pilot project done to test the knowledge and also the possibilities. It is in no way perfect Hope somebody can make it perfect and also hope to help and inspire programmers trying Image OCR

image-ocr---extraction-to-excel---workflow's People

Contributors

mohanraobc83 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.