Light

anuraggupta29 / document-scanner Goto Github PK

View Code? Open in Web Editor NEW

1.0 2.0 0.0 24.62 MB

Desktop based GUI Document Scanner with OCR

Python 100.00%

opencv opencv-python image-processing image-filtering grayscale bilateral-filter blur canny-edge-detection canny autocanny

document-scanner's Introduction

Document Scanner and OCR

For GUI version of this project go to : https://github.com/anuraggupta29/DocScan-electronJS

In Progress

This is a desktop based GUI Document Scanner.
It can scan multiple images at once.
It automatically detects the 'document' part from the image.
It corrects its orientation and perspective.
The GUI will have the feature to manually select the 'document' part from each image, in case there's an error in automatic detection.
It can save the image in multiple modes and resolution.
It can also save all the images as a simgle pdf.
It will also have the feature of optical character recognition.

How document detection works on each image?

It resizes a copy of the image for manipulation.
It then grayscales the image.
Then it applies a bilateral filter to it (blurs the image while preserving edges).
It produces a canny image (binary edges only) with the threshold dependent on median of intensity values of image pixels.
It then detects the contours in the image (the curves).
Then it takes the largest contour based on perimeter (As the document part will have the largest perimeter.
Note : The document will also have the largest area but most of the time the contour detected is not a closed curve, because of which its area becomes very small but the perimeter remains large.
Next it takes a convex hull of our selected contour (smallest polygon enclosing the contour).
Then it uses approximatePolyDP to approximate the convex hull as a rectangle.
Thus it obtains the 4 corners of our document.
Then it scales to coordinates according to the original image size.
From those 4 coordinates we identify which one is the tl, tr, br, bl coordinate.
The it calculates the height and width of our document portion.
Then it does a perspective transform of the original image with our rectangle coordinates and produce the transformed image.

Note : Rest of the features are yet to be added.

document-scanner's People

Contributors

Stargazers

Watchers

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.