The opticalcharacterrecognition from mudit2003

Overview: The Optical Character Recognition (OCR) project converts images of text into machine-readable text format. This allows users to take scanned documents or images containing text and convert them into editable and searchable text documents.

Technologies Used:

Python: The primary programming language used for the project.
TensorFlow and Keras: Used for building and training the Convolutional Neural Network (CNN) for character recognition.
OpenCV: Utilized for image processing tasks such as converting images to grayscale, blurring, and edge detection.
PyTesseract: A Python wrapper for Google's Tesseract-OCR Engine, used for extracting text from images.
PyQt5: Used for building the graphical user interface (GUI) of the application.

Project Structure:

model.h5: Pre-trained model file for the CNN.
OCR.py: Main Python script containing the implementation for loading the model, preprocessing the images, and recognizing text.
README.md: Documentation file providing an overview and setup instructions.
sample_image: Directory containing sample images used for testing the OCR.

Implementation Details:

Model Training:
    Dataset: Extended MNIST dataset (EMNIST) is used, which includes handwritten digits and alphabets.
    Preprocessing: The images are resized, normalized, and split into training and testing sets.
    CNN Architecture: A Sequential CNN model is built using Keras, which includes layers such as Convolutional, MaxPooling, and Dense layers.
    Training: The model is trained on the EMNIST dataset, and the training performance is visualized using matplotlib.

Image Recognition:
    Image Loading and Preprocessing: The input image is converted to grayscale and blurred to reduce noise.
    Edge Detection: Canny edge detection is used to highlight the edges of the characters in the image.
    Contour Detection: Contours of the characters are identified and filtered based on size.
    Character Extraction: Each contour is extracted, thresholded, and resized to match the input size expected by the CNN.
    Prediction: The pre-trained CNN model predicts the character for each contour, and the results are displayed on the image.

Setup Instructions:

    Clone the Repository:
    
        git clone https://github.com/Mudit2003/OpticalCharacterRecognition.git
        cd OpticalCharacterRecognition

Install Dependencies:

pip install tensorflow matplotlib opencv-python pytesseract pyqt5

Run the Application:

python OCR.py

Usage:

Modes: The application can operate in two modes: 'image' mode for recognizing text in static images and 'webcam' mode for live video capture and text recognition.
Pre-trained Model: The pre-trained model (model.h5) can be used to skip the training process and directly perform recognition.

mudit2003 / opticalcharacterrecognition Goto Github PK

opticalcharacterrecognition's Introduction

opticalcharacterrecognition's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent