agirishkumar / cudaocr Goto Github PK

View Code? Open in Web Editor NEW

0.0 1.0 0.0 2.08 MB

Building an OCR engine using CUDA

License: MIT License

Cuda 92.28% Makefile 7.26% C 0.46%

cuda-programming ocr

cudaocr's Introduction

Enhanced CUDA-Accelerated OCR Pipeline for Printed English Text

Issues

some issue with preprocessed image, too much noise, bottom is blacked. check the sample.png and preprocessed_sample.png
tried to use texture memory for blurring, median flitering but failed, so removed it. should try later

Plan of Action:

1. Image Acquisition

Load image from file or capture from camera
Transfer image to GPU memory
Implement robust error handling for different image formats
Add image quality assessment to filter out low-quality images early
Use CUDA streams for asynchronous data transfer when processing multiple images (==for later==)

2. Preprocessing (GPU)

Utilize NVIDIA Performance Primitives (NPP) for efficient image processing (==later if required==)
Implement parameter tuning for each step (e.g., kernel size, thresholds)

Color to Grayscale Conversion
- Average method
- Luminosity Method
- Desaturation Method
Image Denoising
- Apply Gaussian blur
- median filter
Contrast Enhancement
- Implement adaptive histogram equalization
- Implement CLAHE
Binarization
- Implement Otsu's thresholding
- Implement adaptive thresholding

3. Page Layout Analysis (GPU)

Use cuCIM library for faster processing (==Optional==)
Implement methods to handle various document layouts (e.g., multi-column) (==for later==)

Skew Detection and Correction
- Calculate the skew and correct the rotation
Document Structure Analysis
- Identify text blocks, images, tables, etc.

4. Text Line Detection (GPU)

Advanced Morphological Operations
- Handle diverse fonts and text sizes
Connected Component Analysis
- Implement the CCA
Text Line Extraction
- Group connected components into text lines

Investigate deep learning-based approaches for more accurate detection(==later==)

5. Word Segmentation (GPU)

Inter-word Space Detection
- Implement edge detection methods
Word Bounding Box Extraction
- Use DBSCAN clustering for better word grouping

6. Character Segmentation (GPU)

Vertical Projection Analysis
Character Bounding Box Extraction

Implement techniques to handle touching or overlapping characters

7. Feature Extraction (GPU)

Character Normalization
- Resize and center each character
Feature Computation
- Experiment with various techniques (e.g., HOG, pixel intensity patterns)
- Ensure robustness to font style and size variations

8. Character Recognition (GPU with cuDNN)

Evaluate different models (CNN, LSTM) for optimal accuracy and speed
Use transfer learning with pre-trained models
Implement model quantization for faster inference

9. Post-processing

Language Model Application (GPU/CPU)
- Use advanced models like BERT or GPT for context understanding
Word Formation and Validation
Text Line Formation

Implement a feedback loop to refine earlier stages based on language model output

10. Output Generation

Text Formatting
- Match original layout
Result Visualization
- Highlight recognized text on the original image
Multi-format Output
- Support various formats (e.g., JSON, PDF) with metadata

11. Quality Assurance

Confidence Scoring
Error Detection and Correction
User Feedback Mechanism
- Continuously improve OCR accuracy based on corrections

12. User Interface (Optional)

Responsive Input Interface
Interactive Result Display
Manual Correction Tools
Accessibility Features

Additional Considerations

Benchmarking: Continuously profile and benchmark each stage
Parallelization: Optimize pipeline to fully utilize GPU capabilities
Modularization: Develop each stage as an independent, easily updatable component
Error Handling: Implement robust error management throughout the pipeline
Scalability: Design the system to handle varying workloads efficiently
Data Augmentation: For training and testing, augment data to improve robustness
Version Control: Use Git for tracking changes and collaborating
Documentation: Maintain comprehensive documentation for each module
Testing: Implement unit tests and integration tests for each component

To run the program:

clone the repo

gh repo clone agirishkumar/CudaOCR
cd CudaOCR
make
./app

This project is making me go crazy... fucked my sleep cycle 🥲.. but its fun!!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.