Giter Site home page Giter Site logo

text-extraction-table-image's Introduction

Text-Extraction-Table-Image

This project aims to extract text from a table image into python objects. Below is a result of the detection:

test case

Prerequisites/Dependencies

  • OpenCV => 2.4.8
  • Numpy
  • PyTesseract

Idea Behind The Code

I've publisehed the documentation on my website. Please read it to understand the idea behind the code.

For Refinement

After your algorithm can detect the text successfully, now you can save it into Python object such as Dictionary or List. Some regions name (in the “Kabupaten/Kota” are failed to be detected precisely, since it is not included in Tesseract training data. However, it shouldn’t be a problem as the regions’ indexes can be detected precisely. Also, this text extraction might fail to detect the text in other fonts, depending on the font used. In case of misinterpretation, such as “5” is detected as “8”, you can do an image processing such as eroding and dilating.

My code is far from perfect, if you find some error or chances of refinement, write me a comment!

text-extraction-table-image's People

Contributors

fazlurnu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

text-extraction-table-image's Issues

List index out of range

Hi there thanks for sharing this wonderful algo, but i'm gtting this error..

File "c:\Program Files\Sublime Text 3\Text-Extraction-Table-Image-master\scripts\ROI_selection.py", line 113, in get_ROI
x1 = vertical[left_line_index][2] + offset
IndexError: list index out of range

images/source.png is jpg file

images/source.png has wrong file extension - it is jpg file and not png.
And main.py is expecting to find source7.png, so IMO this should be changed to:

diff --git a/scripts/main.py b/scripts/main.py
index de29972..d6dad3d 100644
--- a/scripts/main.py
+++ b/scripts/main.py
@@ -10,7 +10,7 @@ from ROI_selection import detect_lines, get_ROI
 import cv2 as cv

 def main(display = False, print_text = False, write = False):
-    filename = '../images/source7.png'
+    filename = '../images/source.jpg'

     src = cv.imread(cv.samples.findFile(filename))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.