Giter Site home page Giter Site logo

edge detection algorithm about pdfplumber HOT 6 CLOSED

jsvine avatar jsvine commented on September 2, 2024
edge detection algorithm

from pdfplumber.

Comments (6)

jsfenfen avatar jsfenfen commented on September 2, 2024

Hey @wanghaisheng , dunno what @jsvine has planned but at the moment this project doesn't do machine vision type algorithms at all--instead it sits on top of pdfminer, a library that reads the pdf file internals and returns all the "things" in the pdf file. PDF is a display-oriented language, so most of what's in the files are instructions, like 'show an 'e' in Arial-10 font at this x-y position'. PDFminer returns those details--and it's important to note that they aren't present in 'image' based pdfs that haven't been OCR'ed etc. So the 'input' are characters with bounding boxes . Finding the horizontal edges of a row of text is an operation performed on the bounding boxes. Not quite sure if that's what you're asking?

from pdfplumber.

wanghaisheng avatar wanghaisheng commented on September 2, 2024

@jsfenfen thx very much.
now i can say you guys implement some kind of edge detection and table detection with native pdf instead of treating pdf like image.do you have estimation about how long it takes implement the vison type using such opencv ?

from pdfplumber.

jsfenfen avatar jsfenfen commented on September 2, 2024

do you have estimation about how long it takes implement the vison type using such opencv ?

Hey @wanghaisheng that's something I definitely don't know! I think the idea of using programs like pdfminer is that extracted data alone is pretty powerful--or at least helpful, in capturing really repetitive, machine generated structures. But doing fancier things with opencv is solving a different problem (probably?). A lot of work on images and text is deep-learning based OCR, though arguably what's needed more than that is deep-learning based layout detection. But you might look into the work of David Doermann.

from pdfplumber.

wanghaisheng avatar wanghaisheng commented on September 2, 2024

@jsfenfen no that is not i am looking for .
to my knowledge we can use tools like pdfminer to get x y,using fonts,distance,position as a one dimension feature to determine layout for our domain specific data ,but with the supplement of two-dimensional features we got from image convert from pdf ,we can get more precise table detection and extraction data later on

from pdfplumber.

jsvine avatar jsvine commented on September 2, 2024

Hi @wanghaisheng, and thanks @jsfenfen. pdfplumber implements a version of Nurminen's approach in table.py. One major difference, as @jsfenfen notes, is that Nurminen's original approach uses computer vision to detects edges, whereas pdfplumber the lines and character bounding boxes explicitly provided by fully-digital PDFs.

I'm working on a major update to pdfplumber that will change some aspects of how the user specifies which types of edges to use when finding tables. But the general approach — and, especially, the conversion of these edges to tables — will remain the same.

from pdfplumber.

wanghaisheng avatar wanghaisheng commented on September 2, 2024

@jsvine @jsfenfen little code for you guys

import cv2
import numpy as np
from matplotlib import pyplot as plt
import json
import sys
import subprocess
import os
class detectTable(object):
    def __init__(self, src_img):
        self.src_img = src_img

    def run(self):
        if len(self.src_img.shape) == 2:  # 灰度图
            gray_img = self.src_img
        elif len(self.src_img.shape) ==3:
            gray_img = cv2.cvtColor(self.src_img, cv2.COLOR_BGR2GRAY)

        thresh_img = cv2.adaptiveThreshold(~gray_img,255,cv2.ADAPTIVE_THRESH_MEAN_C,cv2.THRESH_BINARY,15,-2)
        h_img = thresh_img.copy()
        v_img = thresh_img.copy()
        scale = 15
        h_size = int(h_img.shape[1]/scale)

        h_structure = cv2.getStructuringElement(cv2.MORPH_RECT,(h_size,1)) # 形态学因子
        h_erode_img = cv2.erode(h_img,h_structure,1)

        h_dilate_img = cv2.dilate(h_erode_img,h_structure,1)
        # cv2.imshow("h_erode",h_dilate_img)
        v_size = int(v_img.shape[0] / scale)

        v_structure = cv2.getStructuringElement(cv2.MORPH_RECT, (1, v_size))  # 形态学因子
        v_erode_img = cv2.erode(v_img, v_structure, 1)
        v_dilate_img = cv2.dilate(v_erode_img, v_structure, 1)

        mask_img = h_dilate_img+v_dilate_img
        joints_img = cv2.bitwise_and(h_dilate_img,v_dilate_img)
        joints_img = cv2.dilate(joints_img,None,iterations=3)
        cv2.imwrite("joints.png",~joints_img)
        cv2.imwrite("mask.png",~mask_img)

if __name__=='__main__':
    img = cv2.imread(sys.argv[1])
    detectTable(img).run()

from pdfplumber.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.