Giter Site home page Giter Site logo

Comments (2)

tmbdev avatar tmbdev commented on June 25, 2024

ocropus-gpageseg assumes that text lines are roughly the same scale. In return, it can detect even touching text lines in noisy documents pretty well. But that's only one of many strategies and possible tradeoffs. Your documents look like they are quite clean but have large variations in font size.

The best way to do text line recognition reliably is probably to run multiple different line detectors and combine their outputs.

As a simple version of that, you could try to run ocropus-gpageseg at different scales, try to recognize all the candidate text lines from the different parameter settings, and throw away those that give gibberish either due to being merged or split up.

Obviously, that is not going to be cheap. But ultimately, the only arbiter of whether a text line has been correctly segmented is whether you can recognize it, so for general purpose text line segmentation, invoking a recognizer somewhere is necessary.

For Latin script, you can also try to classify individual connected components as text/non-text and then attempt to group those together.

I'm planning on releasing a 2D LSTM based segmenter at some point, but that will still take a while.

from dup-ocropy.

zuphilip avatar zuphilip commented on June 25, 2024

Actually, in my example above the layout segmentation is perfect with ocropus-gpageseg --vscale 2.

from dup-ocropy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.