Comments (6)
You can use the C-API to only retrieve the page segmentation without doing character recognition. Use TessBaseAPISetPageSegMode to set the segmentation mode, call TessBaseAPIProcessPages, and finally retrieve the page iterator using TessBaseAPIAnalyseLayout. Iterate using the TessPageIteratorNext function at the lowest level and check with TessPageIteratorIsAtBeginningOf if the current symbol is at the start of a new block. All in all it shouldn't be more than a few lines of C code and you're skipping the recognition part of tesseract completely.
from tesseract.
For support please use tesseract-ocr user forum. See FAQ[1]
[1] https://github.com/tesseract-ocr/tesseract/wiki/FAQ#rules-and-advice
from tesseract.
@zdenop thank's for clarifying. Here is the link to my forum post (which contains another answer): https://groups.google.com/forum/#!topic/tesseract-ocr/1Frh-5ggNxg
from tesseract.
This issue is currently the top search result for 'ocr_float'; it lacks a simple summary: Tesseract (currently) does not support ocr_float.
from tesseract.
@jimregan Cheers! I'm reproducing your answer on the linked forum page (the preferred help location).
from tesseract.
That seems a bit redundant; I was merely summarising what you were told there :)
from tesseract.
Related Issues (20)
- multithreaded tesseract causes Linux crash HOT 5
- Linker Error for tesseract53.lib HOT 1
- Add redirect function HOT 1
- Add ICD Codes in english trained Data HOT 2
- Some CI jobs (GitHub Actions) are failing HOT 10
- uuencode-generated text is OCRed with many mistakes HOT 2
- Error! The command "tesseract" was not found. HOT 2
- Error! The command "tesseract" was not found
- unicharset_extractor segfault HOT 31
- Please add the API call to translate the language code to the full language name HOT 3
- Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer (fast) model, cannot continue training HOT 7
- Add the NN for a 'random' ASCII language HOT 1
- "min_characters_to_try" parameter does not work HOT 2
- phonetic symbols and special characters HOT 1
- inform where we can find tesseract.exe HOT 1
- Native Crash in otsuthr.cpp HOT 2
- CI: vcpkg failure due to missing xz tarball HOT 4
- link error LNK1120 with text2image.exe
- Mac m1, not able to compile HOT 2
- OCR of Indian Currency Sign " ₹" HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.