Comments (9)
Did it happen with google traineddata file (or custom training)?
from tesseract.
It happened with custom training
from tesseract.
Try to set LC_NUMERIC to C during training
from tesseract.
Hello,
I found that tesseract had a patch for this problem (https://code.google.com/p/tesseract-ocr/issues/detail?id=910)
Why is this not in the new version of Tesseract 3.04 ?
Will it be in the next version ?
Thanks
from tesseract.
Btw the custom training I use is not mine so I cannot run it again with LC_NUMERIC=C
from tesseract.
Why do you think this patch is not in current version??? issue 910 you are reffering has problem with official google traineddata file. This was fixed.
AFAIR problem is in custom training.
from tesseract.
Ok my bad.
But I just tried with the eng.traineddata from official google traineddata file and I've got the same error
"Error: Illegal min or max specification!
"Fatal error encountered!" == NULL:Error:Assert failed:in file globaloc.cpp, line 75"
from tesseract.
I'm having a hard time seeing how this is going wrong due to locale with the current code. The actual error is signaled here: https://github.com/tesseract-ocr/tesseract/blob/master/classify/clusttool.cpp#L89 which happens when it is unhappy with the results that tfscanf gets for the feature parameters. tfscanf is a private, locale-independent version of fscanf, which calls, in turn, the private tvfscanf which implements its own parsing of floats with a hard coded decimal separator of '.'
One thing that definitely could cause it though is a bad/corrupted feature parameter file.
I just tested with the stock tesseract 3.03 on a brand new Debian 8 installation with the locale set to fr_FR.UTF-8 and everything worked perfectly.
If you still can't get this to work, please post the output of the following commands:
uname -a
tesseract -v
locale
from tesseract.
@oelleo: unfortunately tesseract requires (at the moment) training data use dot as decimal separator => you need to correct your custom training data.
I think it could be possible without retraining. Try to unpack your data (combine_tessdata -u eng.traineddata tmp/eng.
) and fix decimal separator in eng.normproto
(replace eng with your name of your custom training)
from tesseract.
Related Issues (20)
- Failed dependency : liblept.so.5() HOT 4
- Tesseract 5.0.0-alpha command line is crashing HOT 1
- unicharset_extractor does not build anymore HOT 2
- Tesseract fails to OCR text with very clear hexadecimal digits HOT 5
- Two little bugs for tesseract HOT 1
- multithreaded tesseract causes Linux crash HOT 5
- Linker Error for tesseract53.lib HOT 1
- Add redirect function HOT 1
- Add ICD Codes in english trained Data HOT 2
- Some CI jobs (GitHub Actions) are failing HOT 10
- uuencode-generated text is OCRed with many mistakes HOT 2
- Error! The command "tesseract" was not found. HOT 2
- Error! The command "tesseract" was not found
- unicharset_extractor segfault HOT 31
- Please add the API call to translate the language code to the full language name HOT 3
- Warning: LSTMTrainer deserialized an LSTMRecognizer! Error, data/eng/eng_num_vert.lstm is an integer (fast) model, cannot continue training HOT 7
- Add the NN for a 'random' ASCII language HOT 1
- "min_characters_to_try" parameter does not work HOT 2
- phonetic symbols and special characters HOT 1
- inform where we can find tesseract.exe HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tesseract.