Comments (6)
Maybe there are too few entries.
Please upload your words.txt so that I can have a look.
from simplehtr.
so there are only 8 entries (lines)?
You need much more. A batch contains 50 samples (image-text pairs). 95% of the batches are used for training, 5% for evaluation (can be changed here).
When using this 95% / 5% split, we need at least 20 batches of data, that is at least 20*50=1000 samples. You can change the split to e.g. 90% / 10%, then 500 samples are enough.
Regarding the 'X' entries: the IAM format contains information which we don't need for this text recognition system. Therefore I just put non-sense values ('X') in these columns.
Instead of converting the data to IAM format, you could of course also change the DataLoader.py module to directly read your data. It's up to you - simply choose the solution which is easier or which better fits your needs.
from simplehtr.
seems like the dataset contains 0 entries? This way, the total number of characters of your ground truth text is 0 and when computing the character error rate a division by 0 takes place.
See the README how the output should look like if the dataset contains entries:
Epoch: 1
Train NN
Batch: 1 / 500 Loss: 130.354
Batch: 2 / 500 Loss: 66.6619
...
Validate NN
Batch: 1 / 115
Ground truth -> Recognized
[OK] "," -> ","
[ERR:1] "Di" -> "D"
...
Character error rate: 13.956289%. Word accuracy: 67.721739%.
See this article section 1.2 on how to create a compatible dataset.
from simplehtr.
I'm using my own dataset (Korean letters) generated by you. Where is my mistake, I couldn't find. Should I modify the code in:
\# write filename, dummy-values and text
line = 'words-words-%d' % ctr + ' X X X X X X X ' + sample[0] + '\n'
But my 'words' file structure is same as given from website.
from simplehtr.
Yes, few entries, like:
a01-000u-00-00 X X X X X X X 개발과
a01-000u-00-01 X X X X X X X 제3항의
a01-000u-00-02 X X X X X X X 100일을.
a01-000u-00-03 X X X X X X X 자유와
a01-000u-00-04 X X X X X X X 25일부터
a01-000u-00-05 X X X X X X X 정치는
a01-000u-00-06 X X X X X X X 타인을
a01-000u-01-00 X X X X X X X 정책을
...............
But, I wanted to be sure that is working or not. I have another file that consists big entries. The 'X X X X X X' I didn't understand, really.
from simplehtr.
Thank you very much @githubharald . I got it!
from simplehtr.
Related Issues (20)
- TypeError: a bytes-like object is required, not 'NoneType' (dataloader_iam.py line 119) HOT 4
- Blank line filter in dataloader doesn't quite work HOT 1
- Deep Stream HOT 1
- How to use in ML.NET c#?
- unable to build wheel for word_beam_search HOT 1
- Where can I find the tagset.txt file HOT 3
- Add feature to save train loss in summary + minor bug fix HOT 2
- Data visualization HOT 2
- pip install error: ModuleNotFoundError: No module named 'patch_ng' HOT 1
- which version of python used?
- Add cudnn64_8.dll to the Windows/System32 folder, otherwise the program cannot run properly.
- How to convert checkpoint to ONNX HOT 2
- Outdated version of tensorflow used HOT 6
- Training Model
- Training the model from scratch and error "model not found" HOT 4
- Wrong detection of words in model validation HOT 8
- Missing CITATION.cff file for repository HOT 1
- where is json HOT 9
- Hello, I'm sorry to disturb you again. How to make a front-end webpage for this project, which only needs to be able to open locally. Could you please teach me? HOT 1
- performance evaluation of the experimental results HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from simplehtr.