Comments (7)
Hi sinai. I am afraid there is no imagefile or alto in the archive.zip. In order to see what is going wrong we need to see the input data.
from kraken.
Document.zip
Sorry, here they are. No standard Alto but the legacy files from NLI which the code is accessing for the structural information.
Uploading pdf-images.zip…
from kraken.
Correcting what I asked above: a solution that will take into account the size of the fonts should be adapted to each line, not to each block, because the variance in font size happens also inside blocks, especially in Advertisements.
from kraken.
there should not be any issue with font size if the font size is the same over 1 line and the height of an average aleph is bigger than a minimum. lines are normalized to 120 px. it does not matter if your line has a height of 20k pixels or 2k or 200. if it is 20 px it would be too small. and if there are very big letters and very small letters in the same line, then there can be a problem with the SMALL letters due to the normalization, but not with the big letters.
from kraken.
to me your problem is not kraken related but rather rooted in an imperfect conversion of the underlying legacy xmls which are your local project task. the structure is too complex to dive into it from the outside without further explanations. it is even hard for me to understand which XML belongs to which image.
from kraken.
please join me on gitter as this is not kraken related.
from kraken.
on a private channel on gitter
from kraken.
Related Issues (20)
- Line detection does not work with version 5.0.0 HOT 1
- Quality of kraken confidence measures HOT 1
- problem with old Alto files HOT 4
- Training a models throws "could not create a primitive descriptor for an LSTM forward propagation primitive" HOT 1
- Region without bounds crashes at serialization
- Support for Python 3.12 HOT 3
- Optimized distribution of packet for inference only HOT 3
- It there a way to simply call kraken recognizer from code? HOT 1
- finetune error on altos containing > or < as txt HOT 7
- serialization problem when using segmodel that has same name for linetypes and regiontypes.
- bad polygons HOT 24
- Fine-tuned segmentation model fails to determine regions HOT 1
- Segment command with separate baseline and region models? HOT 2
- AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? HOT 1
- polygonisation for vertical writing systems HOT 7
- pretrain: UnboundLocalError HOT 1
- Segmentation: batch input not working
- Kraken install post-July 20 2024 are all broken because of python-bidi
- No such option: --device , -d (how to switch between gpu and cpu computing?) HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kraken.