Comments (9)
@Needles404 PR #26 fixes the issue for your PDF by using a new image overlay algorithm:
However now that I tinkered with this new algorithm I begin to understand why the standard derivation was used in the first place.
Say you have a PDF with multiple pages but the same styling, e.g. a PowerPoint presentation with a logo on each slide at the same position. The standard derivation will then strip away the recurring parts like the logo. This is most often what you want as only the differing content is what's interesting.
In your case as the pages are almost identical on purpose this approach however doesn't work very well. Maybe it therefore makes sense to offer the new algorithm as fallback if the other one does not work.
from briss-2.0.
from briss-2.0.
I have verified this bug. The current selection boundaries are not clear when the document is loaded.
from briss-2.0.
@Needles404 Have you tried using the original 0.9 version of Briss? It would be interesting to know if the bug is a regression or if it was present in the original source that was forked. The original Briss, awkward as it was to use, always did work perfectly, IMHO.
from briss-2.0.
I could verify that the images generated by PdfDecoder match the pages of the document. However, the BufferedImage
s that are being shown and which are used to produce the rectangles are already "broken". Even if I tune some parameters of the algorithms and the rectangle is shown, there's no point in cropping the document as the preview itself is "broken".
That's kinda the most I can do as I still barely understand the algorithms in all the steps that Briss uses.
from briss-2.0.
@Needles404 I think I could track down the issue. Yeah I know, a little bit late but maybe it's helpful for you nevertheless.
As @cleydyr wrote the PdfDecoder returns a perfect image. And even if you apply cropping the resulting PDF looks perfectly fine.
It looks like the issue lies in the algorithm used to calculate the overlay image in ClusterImageData.calculateSdOfImages
.
I am guessing but from the naming and looking at the algorithm I suppose the sd stands for standard derivation and basically computes how far apart each pixel value is from the mean.
Becaues every page in this particular PDF looks the same except for some parts of the barcode the black parts basically cancel out each other as the standard derivation is very small or zero.
I verified this with this prepared PDF which has 4 identical pages and the result is empty as suspected:
To fix this issue I think it is needed to exchange this algorithm. I will check how this could be done.
from briss-2.0.
from briss-2.0.
@Needles404 I finally had to time to finish this. You can find the new version here:
https://github.com/mbaeuerle/Briss-2.0/releases/tag/v2.0-alpha-3
Give it a shot and let me know if this works for you :)
With this new version the preview is basically falling back to another algorithm if a certain amount of the content is similar on all pages.
from briss-2.0.
Thanks again, it is now showing a usable preview which is excellent. The preview appears to be missing some text which I can only attribute to use of language packs since they are mainly Asian characters, this would probably be a different issue.
from briss-2.0.
Related Issues (20)
- ./gradlew distZip not working properly HOT 4
- Add feature to exclude pages again HOT 3
- New crop rectangles are created behind older ones HOT 3
- V2-alpha 3 exceptions in Command prompt HOT 6
- Image to big HOT 3
- Running on MacOS HOT 15
- Problems Launching on Briss 2.0 HOT 12
- Add Java 19 support HOT 3
- Can I pass the --splitcol argument without executing it?
- Can't get Briss 2.0 to run without freezing HOT 1
- [feature request] page overlap mark HOT 1
- [feature request] add ability to lock aspect ratio HOT 1
- How to install it? HOT 14
- Can't execute Briss 2.0 on Windows 11 HOT 8
- Store application state as user configuration
- Application window may appear outside of the visible screen on scaled up screens
- Different cropped page widths from identically sized crop boxes HOT 5
- Save some cropping settings to use in command line HOT 3
- Destructive crop for conversion to epub HOT 1
- Missing shortcuts HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from briss-2.0.