Giter Site home page Giter Site logo

Comments (9)

mbaeuerle avatar mbaeuerle commented on July 3, 2024 2

@Needles404 PR #26 fixes the issue for your PDF by using a new image overlay algorithm:
grafik

However now that I tinkered with this new algorithm I begin to understand why the standard derivation was used in the first place.
Say you have a PDF with multiple pages but the same styling, e.g. a PowerPoint presentation with a logo on each slide at the same position. The standard derivation will then strip away the recurring parts like the logo. This is most often what you want as only the differing content is what's interesting.

In your case as the pages are almost identical on purpose this approach however doesn't work very well. Maybe it therefore makes sense to offer the new algorithm as fallback if the other one does not work.

from briss-2.0.

Needles404 avatar Needles404 commented on July 3, 2024 1

from briss-2.0.

cleydyr avatar cleydyr commented on July 3, 2024

I have verified this bug. The current selection boundaries are not clear when the document is loaded.

from briss-2.0.

fatso83 avatar fatso83 commented on July 3, 2024

@Needles404 Have you tried using the original 0.9 version of Briss? It would be interesting to know if the bug is a regression or if it was present in the original source that was forked. The original Briss, awkward as it was to use, always did work perfectly, IMHO.

from briss-2.0.

cleydyr avatar cleydyr commented on July 3, 2024

I could verify that the images generated by PdfDecoder match the pages of the document. However, the BufferedImages that are being shown and which are used to produce the rectangles are already "broken". Even if I tune some parameters of the algorithms and the rectangle is shown, there's no point in cropping the document as the preview itself is "broken".

That's kinda the most I can do as I still barely understand the algorithms in all the steps that Briss uses.

from briss-2.0.

mbaeuerle avatar mbaeuerle commented on July 3, 2024

@Needles404 I think I could track down the issue. Yeah I know, a little bit late but maybe it's helpful for you nevertheless.

As @cleydyr wrote the PdfDecoder returns a perfect image. And even if you apply cropping the resulting PDF looks perfectly fine.

It looks like the issue lies in the algorithm used to calculate the overlay image in ClusterImageData.calculateSdOfImages.
I am guessing but from the naming and looking at the algorithm I suppose the sd stands for standard derivation and basically computes how far apart each pixel value is from the mean.
Becaues every page in this particular PDF looks the same except for some parts of the barcode the black parts basically cancel out each other as the standard derivation is very small or zero.

I verified this with this prepared PDF which has 4 identical pages and the result is empty as suspected:
grafik

To fix this issue I think it is needed to exchange this algorithm. I will check how this could be done.

from briss-2.0.

Needles404 avatar Needles404 commented on July 3, 2024

from briss-2.0.

mbaeuerle avatar mbaeuerle commented on July 3, 2024

@Needles404 I finally had to time to finish this. You can find the new version here:
https://github.com/mbaeuerle/Briss-2.0/releases/tag/v2.0-alpha-3
Give it a shot and let me know if this works for you :)

With this new version the preview is basically falling back to another algorithm if a certain amount of the content is similar on all pages.

from briss-2.0.

Needles404 avatar Needles404 commented on July 3, 2024

Thanks again, it is now showing a usable preview which is excellent. The preview appears to be missing some text which I can only attribute to use of language packs since they are mainly Asian characters, this would probably be a different issue.

Untitled

Package - FBA15CWS6NN8.pdf

from briss-2.0.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.