red6 / pdfcompare Goto Github PK

A simple Java library to compare two PDF files

License: Apache License 2.0

Java 100.00%

pdf pdf-files pdfbox compare

pdfcompare's Introduction

PdfCompare

A simple Java library to compare two PDF files. Files are rendered and compared pixel by pixel. There is no text comparison.

Usage with Maven

Just include it as a dependency. Please check for the most current version available:

<dependencies>
  <dependency>
    <groupId>de.redsix</groupId>
    <artifactId>pdfcompare</artifactId>
    <version>...</version> <!-- see current version in the maven central tag above -->
  </dependency>
</dependencies>

Simple Usage via UI or Commandline

There is a simple interactive UI, when you start the jar file without any additional arguments (which starts the class de.redsix.pdfcompare.Main). It allows you to choose files to compare and also to mark areas to ignore and write those to an ignore-file.

Next to the UI you can provide an expected and actual file and additional parameter via a CLI. To get a help for the CLI use the -h or --help option-.

usage: java -jar pdfcompare-x.x.x-full.jar [EXPECTED] [ACTUAL]
 -h,--help              Displays this text and exit
 ...

Usage as a library

But the focus of PdfCompare is on embedded usage as a library.

new PdfComparator("expected.pdf", "actual.pdf").compare().writeTo("diffOutput");

This will produce an output PDF which may include markings for differences found. PdfCompare renders a page from the expected.pdf and the same page from the actual.pdf to a bitmap image and compares these two images pixel by pixel. Pixels that are equal are faded a bit. Pixels that differ are marked in red and green. Green for pixels that where in the expected.pdf, but are not present in the actual.pdf. Red for pixels that are present in the actual.pdf, but were not in the expected.pdf. And there are markings at the edge of the paper in magenta to find areas that differ quickly. Ignored Areas are marked with a yellow background. Pages that were expected, but did not come are marked with a red border. Pages that appear, but were not expected are marked with a green border.

The compare-method returns a CompareResult, which can be queried:

final CompareResult result = new PdfComparator("expected.pdf", "actual.pdf").compare();
if (result.isNotEqual()) {
    System.out.println("Differences found!");
}
if (result.isEqual()) {
    System.out.println("No Differences found!");
}
if (result.hasDifferenceInExclusion()) {
    System.out.println("Differences in excluded areas found!");
}
result.getDifferences(); // returns page areas, where differences were found

For convenience, writeTo also returns the equals status:

boolean isEquals = new PdfComparator("expected.pdf", "actual.pdf").compare().writeTo("diffOutput");
if (!isEquals) {
    System.out.println("Differences found!");
}

The compare method can be called with filenames as Strings, Files, Paths or InputStreams.

Exclusions

It is also possible to define rectangular areas that are ignored during comparison. For that, a file needs to be created, which defines areas to ignore. The file format is JSON (or actually a superset called HOCON) and has the following form:

exclusions: [
    {
        page: 2
        x1: 300 // entries without a unit are in pixels. Pdfs are rendered by default at 300DPI
        y1: 1000
        x2: 550
        y2: 1300
    },
    {
        // page is optional. When not given, the exclusion applies to all pages.
        x1: 130.5mm // entries can also be given in units of cm, mm or pt (DTP-Point defined as 1/72 Inches)
        y1: 3.3cm
        x2: 190mm
        y2: 3.7cm
    },
    {
        page: 7
        // coordinates are optional. When not given, the whole page is excluded.
    }
]

When the provided exclusion file is not found, it is ignored and the compare is done without the exclusions.

Exclusions are provided in the code as follows:

new PdfComparator("expected.pdf", "actual.pdf").withIgnore("ignore.conf").compare();

Alternatively an Exclusion can be added via the API as follows:

new PdfComparator("expected.pdf", "actual.pdf")
	.withIgnore(new PageArea(1, 230, 350, 450, 420))
	.withIgnore(new PageArea(2))
	.compare();

Encrypted PDF files

When you want to compare password protected PDF files, you can give the password to the Comparator through the withExpectedPassword(String password) or withActualPassword(String password) methods respectively.

new PdfComparator("expected.pdf", "actual.pdf")
    .withExpectedPassword("somePwd")
    .withActualPassword("anotherPwd")
    .compare();

Configuring PdfCompare

PdfCompare can be configured with a config file. The default config file is called "application.conf" and it must be located in the root of the classpath.

PdfCompare uses Lightbend Config (previously called TypeSafe Config) to read its configuration files. If you want to specify another configuration file, you can find out more about that here: https://github.com/lightbend/config#standard-behavior. In particular you can specify a replacement config file with the -Dconfig.file=path/to/file command line argument.

Alternatively you can specify parameters either through a system environment variables or as a Jvm parameter with -DvariableName=

Another way to specify a different config location programmatically is to create a new ConfigFileEnvironment(...) and pass it to PdfCompare.withEnvironment(...).

Configuring PdfCompare though an API

All the settings, that can be changed through the application.conf file can also be changed programmatically through the API. To do so you can use the following code:

new PdfComparator("expected.pdf", "actual.pdf")
	.withEnvironment(new SimpleEnvironment()
        .setActualColor(Color.green)
        .setExpectedColor(Color.blue))
	.compare();

The SimpleEnvironment delegates all settings, that were not assigned, to the default Environment.

Configuration options

Through the environment you can configure the memory settings (see above) and the following settings:

DPI=300

Sets the DPI that Pdf pages are rendered with. Default is 300.
expectedColor=00B400 (GREEN)

The expected color is the color that is used for pixels that were expected, but are not there. The colors are specified in HTML-Stlye format (without a leading '#'): The first two characters define the red-portion of the color in hexadecimal. The next two characters define the green-portion of the color. The last two characters define the blue-portion of the color to use.
actualColor=D20000 (RED)

The actual color is the color that is used for pixels that are there, but were not expected. The colors are specified in HTML-Stlye format (without a leading '#'): The first two characters define the red-portion of the color in hexadecimal. The next two characters define the green-portion of the color. The last two characters define the blue-portion of the color to use.
tempDir=System.property("java.io.tmpdir")

Sets the directory where to write temporary files. Defaults to the java default for java.io.tmpdir, which usually determines a system specific default, like /tmp on most unix systems.
allowedDifferenceInPercentPerPage=0.2

Percent of pixels that may differ per page. Default is 0. If for some reason your rendering is a little off or you allow for some error margin, you can configure a percentage of pixels that are ignored during comparison. That way a difference is only reported, when more than the given percentage of pixels differ. The percentage is calculated per page. Not that the differences are still marked in the output file, when you addEqualPagesToResult.
parallelProcessing=true

When set to false, disables all parallel processing and process everything in a single thread.
addEqualPagesToResult=true

When set to false, only pages with differences are added to the result and this the resulting difference PDF document.
failOnMissingIgnoreFile=false

When set to true, a missing ignore file leads to an exception. Otherwise it is ignored and only an info level log messages is written.

Different CompareResult Implementations

There are a few different Implementations of CompareResults with different characteristics. The can be used to control certain aspects of the system behaviour, in particular memory consumption.

Internals about memory consumption

It is good to know a few internals, when using the PdfCompare. Here is in a nutshell, what PdfCompare does, when it compares two PDFs.

PdfCompare uses the Apache PdfBox Library to read and write Pdfs.

The Two Pdfs to compare are opened with PdfBox.
A page from each Pdf is read and rendered into a BufferedImage by default at 300dpi.
A new empty BufferedImage is created to take the result of the comparison. It has the maximum size of the expected and the actual image.
When the comparison is finished, the new BufferedImage, which holds the result of the comparison, is kept in memory in a CompareResult object. Holding on to the CompareResult means, that the images are also kept in memory. If memory consumption is a problem, a CompareResultWithPageOverflow or a CompareResultWithMemoryOverflow can be used. Those classes store images to a temporary folder on disk, when certain thresholds are reached.
After all pages are compared, a new Pdf is created and the images are written page by page into the new Pdf.

So comparing large Pdfs can use up a lot of memory. I didn't yet find a way to write the difference Pdf page by page incrementally with PdfBox, but there are some workarounds.

CompareResults with Overflow

There are currently two different CompareResults, that have different strategies for swapping pages to disk and thereby limiting memory consumption.

CompareResultWithPageOverflow - stores a bunch of pages into a partial Pdf and merges the resulting Pdfs in the end. The default is to swap every 10 pages, which is a good balance between memory usage and performance.
CompareResultWithMemoryOverflow - tries to keep as many images in memory as possible and swaps, when a critical amount of memory is consumed by the JVM. As a default, pages are swapped, when 70% of the maximum available heap is filled.

A different CompareResult implementation can be used as follows:

new PdfComparator("expected.pdf", "actual.pdf", new CompareResultWithPageOverflow()).compare();

Also there are some internal settings for memory limits, that can be changed. Just add a file called "application.conf" to the root of the classpath. This file can have some or all of the following settings to overwrite the defaults given here:

imageCacheSizeCount=30

How many images are cached by PdfBox
maxImageSizeInCache=100000

A rough maximum size of images that are cached, to prevent very big images from being cached
mergeCacheSizeMB=100

When Pdfs are partially written and later merged, this is the memory cache that is configured for the PdfBox instance that does the merge.
swapCacheSizeMB=100

When Pdfs are partially written, this is the memory cache that is configured for the PdfBox instance that does the partial writes.
documentCacheSizeMB=200

This is the cache size configured for the PdfBox instance, that loads the documents that are compared.
parallelProcessing=true

When set to false, disables all parallel processing and process everything in a single thread.
overallTimeoutInMinutes=15

Set the overall timeout. This is a safety measure to detect possible deadlocks. Complex comparisons might take longer, so this value might have to be increased.
executorTimeoutInSeconds=60

Sets the timeout to wait for the executors to finish after the overallTimeout was reached. It's unlikely that you ever need to change this.

So in this default configuration, PdfBox should use up to 400MB of Ram for it's caches, before swapping to disk. I have good experience with granting a 2GB heap space to the JVM.

Acknowledgements

Big thanks to Chethan Rao [email protected] for helping me diagnose out of memory problems and providing the idea of partial writes and merging of the generated PDFs.

pdfcompare's People

Contributors

Stargazers

Watchers

Forkers

pascalschumacher plutext chethanrao vishwaschauhan malaypatel lanzaichen atcn-test tekyman007 chisasaw rweisleder sivarajradhakrishnan syadama tspannhw pvarenik enwat hylswind aespunto joergfischer coinfirm pablonicolasdiaz chrischjh tomgoody sivasgit dst1213 kiruthigathilak digimono op1993 at0mskxiv libing-qy kishorerepo pgrigoro belysheva satdrasing lejeanbono v3g3t4x hendrikmuenster olafrancis abatboy gastendonk souravraj95 alandoni kartikijoshi an471-dotcom tgsen mcarnine1 taimax13 jzaratei ssdearjohn basilchik 3fong avl93 john-shipman-bah wpivotto harujiburke balderwu ajith-cmd cunvoas stefanhh0 kirankandel tkrah nstdio fengtaotien anoop-qasolve razvangeorgerosca eshenko ynchogriz

pdfcompare's Issues

How to compare PDF which have Overprint or not？

Hello
I found when I compare two PDF files one is Overprint(it is used in printing industry), another without, the result file no any different.

is there any way to solve ?

Thank you.

Awaiting Latch 'FullCompare' timed out

I'm just comparing the first page of my PDFs, running in Eclipse on Linux (java 8 openjdk).

After a couple of files, I get ERROR Awaiting Latch 'FullCompare' timed out after 15 MINUTES

exclude header and footer x1,y1.....x3, y3 x4, y4

HI,

Firstly thanks for the library. as many said it helps alot.

Ive read this #46 which is closed.

i have manage to excluded the header region.

how do i add a 2nd exclude for the footer region?

this for my header for every page:

exclusions: [
{
x1: 0
y1: 0
x2: 2000
y2: 468
},
]

how is the excluded footer defined? x3 , y3 .... ??

again thanks for the advice.

Update:

would it be like this?

#############
exclusions: [
{ //header for all pages
x1: 0
y1: 0
x2: 2000
y2: 468
},
{ //footer for all pages
x1: 0
y1: 2620
x2: 1980
y2: 2800
},

]

#############

Encoding issue

When i switched from version 1.1.40 to 1.1.44 i'm getting encoding errors, text that used to work fine are now garbled. I'm using a report in french.

for example

appears when it should be

Specify a diff threshold

I'm regression testing a lot of PDFs that are rendered to and from many different formats, so there will always be a certain diff.

At the moment I have to go through every file manually to ensure that the diff isn't too large, which is quite time consuming.

It would be very useful if I could specify that a 2 mm difference would be ok, for example. Would this be possible to implement?

Best/Easiest way to exclude whole pages from the comparison?

exclusions: [
    {
        page: 1
        x1: 0
        y1: 0
        x2: 9999
        y2: 9999
    }
]

works fine. I'm just wondering if there is a better/easier way?

Changing Java code to out requirement

Hi,
Can you please let me know how to change java code of this library to our requirement.

I have imported this maven library java class files. How can I change in my workspace.

Can I extend these classes and use for custom requirement.

e.g. 1. Ignoring formatting changes
2. Ignoring Green color fonts for pixels that are there where not expected

Regards
Mohan

WARN de.redsix.pdfcompare.PdfComparator - No files found to compare

Hi,

I'm getting the following message "WARN de.redsix.pdfcompare.PdfComparator - No files found to compare" even the PDF files were present in the path specified.

*Following is my piece of code to compare two sample PDF's:

package PDFcomparison;

import de.redsix.pdfcompare.PdfComparator;

public class pdfcomparisontest {

public static void main(String[] args) throws Exception {
	
	String requirement = "D:\\PDFComparisonPDFs\\doc1.pdf";
	String actual = "D:\\PDFComparisonPDFs\\doc2.pdf";
	String results = "D:\\PDFComparisonPDFs\\Results\\output";
	new PdfComparator(requirement, actual).compare().writeTo(results);
	System.out.println("Completed");

}

}

Following is the dependencies added to POM.xml

de.redsix pdfcompare 1.1.25

If i compare two PDF with 139 pages, diffOutput PDf is not generated

PDF which i have used is attached
document (3).pdf
using Pdfcompare version 1.1.33 & PDFbox 2.0.11,
i have two questions
what is the limit for the pages that can compare?
what is limit for PDF size?

Allow for a difference in percent per page via API

Any way we can set this difference in percentage via the API, I have JUnit tests and would like some pdfs to allow different percentages.

Also having a hard time figuring out gradle to allow the application.conf file to be added to the build from the test sources folder to be used in my unit tests, any help on that would be great, but just being able to set the percentage difference through code would solve my issue. Thanks.

Choose PDF files dynamically instead of hardcode

Hi,

This java library works for standalone pdf input files stored in my desktop. how to pass input pdf files at runtime. i.e dynamically end use has to select the files and call this library.

Also how do I change the java for my requirement.

In Detail explanation: I am trying to select the input PDF files from below jsp code and reading pdf file getParameter

Jsp home page code:

//browsing & selecting 1st pdf file //browsing & selecting 2nd pdf file // submit button

getting pdf file as parameter
String DocumentFile1 = request.getParameter("DocumentFile1");
String DocumentFile2 = request.getParameter("DocumentFile2");

passing to PdfComparator method:
new PdfComparator(DocumentFile1, DocumentFile2).compare().writeTo(DocumentDiffResult);

Can I write the result pdf file into local drive after submit, I am not able to get the result after submit
If not, how can I get the result pdf file in the browser after user click submit or option to download pdf

Regards
Mohan

Can we set the color in which diff are highlighted? Or may be Expected is faded image and actual is some color?

Support encrypted PDF file in the UI

Thanks a lot for the wonderful library!
I feel the UI screen is very handy to compare two files. However it does not support encrypted PDF file.
Please consider to fix it when you find time:)

I'm compare two same files but got [Diff-3-thread-1] INFO de.redsix.pdfcompare.DiffImage - Differences found at { page: 1, x1: 424, y1: 863, x2: 2356, y2: 1817 }

Hi, I'm try two compare downloaded file with goldenCopy (they are the same), but on finish I am got message that "differences found".
Assumption: When I'm compare files locally - all is fine. Problems starts when I'm get goldenCopy from gitHub and compare it with same file downloaded from browser.

Maybe You know how I can get around this problem?
Thank You!

Return compare result as java.io.File

Hi there

First of all thanks for working on this! I'm currently writing a little project which accepts two PDF's through a REST service and returns the result. What I'm struggling with is to return the result.pdf as a file.

Consulting the API there isn't any way to get this done. What are your suggestions to do this? Should I temporarily store it?

Thanks

Side by Side view report

HI,
Is it possible to build a feature of viewing the differences in a side-by - side pane? That is expected pdf on left half of the screen & actual pdf on the right half of the screen with the differences highlighted in different color codes.

Utility fails to create complete result for large size pdf files.

On comparing a PDF with size 1800 pages, It chokes up the CPU and 12 GB of RAM for 15 minutes. The result varies with 75-150 pages instead of 1800 pages.

compile time error for ch.qos.logback:logback-classic:jar:1.2.3

I get this error when I run mvn eclipse:eclipse
[ERROR] Duplicate classes found:
[ERROR]
[ERROR] Found in:
[ERROR] org.slf4j:slf4j-log4j12:jar:1.6.1:runtime
[ERROR] ch.qos.logback:logback-classic:jar:1.2.3:compile
[ERROR] Duplicate classes:
[ERROR] org/slf4j/impl/StaticMDCBinder.class
[ERROR] org/slf4j/impl/StaticMarkerBinder.class
[ERROR] org/slf4j/impl/StaticLoggerBinder.class

There is compile time error for ch.qos.logback:logback-classic:jar:1.2.3
If I exclude this jar, there is no error. But I'm not sure if its the right way

Option to find if differences are ONLY in Excluded Areas

Hi,
I would like to report the test as PASS if differences are there BUT ONLY in Excluded Areas.
Please let me know if there is a way to achieve this currently.
If not please guide me to customize the library or suggest a workaround.
Any help would be great !

Regards
Anuradha

Excluding header and footer

Is it possible exclude headers from comparison? My PDF include date and time in header which will always show diff

Defining allowed difference

Hi,
We've started using pdfcompare recently for testing our templating service and it works great!
One issue we're facing is that the generated PDFs from our local machines (OSX) are different from what is generated on our servers, the difference is very small (0.1% difference) but still they are not equal and so out tests fail.
I was thinking of contributing a way of defining the allowed difference, I started going over the code and thought maybe you'll have ideas or leads on how to implement.

inclusions and exclusions

Hi,
It goes without saying that the exclusions array in the HOCON file is incredibly useful.
Would it also be feasible to specify content areas that should be tested as well as ones that shouldn't?

An array of pages that should be tested would also be great. Possibly also a random selection of pages.

Don't get me wrong, I will also look into the source and see if these features are something I could contribute to the project I just thought I'd ask to see if any work has started on them.

is there a way to exclude the differences of the actual data between two PDF reports?

I am trying to compare two PDF reports irrelevant of the actual data. I am more interested to ensure the coordinates of the data between the two reports are identical. is there a way to exclude the difference of data between two PDF reports using the pdfCompare library?

Compare with password protected PDF file

I found this pdf comparator so awesome and really helpful for my project testing, but recently I have a problem with the password protected file, Is it possible to add a method in to insert the password so the file become unprotected?

Feature: Colourblind mode

Hi,

First of all thanks for the library.
We currently use this for testing our pdf files. Although I can see the differences perfectly, I have a colleague who is colourblind (red-green) and for him the differences are harder to see. Is there a feature planned for the colourblind under us, or can you point me to where the colours are set so I can create a pull request with configurable colours.

Thanks in advance
Ben Maes

Implementation of DiskUsingCompareResult is missing

Hi,
in the readme.md the possibility of using the DiskUsingCompareResult ist mentioned, which comes into play in memory comsumption issues.
Unfortanutelly, this class is not available in the current master distribution.

Can you please re-provide this CompareResult Strategie back again, as it is very useful when memory comsumption is an issue.

Thanks & BR,
Peter

Readme mentions CompareResultWithDiskStorage but class is (no longer?) in jar

as of version 1.1.17

java.lang.NoSuchMethodError when attempting to compare 2 different pdfs

I have 2 Pdfs which differ in page count and content. I compare them as such:
Boolean isEquals = new PdfComparator(pdf1, pdf2).compare().writeTo("./diff.pdf");.........line 82

but encounter this error:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.pdfbox.pdmodel.PDDocument.load(Ljava/io/InputStream;Lorg/apache/pdfbox/io/MemoryUsageSetting;)Lorg/apache/pdfbox/pdmodel/PDDocument;
at de.redsix.pdfcompare.PdfComparator.compare(PdfComparator.java:155)
at com.company.ctg.fps.automation.integration.main(FPS.java:82)

command line app

Hi. Could you please provide instructions on how to compile this into a command line application for comparing two pdfs? Thanks.

Comparison output readability (Ability to generate two diff documents)

We have tried to compare two PDF files, and output came in the below format. It seems like we cannot read the difference from the output. Is there any other way we can make it more simpler and easy to understand?

Use specific configuration file OR prefix configuration entries with e.g. "pdfcompare"

Since pdfcompare uses the default configuration file name "application.conf", it will probably share this file with other libraries that also use Typesafe config. To avoid confusion, pdfcompare should either use a custom file (e.g. "pdfcompare.conf"), or add prefixes to its property keys (e.g. "pdfcompare.allowedDifferenceInPercentPerPage"). Typesafe config allows easy handling of such namespacing prefixes.

Can't get difference PDF

We are using this tool to compare some of the PDF reports in our organization. For some of the PDFs, we can't get the difference output.

I am very thankful to the person who has created this awesome tool, if you look into this issue then it will solve my major problems. I am current using v1.1.45.

Question about full pages exlusions

Hello

Let's say I have 2 files which can have from 2 to X number of pages. I can 2 and 2 or 3 and 3 or 15 and 15, etc. I always want to compare only the first 2 pages and exclude all others to the end. I want to use only one conf file. Please, suggest what is the best way will be to do that? Also, can source and target files be in any separate directories? What are the changes in the last version vs 2 versions back?

Thanks a lot in advance

Jeff

Add exclusions via API and without config file

Currently it is necessary to create a config file to define exclusions. It would be useful to add exclusions directly without a config file.

Example:

// ignores a single page
withIgnore(int page)

// ignores an area on all pages
withIgnore(String x1, String y1, String x2, String y2)

// ignores an area on a single page
withIgnore(int page, String x1, String y1, String x2, String y2)

Should it be named "withIgnore" or "withExclusion"?

SLF4J Errors

I am using the pdfcompare library to compare two pdf documents. However, I got the following error: SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". I included the slf4j api to my class path when compiling and running my code but I am still getting the same error. How can this be resolved?

Report the expected and actual pixel differences between two reports in log info

Greetings,

Friendly request. Is there a way to report the expected and actual pixel differences within the report?

Currently, it outputs the location of the difference as described in the example log below.

{ page: 1, x1: 163, y1: 193, x2: 2423, y2: 1002 }

It would be extremely helpful if there were expected and actual pixel differences. something that looks like this.

{ page: 1, Expected {x1: 163, y1: 193, x2: 2423, y2: 1002} Actual: {x1: 164, y1: 194, x2: 2424, y2: 1003} }

FYI, This is a wonderful library! Thanks.

withIgnore(new pageArea()) can not be resolved

Hi I am tryign to use withIgnore function but teh code is not gettign compiled.
boolean isEquals = new PdfComparator(ref, act).withIgnore(new PageArea(1, 1600, 2210, 2200,2300)).compare().writeTo(diffPath);
Error in eclipse code:
The method withIgnore(String) in the type PdfComparator is not applicable for the arguments (PageArea)

Can you pelase help me resolve this one

application.conf location

Hi there,

First off, I would like to thank the developers for this library. It's exactly what I'm looking for. I'm completely new to Java and I installed Eclipse/Maven for the first time yesterday just to get this library to run!

What I'm having an issue with is follows.
The docs specify an application.conf be created with additional parameters such as 'allowedDifferenceInPercentPerPage' and the file should be placed in the classes root. Due to my inexperience with Java, I'm having a hard time locating this classes root. I compile a .jar with Maven, is the classes root the same folder as the .jar?

Additionally, would it be possible to specify an application.conf at runtime? I have situations where various 'allowedDifference' thresholds are needed.

Thanks!

Run with jdk 1.7

Hi,

can you let us know what version is compatible with jdk 1.7 ?

Thanks
Deep

Challenging Compare

Hi,
in my use case I have to compare two very larege PDF documents.
Each of them have around 610 MBs and consists of more than 14.800 pages.

I tried all available CompareResult Strategies, whereas the
CompareResultWithPageOverflow (with 50 or 100) pages, was progressing at best.
Unfortuantelly, not even one strategie finished the job.
Even the Overflow Strategies raised java heap space exceptions and stocked in the end.

I also tunes the library through the application.conf as follows:
imageCacheSizeCount=30
maxImageSizeInCache=100000
mergeCacheSizeMB=100
swapCacheSizeMB=100
documentCacheSizeMB=200
parallelProcessing=true
overallTimeoutInMinutes=3

Finally (with CompareResultWithPageOverflow) some partial.pdfs are stored in my %APPDATA%/local/Temp directory, but the application does not response anymore.

So what are your suggestions to meet this challenge?
What may I provide you further, except I am not allowed to give you these large PDF files, as of data governance restrictions (beside the large amount).

Here are some code snippets for my usage:
...
public static void main(String[] args) {

    SpringApplication.run(PdfcompareTestApplication.class, args);

    if (args != null && args.length > 0)
        parseArguments(args);

    try {
        executeCompare();
    } catch (IOException e) {
        e.printStackTrace();
    } catch (InterruptedException e) {
        e.printStackTrace();
    }
}

private static void executeCompare() throws IOException, InterruptedException {

    CompareResult result = null;

    System.out.println("Start test with following settings: ");
    System.out.println("==========================================================================");
    System.out.println("    f1 : " + file1  + " <--> " + "  f2 : " + file2);
    System.out.println("    mode : " + mode);
    System.out.println("    loops : " + loops);
    System.out.println("    writeDiff : " + writeDiffs);
    System.out.println("    printMemory: " + printMemory);
    System.out.println("==========================================================================");

    Instant start = Instant.now();
    for (int i = 0; i < loops; i++) {
        Instant startLoop = Instant.now();
        if (mode == 3) {
            result = executeCompareResultWithPageOverflow();
        }
        else if (mode == 2)
        {
            result = executeCompareResultWithMemoryOverflow();
        }

        else
            result = executeCompareResult(null);

        Instant endLoop = Instant.now();
        System.out.println("Duration of loop " + i + ": " + Duration.between(startLoop, endLoop).toMillis() + "ms. # of Pages: " + (result.getNumberOfPages() + 1) + ". Differences: " + (result.isNotEqual() ? "yes" : "no"));
        if (result.isNotEqual() && writeDiffs) {
            result.writeTo("diff");
            System.out.println("Wrote ./diff.pdf");
        }

        result = null;

        if (printMemory)
            printMemory("intermediate");
   }

    Instant end = Instant.now();
    System.out.println("Total duration: " + Duration.between(start, end).toMillis() + "ms");

    if (printMemory)
        printMemory("finished");

    System.exit(0);
}

private static CompareResult executeCompareResultWithPageOverflow () throws IOException {
CompareResult result = new CompareResultWithPageOverflow(50);
return executeCompareResult(result);
}

private static CompareResult executeCompareResultWithMemoryOverflow () throws IOException
{
    CompareResult result = new CompareResultWithMemoryOverflow();
    return executeCompareResult(result);
}

private static CompareResult executeCompareResult (CompareResult result) throws IOException
{
    if (result != null)
        return new PdfComparator(file1, file2, result).compare();
    else
        return new PdfComparator(file1, file2).compare();
}

...

Many thanks and BR,
Peter

Getting error while Executing my Document has more than 20 pages

[Draw-1-thread-1] ERROR de.redsix.pdfcompare.PdfComparator - Error while rendering page 3 for expected document
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at de.redsix.pdfcompare.PdfComparator.getImage(PdfComparator.java:238)
at de.redsix.pdfcompare.PdfComparator.lambda$drawImage$11(PdfComparator.java:219)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.awt.image.DataBufferInt.(Unknown Source)
at java.awt.image.Raster.createPackedRaster(Unknown Source)
at java.awt.image.DirectColorModel.createCompatibleWritableRaster(Unknown Source)
at java.awt.image.BufferedImage.(Unknown Source)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:205)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:150)
at de.redsix.pdfcompare.PdfComparator.renderPageAsImage(PdfComparator.java:284)
at de.redsix.pdfcompare.PdfComparator.lambda$null$8(PdfComparator.java:216)
at de.redsix.pdfcompare.PdfComparator$$Lambda$6/5238162.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
... 3 common frames omitted
[Draw-1-thread-1] ERROR de.redsix.pdfcompare.PdfComparator - Error while rendering page 4 for expected document
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at de.redsix.pdfcompare.PdfComparator.getImage(PdfComparator.java:238)
at de.redsix.pdfcompare.PdfComparator.lambda$drawImage$11(PdfComparator.java:219)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.awt.image.DataBufferInt.(Unknown Source)
at java.awt.image.Raster.createPackedRaster(Unknown Source)
at java.awt.image.DirectColorModel.createCompatibleWritableRaster(Unknown Source)
at java.awt.image.BufferedImage.(Unknown Source)
at org.apache.pdfbox.rendering.PDFRenderer.renderImage(PDFRenderer.java:205)
at org.apache.pdfbox.rendering.PDFRenderer.renderImageWithDPI(PDFRenderer.java:150)
at de.redsix.pdfcompare.PdfComparator.renderPageAsImage(PdfComparator.java:284)
at de.redsix.pdfcompare.PdfComparator.lambda$null$8(PdfComparator.java:216)
at de.redsix.pdfcompare.PdfComparator$$Lambda$6/5238162.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
... 3 common frames omitted
[Draw-1-thread-1] ERROR de.redsix.pdfcompare.PdfComparator - Error while rendering page 5 for expected document
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask.report(Unknown Source)
at java.util.concurrent.FutureTask.get(Unknown Source)
at de.redsix.pdfcompare.PdfComparator.getImage(PdfComparator.java:238)
at de.redsix.pdfcompare.PdfComparator.lambda$drawImage$11(PdfComparator.java:219)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Missing page not listed in `getDifferences`

When comparing to PDF with a different number of pages, the missing pages are not present when calling getDifferences but are present in the PDF file obtained by using write.
Unfortunately I can't attach here the PDFs I use but I can send it to you by any mean you see fit if necessary.

Steps to reproduce :

Create two PDF, one with only one page (let's name it onePage.pdf), the second with two pages (twoPages.pdf)
Programmatically compare them with new PdfComparator<CompareResultImpl>("onePage.pdf","twoPages.pdf").compare()
Get the list of differences with getDifferences()

Current result :
The list has one null element

Expected result :
The list has one PageArea element representing the whole missing page

By (very) quickly looking through the code (thus I am probably wrong), I think that in DiffImage#diffImages the mark function is call correctly but the PageArea is not created which makes it null as a result.

Can we get difference output in JSON format instead of PDF?

Hey, i was wondering if we can get difference output in json instead of pdf so i can just grab the difference area from JSON output?

[QUESTION]: How to interpret the log of differences?

This is a question not an issue. How do i interpret the log of differences which get generated?
Consider the following log which gets generated:
"Differences found at { page: 24, x1: 105, y1: 354, x2: 2137, y2: 2787 }"
Reason i ask is, i tried to use PdfBox API and applied the same regions to retrieve the text from the actual pdf file, and i don't get the same content where the difference is?

Unable to use setAllowedDiffInPercent()

I have following code where I am trying to set alloweddiffin percent. But the code does not get compiled and shows error at Compare() saying "Cannot invoke compare() on the primitive type void" Please help me understand what is wrong in the code.
SimpleEnvironment e= new SimpleEnvironment();
e.setAllowedDiffInPercent(0.2);
boolean isEquals = new PdfComparator(ref, act)
.withIgnore(new PageArea(1, 1600, 2000, 2400, 2300))
.withIgnore(new PageArea(2, 1, 3900, 400))
.setEnvironment(e)
.compare()
.writeTo(diffPath);

No pdf file found

I am getting no file found error, though the path i have given consists of the required pdf file

Allow custom DPI in the settings

Currently, the DPI of the comparison images is fixed at 300. It would be good to have an option in the settings (application.conf and Java API) to change it to other values.

Easier way to get differences found in pdf

It's more a question or polite request - is there a way to recapture differences found in pdfs returned in Log by DiffImage.diffImages() method, so I can pass them to my exception message. Or can you return it in some method as String or something like that?

Add API method to set the allowed difference

In a single run I want to compare multiple PDF files. Each file has a different "allowed difference". Currently it is only possible to set a global "allowed difference" via the application.conf. It should be possible to configure it per run.

Feature: output only pages with differences

I have huge documents that needs to be compared (1000-2000 pages) most of them are completely the same but sometimes there are 1 or 2 pages that show a differences. To find these differences easier would it be possible to only output the pages with differences?

Greetings
Ben