mkl-public / testarea-pdfbox2 Goto Github PK

View Code? Open in Web Editor NEW

78.0 11.0 43.0 138.13 MB

Test area for public PDFBox v2 issues on stackoverflow etc

License: Apache License 2.0

Java 100.00%

pdfbox java pdf

testarea-pdfbox2's People

Contributors

Stargazers

Watchers

Forkers

npcoderat knighthunter09 matteosusca haoxiaoyong1014 hossam45 cybernetics si294r vishal-kedia abnerfang jsonal1996 ljader-forks ashishgaikwad ntthanh feihu618 fammr pramoth peter-up zhongguogu alesita821 ajkhedkar15 sooids dr-o07 arshaj884 chandreshspringer namanmandlik speleo27 tominem sundewang bahinapster lvtaoleo zhus marvin-2020 gibotsgithub iceredtea piko35 lyf2nb yueyub sonim1 lhchi04 aino-gautam pdob-git elvisbegovic

testarea-pdfbox2's Issues

Tables with Merged cells

I'm traying to resolve issue from question on stackoverflow. Any help would be appreciated
Tables with Merged cells question 78001237

One case fails to remove invisible texts or symbols

Hi,There is one case that fails to remove invisible text by PDFVisibleTextStripper.java.
In PDF page One.

00000000000005fw6q.pdf

One issue with BreakLongString.java example

Hi there,

I've posted this question to Stackoverflow but it is deleted by administrator.
I'm following the example of BreakLongString.java to show my content in Signature field.
The problem is, if the content of my text contains a very long word, this example does not handle this case yet.

For example, this is my text

String text = "I am trying toTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTM create a PDF file with a lot of text contents in the document. I am using PDFBox";

And the text is truncated in the generated pdf file.

Please help me to have a look on the issue?

How to add image to pdf

Hi,

I seen your test case for creating a button and adding an image appearance.

Im looking to use an existing button (as a placeholder) on the pdf to draw an image. Is this the correct approach?

If so does this code look correct? (This is based of the test you defined)

  try (InputStream resource = getClass().getClassLoader().getResourceAsStream("2x2colored.png")) {
            BufferedImage bufferedImage = ImageIO.read(resource);
            PDImageXObject pdImageXObject = LosslessFactory.createFromImage(acrobatDocument, bufferedImage);
            float width = 10 * pdImageXObject.getWidth();
            float height = 10 * pdImageXObject.getHeight();

            PDAppearanceStream pdAppearanceStream = new PDAppearanceStream(acrobatDocument);
            pdAppearanceStream.setResources(new PDResources());
            try (PDPageContentStream pdPageContentStream = new PDPageContentStream(acrobatDocument, pdAppearanceStream)) {
                pdPageContentStream.drawImage(pdImageXObject, 0, 0, width, height);
            }
            pdAppearanceStream.setBBox(new PDRectangle(width, height));

            //get button you want to replace
            PDButton button  = (PDPushButton)acrobatAcroForm.getField("PushButton");

            List<PDAnnotationWidget> widgets = button.getWidgets();
            for (PDAnnotationWidget pdAnnotationWidget : widgets) {

                PDAppearanceDictionary pdAppearanceDictionary = pdAnnotationWidget.getAppearance();
                if (pdAppearanceDictionary == null) {
                    pdAppearanceDictionary = new PDAppearanceDictionary();
                    pdAnnotationWidget.setAppearance(pdAppearanceDictionary);
                }

                //add appearance to button
                pdAppearanceDictionary.setNormalAppearance(pdAppearanceStream);
            }
 button.setReadOnly(true);
            acrobatAcroForm.getFields().add(button);

            acrobatDocument.save(new File("build", "imageWithButton.pdf"));
        }

Thanks,
Shane.

How to set multiple language font to text from?

I see mkl answered many PDF related questions and you are a PDF expert. And we run into issues when using pdfbox.
1、How to set multiple language(dynamic fonts) fonts to form field? I saw your implement works well on "showtext" https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/AddTextWithDynamicFonts.java
2、```
public static PDType0Font load(org.apache.pdfbox.pdmodel.PDDocument doc,
java.io.InputStream input,
boolean embedSubset)
embedSubset – True if the font will be subset before embedding. Set this to false when creating a font for AcroForm.

How to embed subset of a font when creating a font for AcroForm. Because CJK font could be very large.

How to Attach an Image to a PDF Form Field Without Invalidating an Existing Signature?

Usecase: I'm working on a feature that requires attaching an image to a PDF without invalidating a pre-existing digital signature. The PDF should have a form field designated for image attachment, which can be populated later. I want to implement this using PDFBox.

Implementation: Since PDFs lack a dedicated image form field, I’m utilizing a PDPushButton as a workaround, following the method outlined in the following issue;

Additionally, the PDF includes a signature form field. The process involves signing the signature field first and subsequently attaching the image to the PDPushButton field. However, this sequence is causing the signature to become invalid.

Here is the code for attaching an image to the PDPushButton.

@SneakyThrows
public static void fillInitialField(String inputFilePath, String outputFilePath) {
    // Load input file
    PDDocument document = PDDocument.load(new File(inputFilePath));

    // Find and link the relevant signature field
    PDPushButton initial = PdfService.findInitial(document, "132323423180965");

    PDImageXObject pdImageXObject = PDImageXObject.createFromFile("initial.png", document);
    float width = 10 * pdImageXObject.getWidth();
    float height = 10 * pdImageXObject.getHeight();

    PDAppearanceStream pdAppearanceStream = new PDAppearanceStream(document);
    pdAppearanceStream.setResources(new PDResources());
    try (PDPageContentStream pdPageContentStream = new PDPageContentStream(document, pdAppearanceStream)) {
        pdPageContentStream.drawImage(pdImageXObject, 200, 300, width, height);
    }
    pdAppearanceStream.setBBox(new PDRectangle(width, height));

    List<PDAnnotationWidget> widgets = initial.getWidgets();
    for (PDAnnotationWidget pdAnnotationWidget : widgets) {

        PDAppearanceDictionary pdAppearanceDictionary = pdAnnotationWidget.getAppearance();
        if (pdAppearanceDictionary == null) {
            pdAppearanceDictionary = new PDAppearanceDictionary();
            pdAnnotationWidget.setAppearance(pdAppearanceDictionary);
        }

        pdAppearanceDictionary.setNormalAppearance(pdAppearanceStream);
    }
    initial.setReadOnly(true);

    // Save and close the document
    FileOutputStream fos = new FileOutputStream(outputFilePath);
    document.save(fos);
    document.close();
}

I’ve created a repository that replicates this issue, which can be found here: https://github.com/ContractSPAN/ImageFormFieldIssue

How can I implement an image attachment to a PDF form field in such a way that it doesn’t invalidate an existing signature? I am open to alternative approaches to achieve this functionality.

cosstream has been closed and cannot be read. perhaps its enclosing pddocument has been closed

Hi!

Thanks for the tool.
I've got an issue when calling the merge method, sometimes (not always but quite frequently) it throws an error saying "cosstream has been closed and cannot be read. perhaps its enclosing pddocument has been closed" on 50th line of DenseMerge and same with VeryDenseTool.

My usage is this:

PdfVeryDenseMergeTool pdfDenseMergeTool = new PdfVeryDenseMergeTool(PDRectangle.A4, dim1, dim2, dim3);

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        
pdfDenseMergeTool.merge(byteArrayOutputStream, listOfDocuments);

Any idea of what is happening SOMETIMES? It feels like a problem of synch between the streams but idk

The problem is that is not deterministic, with exactly the same input it can throw or not the error

OptimizeAfterMerge breaks PDF file on native Firefox pdf reader

Hi there,

Considering issue title :

Reproduction :

Starting from this PDF form-empty.pdf
I create a PDF (merged PDF) to which I apply optimize() methode and result is : form-filled.pdf
But when you open it on Chrome you can see 22 pages but when opening with Firefox you can only see 4 pages.
When open this pdf gile with Adobe Reader you can see 22 pages but if you scroll down you get error 14 after page 3 :

If i create same PDF (merged PDF) without optimize() method you can correctly read it on Firefox with 22 pages : form-filled-NO-optimized.pdf

Temporary workaround :

We cannot save this big file size without compression while with optimize() the PDF size is reduced by 5. We keep optimizing and ask user to read pdf on Adobe Reader OR use Chrome/Edge.

Expected behavior :

It seems optimization method is too agressive. How can we enhance optimize() to not breaks firefox-reader or how we should adapt our initial pdf-empty.pdf file to avoid this situation. It seems my initial pdf form-empty.pdf is not created correctly maybe due to copy/pase of AcroForm field... can this be catched/fixed by opzimize method.

Additionnal info :

We have others pdfs similar to form-filled2.pdf that works with optimize() method but this one I can understand why it breaks firefox building pdf reader.

How to extract all the lines in PDF？

Not all lines are extracted

pdfbox 2.0.21
linux 18.04
java:jdk1.8.0

test file:src/test/resources/mkl/testarea/pdfbox2/extract/demo.pdf
test code:src/test/java/mkl/testarea/pdfbox2/extract/ExtractLinesWithDir.java

Using this method can extract some lines, but not all lines. In the figure, the green line represents the extracted line, and the red "?"mark represents the non extracted line.
How to extract all the lines in PDF?

how to modify form object stream?

https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/RemoveText.java#L49
`
public void processForm(PDFormXObject form) throws IOException {

    final PDStream formContentStream = form.getContentStream();
    final PDResources resources = form.getResources();
    formReplacementStream=formContentStream.createOutputStream(COSName.FLATE_DECODE);
    formReplacement = new ContentStreamWriter(formReplacementStream);
    super.showForm(form);

}

`
how to update the PDFormXObjects' stream?

PdfVeryDenseMergeTool

Hello, we have been using your PdfVeryDenseMergeTool from here: https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/merge/PdfVeryDenseMergeTool.java

We want to port this functionality to PDFBox 2. Can you please guide us on how it can be done? Or are you already working on this?

Thank you!

Nirmal

Performance improvement for OptimizeAfterMerge

I have tested the code with 50k pages in a pdf
The total time it took was 3:10 min

If you change in equals() method following code on line 292:

if (keys.equals(bDict.keySet())) {

with this:

if (keys.size() == bDict.keySet().size()) {

then it takes only 1:08 min. The logic of the code is the same because once we see if the 2 sets are with equal size then we are comparing inside the if statement each element in the first set with the element in the second one - if the second one do not have the key then the value would be null and the comparison won't happen and it will return false on line 295

Error on execute

Hello, I'm trying to run the tests and I always get the error attached. Can you help me ?

PDFRenderer.renderImage render pdf to image failed

https://stackoverflow.com/questions/78105967/i-use-pdfbox-pdfrenderer-renderimage-render-pdf-to-image-failed-the-result-imag
pdf file Uploading hk.pdf…
actual output image
expecting image:

Remove text behind image

Help needed:
I am in a requirement to split a pdf into two, one with image and one with text. I dont want to remove the text which are behind an image and it should be the part of the image pdf. I want to extract only the top layered text in the PDF. Can any one help on this?

I already extracted the image and text into two pdfs by looping through pdf operators. I am facing trouble when not to remove the text behind the PDF.

https://stackoverflow.com/questions/52334071/pdfbox-remove-text-behind-image