mkl-public / testarea-pdfbox2 Goto Github PK
View Code? Open in Web Editor NEWTest area for public PDFBox v2 issues on stackoverflow etc
License: Apache License 2.0
Test area for public PDFBox v2 issues on stackoverflow etc
License: Apache License 2.0
I'm traying to resolve issue from question on stackoverflow. Any help would be appreciated
Tables with Merged cells question 78001237
Hi,There is one case that fails to remove invisible text by PDFVisibleTextStripper.java.
In PDF page One.
Hi there,
I've posted this question to Stackoverflow but it is deleted by administrator.
I'm following the example of BreakLongString.java to show my content in Signature field.
The problem is, if the content of my text contains a very long word, this example does not handle this case yet.
For example, this is my text
String text = "I am trying toTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTM create a PDF file with a lot of text contents in the document. I am using PDFBox";
And the text is truncated in the generated pdf file.
Please help me to have a look on the issue?
Hi,
I seen your test case for creating a button and adding an image appearance.
Im looking to use an existing button (as a placeholder) on the pdf to draw an image. Is this the correct approach?
If so does this code look correct? (This is based of the test you defined)
try (InputStream resource = getClass().getClassLoader().getResourceAsStream("2x2colored.png")) {
BufferedImage bufferedImage = ImageIO.read(resource);
PDImageXObject pdImageXObject = LosslessFactory.createFromImage(acrobatDocument, bufferedImage);
float width = 10 * pdImageXObject.getWidth();
float height = 10 * pdImageXObject.getHeight();
PDAppearanceStream pdAppearanceStream = new PDAppearanceStream(acrobatDocument);
pdAppearanceStream.setResources(new PDResources());
try (PDPageContentStream pdPageContentStream = new PDPageContentStream(acrobatDocument, pdAppearanceStream)) {
pdPageContentStream.drawImage(pdImageXObject, 0, 0, width, height);
}
pdAppearanceStream.setBBox(new PDRectangle(width, height));
//get button you want to replace
PDButton button = (PDPushButton)acrobatAcroForm.getField("PushButton");
List<PDAnnotationWidget> widgets = button.getWidgets();
for (PDAnnotationWidget pdAnnotationWidget : widgets) {
PDAppearanceDictionary pdAppearanceDictionary = pdAnnotationWidget.getAppearance();
if (pdAppearanceDictionary == null) {
pdAppearanceDictionary = new PDAppearanceDictionary();
pdAnnotationWidget.setAppearance(pdAppearanceDictionary);
}
//add appearance to button
pdAppearanceDictionary.setNormalAppearance(pdAppearanceStream);
}
button.setReadOnly(true);
acrobatAcroForm.getFields().add(button);
acrobatDocument.save(new File("build", "imageWithButton.pdf"));
}
Thanks,
Shane.
I see mkl answered many PDF related questions and you are a PDF expert. And we run into issues when using pdfbox.
1、How to set multiple language(dynamic fonts) fonts to form field? I saw your implement works well on "showtext" https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/AddTextWithDynamicFonts.java
2、```
public static PDType0Font load(org.apache.pdfbox.pdmodel.PDDocument doc,
java.io.InputStream input,
boolean embedSubset)
embedSubset – True if the font will be subset before embedding. Set this to false when creating a font for AcroForm.
How to embed subset of a font when creating a font for AcroForm. Because CJK font could be very large.
Usecase: I'm working on a feature that requires attaching an image to a PDF without invalidating a pre-existing digital signature. The PDF should have a form field designated for image attachment, which can be populated later. I want to implement this using PDFBox.
Implementation: Since PDFs lack a dedicated image form field, I’m utilizing a PDPushButton as a workaround, following the method outlined in the following issue;
Additionally, the PDF includes a signature form field. The process involves signing the signature field first and subsequently attaching the image to the PDPushButton field. However, this sequence is causing the signature to become invalid.
Here is the code for attaching an image to the PDPushButton.
@SneakyThrows
public static void fillInitialField(String inputFilePath, String outputFilePath) {
// Load input file
PDDocument document = PDDocument.load(new File(inputFilePath));
// Find and link the relevant signature field
PDPushButton initial = PdfService.findInitial(document, "132323423180965");
PDImageXObject pdImageXObject = PDImageXObject.createFromFile("initial.png", document);
float width = 10 * pdImageXObject.getWidth();
float height = 10 * pdImageXObject.getHeight();
PDAppearanceStream pdAppearanceStream = new PDAppearanceStream(document);
pdAppearanceStream.setResources(new PDResources());
try (PDPageContentStream pdPageContentStream = new PDPageContentStream(document, pdAppearanceStream)) {
pdPageContentStream.drawImage(pdImageXObject, 200, 300, width, height);
}
pdAppearanceStream.setBBox(new PDRectangle(width, height));
List<PDAnnotationWidget> widgets = initial.getWidgets();
for (PDAnnotationWidget pdAnnotationWidget : widgets) {
PDAppearanceDictionary pdAppearanceDictionary = pdAnnotationWidget.getAppearance();
if (pdAppearanceDictionary == null) {
pdAppearanceDictionary = new PDAppearanceDictionary();
pdAnnotationWidget.setAppearance(pdAppearanceDictionary);
}
pdAppearanceDictionary.setNormalAppearance(pdAppearanceStream);
}
initial.setReadOnly(true);
// Save and close the document
FileOutputStream fos = new FileOutputStream(outputFilePath);
document.save(fos);
document.close();
}
I’ve created a repository that replicates this issue, which can be found here: https://github.com/ContractSPAN/ImageFormFieldIssue
How can I implement an image attachment to a PDF form field in such a way that it doesn’t invalidate an existing signature? I am open to alternative approaches to achieve this functionality.
Hi!
Thanks for the tool.
I've got an issue when calling the merge method, sometimes (not always but quite frequently) it throws an error saying "cosstream has been closed and cannot be read. perhaps its enclosing pddocument has been closed" on 50th line of DenseMerge and same with VeryDenseTool.
My usage is this:
PdfVeryDenseMergeTool pdfDenseMergeTool = new PdfVeryDenseMergeTool(PDRectangle.A4, dim1, dim2, dim3);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
pdfDenseMergeTool.merge(byteArrayOutputStream, listOfDocuments);
Any idea of what is happening SOMETIMES? It feels like a problem of synch between the streams but idk
The problem is that is not deterministic, with exactly the same input it can throw or not the error
Hi there,
Considering issue title :
Starting from this PDF form-empty.pdf
I create a PDF (merged PDF) to which I apply optimize() methode and result is : form-filled.pdf
But when you open it on Chrome you can see 22 pages but when opening with Firefox you can only see 4 pages.
When open this pdf gile with Adobe Reader you can see 22 pages but if you scroll down you get error 14 after page 3 :
We cannot save this big file size without compression while with optimize() the PDF size is reduced by 5. We keep optimizing and ask user to read pdf on Adobe Reader OR use Chrome/Edge.
It seems optimization method is too agressive. How can we enhance optimize() to not breaks firefox-reader or how we should adapt our initial pdf-empty.pdf file to avoid this situation. It seems my initial pdf form-empty.pdf is not created correctly maybe due to copy/pase of AcroForm field... can this be catched/fixed by opzimize method.
We have others pdfs similar to form-filled2.pdf that works with optimize() method but this one I can understand why it breaks firefox building pdf reader.
pdfbox 2.0.21
linux 18.04
java:jdk1.8.0
test file:src/test/resources/mkl/testarea/pdfbox2/extract/demo.pdf
test code:src/test/java/mkl/testarea/pdfbox2/extract/ExtractLinesWithDir.java
https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/content/RemoveText.java#L49
`
public void processForm(PDFormXObject form) throws IOException {
final PDStream formContentStream = form.getContentStream();
final PDResources resources = form.getResources();
formReplacementStream=formContentStream.createOutputStream(COSName.FLATE_DECODE);
formReplacement = new ContentStreamWriter(formReplacementStream);
super.showForm(form);
}
`
how to update the PDFormXObjects' stream?
Hello, we have been using your PdfVeryDenseMergeTool from here: https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/merge/PdfVeryDenseMergeTool.java
We want to port this functionality to PDFBox 2. Can you please guide us on how it can be done? Or are you already working on this?
Thank you!
Nirmal
I have tested the code with 50k pages in a pdf
The total time it took was 3:10 min
If you change in equals() method following code on line 292:
if (keys.equals(bDict.keySet())) {
with this:
if (keys.size() == bDict.keySet().size()) {
then it takes only 1:08 min. The logic of the code is the same because once we see if the 2 sets are with equal size then we are comparing inside the if statement each element in the first set with the element in the second one - if the second one do not have the key then the value would be null and the comparison won't happen and it will return false on line 295
https://stackoverflow.com/questions/78105967/i-use-pdfbox-pdfrenderer-renderimage-render-pdf-to-image-failed-the-result-imag
pdf file Uploading hk.pdf…
actual output image
expecting image:
Help needed:
I am in a requirement to split a pdf into two, one with image and one with text. I dont want to remove the text which are behind an image and it should be the part of the image pdf. I want to extract only the top layered text in the PDF. Can any one help on this?
I already extracted the image and text into two pdfs by looping through pdf operators. I am facing trouble when not to remove the text behind the PDF.
https://stackoverflow.com/questions/52334071/pdfbox-remove-text-behind-image
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.