borb-examples's People
Forkers
corneliuscob sigmakappa tezheng simrit1 constructionware terragord7 techthiyanes milan-chicago eng-rsmy chorseng vinodkumarkaplan ssahgal vbsoftpl cob05 wandrys-dev shalevy1 stjordanis shaikficus musicologyman mindaugasvaitkus2 rustydigg918 wzoungrana strickvl pavansai018 jjbiggins lambutty amks1 phothok asweigart bassman76jazz frankg1 pensebien cuong-max elijahahianyo davidbradway zachvaliant etrimby kbrown01 topic2k rjschave fischer-ben konahart ridingroad shifeng1111 michallebeda jesusoctavioas lchcapitalhumain olexsyn chrisburch vineetp6 kiranacd jiaxiaoniu swit1983 nooobkevin vikingpathak warkanlockborb-examples's Issues
feasibility question of generating annotated pdf files
Hello,
I would like to create PDFs and remember directly where content is positioned in the PDF (via the bounding box). Therefore I would like to know if it is possible to retrieve the calculated bounding box of objects directly when creating PDFs with Borb. I assume that it should be possible somehow as soon as you add an element like a paragraph to a layout.
Images in PDF created by borb disappear on additional manipulation
Hi,
thanks for the great lib!!!
I'm running into a problem that seems quite strange to me, not sure what the reason could be.
I have an image I added to a pdf, which acts as the template. Here's the code to do so, closely following your ebook:
def convert_png_to_pdf(png_file, output_file):
doc: Document = Document()
# create Page
page: Page = Page(width=Decimal(595.2), height=Decimal(375.12))
page._padding_top = Decimal(0)
page._padding_left = Decimal(0)
page._padding_right = Decimal(0)
page._padding_bottom = Decimal(0)
page._margin_top = Decimal(0)
page._margin_bottom = Decimal(0)
page._margin_left = Decimal(0)
page._margin_right = Decimal(0)
# add Page to Document
doc.add_page(page)
layout: PageLayout = SingleColumnLayout(page)
layout._padding_top = Decimal(0)
layout._padding_left = Decimal(0)
layout._padding_right = Decimal(0)
layout._padding_bottom = Decimal(0)
layout._margin_top = Decimal(0)
layout._margin_bottom = Decimal(0)
layout._margin_left = Decimal(0)
layout._margin_right = Decimal(0)
layout._border_width = Decimal(0)
layout._column_widths = [Decimal(595.2)]
# add an Image
layout.add(
Image(
Path(png_file),
width=Decimal(595.2), height=Decimal(375.12),
margin_top=Decimal(0), margin_bottom=Decimal(0), margin_left=Decimal(0), margin_right=Decimal(0),
padding_top=Decimal(0), padding_bottom=Decimal(0), padding_left=Decimal(0), padding_right=Decimal(0),
border_width=Decimal(0)
)
)
# store
with open(f"editable/{output_file}.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, doc)
This adds the image to an pdf. If I open the pdf file, everything looks correct.
Now, I need to add text to this template. The problem is - when opening the pdf file again, the image disappears. The following code generates a blank page:
with open('pdf.pdf', 'rb') as f:
doc = PDF.loads(f)
with open('output.pdf', 'wb') as f:
PDF.dumps(f, doc)
Steps to reproduce:
Add an image to a newly created PDF, then load and save it using this library.
Can you help?
Thanks a lot!
Form data not printing correct values
When creating a form with multiple TextField
's, their values will overwrite each other. Each of the TextField
's in my code have different field_name
values (which I assume is similar to an ID in CSS).
Shown below is the pdf as seen in Chrome. I have added unique inputs for each TextField
.
Shown below is the print preview for this pdf with the same set of inputs as above.
When I tried to print this pdf (with all unique values), it printed the "print preview" version instead. If you save the PDF with unique values, it will also save the "print preview" version.
Example code: (please note: actual table is much bigger, but this should produce the same result)
layout.add(
FixedColumnWidthTable(
number_of_columns=4,
number_of_rows=2,
margin_top=80,
column_widths=[Decimal(2.1), Decimal(3), Decimal(0.5), Decimal(3)],
vertical_alignment=Alignment.TOP,
)
.add(
Paragraph(
"User Name:",
font="Helvetica-bold",
font_size=11,
padding_bottom=6
)
)
.add(
TextField(
field_name="username",
padding_top=Decimal(4),
font_size=11,
padding_bottom=6
)
)
.add(
Paragraph(
"ID:",
font="Helvetica-bold",
font_size=11,
padding_bottom=6
)
)
.add(
TextField(
field_name="eid",
padding_top=Decimal(4),
font_size=11,
padding_bottom=6
)
)
.add(
Paragraph(
"Computer Name:",
font="Helvetica-bold",
font_size=11,
padding_bottom=6
)
)
.add(
TableCell(
TextField(
field_name="newpcname",
padding_top=Decimal(4),
font_size=11,
padding_bottom=6
),
col_span=3
)
)
.set_padding_on_all_cells(
Decimal(3), Decimal(2), Decimal(0), Decimal(2))
.no_borders()
)
"Font %s can not represent '%s'" %
There is any solution for this?
Exception has occurred: AssertionError
Font SDGWMI+OpenSans-Semibold can not represent ''
File "C:\Users\pc\Desktop\archive.py", line 16, in main
doc = SimpleFindReplace.sub("Jots", "Joris", doc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\pc\Desktop\archive.py", line 23, in <module>
main()
AssertionError: Font SDGWMI+OpenSans-Semibold can not represent ' '
Traceback (most recent call last):
File "C:\Python311\Lib\runpy.py", line 198, in _run_module_as_main
return _run_code(code, main_globals, None,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\runpy.py", line 88, in _run_code
exec(code, run_globals)
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
cli.main()
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
run()
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
page.apply_redact_annotations()
page.apply_redact_annotations()
File "C:\Python311\Lib\site-packages\borb\pdf\page\page.py", line 152, in apply_redact_annotations
.read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\borb\pdf\canvas\canvas_stream_processor.py", line 300, in read
raise e
File "C:\Python311\Lib\site-packages\borb\pdf\canvas\canvas_stream_processor.py", line 294, in read
operator.invoke(self, operands, event_listeners) File "C:\Python311\Lib\site-packages\borb\pdf\canvas\redacted_canvas_stream_processor.py", line 302, in invoke
self._write_chunk_of_text(
File "C:\Python311\Lib\site-packages\borb\pdf\canvas\redacted_canvas_stream_processor.py", line 230, in _write_chunk_of_text
)._write_text_bytes()
^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\borb\pdf\canvas\layout\text\chunk_of_text.py", line 200, in _write_text_bytes
return self._write_text_bytes_in_hex()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Python311\Lib\site-packages\borb\pdf\canvas\layout\text\chunk_of_text.py", line 242, in _write_text_bytes_in_hex
assert cid is not None, "Font %s can not represent '%s'" % (
^^^^^^^^^^^^^^^
AssertionError: Font SDGWMI+OpenSans-Semibold can not represent
The code is:
from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace
import typing
def main():
doc: typing.Optional[Document] = None
with open("archive.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle)
assert doc is not None
doc = SimpleFindReplace.sub("Jots", "Joris", doc)
with open("output2.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, doc)
if __name__ == "__main__":
main()
Form displaying incorectly
Form displaying incorrectly:
When creating forms, entered values will not show correctly after PDF is saved and reopened. When you enter the textfield in edit-mode, the value previously entered will show. Looks like the last entered value is displayed in all the textfields. The same effect can be created with the following code:
layout.add(
FixedColumnWidthTable(number_of_columns=2, number_of_rows=4)
.add(Paragraph("First Name:"))
.add(TextField(field_name="firstname"))
.add(Paragraph("Last Name:"))
.add(TextField(field_name="lastname"))
.add(Paragraph("Country"))
.add(TextField(field_name="country",value="NaN"))
.set_padding_on_all_cells(Decimal(2), Decimal(2), Decimal(2), Decimal(2))
.no_borders()
)
When opening this pdf, all fields will show "NaN".
Python 3.10.3
borb-2.0.21
Platform Windows 10
Pdf tested in chrome and Acrobat Reade
output_form.pdf
r
How define custom page margins?
How define custom page margins for all page in document?
I wish used non standard margins (5x5x5x5) page A4.
CSS doesn't apply when when create pdf from html.
I created some PDFs from HTML but styling did not apply. I tried in <style> tag and in inline style as well. both did work.
KeyError in borb when Replacing Text in an Existing PDF
Hi Joris
Thank you for borb. I tried the example Replacing Text in an Existing PDF today with borb 2.1.10 installed via pip. It terminates with this error message:
File "borb/toolkit/text/simple_find_replace.py", line 77, in sub
page.apply_redact_annotations()
File "borb/pdf/page/page.py", line 145, in apply_redact_annotations
for x in self["Annots"]
KeyError: 'Annots'
I tried adding a guard if "Annots" in self
, but then the resulting PDF is rendered without all text.
Unfortunately, I cannot share the specific PDF with you since it contains sensitive information.
Kind Regards
Hermann
Turkish characters are not supported.
Turkish characters are not supported.
Arabic fonts cannot represent several caracter when using bidi algrothim and resharper
I am trying to print some Arabic text, and since we need to support bidirectional text, we are using bidi algorithm and Arabic Resharper. We are getting an exception to say
assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font HSNIbtisam can not represent 'ﺕ'
Knowing that the font support all Arabic characters.
def format(self, delivery_voucher: DeliveryVoucher):
document = Document()
page = Page()
document.add_page(page)
layout: PageLayout = SingleColumnLayout(page)
font_path = Path(__file__).parent / 'misc/fonts/Ibtisam.ttf'
custom_font = TrueTypeFont.true_type_font_from_file(font_path)
print(font_path)
arabic_text= "ابتثجحخدذرزسشصضطظعغفقكلمنهوي"
reshaped_text = arabic_reshaper.reshape(arabic_text)
bidirectional_text = algorithm.get_display(reshaped_text)
# add a Paragraph
layout.add(borb.pdf.Paragraph(bidirectional_text, font=custom_font))
# store
with open("output.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, document)
Long tables
Hi, what approach would you recommend to split long tables into multiple pages? I didn't find any built-in solution, but there must be a way, right?
Support for non-english language
AssertionError: Font Helvetica can not represent '৳'
I tried my custom font for the language "Bangla". But it doesn't work.
Code Provided for Example 5.7's pdf is not working with borb 2.1.17
I am currently following through tutorials and I have discovered that the example 5.7s example to extract all colors on a given pdf file is not working on my machine.
Below is the full Traceback output for my attempts to run the code
reading page
Traceback (most recent call last):
File "c:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\Image and color Extraction.py", line 80, in <module>
t.add(
File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 394, in add
first_incomplete_row: int = min(
^^^^
ValueError: min() arg is an empty sequence
The code I am running:
#!chapter_005/src/snippet_013.py
from borb.io.read.types import Name, String, Dictionary
from borb.pdf import SingleColumnLayout
from borb.pdf import PageLayout
from borb.pdf import Paragraph
from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import PDF
from borb.pdf import HexColor
from borb.pdf import Image
from borb.pdf import HexColor, RGBColor, Color
from borb.pdf.canvas.geometry.rectangle import Rectangle
from borb.pdf import ConnectedShape
from borb.pdf import Alignment
from borb.pdf import FlexibleColumnWidthTable
from borb.pdf import LineArtFactory
from borb.toolkit import ColorExtraction
import typing
from decimal import Decimal
# # create Document
# doc: Document = Document()
# # create Page
# page: Page = Page()
# # add Page to Document
# doc.add_page(page)
# # set a PageLayout
# layout: PageLayout = SingleColumnLayout(page)
# # the following code adds 3 paragraphs, each in a different color
# layout.add(Paragraph("Hello World!", font_color=HexColor("FF0000")))
# layout.add(Paragraph("Hello World!", font_color=HexColor("00FF00")))
# layout.add(Paragraph("Hello World!", font_color=HexColor("0000FF")))
# # the following code adds 1 image
# layout.add(
# Image(
# "https://images.unsplash.com/photo-1589606663923-283bbd309229?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8",
# width=Decimal(256),
# height=Decimal(256),
# )
# )
# # store
# with open("ImageOutput.pdf", "wb") as out_file_handle:
# PDF.dumps(out_file_handle, doc)
##############################################################################################
doc: typing.Optional[Document] = None
l: ColorExtraction = ColorExtraction()
with open("ImageOutput.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle, [l])
l.get_
# extract colors
colors: typing.Dict[Color, Decimal] = l.get_color()[0]
# create output Document
doc_out: Document = Document()
# add Page
p: Page = Page()
doc_out.add_page(p)
# add PageLayout
l: PageLayout = SingleColumnLayout(p)
# add Paragraph
l.add(Paragraph("These are the colors used in the input PDF:"))
# add Table
t: FlexibleColumnWidthTable = FlexibleColumnWidthTable(
number_of_rows=3, number_of_columns=3, horizontal_alignment=Alignment.CENTERED
)
for c in colors.keys():
t.add(
ConnectedShape(
LineArtFactory.droplet(
Rectangle(Decimal(0), Decimal(0), Decimal(32), Decimal(32))
),
stroke_color=c,
fill_color=c,
)
)
t.set_padding_on_all_cells(Decimal(5), Decimal(5), Decimal(5), Decimal(5))
t.no_borders()
l.add(t)
# store
with open("outputImage_Colors.pdf", "wb") as pdf_file_handle:
PDF.dumps(pdf_file_handle, doc_out)
Cant read an specif pdf file
I have some trouble.
I'm getting the following message Unexpected character at end of dictionary.
when trying to read this specifc pdf file.
Form fields created by Borb is not visible in pdf xchange editor
Hi, I tried creating a sample pdf with form element using the sample code.
the element created was some text fields.
when I opened the pdf in pdf xchange editor , the form fields were not visible, only a rectangular box was visible which are not editable.
when I opened the file in Acrobat, the form fields are visible and when I entered some text there and saved, the form fields were visible back when I opened in pdf xchange editor.
may I know what is issue here? I dodnot face the same when form was created using other modules like pdfrw etc..
Extract highlighted text from PDF document?
Does this library support extracting highlighted text from a PDF document?
KeyError: 'XRef' when creating editable form
While trying to recreate a form we use in office (currently filled out by hand), I came across this XRef error. The form consists of a logo/title sharing the top of the page, followed by the inputs shown below. Unfortunately I cannot share the other elements (logo, title, subtitle) without redacting them to oblivion.
The script executes perfectly when commenting out this specific table.
Edit: Let me also mention that I'd prefer to match the original layout (with mostly two label/field combinations per line). Having nine rows was a compromise when I thought that might be the issue.
The offending FlexibleWidthColumn:
layout.add(
FlexibleColumnWidthTable(number_of_columns=2, number_of_rows=9)
.add(Paragraph("User Name: "))
.add(TextField(field_name="username"))
.add(Paragraph("ID: "))
.add(TextField(field_name="eid"))
.add(Paragraph("Computer Name: "))
.add(TextField(field_name="newpcname"))
.add(Paragraph("Replacing Computer: "))
.add(TextField(field_name="oldpcname"))
.add(Paragraph("S/N: "))
.add(TextField(field_name="oldserial"))
.add(Paragraph("Keep in Service"))
.add(TextField(field_name="service"))
.add(Paragraph("Location"))
.add(TextField(field_name="location"))
.add(Paragraph("Model: "))
.add(TextField(field_name="model"))
.add(Paragraph("S/N: "))
.add(TextField(field_name="serial"))
.set_padding_on_all_cells(Decimal(2), Decimal(2), Decimal(2), Decimal(2))
.no_borders()
)
Stack trace:
Traceback (most recent call last):
File "c:\Users\user\Documents\python\PDFGen\main.py", line 122, in <module>
main()
File "c:\Users\user\Documents\python\PDFGen\main.py", line 92, in main
layout.add(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\page_layout\multi_column_layout.py", line 194, in add
layout_rect = layout_element.layout(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 290, in layout
return self.calculate_layout_box_and_do_layout(page, bounding_box)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 303, in calculate_layout_box_and_do_layout
layout_box = self._calculate_layout_box(page, bounding_box)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
returned_layout_box = self._calculate_layout_box_without_padding(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 241, in _calculate_layout_box_without_padding
layout_rect = self._do_layout_without_padding(page, bounding_box)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\flexible_column_width_table.py", line
87, in _do_layout_without_padding
t.calculate_min_and_max_width()
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 86, in calculate_min_and_max_width
max_bounding_box: Rectangle = self._calculate_layout_box(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
returned_layout_box = self._calculate_layout_box_without_padding(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 123, in _calculate_layout_box_without_padding
return self._layout_element._calculate_layout_box(page, bounding_box)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
returned_layout_box = self._calculate_layout_box_without_padding(
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 241, in _calculate_layout_box_without_padding
layout_rect = self._do_layout_without_padding(page, bounding_box)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\forms\text_field.py", line 161, in _do_layout_without_padding
self._init_widget_dictionary(page, layout_rect)
File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\forms\text_field.py", line 98, in _init_widget_dictionary
catalog: Dictionary = page.get_root()["XRef"]["Trailer"]["Root"] # type: ignore [attr-defined]
KeyError: 'XRef'
How to manipulate pdf outline
snippet_013.py error?
chapter_005/src/snippet_013.py fails with the following error:
from borb.toolkit.color.color_spectrum_extraction import ColorSpectrumExtraction ModuleNotFoundError: No module named 'borb.toolkit.color.color_spectrum_extraction'
borb.toolkit.color doesn't appear to contain a color_spectrum_extraction
function.
Should these two references to ColorSpectrumExtraction in snippet_013.py be changed to ColorExtraction? Or is something else amiss?
from borb.toolkit.color.color_spectrum_extraction import ColorSpectrumExtraction
... some code removed ...
l: ColorSpectrumExtraction = ColorSpectrumExtraction()
No support to chinese
I found borb
doesn't support Chinese character when add Chinese text to PDF page, any idea to solve this?
Footer
Hello! great library. thanks for your work. Through the translator I studied the documentation, but did not find it. how to add header and footer to pages?
How to select column manually in MultiColumnLayout page?
I have created a MultiColumnLayout pdf page and adding table and graph in the page.
Is there any way I can add the graph in second column (manually) of page whether the first column is full or not?
Exception of openig certain files
Hi, I'm trying to open a set of PDFs.
On this set exists a sub-set that makes the library crash.
I reported the stack trace of the exception.
Traceback (most recent call last):
File "/home/scampese/Repository/AuraTests/borb_test.py", line 36, in <module>
main()
File "/home/scampese/Repository/AuraTests/borb_test.py", line 26, in main
doc = PDF.loads(in_file_handle, [l1])
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/pdf/pdf.py", line 54, in loads
return ReadAnyObjectTransformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/xref_transformer.py", line 140, in transform
trailer = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
transformed_referenced_object = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/page/root_dictionary_transformer.py", line 84, in transform
transformed_root_dictionary = t.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
transformed_referenced_object = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/array_transformer.py", line 46, in transform
object_to_transform[i] = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
transformed_referenced_object = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/page/page_dictionary_transformer.py", line 62, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
v = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
transformed_referenced_object = self.get_root_transformer().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
return super().transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
out = h.transform(
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/stream_transformer.py", line 53, in transform
object_to_transform = decode_stream(object_to_transform)
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/filter/stream_decode_util.py", line 74, in decode_stream
transformed_bytes = RunLengthDecode.decode(transformed_bytes)
File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/filter/run_length_decode.py", line 34, in decode
n = bytes_in[i + 1]
IndexError: index out of range
Process finished with exit code 1
N.B.: with other libraries like pdfquery or textextract I'm able to open the same file without problems
Do you have a suggestion on how to deal with it?
My env:
- python 3.8.12
Thanks!
TextField Object filling the same answer in all textfields
I have been trying to use the code available in the chapter 4 about form fields. So, in the output PDF when I try to fill any TextField, the following Fields contain the same information. I would like to have them independent but that doesn't seem to work no matter how I try.
I am using borb version 2.0.27.
Does any know if the issue actually comes from the TextField Object or is it from something else ?
the use of col_span in example snippets.
The documentation and examples have col_span but the correct key word is column_span
Trigger javascript on change of TextField value
Dear Joris,
Many thanks for creating such a great library for building PDF documents from Python! I really enjoy playing with it!
In one of the chapters, you provide an example in which a javascript function is called when pushing a button.
Would it be also possible to trigger a piece of script when exiting the edit mode of a text field / when the value of a field is changed?
Many thanks in advance for your answer!
Best,
Hendrik
Missing Documentation
A good document should be written to understand the package.
Extract unicode text
Hi, I used borb to extract text from a PDF using SimpleTextExtraction in the example code below to learn how the tools works.
I did get the text out, but there seems to be unicode errors in the text, for example
Dossiê da Unidade Curricular
instead of:
Dossiê da Unidade Curricular
Is there a way to add a codec somewhere?
#!chapter_005/src/snippet_005.py
import typing
from borb.pdf.document.document import Document
from borb.pdf.pdf import PDF
from borb.toolkit.text.simple_text_extraction import SimpleTextExtraction
def main():
# read the Document
doc: typing.Optional[Document] = None
l: SimpleTextExtraction = SimpleTextExtraction()
with open("output.pdf", "rb") as in_file_handle:
doc = PDF.loads(in_file_handle, [l])
# check whether we have read a Document
assert doc is not None
# print the text on the first Page
print(l.get_text_for_page(0))
if __name__ == "__main__":
main()
Export PDF to JPG(PIL) with different DPI
Hi, Joris!
Can I change the image quality when exporting to JPG? Ex. DPI=300 or more
Thank you!
Issue while running snippets 002 from chapter 007
Hi I'm running python version 3.8 on top of ubuntu 18.04.
and ratting issue with snippets #!chapter_007/src/snippet_002.py
getting error at
with open("data/ama_logistic_236523.pdf", "rb") as pdf_file_handle:
doc = PDF.loads(pdf_file_handle, [l])
Error: AssertionError: A Rectangle must have a non-negative width.
when i try another PDF, first error passed and getting another error which mentioned below.
# get page
p: Page = doc.get_page(0)
Error: TypeError: element indices must be integers
Error: Paragraph object has no attribute layout
I am getting an error when running snippets
chapter_002/src/snippet_018.py
...
chapter_002/src/snippet_027.py
: 'Paragraph' object has no attribute 'layout'
For example:
chapter_002/src/snippet_022.py
crashed with error:
Traceback (most recent call last):
File "/home/.../cgi-bin/test_borb_sn022.py", line 47, in <module>
main()
File "/home/.../cgi-bin/test_borb_sn022.py", line 39, in main
).layout(page, r)
AttributeError: 'Paragraph' object has no attribute 'layout'
Distributor ID: Ubuntu
Description: Linux Lite 6.0
Release: 22.04
Codename: jammy
Python 3.10.4
borb-2.1.0
Pillow-9.2.0
certifi-2022.6.15.1
charset-normalizer-2.1.1
fonttools-4.37.1
idna-3.3
python-barcode-0.14.0
qrcode-7.3.1
requests-2.28.1
urllib3-1.26.12
Why I use SimpleTextExtraction extracting text but pages not right
Dear all,
As I followed the instructions to extract text using the code below , but I found the order of variable(pages) is totally right comparing with my original pdf file , and I attached the code and pic below .
import typing
from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleTextExtraction
import os
import numpy as np
import argparse
doc: typing.Optional[Document] = None
l: SimpleTextExtraction = SimpleTextExtraction()
with open('1.pdf', "rb") as in_file_handle:
doc = PDF.loads(in_file_handle, [l])
assert doc is not None
pages = l.get_text()
pages
The pic below as u can see , pages[1] is the page of 4 for my pdf file , so it not matched . Could u plz tell me how to solve it , since in my pdf file there is no pages info writen in the pdf file.
cannot import name 'Document' from 'borb.pdf'
Hi, while trying to import the following, I got an error
from pathlib import Path
from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import SingleColumnLayout
from borb.pdf import Paragraph
from borb.pdf import PDF
ImportError: cannot import name 'Document' from 'borb.pdf' (/home/niko/anaconda3/envs/py39/lib/python3.9/site-packages/borb/pdf/init.py)
version: '2.1.5.2'
Excessive Compartmentalization
Hi, Joris
I am an Indian programmer working on some AI functionalities, such as OCR on scanned PDFs. I have just discovered borb, and I must say, I am deeply impressed by both the power and the level of detail in the documentation of this library. I would love to see this project grow and become a very well-integrated part of the Python family, recognized and respected by the developers who use Python. I think Borb holds the key to making PDF more easy to work with for all programmers around the world, and heaven knows there are a ton of scanned PDFs lying around which need work.
But there is something that I think is going to hold Borb back a lot. And that is the monstrous level of undue complexity in the package tree of the framework. I mean, just look at this:
from borb.pdf.document.document import Document
Now you have to understand, for anyone who is familiar with the way the Python developers build their frameworks, this is about the ugliest single line of text in human history. And it will remain so embedded in their minds as the ugliest piece of writing they have seen until they import the PageLayout.
For instance, the borb.pdf.document
package contains just two things - a name tree and the borb.pdf.document.document
sub-package. This latter subpackage contains only one thing, and that is the Document
class. So here is what I would like to know - was there really a need for the subpackage to wrap the class?
Instead of this thing -
flowchart TD
A[borb.pdf.document] --> B[borb.pdf.document.name_tree];
A --> C[borb.pdf.document.document];
C --> D[borb.pdf.document.document.Document];
B --> E[borb.pdf.document.name_tree.NameTree];
style D fill:#afe, stroke:#2b7;
style E fill:#afe, stroke:#2b7;
Why can we not simply have:
flowchart TD
A[borb.pdf.document] --> B[borb.pdf.document.NameTree];
A --> C[borb.pdf.document.Document];
style C fill:#afe, stroke:#2b7;
style B fill:#afe, stroke:#2b7;
That would have made the importing of the modules so much simpler and easier, not to mention made so much more sense. The borb.pdf.document.document
is a wrapper that only serves to wrap a single class. Can we not remove it altogether and replace it by the class it wraps? Does it hold the place for some other classes which will be entered in future? Was there a plan, since abandoned, to put more classes in there?
The Central Problem
As it stands, the borb
package tree is immense, with most of the nodes having only one descendant. it is a cardinal rule in making a good tree that all non-leaf nodes should have at least more than one descendant. The borb
import statements are tiresome to remember, compared to other import statements.
Consider how the skimage.io
and skimage.draw
modules work. There is no further io
subpackage inside the skimage.io
- it directly yields us its classes and methods. We need something like that inside the borb
module, a simple and easy-to-recall package tree which we can all love and use by instinct, like all the beloved packages in Python are used.
How to change attributes of checkbox.
Hi, @jorisschellekens Thank you for an amazing implementation. I am using checkboxes in my pdf but I expect different shapes of checkboxes and also different sizes. As its implementation is in to-do so I am unaware of attributes accepted by the checkbox class. How can I achieve this?
Markdown to PDF for long document
Hello
Thank you for this great library. I am using the example from the Markdown section it the documentation.
I am trying to convert a markdown article to PDF but am getting a:
AssertionError: BlockFlow is too tall to fit inside column / page.
I assume this is because the layout element is too large on the page. What would you suggest for long Markdown documents? My thoughts were adding each new paragraph as a new layout element , but I am unsure how to do this.
Many thanks
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.