jorisschellekens / borb-examples Goto Github PK

View Code? Open in Web Editor NEW

408.0 408.0 58.0 31.92 MB

Python 99.97% HTML 0.03%

borb-examples's People

Stargazers

Watchers

Forkers

corneliuscob sigmakappa tezheng simrit1 constructionware terragord7 techthiyanes milan-chicago eng-rsmy chorseng vinodkumarkaplan ssahgal vbsoftpl cob05 wandrys-dev shalevy1 stjordanis shaikficus musicologyman mindaugasvaitkus2 rustydigg918 wzoungrana strickvl pavansai018 jjbiggins lambutty amks1 phothok asweigart bassman76jazz frankg1 pensebien cuong-max elijahahianyo davidbradway zachvaliant etrimby kbrown01 topic2k rjschave fischer-ben konahart ridingroad shifeng1111 michallebeda jesusoctavioas lchcapitalhumain olexsyn chrisburch vineetp6 kiranacd jiaxiaoniu swit1983 nooobkevin vikingpathak warkanlock

borb-examples's Issues

feasibility question of generating annotated pdf files

Hello,

I would like to create PDFs and remember directly where content is positioned in the PDF (via the bounding box). Therefore I would like to know if it is possible to retrieve the calculated bounding box of objects directly when creating PDFs with Borb. I assume that it should be possible somehow as soon as you add an element like a paragraph to a layout.

Images in PDF created by borb disappear on additional manipulation

Hi,

thanks for the great lib!!!

I'm running into a problem that seems quite strange to me, not sure what the reason could be.

I have an image I added to a pdf, which acts as the template. Here's the code to do so, closely following your ebook:

def convert_png_to_pdf(png_file, output_file):
    doc: Document = Document()

    # create Page
    page: Page = Page(width=Decimal(595.2), height=Decimal(375.12))
    page._padding_top = Decimal(0)
    page._padding_left = Decimal(0)
    page._padding_right = Decimal(0)
    page._padding_bottom = Decimal(0)
    page._margin_top = Decimal(0)
    page._margin_bottom = Decimal(0)
    page._margin_left = Decimal(0)
    page._margin_right = Decimal(0)
    # add Page to Document
    doc.add_page(page)

    layout: PageLayout = SingleColumnLayout(page)
    layout._padding_top = Decimal(0)
    layout._padding_left = Decimal(0)
    layout._padding_right = Decimal(0)
    layout._padding_bottom = Decimal(0)
    layout._margin_top = Decimal(0)
    layout._margin_bottom = Decimal(0)
    layout._margin_left = Decimal(0)
    layout._margin_right = Decimal(0)
    layout._border_width = Decimal(0)
    layout._column_widths = [Decimal(595.2)]

    # add an Image
    layout.add(
        Image(
            Path(png_file),
            width=Decimal(595.2), height=Decimal(375.12),
            margin_top=Decimal(0), margin_bottom=Decimal(0), margin_left=Decimal(0), margin_right=Decimal(0),
            padding_top=Decimal(0), padding_bottom=Decimal(0), padding_left=Decimal(0), padding_right=Decimal(0),
            border_width=Decimal(0)
        )
    )

    # store
    with open(f"editable/{output_file}.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

This adds the image to an pdf. If I open the pdf file, everything looks correct.

Now, I need to add text to this template. The problem is - when opening the pdf file again, the image disappears. The following code generates a blank page:

with open('pdf.pdf', 'rb') as f:
    doc = PDF.loads(f)
with open('output.pdf', 'wb') as f:
    PDF.dumps(f, doc)

Steps to reproduce:

Add an image to a newly created PDF, then load and save it using this library.

Can you help?

Thanks a lot!

Form data not printing correct values

When creating a form with multiple TextField's, their values will overwrite each other. Each of the TextField's in my code have different field_name values (which I assume is similar to an ID in CSS).

Shown below is the pdf as seen in Chrome. I have added unique inputs for each TextField.

Shown below is the print preview for this pdf with the same set of inputs as above.

When I tried to print this pdf (with all unique values), it printed the "print preview" version instead. If you save the PDF with unique values, it will also save the "print preview" version.

Example code: (please note: actual table is much bigger, but this should produce the same result)

layout.add(
  FixedColumnWidthTable(
    number_of_columns=4,
    number_of_rows=2,
    margin_top=80,
    column_widths=[Decimal(2.1), Decimal(3), Decimal(0.5), Decimal(3)],
    vertical_alignment=Alignment.TOP,
  )
  .add(
    Paragraph(
      "User Name:",
      font="Helvetica-bold",
      font_size=11,
      padding_bottom=6
    )
  )
  .add(
    TextField(
      field_name="username",
      padding_top=Decimal(4),
      font_size=11,
      padding_bottom=6
    )
  )
  .add(
    Paragraph(
      "ID:",
      font="Helvetica-bold",
      font_size=11,
      padding_bottom=6
    )
  )
  .add(
    TextField(
      field_name="eid",
      padding_top=Decimal(4),
      font_size=11,
      padding_bottom=6
    )
  )
  .add(
    Paragraph(
      "Computer Name:",
      font="Helvetica-bold",
      font_size=11,
      padding_bottom=6
    )
  )
  .add(
    TableCell(
      TextField(
        field_name="newpcname",
        padding_top=Decimal(4),
        font_size=11,
        padding_bottom=6
      ),
      col_span=3
    )
  )
  .set_padding_on_all_cells(
  Decimal(3), Decimal(2), Decimal(0), Decimal(2))
  .no_borders()
)

"Font %s can not represent '%s'" %

There is any solution for this?

Exception has occurred: AssertionError
Font SDGWMI+OpenSans-Semibold can not represent ''
  File "C:\Users\pc\Desktop\archive.py", line 16, in main
    doc = SimpleFindReplace.sub("Jots", "Joris", doc)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\pc\Desktop\archive.py", line 23, in <module>
    main()
AssertionError: Font SDGWMI+OpenSans-Semibold can not represent ' '


Traceback (most recent call last):
  File "C:\Python311\Lib\runpy.py", line 198, in _run_module_as_main
    return _run_code(code, main_globals, None,      
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^      
  File "C:\Python311\Lib\runpy.py", line 88, in _run_code
    exec(code, run_globals)
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy\__main__.py", line 39, in <module>
    cli.main()
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 430, in main
    run()
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\adapter/../..\debugpy\launcher/../..\debugpy/..\debugpy\server\cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,      
  File "c:\Users\pc\.vscode\extensions\ms-python.python-2023.8.0\pythonFiles\lib\python\debugpy\_vendored\pydevd\_pydevd_bundle\pydevd_runpy.py", line 124, in _run_code
    page.apply_redact_annotations()
    page.apply_redact_annotations()
  File "C:\Python311\Lib\site-packages\borb\pdf\page\page.py", line 152, in apply_redact_annotations
    .read(io.BytesIO(self["Contents"]["DecodedBytes"]), [])
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\canvas_stream_processor.py", line 300, in read
    raise e
  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\canvas_stream_processor.py", line 294, in read
    operator.invoke(self, operands, event_listeners)  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\redacted_canvas_stream_processor.py", line 302, in invoke
    self._write_chunk_of_text(
  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\redacted_canvas_stream_processor.py", line 230, in _write_chunk_of_text
    )._write_text_bytes()
      ^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\layout\text\chunk_of_text.py", line 200, in _write_text_bytes
    return self._write_text_bytes_in_hex()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Python311\Lib\site-packages\borb\pdf\canvas\layout\text\chunk_of_text.py", line 242, in _write_text_bytes_in_hex
    assert cid is not None, "Font %s can not represent '%s'" % (
           ^^^^^^^^^^^^^^^
AssertionError: Font SDGWMI+OpenSans-Semibold can not represent

The code is:


from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleFindReplace
import typing

def main():
    doc: typing.Optional[Document] = None
    with open("archive.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle)

    assert doc is not None

    doc = SimpleFindReplace.sub("Jots", "Joris", doc)

    with open("output2.pdf", "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

if __name__ == "__main__":
    main()

Form displaying incorectly

Form displaying incorrectly:

When creating forms, entered values will not show correctly after PDF is saved and reopened. When you enter the textfield in edit-mode, the value previously entered will show. Looks like the last entered value is displayed in all the textfields. The same effect can be created with the following code:
layout.add(
FixedColumnWidthTable(number_of_columns=2, number_of_rows=4)
.add(Paragraph("First Name:"))
.add(TextField(field_name="firstname"))
.add(Paragraph("Last Name:"))
.add(TextField(field_name="lastname"))
.add(Paragraph("Country"))
.add(TextField(field_name="country",value="NaN"))
.set_padding_on_all_cells(Decimal(2), Decimal(2), Decimal(2), Decimal(2))
.no_borders()
)

When opening this pdf, all fields will show "NaN".

Python 3.10.3
borb-2.0.21
Platform Windows 10
Pdf tested in chrome and Acrobat Reade
output_form.pdf
r

How define custom page margins?

How define custom page margins for all page in document?
I wish used non standard margins (5x5x5x5) page A4.

CSS doesn't apply when when create pdf from html.

I created some PDFs from HTML but styling did not apply. I tried in <style> tag and in inline style as well. both did work.

KeyError in borb when Replacing Text in an Existing PDF

Hi Joris

Thank you for borb. I tried the example Replacing Text in an Existing PDF today with borb 2.1.10 installed via pip. It terminates with this error message:

File "borb/toolkit/text/simple_find_replace.py", line 77, in sub
  page.apply_redact_annotations()
File "borb/pdf/page/page.py", line 145, in apply_redact_annotations
  for x in self["Annots"]
KeyError: 'Annots'

I tried adding a guard if "Annots" in self, but then the resulting PDF is rendered without all text.

Unfortunately, I cannot share the specific PDF with you since it contains sensitive information.

Kind Regards
Hermann

Turkish characters are not supported.

Arabic fonts cannot represent several caracter when using bidi algrothim and resharper

I am trying to print some Arabic text, and since we need to support bidirectional text, we are using bidi algorithm and Arabic Resharper. We are getting an exception to say
assert cid is not None, "Font %s can not represent '%s'" % (
AssertionError: Font HSNIbtisam can not represent 'ﺕ'

Knowing that the font support all Arabic characters.

def format(self, delivery_voucher: DeliveryVoucher):
        document = Document()
        page = Page()
        document.add_page(page)

        layout: PageLayout = SingleColumnLayout(page)
        font_path = Path(__file__).parent / 'misc/fonts/Ibtisam.ttf'
        custom_font = TrueTypeFont.true_type_font_from_file(font_path)
        print(font_path)

        arabic_text= "ابتثجحخدذرزسشصضطظعغفقكلمنهوي"
        reshaped_text = arabic_reshaper.reshape(arabic_text)
        bidirectional_text = algorithm.get_display(reshaped_text)

        # add a Paragraph
        layout.add(borb.pdf.Paragraph(bidirectional_text, font=custom_font))

        # store
        with open("output.pdf", "wb") as pdf_file_handle:
            PDF.dumps(pdf_file_handle, document)

Long tables

Hi, what approach would you recommend to split long tables into multiple pages? I didn't find any built-in solution, but there must be a way, right?

Support for non-english language

AssertionError: Font Helvetica can not represent '৳'

I tried my custom font for the language "Bangla". But it doesn't work.

Code Provided for Example 5.7's pdf is not working with borb 2.1.17

I am currently following through tutorials and I have discovered that the example 5.7s example to extract all colors on a given pdf file is not working on my machine.

Below is the full Traceback output for my attempts to run the code

reading page
Traceback (most recent call last):
  File "c:\Users\Lenovo\OneDrive\Documents\LI ZHUOXI\ITE- College West\Lessons\Industrial Attachment Program\IAP Higher Nitec AI Applications\HumanKind Design Pte Ltd\Image and color Extraction.py", line 80, in <module>
    t.add(
  File "C:\Users\Lenovo\anaconda3\envs\HumanKind\Lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 394, in add
    first_incomplete_row: int = min(
                                ^^^^
ValueError: min() arg is an empty sequence

The code I am running:

#!chapter_005/src/snippet_013.py
from borb.io.read.types import Name, String, Dictionary
from borb.pdf import SingleColumnLayout
from borb.pdf import PageLayout
from borb.pdf import Paragraph
from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import PDF
from borb.pdf import HexColor
from borb.pdf import Image
from borb.pdf import HexColor, RGBColor, Color
from borb.pdf.canvas.geometry.rectangle import Rectangle
from borb.pdf import ConnectedShape
from borb.pdf import Alignment
from borb.pdf import FlexibleColumnWidthTable
from borb.pdf import LineArtFactory
from borb.toolkit import ColorExtraction
import typing

from decimal import Decimal

# # create Document
# doc: Document = Document()

# # create Page
# page: Page = Page()

# # add Page to Document
# doc.add_page(page)

# # set a PageLayout
# layout: PageLayout = SingleColumnLayout(page)

# # the following code adds 3 paragraphs, each in a different color
# layout.add(Paragraph("Hello World!", font_color=HexColor("FF0000")))
# layout.add(Paragraph("Hello World!", font_color=HexColor("00FF00")))
# layout.add(Paragraph("Hello World!", font_color=HexColor("0000FF")))

# # the following code adds 1 image
# layout.add(
#     Image(
#         "https://images.unsplash.com/photo-1589606663923-283bbd309229?ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8",
#         width=Decimal(256),
#         height=Decimal(256),
#     )
# )

# # store
# with open("ImageOutput.pdf", "wb") as out_file_handle:
#     PDF.dumps(out_file_handle, doc)

##############################################################################################

doc: typing.Optional[Document] = None
l: ColorExtraction = ColorExtraction()
with open("ImageOutput.pdf", "rb") as pdf_file_handle:
    doc = PDF.loads(pdf_file_handle, [l])
l.get_
# extract colors
colors: typing.Dict[Color, Decimal] = l.get_color()[0]

# create output Document
doc_out: Document = Document()

# add Page
p: Page = Page()
doc_out.add_page(p)

# add PageLayout
l: PageLayout = SingleColumnLayout(p)

# add Paragraph
l.add(Paragraph("These are the colors used in the input PDF:"))

# add Table
t: FlexibleColumnWidthTable = FlexibleColumnWidthTable(
    number_of_rows=3, number_of_columns=3, horizontal_alignment=Alignment.CENTERED
)
for c in colors.keys():
    t.add(
        ConnectedShape(
            LineArtFactory.droplet(
                Rectangle(Decimal(0), Decimal(0), Decimal(32), Decimal(32))
            ),
            stroke_color=c,
            fill_color=c,
        )
    )
t.set_padding_on_all_cells(Decimal(5), Decimal(5), Decimal(5), Decimal(5))
t.no_borders()
l.add(t)

# store
with open("outputImage_Colors.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc_out)

Cant read an specif pdf file

I have some trouble.
I'm getting the following message Unexpected character at end of dictionary. when trying to read this specifc pdf file.

Form fields created by Borb is not visible in pdf xchange editor

Hi, I tried creating a sample pdf with form element using the sample code.
the element created was some text fields.
when I opened the pdf in pdf xchange editor , the form fields were not visible, only a rectangular box was visible which are not editable.
when I opened the file in Acrobat, the form fields are visible and when I entered some text there and saved, the form fields were visible back when I opened in pdf xchange editor.

may I know what is issue here? I dodnot face the same when form was created using other modules like pdfrw etc..

Extract highlighted text from PDF document?

Does this library support extracting highlighted text from a PDF document?

KeyError: 'XRef' when creating editable form

While trying to recreate a form we use in office (currently filled out by hand), I came across this XRef error. The form consists of a logo/title sharing the top of the page, followed by the inputs shown below. Unfortunately I cannot share the other elements (logo, title, subtitle) without redacting them to oblivion.
The script executes perfectly when commenting out this specific table.

Edit: Let me also mention that I'd prefer to match the original layout (with mostly two label/field combinations per line). Having nine rows was a compromise when I thought that might be the issue.

The offending FlexibleWidthColumn:

layout.add(
    FlexibleColumnWidthTable(number_of_columns=2, number_of_rows=9)
    .add(Paragraph("User Name: "))
    .add(TextField(field_name="username"))
    .add(Paragraph("ID: "))
    .add(TextField(field_name="eid"))
    .add(Paragraph("Computer Name: "))
    .add(TextField(field_name="newpcname"))
    .add(Paragraph("Replacing Computer: "))
    .add(TextField(field_name="oldpcname"))
    .add(Paragraph("S/N: "))
    .add(TextField(field_name="oldserial"))
    .add(Paragraph("Keep in Service"))
    .add(TextField(field_name="service"))
    .add(Paragraph("Location"))
    .add(TextField(field_name="location"))
    .add(Paragraph("Model: "))
    .add(TextField(field_name="model"))
    .add(Paragraph("S/N: "))
    .add(TextField(field_name="serial"))
    .set_padding_on_all_cells(Decimal(2), Decimal(2), Decimal(2), Decimal(2))
    .no_borders()
)

Stack trace:

Traceback (most recent call last):
  File "c:\Users\user\Documents\python\PDFGen\main.py", line 122, in <module>
    main()
  File "c:\Users\user\Documents\python\PDFGen\main.py", line 92, in main 
    layout.add(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\page_layout\multi_column_layout.py", line 194, in add
    layout_rect = layout_element.layout(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 290, in layout     
    return self.calculate_layout_box_and_do_layout(page, bounding_box)    
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 303, in calculate_layout_box_and_do_layout
    layout_box = self._calculate_layout_box(page, bounding_box)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
    returned_layout_box = self._calculate_layout_box_without_padding(     
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 241, in _calculate_layout_box_without_padding
    layout_rect = self._do_layout_without_padding(page, bounding_box)     
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\flexible_column_width_table.py", line 
87, in _do_layout_without_padding    
    t.calculate_min_and_max_width()  
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 86, in calculate_min_and_max_width
    max_bounding_box: Rectangle = self._calculate_layout_box(
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
    returned_layout_box = self._calculate_layout_box_without_padding(     
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\table\table.py", line 123, in _calculate_layout_box_without_padding
    return self._layout_element._calculate_layout_box(page, bounding_box) 
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 213, in _calculate_layout_box
    returned_layout_box = self._calculate_layout_box_without_padding(     
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\layout_element.py", line 241, in _calculate_layout_box_without_padding
    layout_rect = self._do_layout_without_padding(page, bounding_box)     
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\forms\text_field.py", line 161, in _do_layout_without_padding
    self._init_widget_dictionary(page, layout_rect)
  File "C:\Users\user\AppData\Local\Programs\Python\Python310\lib\site-packages\borb\pdf\canvas\layout\forms\text_field.py", line 98, in _init_widget_dictionary
    catalog: Dictionary = page.get_root()["XRef"]["Trailer"]["Root"]  # type: ignore [attr-defined]
KeyError: 'XRef'

How to manipulate pdf outline

snippet_013.py error?

chapter_005/src/snippet_013.py fails with the following error:
from borb.toolkit.color.color_spectrum_extraction import ColorSpectrumExtraction ModuleNotFoundError: No module named 'borb.toolkit.color.color_spectrum_extraction'

borb.toolkit.color doesn't appear to contain a color_spectrum_extraction function.

Should these two references to ColorSpectrumExtraction in snippet_013.py be changed to ColorExtraction? Or is something else amiss?

from borb.toolkit.color.color_spectrum_extraction import ColorSpectrumExtraction
... some code removed ...
l: ColorSpectrumExtraction = ColorSpectrumExtraction()

No support to chinese

I found borb doesn't support Chinese character when add Chinese text to PDF page, any idea to solve this?

Footer

Hello! great library. thanks for your work. Through the translator I studied the documentation, but did not find it. how to add header and footer to pages?

How to select column manually in MultiColumnLayout page?

I have created a MultiColumnLayout pdf page and adding table and graph in the page.

Is there any way I can add the graph in second column (manually) of page whether the first column is full or not?

Exception of openig certain files

Hi, I'm trying to open a set of PDFs.
On this set exists a sub-set that makes the library crash.

I reported the stack trace of the exception.

Traceback (most recent call last):
  File "/home/scampese/Repository/AuraTests/borb_test.py", line 36, in <module>
    main()
  File "/home/scampese/Repository/AuraTests/borb_test.py", line 26, in main
    doc = PDF.loads(in_file_handle, [l1])
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/pdf/pdf.py", line 54, in loads
    return ReadAnyObjectTransformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/xref_transformer.py", line 140, in transform
    trailer = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
    transformed_referenced_object = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/page/root_dictionary_transformer.py", line 84, in transform
    transformed_root_dictionary = t.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
    transformed_referenced_object = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/array_transformer.py", line 46, in transform
    object_to_transform[i] = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
    transformed_referenced_object = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/page/page_dictionary_transformer.py", line 62, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/dictionary_transformer.py", line 46, in transform
    v = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/reference/reference_transformer.py", line 103, in transform
    transformed_referenced_object = self.get_root_transformer().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/any_object_transformer.py", line 100, in transform
    return super().transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/transformer.py", line 123, in transform
    out = h.transform(
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/read/object/stream_transformer.py", line 53, in transform
    object_to_transform = decode_stream(object_to_transform)
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/filter/stream_decode_util.py", line 74, in decode_stream
    transformed_bytes = RunLengthDecode.decode(transformed_bytes)
  File "/home/scampese/anaconda3/envs/python_3_8/lib/python3.8/site-packages/borb/io/filter/run_length_decode.py", line 34, in decode
    n = bytes_in[i + 1]
IndexError: index out of range

Process finished with exit code 1

N.B.: with other libraries like pdfquery or textextract I'm able to open the same file without problems

Do you have a suggestion on how to deal with it?

My env:

python 3.8.12

Thanks!

TextField Object filling the same answer in all textfields

I have been trying to use the code available in the chapter 4 about form fields. So, in the output PDF when I try to fill any TextField, the following Fields contain the same information. I would like to have them independent but that doesn't seem to work no matter how I try.

I am using borb version 2.0.27.

Does any know if the issue actually comes from the TextField Object or is it from something else ?

the use of col_span in example snippets.

The documentation and examples have col_span but the correct key word is column_span

Trigger javascript on change of TextField value

Dear Joris,
Many thanks for creating such a great library for building PDF documents from Python! I really enjoy playing with it!
In one of the chapters, you provide an example in which a javascript function is called when pushing a button.
Would it be also possible to trigger a piece of script when exiting the edit mode of a text field / when the value of a field is changed?
Many thanks in advance for your answer!
Best,
Hendrik

Missing Documentation

A good document should be written to understand the package.

Extract unicode text

Hi, I used borb to extract text from a PDF using SimpleTextExtraction in the example code below to learn how the tools works.
I did get the text out, but there seems to be unicode errors in the text, for example

DossiÃª da Unidade Curricular

instead of:

Dossiê da Unidade Curricular

Is there a way to add a codec somewhere?

#!chapter_005/src/snippet_005.py
import typing
from borb.pdf.document.document import Document
from borb.pdf.pdf import PDF
from borb.toolkit.text.simple_text_extraction import SimpleTextExtraction

def main():

    # read the Document
    doc: typing.Optional[Document] = None
    l: SimpleTextExtraction = SimpleTextExtraction()
    with open("output.pdf", "rb") as in_file_handle:
        doc = PDF.loads(in_file_handle, [l])

    # check whether we have read a Document
    assert doc is not None

    # print the text on the first Page
    print(l.get_text_for_page(0))


if __name__ == "__main__":
    main()

Export PDF to JPG(PIL) with different DPI

Hi, Joris!

Can I change the image quality when exporting to JPG? Ex. DPI=300 or more

Thank you!

Issue while running snippets 002 from chapter 007

Hi I'm running python version 3.8 on top of ubuntu 18.04.
and ratting issue with snippets #!chapter_007/src/snippet_002.py
getting error at

    with open("data/ama_logistic_236523.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle, [l])

Error: AssertionError: A Rectangle must have a non-negative width.

when i try another PDF, first error passed and getting another error which mentioned below.

# get page
    p: Page = doc.get_page(0)

Error: TypeError: element indices must be integers

Error: Paragraph object has no attribute layout

I am getting an error when running snippets

chapter_002/src/snippet_018.py
...
chapter_002/src/snippet_027.py

: 'Paragraph' object has no attribute 'layout'

For example:
chapter_002/src/snippet_022.py

crashed with error:

Traceback (most recent call last):
  File "/home/.../cgi-bin/test_borb_sn022.py", line 47, in <module>
    main()
  File "/home/.../cgi-bin/test_borb_sn022.py", line 39, in main
    ).layout(page, r)
AttributeError: 'Paragraph' object has no attribute 'layout'

Distributor ID: Ubuntu
Description: Linux Lite 6.0
Release: 22.04
Codename: jammy

Python 3.10.4

borb-2.1.0
Pillow-9.2.0
certifi-2022.6.15.1
charset-normalizer-2.1.1
fonttools-4.37.1
idna-3.3
python-barcode-0.14.0
qrcode-7.3.1
requests-2.28.1
urllib3-1.26.12

Why I use SimpleTextExtraction extracting text but pages not right

Dear all,

As I followed the instructions to extract text using the code below , but I found the order of variable(pages) is totally right comparing with my original pdf file , and I attached the code and pic below .

import typing
from borb.pdf import Document
from borb.pdf import PDF
from borb.toolkit import SimpleTextExtraction
import os 
import numpy as np
import argparse
doc: typing.Optional[Document] = None
l: SimpleTextExtraction = SimpleTextExtraction()
with open('1.pdf', "rb") as in_file_handle:
    doc = PDF.loads(in_file_handle, [l])

assert doc is not None

pages = l.get_text()
pages

The pic below as u can see , pages[1] is the page of 4 for my pdf file , so it not matched . Could u plz tell me how to solve it , since in my pdf file there is no pages info writen in the pdf file.

cannot import name 'Document' from 'borb.pdf'

Hi, while trying to import the following, I got an error

from pathlib import Path

from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import SingleColumnLayout
from borb.pdf import Paragraph
from borb.pdf import PDF

ImportError: cannot import name 'Document' from 'borb.pdf' (/home/niko/anaconda3/envs/py39/lib/python3.9/site-packages/borb/pdf/init.py)

version: '2.1.5.2'

Excessive Compartmentalization

Hi, Joris

I am an Indian programmer working on some AI functionalities, such as OCR on scanned PDFs. I have just discovered borb, and I must say, I am deeply impressed by both the power and the level of detail in the documentation of this library. I would love to see this project grow and become a very well-integrated part of the Python family, recognized and respected by the developers who use Python. I think Borb holds the key to making PDF more easy to work with for all programmers around the world, and heaven knows there are a ton of scanned PDFs lying around which need work.

But there is something that I think is going to hold Borb back a lot. And that is the monstrous level of undue complexity in the package tree of the framework. I mean, just look at this:

from borb.pdf.document.document import Document

Now you have to understand, for anyone who is familiar with the way the Python developers build their frameworks, this is about the ugliest single line of text in human history. And it will remain so embedded in their minds as the ugliest piece of writing they have seen until they import the PageLayout.

For instance, the borb.pdf.document package contains just two things - a name tree and the borb.pdf.document.document sub-package. This latter subpackage contains only one thing, and that is the Document class. So here is what I would like to know - was there really a need for the subpackage to wrap the class?

Instead of this thing -

flowchart TD
    A[borb.pdf.document] --> B[borb.pdf.document.name_tree];
    A --> C[borb.pdf.document.document];
    C --> D[borb.pdf.document.document.Document];
    B --> E[borb.pdf.document.name_tree.NameTree];
    style D fill:#afe, stroke:#2b7;
    style E fill:#afe, stroke:#2b7;

Why can we not simply have:

flowchart TD
    A[borb.pdf.document] --> B[borb.pdf.document.NameTree];
    A --> C[borb.pdf.document.Document];
    style C fill:#afe, stroke:#2b7;
    style B fill:#afe, stroke:#2b7;

That would have made the importing of the modules so much simpler and easier, not to mention made so much more sense. The borb.pdf.document.document is a wrapper that only serves to wrap a single class. Can we not remove it altogether and replace it by the class it wraps? Does it hold the place for some other classes which will be entered in future? Was there a plan, since abandoned, to put more classes in there?

The Central Problem

As it stands, the borb package tree is immense, with most of the nodes having only one descendant. it is a cardinal rule in making a good tree that all non-leaf nodes should have at least more than one descendant. The borb import statements are tiresome to remember, compared to other import statements.

Consider how the skimage.io and skimage.draw modules work. There is no further io subpackage inside the skimage.io - it directly yields us its classes and methods. We need something like that inside the borb module, a simple and easy-to-recall package tree which we can all love and use by instinct, like all the beloved packages in Python are used.

How to change attributes of checkbox.

Hi, @jorisschellekens Thank you for an amazing implementation. I am using checkboxes in my pdf but I expect different shapes of checkboxes and also different sizes. As its implementation is in to-do so I am unaware of attributes accepted by the checkbox class. How can I achieve this?

Markdown to PDF for long document

Hello

Thank you for this great library. I am using the example from the Markdown section it the documentation.

I am trying to convert a markdown article to PDF but am getting a:

AssertionError: BlockFlow is too tall to fit inside column / page.

I assume this is because the layout element is too large on the page. What would you suggest for long Markdown documents? My thoughts were adding each new paragraph as a new layout element , but I am unsure how to do this.
Many thanks