Giter Site home page Giter Site logo

tutorcruncher / pydf Goto Github PK

View Code? Open in Web Editor NEW
71.0 71.0 17.0 98.44 MB

PDF generation in python using wkhtmltopdf for heroku and docker

License: MIT License

Python 68.19% Makefile 4.07% HTML 26.74% Dockerfile 1.00%
asyncio docker heroku html-to-pdf pdf pdf-generation python wkhtmltopdf

pydf's People

Contributors

damienalexandre avatar dependabot[bot] avatar samuelcolvin avatar tomhamiltonstubber avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pydf's Issues

pages

Hi! I love your work. I am trying to make page numbers alternately appear at the bottom left for odd pages and bottom right for even pages. Is there a way to do that?
What I have so far is this...

wkhtmltopdf_path = 'C:/Program Files/wkhtmltopdf/bin/wkhtmltopdf.exe'
config = pdfkit.configuration(wkhtmltopdf=wkhtmltopdf_path)
options = {
    'footer-center': '~ [page] of [topage] ~',

Can't use all command arguments via the API

We have the ability to set whtmltopdf options via HTTP Header,
but that only works for argument with a value I'm afraid.

HTTP headers are read here:

for k, v in request.headers.items():
if k.startswith('Pdf-') or k.startswith('Pdf_'):
config[k[4:].lower()] = v.lower()

Then in:

pydf/pydf/wkhtmltopdf.py

Lines 33 to 42 in 8f9ce76

def _convert_args(**py_args):
cmd_args = []
for name, value in py_args.items():
if value in {None, False}:
continue
arg_name = '--' + name.replace('_', '-')
if value is True:
cmd_args.append(arg_name)
else:
cmd_args.extend([arg_name, str(value)])

The only way to get the argument on the command line is to have a real True.

So via the API we can't get options like --disable-javascript or even --lowquality.

Trouble getting Windows version going....

Hi there

Could be me being new to Python, but I do think I've go plenty of decades of programming tomake up for that bit of ignorance

I was attracted to PYDF because it can run async.

I'm already up and saving PDF files using the PDFKIT module.

I was sorta surprised that I needed to modify the source code to get pydf to run on Windows.

I needed to add an Environment variable pointing to the WkhltmlPDF.exe file. OK, not such a bad thing, but would have been nice to see in the installation for windows notes. I also changed the wkhtmltopdf.py file because the executable did not was the required .EXE on the end of the filename.

OK, got past that initial install kinda problems but I'm still crashing.

So, taking a moment to log an Issue to see perhaps I've bitten off more than I can chew. I was looking for something off the shelf that I didn't need to modify the package source for.

Maybe I didn't read the docs carefully enough??

Sorry to be a moron. It's my 3rd week on Python and Github, etc. Still, I've come a very long ways!!

==========================================

Curious what IDE folks use on these projects. I've been using PyCharm and I LOVE it! Wow, what a treat to have a full decbugger with breakpoints and a great variable browser. I tried Thromber, but it didn't work quite as well.

==========================================

Thank you to the authors of this package just the same! And I appreciate anyone that takes the time to answer stupid questions.

Image Isn't rendering

When image link is presented in html, renderer just ignoring it.
HTML:

<html><style>
         @page {
               size: 9.25in 6.25in;
               margin: 0;
           }
 
        @font-face {
          font-family: "Arial";
          src: url("https://f001.backblazeb2.com/file/inkit-cdn/arial.ttf")
            format("truetype");
        }
        
        @font-face {
          font-family: "Times New Roman";
          src: url("https://f001.backblazeb2.com/file/inkit-cdn/times-new-roman.ttf")
            format("truetype");
        }
        
        @font-face {
          font-family: "Courier New";
          src: url("https://f001.backblazeb2.com/file/inkit-cdn/courier-new.ttf")
            format("truetype");
        }
    </style><body style="user-select:none;margin:0"><div style="width:9.25in;height:6.25in;position:relative"><div><div style="position:absolute;left:0in;top:0in;z-index:5002;right:0;bottom:0;padding:0.1875in;background:url(file:///home/user/Projects/services/pdf-renderer/service/eef55f7f-7bed-44c1-ab2d-0d11c61d4fd7.png) no-repeat center center / cover"><div style="position:relative"></div></div></div><div style="position:absolute;left:0.1875in;top:0.1875in;right:0.1875in;bottom:0.1875in"></div></div></body></html>

Also tried base64 string.

Heroku-18 stack failure

Now incompatible with Heroku-18.

RuntimeError: error running wkhtmltopdf, command: ['--cache-dir', '/tmp/pydf_cache', '--margin-bottom', '5mm', '--margin-top', '5mm', '--orientation', 'Portrait', '--page-size', 'Legal', '-', '-']
response: "/app/.heroku/python/lib/python3.7/site-packages/pydf/bin/wkhtmltopdf: error while loading shared libraries: libXrender.so.1: cannot open shared object file: No such file or directory"

@samuelcolvin are you still maintaining this? thanks....

error in windows

hi.
i have a script working alright in linux (openuse).
tried to use it in a windows machine and doesnt.

even tried the simple example:
pdf = pydf.generate_pdf('<h1>this is html</h1>')

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\ProgramData\Miniconda3\lib\site-packages\pydf\wkhtmltopdf.py", line 145, in generate_pdf
    p = _execute_wk(*cmd_args, input=html.encode())
  File "C:\ProgramData\Miniconda3\lib\site-packages\pydf\wkhtmltopdf.py", line 30, in _execute_wk
return subprocess.run(wk_args, input=input, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "C:\ProgramData\Miniconda3\lib\subprocess.py", line 403, in run
with Popen(*popenargs, **kwargs) as process:
  File "C:\ProgramData\Miniconda3\lib\subprocess.py", line 707, in __init__
restore_signals, start_new_session)
  File "C:\ProgramData\Miniconda3\lib\subprocess.py", line 990, in _execute_child
startupinfo)
OSError: [winerror 193]: %1 not a valid win32 program

Note: i traslated last line "OsError..". my machine threw that in spanish.

any trick to avoid this Error?
thanks.

How to render external files?

How to render external files: css, images, fonts??

ar_template = """
<html>
<head>
	<title>arabic</title>
	<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
	<style type="text/css">
		@font-face {{
  			font-family: "Font";
  			src: url("font.ttf") format("truetype");
		}}
		#ar{{
		    font-family:Font;
			font-size: 36px;
			margin: 30px;
			border: 1px solid black;
		}}
	</style>
</head>
<body>
	<p id="ar" dir="rtl" lang="ar">{content}</p>
</body>
</html>
"""

I wrote external font link. it didnot work

Consider keeping header case as given?

I'm using pydf with the wkhtmltopdf "footer-text" option,
but as every parameters are lower cased in:

config[k[4:].lower()] = v.lower()

My text content is not properly passed througt wkhtmltopdf.

I don't know if there is a reason for this lower() call on all argument values? It's ok for the key I guess.

Reproducer

curl -d '<h1>this is html</h1>' -H "pdf-footer-right: UPPERCASE [page]/[topage]" http://localhost:8000/generate.pdf > created.pdf

Result

uppercase 1/1

Expected

UPPERCASE 1/1

Error while loading shared libraries: libjpeg.so.8

I couldn't use pydf inside docker.
I am using pypoetry. And I installed packages like this.

FROM python:3.8
RUN mkdir /src
WORKDIR /src
COPY . /src
RUN curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
ENV PATH="/root/.poetry/bin:$PATH"
COPY pyproject.toml poetry.lock /opt/project/
RUN poetry config virtualenvs.create false && \
    poetry install --no-dev

COPY . /opt/project/
RUN poetry install  --no-dev

It is my code where I used pydf

from pydf import AsyncPydf
from io import BytesIO

apydf = AsyncPydf()

async def pdf_file():
    pdf_content = await apydf.generate_pdf("<h1>hello world</h1>")
    bytes_ = BytesIO(pdf_content)
    bytes_name = "file.pdf"
    return bytes_

It raises error

File "/src/ptime/core/tools.py", line 189, in pdf_file
pdf_content = await apydf.generate_pdf("<h1>hello world</h1>")
File "/usr/local/lib/python3.8/site-packages/pydf/wkhtmltopdf.py", line 73, in generate_pdf
raise RuntimeError('error running wkhtmltopdf, command: {!r}\n'
RuntimeError: error running wkhtmltopdf, command: ['/usr/local/lib/python3.8/site-packages/pydf/bin/wkhtmltopdf', '--cache-dir', '/tmp/pydf_cache', '-', '-']
response: "/usr/local/lib/python3.8/site-packages/pydf/bin/wkhtmltopdf: error while loading shared libraries: libjpeg.so.8: cannot open shared object file: No such file or directory"

How can I fix this?

Not working with uvloop

#! /usr/bin/python
import asyncio

import aiohttp
import uvloop
from pydf import AsyncPydf

html = "<html>hello</html>"

uvloop.install() # toggle this line on/off


async def generate_async():
    apydf = AsyncPydf()
    await apydf.generate_pdf(html)


asyncio.run(generate_async())

Error:

untimeError: error running wkhtmltopdf, command: ['/home/sevaho/.local/share/virtualenvs/test-hTe5GLvs/lib/python3.8/sit
e-packages/pydf/bin/wkhtmltopdf', '--cache-dir', '/tmp/pydf_cache', '-', '-']
response: "Loading pages (1/6)
QPainter::begin(): Returned false============================] 100%
Error: Unable to write to destination
Exit with code 1, due to unknown error."

Why it doesn't load images in PDF?

In the rendered PDF images are not loaded.

here is main.py

import pydf


html_str = """
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width,initial-scale=1" />
    <title>بن اعتباری</title>
  </head>
  <body>
      <img src="./pizza.jpg" />
  </body>
</html>
"""



pdf = pydf.generate_pdf(html_str)
with open("test_doc.pdf", "wb") as f:
    f.write(pdf)

and image exists in that directory and everything is fine in html file

Exception while generating pdf

Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/home/bogdan/Projects/inkit_microservices/services/pdf-renderer/service/consumer.py", line 19, in run
    PDFRenderer(self.campaign_id)
  File "/home/bogdan/Projects/inkit_microservices/services/pdf-renderer/service/renderer.py", line 58, in __init__
    self.ioloop.run_until_complete(self.process_contacts())
  File "/usr/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
    return future.result()
  File "/home/bogdan/Projects/inkit_microservices/services/pdf-renderer/service/renderer.py", line 67, in process_contacts
    await asyncio.gather(*tasks)
  File "/home/bogdan/Projects/inkit_microservices/services/pdf-renderer/service/renderer.py", line 100, in render_pdf
    front_pdf = await self.apydf.generate_pdf(front_html)
  File "/home/bogdan/Projects/inkit_microservices/services/pdf-renderer/.env/lib/python3.6/site-packages/pydf/wkhtmltopdf.py", line 65, in generate_pdf
    loop=self.loop
  File "/usr/lib/python3.6/asyncio/subprocess.py", line 225, in create_subprocess_exec
    stderr=stderr, **kwds)
  File "/usr/lib/python3.6/asyncio/base_events.py", line 1194, in subprocess_exec
    bufsize, **kwargs)
  File "/usr/lib/python3.6/asyncio/unix_events.py", line 203, in _make_subprocess_transport
    self._child_watcher_callback, transp)
  File "/usr/lib/python3.6/asyncio/unix_events.py", line 867, in add_child_handler
    "Cannot add child handler, "
RuntimeError: Cannot add child handler, the child watcher does not have a loop attached

OSError: [Errno 8] Exec format error

Not sure if this library is still supported, but FWIW:

Python 3.6.2 (default, Jul 17 2017, 16:44:45) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pydf

In [2]: p = pydf.generate_pdf('http://www.google.com')
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-2-911a3ffb27a6> in <module>()
----> 1 p = pydf.generate_pdf('http://www.google.com')

~/.virtualenvs/barberscore-api/lib/python3.6/site-packages/pydf/wkhtmltopdf.py in generate_pdf(html, cache_dir, grayscale, lowquality, margin_bottom, margin_left, margin_right, margin_top, orientation, page_height, page_width, page_size, image_dpi, image_quality, **extra_kwargs)
    143     cmd_args = _convert_args(**py_args)
    144 
--> 145     p = _execute_wk(*cmd_args, input=html.encode())
    146     pdf_content = p.stdout
    147 

~/.virtualenvs/barberscore-api/lib/python3.6/site-packages/pydf/wkhtmltopdf.py in _execute_wk(input, *args)
     28     """
     29     wk_args = (WK_PATH,) + args
---> 30     return subprocess.run(wk_args, input=input, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
     31 
     32 

/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
    401         kwargs['stdin'] = PIPE
    402 
--> 403     with Popen(*popenargs, **kwargs) as process:
    404         try:
    405             stdout, stderr = process.communicate(input, timeout=timeout)

/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors)
    705                                 c2pread, c2pwrite,
    706                                 errread, errwrite,
--> 707                                 restore_signals, start_new_session)
    708         except:
    709             # Cleanup if the child failed starting.

/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
   1331                             else:
   1332                                 err_msg += ': ' + repr(orig_executable)
-> 1333                     raise child_exception_type(errno_num, err_msg)
   1334                 raise child_exception_type(err_msg)
   1335 

OSError: [Errno 8] Exec format error

OSError - %1 is not a valid Win32 application

I am running the basic example as follow:

import pydf
pdf = pydf.generate_pdf('<h1>this is html</h1>')
with open('test_doc.pdf', 'wb') as f:
    f.write(pdf)

and I get the following error:

Traceback (most recent call last):
  File "pdytest.py", line 2, in <module>
    pdf = pydf.generate_pdf('<h1>this is html</h1>')
  File "C:\Program Files\Python37\lib\site-packages\pydf\wkhtmltopdf.py", line 145, in generate_pdf
    p = _execute_wk(*cmd_args, input=html.encode())
  File "C:\Program Files\Python37\lib\site-packages\pydf\wkhtmltopdf.py", line 30, in _execute_wk
    return subprocess.run(wk_args, input=input, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "C:\Program Files\Python37\lib\subprocess.py", line 472, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Program Files\Python37\lib\subprocess.py", line 775, in __init__
    restore_signals, start_new_session)
  File "C:\Program Files\Python37\lib\subprocess.py", line 1178, in _execute_child
    startupinfo)
OSError: [WinError 193] %1 is not a valid Win32 application

I am using python 3.7.
Could you please help?
Thank you.

Unable to write to destination on Linux — tmp dir should be a file

When running the following:

async def show(request, response):
    response.body = await pydf.AsyncPydf().generate_pdf(
        my_html, print_media_type=True
    )

it works well on MacOSX, but getting the following error on Linux:

Loading pages (1/6)
QPainter::begin(): Returned false============================] 100%
Error: Unable to write to destination
Exit with code 1, due to unknown error.

Note that this is probably due to the following difference of behavior:

wkhtmltopdf http://google.com /tmp/pydf_cache

works well on MacOSX and returns the arror above on Linux.

To solve this with wkhtmltopdf:

wkhtmltopdf http://google.com /tmp/pydf_cache/myfile

I suppose that pydf should create a temporary file instead of writing to the tmp folder for cross platform.

Is there any option I missed?

how to build wkhtmltopdf from source

Dear Samuel Colvin,

I am using your lib to generate pdf file. However, I found that it have a bug when it generate this link
https://www.highcharts.com/blog/news/175-highcharts-performance-boost.

Error response: "wkhtmltopdf: /home/sysadmin/wkhtmltopdf/qt/src/3rdparty/harfbuzz/src/harfbuzz-shaper.cpp:484: void HB_HeuristicSetGlyphAttributes(HB_ShaperItem*): Assertion `glyph_pos == item->num_glyphs' failed."

I think it should update a new patch of library, please kindly refer to this fix ariya/phantomjs#11513. Two files harfbuzz-hebrew.c and harfbuzz-shaper.cpp are needed to be fixed.

Would you give me some tutors of how to re-build the wkhtmltopdf binary for AWS lambda or if you don't mind would you please fix this bug and upload new wkhtmltopdf binary.

Thank you very much and I am looking forward to your reply.

Best Regards,
Andy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.