alvarcarto / url-to-pdf-api Goto Github PK

Web page PDF/PNG rendering done right. Self-hosted service for rendering receipts, invoices, or any content.

License: MIT License

JavaScript 0.55% HTML 99.45% Procfile 0.01%

pdf chrome receipt heroku puppeteer invoice heroku-button html headless headless-chrome

url-to-pdf-api's Introduction

URL to PDF Microservice

Web page PDF rendering done right. Microservice for rendering receipts, invoices, or any content. Packaged to an easy API.

⚠️ WARNING ⚠️ Don't serve this API publicly to the internet unless you are aware of the risks. It allows API users to run any JavaScript code inside a Chrome session on the server. It's fairly easy to expose the contents of files on the server. You have been warned!. See #12 for background.

⭐️ Features:

Converts any URL or HTML content to a PDF file or an image (PNG/JPEG)
Rendered with Headless Chrome, using Puppeteer. The PDFs should match to the ones generated with a desktop Chrome.
Sensible defaults but everything is configurable.
Single-page app (SPA) support. Waits until all network requests are finished before rendering.
Easy deployment to Heroku. We love Lambda but...Deploy to Heroku button.
Renders lazy loaded elements. (scrollPage option)
Supports optional x-api-key authentication. (API_TOKENS env var)

Usage is as simple as https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com. There's also a POST /api/render if you prefer to send options in the body.

🔍 Why?

This microservice is useful when you need to automatically produce PDF files for whatever reason. The files could be receipts, weekly reports, invoices, or any content.

PDFs can be generated in many ways, but one of them is to convert HTML+CSS content to a PDF. This API does just that.

🚀 Shortcuts:

Examples
API
I want to run this myself

How it works

Local setup is identical except Express API is running on your machine and requests are direct connections to it.

Good to know

By default, page's @media print CSS rules are ignored. We set Chrome to emulate @media screen to make the default PDFs look more like actual sites. To get results closer to desktop Chrome, add &emulateScreenMedia=false query parameter. See more at Puppeteer API docs.
Chrome is launched with --no-sandbox --disable-setuid-sandbox flags to fix usage in Heroku. See this issue.
Heavy pages may cause Chrome to crash if the server doesn't have enough RAM.
Docker image for this can be found here: https://github.com/restorecommerce/pdf-rendering-srv

Examples

⚠️ Restrictions ⚠️:

For security reasons the urls have been restricted and HTML rendering is disabled. For full demo, run this app locally or deploy to Heroku.
The demo Heroku app runs on a free dyno which sleep after idle. A request to sleeping dyno may take even 30 seconds.

The most minimal example, render google.com

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com

The most minimal example, render google.com as PNG image

https://url-to-pdf-api.herokuapp.com/api/render?output=screenshot&url=http://google.com

Use the default @media print instead of @media screen.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&emulateScreenMedia=false

Use scrollPage=true which tries to reveal all lazy loaded elements. Not perfect but better than without.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://www.andreaverlicchi.eu/lazyload/demos/lazily_load_lazyLoad.html&scrollPage=true

Render only the first page.

https://url-to-pdf-api.herokuapp.com/api/render?url=https://en.wikipedia.org/wiki/Portable_Document_Format&pdf.pageRanges=1

Render A5-sized PDF in landscape.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&pdf.format=A5&pdf.landscape=true

Add 2cm margins to the PDF.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&pdf.margin.top=2cm&pdf.margin.right=2cm&pdf.margin.bottom=2cm&pdf.margin.left=2cm

Wait for extra 1000ms before render.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&waitFor=1000

Download the PDF with a given attachment name

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&attachmentName=google.pdf

Wait for an element matching the selector input appears.

https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com&waitFor=input

Render HTML sent in JSON body

NOTE: Demo app has disabled html rendering for security reasons.

curl -o html.pdf -XPOST -d'{"html": "<body>test</body>"}' -H"content-type: application/json" http://localhost:9000/api/render

Render HTML sent as text body

NOTE: Demo app has disabled html rendering for security reasons.

curl -o html.pdf -XPOST -d@test/resources/large.html -H"content-type: text/html" http://localhost:9000/api/render

API

To understand the API options, it's useful to know how Puppeteer is internally used by this API. The render code is quite simple, check it out. Render flow:

page.setViewport(options) where options matches viewport.*.
Possibly page.emulateMedia('screen') if emulateScreenMedia=true is set.
Render url or html.

If url is defined, page.goto(url, options) is called and options match goto.*. Otherwise page.setContent(html, options) is called where html is taken from request body, and options match goto.*.
Possibly page.waitFor(numOrStr) if e.g. waitFor=1000 is set.
Possibly Scroll the whole page to the end before rendering if e.g. scrollPage=true is set.

Useful if you want to render a page which lazy loads elements.
Render the output

If output is pdf rendering is done with page.pdf(options) where options matches pdf.*.
Else if output is screenshot rendering is done with page.screenshot(options) where options matches screenshot.*.

GET /api/render

All options are passed as query parameters. Parameter names match Puppeteer options.

These options are exactly the same as its POST counterpart, but options are expressed with the dot notation. E.g. ?pdf.scale=2 instead of { pdf: { scale: 2 }}.

The only required parameter is url.

Parameter	Type	Default	Description
url	string	-	URL to render as PDF. (required)
output	string	pdf	Specify the output format. Possible values: `pdf` , `screenshot` or `html`.
emulateScreenMedia	boolean	`true`	Emulates `@media screen` when rendering the PDF.
enableGPU	boolean	`false`	When set, enables chrome GPU. For windows user, this will always return false. See https://developers.google.com/web/updates/2017/04/headless-chrome
ignoreHttpsErrors	boolean	`false`	Ignores possible HTTPS errors when navigating to a page.
scrollPage	boolean	`false`	Scroll page down before rendering to trigger lazy loading elements.
waitFor	number or string	-	Number in ms to wait before render or selector element to wait before render.
attachmentName	string	-	When set, the `content-disposition` headers are set and browser will download the PDF instead of showing inline. The given string will be used as the name for the file.
viewport.width	number	`1600`	Viewport width.
viewport.height	number	`1200`	Viewport height.
viewport.deviceScaleFactor	number	`1`	Device scale factor (could be thought of as dpr).
viewport.isMobile	boolean	`false`	Whether the meta viewport tag is taken into account.
viewport.hasTouch	boolean	`false`	Specifies if viewport supports touch events.
viewport.isLandscape	boolean	`false`	Specifies if viewport is in landscape mode.
cookies[0][name]	string	-	Cookie name (required)
cookies[0][value]	string	-	Cookie value (required)
cookies[0][url]	string	-	Cookie url
cookies[0][domain]	string	-	Cookie domain
cookies[0][path]	string	-	Cookie path
cookies[0][expires]	number	-	Cookie expiry in unix time
cookies[0][httpOnly]	boolean	-	Cookie httpOnly
cookies[0][secure]	boolean	-	Cookie secure
cookies[0][sameSite]	string	-	`Strict` or `Lax`
goto.timeout	number	`30000`	Maximum navigation time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout.
goto.waitUntil	string	`networkidle0`	When to consider navigation succeeded. Options: `load`, `domcontentloaded`, `networkidle0`, `networkidle2`. `load` - consider navigation to be finished when the load event is fired. `domcontentloaded` - consider navigation to be finished when the `DOMContentLoaded` event is fired. `networkidle0` - consider navigation to be finished when there are no more than 0 network connections for at least `500` ms. `networkidle2` - consider navigation to be finished when there are no more than 2 network connections for at least `500` ms.
pdf.scale	number	`1`	Scale of the webpage rendering.
pdf.printBackground	boolean	`false`	Print background graphics.
pdf.displayHeaderFooter	boolean	`false`	Display header and footer.
pdf.headerTemplate	string	-	HTML template to use as the header of each page in the PDF. Currently Puppeteer basically only supports a single line of text and you must use pdf.margins+CSS to make the header appear! See #77.
pdf.footerTemplate	string	-	HTML template to use as the footer of each page in the PDF. Currently Puppeteer basically only supports a single line of text and you must use pdf.margins+CSS to make the footer appear! See #77.
pdf.landscape	boolean	`false`	Paper orientation.
pdf.pageRanges	string	-	Paper ranges to print, e.g., '1-5, 8, 11-13'. Defaults to the empty string, which means print all pages.
pdf.format	string	`A4`	Paper format. If set, takes priority over width or height options.
pdf.width	string	-	Paper width, accepts values labeled with units.
pdf.height	string	-	Paper height, accepts values labeled with units.
pdf.fullPage	boolean	-	Create PDF in a single page
pdf.margin.top	string	-	Top margin, accepts values labeled with units.
pdf.margin.right	string	-	Right margin, accepts values labeled with units.
pdf.margin.bottom	string	-	Bottom margin, accepts values labeled with units.
pdf.margin.left	string	-	Left margin, accepts values labeled with units.
screenshot.fullPage	boolean	`true`	When true, takes a screenshot of the full scrollable page.
screenshot.type	string	`png`	Screenshot image type. Possible values: `png`, `jpeg`
screenshot.quality	number	-	The quality of the JPEG image, between 0-100. Only applies when `screenshot.type` is `jpeg`.
screenshot.omitBackground	boolean	`false`	Hides default white background and allows capturing screenshots with transparency.
screenshot.clip.x	number	-	Specifies x-coordinate of top-left corner of clipping region of the page.
screenshot.clip.y	number	-	Specifies y-coordinate of top-left corner of clipping region of the page.
screenshot.clip.width	number	-	Specifies width of clipping region of the page.
screenshot.clip.height	number	-	Specifies height of clipping region of the page.
screenshot.selector	string	-	Specifies css selector to clip the screenshot to.

Example:

curl -o google.pdf https://url-to-pdf-api.herokuapp.com/api/render?url=http://google.com

POST /api/render - (JSON)

All options are passed in a JSON body object. Parameter names match Puppeteer options.

These options are exactly the same as its GET counterpart.

Body

The only required parameter is url.

{
  // Url to render. Either url or html is required
  url: "https://google.com",

  // Either "pdf" or "screenshot"
  output: "pdf",

  // HTML content to render. Either url or html is required
  html: "<html><head></head><body>Your content</body></html>",

  // If we should emulate @media screen instead of print
  emulateScreenMedia: true,

  // If we should ignore HTTPS errors
  ignoreHttpsErrors: false,

  // If true, page is scrolled to the end before rendering
  // Note: this makes rendering a bit slower
  scrollPage: false,

  // Passed to Puppeteer page.waitFor()
  waitFor: null,

  // Passsed to Puppeteer page.setCookies()
  cookies: [{ ... }]

  // Passed to Puppeteer page.setViewport()
  viewport: { ... },

  // Passed to Puppeteer page.goto() as the second argument after url
  goto: { ... },

  // Passed to Puppeteer page.pdf()
  pdf: { ... },

  // Passed to Puppeteer page.screenshot()
  screenshot: { ... },
}

Example:

curl -o google.pdf -XPOST -d'{"url": "http://google.com"}' -H"content-type: application/json" http://localhost:9000/api/render

curl -o html.pdf -XPOST -d'{"html": "<body>test</body>"}' -H"content-type: application/json" http://localhost:9000/api/render

POST /api/render - (HTML)

HTML to render is sent in body. All options are passed in query parameters. Supports exactly the same query parameters as GET /api/render, except url paremeter.

Remember that relative links do not work.

Example:

curl -o receipt.html https://rawgit.com/wildbit/postmark-templates/master/templates_inlined/receipt.html
curl -o html.pdf -XPOST [email protected] -H"content-type: text/html" http://localhost:9000/api/render?pdf.scale=1

GET /healthcheck

Health check endpoint used for monitoring if the service is still up and running.

curl -XGET http://localhost:9000/healthcheck

Development

To get this thing running, you have two options: run it in Heroku, or locally.

The code requires Node 8+ (async, await).

1. Heroku deployment

Scroll this readme up to the Deploy to Heroku -button. Click it and follow instructions.

WARNING: Heroku dynos have a very low amount of RAM. Rendering heavy pages may cause Chrome instance to crash inside Heroku dyno. 512MB should be enough for most real-life use cases such as receipts. Some news sites may need even 2GB of RAM.

2. Local development

First, clone the repository and cd into it.

cp .env.sample .env
Fill in the blanks in .env
npm install
npm start Start express server locally
Server runs at http://localhost:9000 or what $PORT env defines

Techstack

Node 8+ (async, await), written in ES7
Express.js app with a nice internal architecture, based on these conventions.
Hapi-style Joi validation with express-validation
Heroku + Puppeteer buildpack
Puppeteer to control Chrome

url-to-pdf-api's People

Contributors

Stargazers

Watchers

Forkers

rudimk ali-elrediny orinocoz derickwarshaw marak izogain saran-007 saityucel sagarayi hien rockonedege checkaayush willcode2surf anhntbk08 iameaszy artisr robinsingh1 maxwells foundplaces psych0der hbcbh1999 mamoru0217 abhi-infrrd sbeckeriv phohtoo splashinn trenddev umnagendra orchestor overrideveloper vaseem-mv alabhyavaibhav velebak jameswnl qadeer05 vitormixd noe007 gokaygurcan vibster equinooxe mayli pking74 sassoo kiennt bruceliu2008 garlandrileyjr mbcrump afj176 levinhphuc91 chipxd lyrl bitforks dyu lulzx saidaydogan livin21 dushmis anantharajuc venthota proximamonkey hanuor rahulraghavankklm drobnikj tanduong rgaidot iamandiradu ccppjava bbuneci niqmk ximbled-team tgifriday tasfe zaksh chaudhary27 thoi-duong itsmebhavin barseghyanartur cbluth rahulsy athiwatp ivmarcos lifetips suelsp kyle-mt adaptive abdullah-alghamdi bradparks acdsign zhoudaqing kameshyadav laterdude connect-wechat-app slimbn steakunderscore roachhd vaginessa kris6929 codedsun cbergoon patrickvanreck

url-to-pdf-api's Issues

Font weight ignored

I have an issue, whatever font-weight property I set it's being ignored. When I open html in Chrome it looks fine, but when I generate pdf from it everything has 'regular' font weight. Has anyone experienced this issue? Is there a workaround?

Add support to 0 timeout requests

As written inside README.md, you can pass the value 0 to goto.timeout when performing a request to /api/render. However, this is supported from puppeteer 0.12.0, while this project currently uses 0.11.0.
I've seen that there is a branch for updating to the newest puppeteer, so this is maybe a non-issue (but the README could be updated before we actually merge the new branch)

Do not support Chinese Website

Such as Baidu, a search engine like google in China. It cannot parse Chinese characters on this web page.
https://url-to-pdf-api.herokuapp.com/api/render?url=http://www.baidu.com

Remove '--no-sandbox --disable-setuid-sandbox' flags

These options make the Chrome more vulnerable. See: puppeteer/puppeteer#290

ERR_CERT_AUTHORITY_INVALID

In my corporation we have self-signed certs, which causes to throw errors. How do I disable SSL?

2017-10-11T20:44:32.919Z - info: [pdf-core.js] Set browser viewport..
2017-10-11T20:44:32.920Z - info: [pdf-core.js] Emulate @media screen..
2017-10-11T20:44:32.921Z - info: [pdf-core.js] Goto url http://google.com ..
2017-10-11T20:44:33.689Z - error: [pdf-core.js] Error when rendering page: Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
2017-10-11T20:44:33.689Z - error: [pdf-core.js] Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
    at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7)
2017-10-11T20:44:33.690Z - info: [pdf-core.js] Closing browser..
2017-10-11T20:44:33.708Z - error: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-11T20:44:33.708Z - error: [error-logger.js] Request parameters:
2017-10-11T20:44:33.709Z - error: [error-logger.js] Request body:
2017-10-11T20:44:33.710Z - error: [error-logger.js] Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID
    at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)
    at <anonymous>
    at process._tickCallback (internal/process/next_tick.js:188:7) 'Error: SSL Certificate error: ERR_CERT_AUTHORITY_INVALID\n    at NavigatorWatcher.waitForNavigation (/usr/src/app/node_modules/puppeteer/lib/NavigatorWatcher.js:73:20)\n    at <anonymous>\n    at process._tickCallback (internal/process/next_tick.js:188:7)'
GET /api/render?url=http://google.com&pdf.margin.top=2cm&pdf.margin.right=2cm&pdf.margin.bottom=2cm&pdf.margin.left=2cm 500 1021.139 ms - -

Run through Chrome Dom Distiller

The Dom Distiller is a nice tool to remove clutter on web pages: https://github.com/chromium/dom-distiller
It would be great to be able to use it.

https bug?

do we support screenshot with oauth2 or other login system?

Chrome tab pooling

Each requests starts a new instance with Puppeteer. We should use a pool of e.g. 4 tabs to make rendering as a service more reliable.

See: puppeteer/puppeteer#518

how to generate a PDF with automatic height?

I use this Puppeteer microservice to generate receipts in PDF. For each receipt, width is always the same, but height changes, according to the article count in the order.

For now, I'm using the article count to approximate the required height for my receipt. It kind of works, but it's not perfect and is a dirty way to do.
Is there way to tell Puppeteer API : "Please automatically find the right PDF height, according to the HTML body height, in order to generate a perfectly sized PDF" ?

Feature request: Support rendering images

Hi,

First of all, thanks for this awesome project. It seems to be really well thought-out, so thank you for your efforts. I also really like the ability to render logged in pages by setting a cookie in the POST request.

Since you are using puppeteer, which also supports rendering pages to images via "screenshot", it would be possible to render images as well. Is this something you're interested in? We have some users which would like this, for example for dashboards that are displayed on a monitors.

URL rendering itself

I think the following shouldn't be allowed as it might put load on the server:

https://url-to-pdf-api.herokuapp.com/api/render?url=https://url-to-pdf-api.herokuapp.com/api/render?url=https://google.com

Internal Server Error

Some requests to the demo Heroku app return:

{
  status: 500,
  statusText: "Internal Server Error",
  messages: [
    "Internal Server Error"
  ]
}

API key authentication

Hi folks! Could someone please point me to some documentation on how to do API key authentication. There's mention of it in the README, but no instructions yet, and I didn't see anything relevant in the Puppeteer docs. Any help appreciated!

'https://url-to-pdf-api.herokuapp.com' dosen't work!

grayscale

What is the way if I want convert my html to grayscale pdf ?

Security issues

It's easy to make Chrome display any file:// link. A couple of ways:

Redirect
window.location.href

Let's figure out if we could have a few ways in Puppeteer to block as much of these as possible. In any case, I'm quite confident that it's not possible to catch all of them. I would definitely recommend serving this API for "trusted" users, e.g. inside your organization.

can i use the css style 'print' to set the page pdf style

I want the content to appear on a separate page, not with other content

Only HTTPS allowed?

http://localhost:9000/api/render?url=http://google.com


2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request parameters:
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Request body:
2017-10-05T16:05:58.491Z - warn: [error-logger.js] Error: Only HTTPS allowed.
GET /api/render?url=https://google.com 403 0.824 ms - 74

Improve error handling

Puppeteer await calls are not throwing all errors. Some errors can only be catched from page.on('error', cb) callback. We should be able to provide these errors better in the responses. Currently almost all errors except validation errors are 500 Internal Server Error. Only place to see what happened is application logs.

Font size decrease when pdf.width and pdf.height parameters are passed.

If the height and width parameters are passed while rendering an HTML page, it somehow reduces the font size but the size of the content boxes are not affected.
Is this the expected result, if not is there any solutions (any flag) to make sure the pdf rendering does not affect any applied styles(CSS).

Upgrade Puppeteer to latest

Hi, please upgrate puppeteer to latest (1.2.0).
Thank you.

scrollPage bug

There are some websites such as example using a special lazy loading strategy.

When users scroll quickly(<300ms) they do not load image. Just when users stop to look at the content they load image.

So, I think that we need another option (scrollInterval) to let user to test and decide the interval.

releated discussions:

puppeteer/puppeteer#338 (comment)

Thanks!

Searching for maintainers

Hi,

I'm searching for a few helping hands with the maintenance. This repo is definitely on my top open source maintenance priorities and I'll continue to be a maintainer also but I haven't had enough time to do good maintenance lately. I think it's healthy for any project to have at least 2 persons with collaboration rights. If you'd like to join the effort, please respond to this issue describing a bit your background in open source.

Support cookies

I would not want to pass the hosted version auth cookies but locally I would like to pass in a url and a cookie to be set. This would allow me to generate, locally, pdfs of my authenticated pages.

Thanks. It looks neat.

Becker

Issues with header and footer templates

This issue gathers a lot of issues with PDF header and footer templates. They are not as flexible as I and apparently many others have thought.

Headers and footers are not appearing

Remember to set pdf.displayHeaderFooter to true.
Add margins for the document: puppeteer/puppeteer#1853

Working example: https://url-to-pdf-api.herokuapp.com/api/render?url=https://github.com&pdf.margin.bottom=100px&pdf.displayHeaderFooter=true&pdf.footerTemplate=%3Cp%20style=%22font-size:20px%22%3EFooter%20text%3C/p%3E

Styling is not working

See puppeteer/puppeteer#2916 and puppeteer/puppeteer#2388

Serverless Support

Deploy as AWS/Azure Function

Why aren't all images loading?

Some images are not loading correctly. Test case: https://url-to-pdf-api.herokuapp.com/api/render?emulateScreenMedia=false&url=https://medium.com/@e_mad_ehsan/getting-started-with-puppeteer-and-chrome-headless-for-web-scrapping-6bf5979dee3e

Crash after 'read ECONNRESET' error

Hi,

I get this error randomly when I try to generate a pdf from my local url-to-pdf.

What I get

The server crash with the following error : Error: read ECONNRESET at exports._errnoException (util.js:1018:11) at TCP.onread (net.js:568:26).
curl print curl: (52) Empty reply from server

What I do

curl -o test_.pdf -XPOST [email protected] -H"content-type: text/html" http://localhost:9000/api/render\?emulateScreenMedia=false\&goto.waitUntil\=load

Solution?

This bug only happens AFTER the pdf generation, when browser.close() is called, but I don't know if this is caused by puppeteer closing its connexion to chrome, or the connexion to one of the assets of the page. Because this error happens after the pdf generation, I'm inclined to ignore it, and it can be done by adding a callback on process.on('uncaughtException', (error) => {}), but I'm not sure that's the correct thing to do, but for now it's the only solution I can provide.

The html file I use

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Test</title>
  <!-- Normalize or reset CSS with your favorite library -->
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/normalize/3.0.3/normalize.css">

  <!-- Load paper.css for happy printing -->
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/paper-css/0.2.3/paper.css">
  <style>
    @page { 
      size: A4; 
    }
    img {
      display: block;
      position: absolute;
    }

    img:nth-of-type(1) {
      left: 200px;
      top: 200px;
      transform: rotate(30deg);
    }
    img:nth-of-type(2) {
      left: 10%;
      top: 70%;
      transform: rotate(200deg);
    }
    img:nth-of-type(2) {
      float: right;
    }
  </style>
</head>
<body class="A4">
  <section class="sheet">
    <h1>Lorem ipsum dolor sit amet, consectetur adipisicing elit. Cum, laboriosam!</h1>
    <p>Lorem ipsum dolor sit amet, <u>consectetur</u> adipisicing elit. <em>Officia</em> <strong>aspernatur sed</strong> <i>quis</i> veniam! Itaque fugiat voluptas rerum necessitatibus iste, <b>dolores id eligendi minus! <i>Velit <u>alias</u></i> quos</b> , deleniti optio quod numquam perspiciatis sequi. Hic autem omnis non ipsam odio. Sit nostrum officia, ea officiis corporis tempore ut illum minus placeat repellat similique natus facere iusto aperiam rerum magni inventore in vero error, quisquam nihil dolore culpa optio necessitatibus, dicta? Sit quos enim, id quidem ea amet voluptas vitae odit sequi, ex aliquid commodi illum aperiam odio suscipit reiciendis</p>
    <img src="https://placehold.it/400x400" alt="placeholder">
    <img src="https://placehold.it/400x400" alt="placeholder">
    <img src="https://placehold.it/400x400" alt="placeholder">
    <img src="https://placehold.it/400x400" alt="placeholder">
    <img src="https://placehold.it/400x400" alt="placeholder">
  </section>
  <section class="sheet">
    <h1>Such wow</h1>
    <h2>Such wow</h2>
    <h3>Such wow</h3>
    <h4>Such wow</h4>
    <h5>Such wow</h5>
    <h6>Such wow</h6>
    <p style="text-align: left">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Molestiae ipsa inventore laborum rem deserunt placeat, praesentium soluta exercitationem corporis at, voluptatibus id atque amet voluptate mollitia nam sunt nisi, excepturi facilis nemo! Maiores deserunt qui, quia soluta culpa accusantium distinctio numquam eaque asperiores maxime suscipit, iusto inventore. Adipisci, quasi corporis!</p>
    <p style="text-align: right">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Laborum, suscipit? Officia rem dolorum, quisquam autem expedita ea odio aliquam dicta amet corporis voluptatum ipsam sequi ipsa accusantium enim molestiae nemo, qui, et odit quod corrupti ab? Odio, quisquam voluptatem aperiam totam illum repellendus temporibus harum dolores, laboriosam alias, doloremque et?</p>
    <p style="text-align: center">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Sapiente ipsam consectetur omnis ut repellendus, amet commodi minus fugit consequatur recusandae necessitatibus explicabo quasi nostrum eveniet dolores similique eligendi, expedita blanditiis doloremque nemo nobis. Sint aspernatur, mollitia expedita nulla est, rerum aliquam error. Provident saepe similique, dignissimos quia explicabo ab, nihil.</p>
    <p style="text-align: justify;">Lorem ipsum dolor sit amet, consectetur adipisicing elit. Minima, tempora? Lorem ipsum dolor sit amet, consectetur adipisicing elit. Sapiente ipsam consectetur omnis ut repellendus, amet commodi minus fugit consequatur recusandae necessitatibus explicabo quasi nostrum eveniet dolores similique eligendi, expedita blanditiis doloremque nemo nobis. Sint aspernatur, mollitia expedita nulla est, rerum aliquam error. Provident saepe similique, dignissimos quia explicabo ab, nihil.</p>
    <h1 style="transform: rotate(180deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
    <h1 style="transform: rotate(50deg);text-align: center;">AMAZING</h1>
    <h1 style="transform: rotate(80deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
    <h1 style="transform: rotate(300deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
    <h1 style="transform: rotate(260deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
    <h1 style="transform: rotate(120deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
    <h1 style="transform: rotate(190deg);text-align: center;">WOOOOOOOOOOOOOOW</h1>
  </section>

</body>
</html>

URL and HTML issues with POST

Hi there,

I'm having trouble getting a POST request in Mithril.js to a locally hosted version of this repo to generate a PDF from the URL I pass through. The URL field is undefined on the server side.

This is what my call looks like:

m.request({
		method: "POST",
		url: "http://localhost:9000/api/render",
		headers: {
			"content-type": "application/json",
		},
		data: {
			"url": "http://www.google.com",
		},
	})
	.then(function (result) {
		try{
			console.log('Worked');
		} catch (error) {
			console.log('Error:' + error);
		}
	})
	.catch(function (result) {
		console.log('Error: ' + result);
	})

On the server side I output the opts. I get this:

{ cookies: [],
  scrollPage: false,
  emulateScreenMedia: true,
  ignoreHttpsErrors: false,
  html: {},
  viewport:
   { width: 1600,
     height: 1200,
     deviceScaleFactor: undefined,
     isMobile: undefined,
     hasTouch: undefined,
     isLandscape: undefined },
  goto:
   { waitUntil: 'networkidle',
     networkIdleTimeout: 2000,
     timeout: undefined,
     networkIdleInflight: undefined },
  pdf:
   { format: 'A4',
     printBackground: true,
     scale: undefined,
     displayHeaderFooter: undefined,
     landscape: undefined,
     pageRanges: undefined,
     width: undefined,
     height: undefined,
     margin:
      { top: undefined,
        right: undefined,
        bottom: undefined,
        left: undefined } },
  url: undefined,
  attachmentName: undefined,
  waitFor: undefined }

When I do a curl command it works as expected, html is null and url contains the expected url.

What am I doing wrong? Thanks in advance!

Fails to navigate on a non-.com

We have an internal site that I'm trying to grab PDFS from on the fly. The app works fine on any public url, but not on our internal.

2017-10-12T14:48:19.108Z - info: [pdf-core.js] Set browser viewport..
2017-10-12T14:48:19.109Z - info: [pdf-core.js] Emulate @media screen..
2017-10-12T14:48:19.109Z - info: [pdf-core.js] Goto url https://cef.erwf.nin.asn/ ..
2017-10-12T14:48:21.395Z - error: [pdf-core.js] Error when rendering page: Error: Failed to navigate: https://cef.erwf.nin.asn/
2017-10-12T14:48:21.396Z - error: [pdf-core.js] Error: Failed to navigate: https://cef.erwf.nin.asn/
    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)
    at <anonymous>
2017-10-12T14:48:21.396Z - info: [pdf-core.js] Closing browser..
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request headers: host=localhost:9000, user-agent=Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:56.0) Gecko/20100101 Firefox/56.0, accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8, accept-language=en-US,en;q=0.5, accept-encoding=gzip, deflate, connection=keep-alive, upgrade-insecure-requests=1
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request parameters:
2017-10-12T14:48:21.407Z - error: [error-logger.js] Request body:
2017-10-12T14:48:21.408Z - error: [error-logger.js] Error: Failed to navigate: https://cef.erwf.nin.asn/
    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)
    at <anonymous> 'Error: Failed to navigate: https://cef.erwf.nin.asn/\n    at Page.goto (/usr/src/app/node_modules/puppeteer/lib/Page.js:390:13)\n    at <anonymous>'
GET /api/render?url=https://cef.erwf.nin.asn/ 500 2484.461 ms - -

options in .env not used

When I alter the options in .env they are not used:

export NODE_ENV=development
export PORT=9990
export ALLOW_HTTP=true

When I use them as a prefix for the start command it works just fine:


ALLOW_HTTP=true PORT=9990 npm start

What am I doing wrong?

BTW, very nice piece of software!

Prevent hidden overflowing text

How may I prevent text getting hidden and overflowed this way http://resume.josephrex.me/ as shown in the profile paragraph there?

random errors when rendering pdf from html via POST

First - thank you so much for creating and working on this project.

I've deployed to Heroku. Most of the time pdf is generated, sometimes there is an error and entire node server crashes.

Here is the log: url_to_pdf_api_error-01-25-2018.log

Is there a good way to debug this problem? Currently it crashes around ~20% of the time. I was running on "hobby" initially, but had same results on 1x and 2x instance types.

All fonts are not rendered correctly in Heroku

E.g. Helvetica Neue is not rendered on Heroku. Url: https://url-to-pdf-api.herokuapp.com/api/render?url=https://github.com/kimmobrunfeldt/url-to-pdf-api/issues/5

My macbook:

macbook.pdf

Heroku:

heroku.pdf

Insufficient URL checking

I think it might be a good idea to restrict URLs to http:// and https:// protocols. The current demo allows file:// type URLs and can therefore be used to read information from the file system.

Try for example https://url-to-pdf-api.herokuapp.com/api/render?url=file:///etc/passwd

There might be issues with other protocols as well. I only tested file:// URLs.

500 error

The following url returns a 500 error, with no useful details.

https://url-to-pdf-api.herokuapp.com/api/render?url=https://drivetexas.org/#/7/32.340/-99.500?future%3Dfalse%26print%3Dtrue

{
  "status": 500,
  "statusText": "Internal Server Error",
  "messages": [
    "Internal Server Error"
  ]
}

Cloudflare and 301 redirects

Hello does this software has been tested to handle 301 requests?
Cloudflare does that and other softwares don't seem to follow up.

CSS filters causing image distortions

Just take a look at https://url-to-pdf-api.herokuapp.com/api/render?url=http://bennettfeely.com/filters&waitFor=1000

The first image (no css) has smooth lines but other images have glitches because of the filters applied to them. If you open the source in Chrome it has no such distortions.

Pass Header Parameters with the URL

How to send authentication parameter along with the url ??

trying something like this but failing
http://abc.abc.com:9000/api/render?url=http://abc.abc.com&setExtraHTTPHeaders={'header': 'value'}

Security issue

https://url-to-pdf-api.herokuapp.com/api/render?url=file:///etc/passwd

Actually shows the content of the file... as mentioned by erdbeerkaese, here:
https://news.ycombinator.com/item?id=15408217

how to specific file name?

I noticed the default file name is render.pdf, but how can I specific custom file name?

CORS config is missing

CORS_ORIGIN is missing in config.js, and it is used in app.js:

  const corsOpts = {
    origin: config.CORS_ORIGIN, //undefined
    methods: ['GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'HEAD', 'PATCH'],
  };

Cookies support

I was having difficulties getting the cookies to be sent with my request, and I think I may have found the problem. This function here is missing cookies assignment, and therefore the resulting cookies array is always empty.

Am I missing something? Thanks for the library by the way - it's just awesome!

setting this project up without Heroku

I'm interested in running this project from an AWS EC2 instance.

I should have no problems doing that following these instructions correct?
https://github.com/alvarcarto/url-to-pdf-api#development

Issue passing more than one parameter.

If i try to pass more than one parameter, i get an error on the second parameter.
Example: https://urltopdf2.herokuapp.com/api/render?url=https://server1.outsystemscloud.com/automatedterritoryas/PDFEmail.aspx?Tenantid=109&Territoryid=564

If i browse to the url, works no problem. When i try to use url-to-pdf-api, i get the following error:
{"status":400,"statusText":"Bad Request","errors":[{"field":["Territoryid"],"location":"query","messages":[""Territoryid" is not allowed"],"types":["object.allowUnknown"]}]}

Again, if i leave off the last &Territoryid=564 it works, no error. Add it, error.

Large HTML body causes Error: Navigation Timeout Exceeded: 30000ms exceeded

We are using a workaround to render raw HTML. This workaround is needed to wait until all external resources are loaded. See the issue here: puppeteer/puppeteer#728

I suspect that this workaround is causing errors with large HTMLs. In my tests, I found that ~2MB HTML worked, but 4MB didn't. Tried with 512MB RAM Heroku server.

Is is possible to remove links on PDF?

Is it possible to remove links from HTML components rendered into the PDF?
Thanks!

cookies

i am confused in assigning cooking in api. could anyone help me. I have 3 cookies
eg - Evnetid = 6235765; sessionid = jshdak; documentID= sjdh; how to enter this in api.

i read the document and try to put the values but getting error every time could anyone help please?

Adding a footer and header on every page

Great work here -- its 2017 and generating PDFs is still unnecessarily complicated. I'm currently using wfhtmltopdf. I'd love to stop using it, and use this project as a micro service to handle all my pdf needs. However the one thing I can't figure out how to do is add a footer and/or header to every generated page. Header/footer would need to have stuff like logo, page number, warning, date, invoice number etc, so it needs to be more custom than the standard pdf.displayHeaderFooter option allows.

Does anyone have any experience with this? Is there something I'm missing? Thanks again for this awesome project.

alvarcarto / url-to-pdf-api Goto Github PK

url-to-pdf-api's Introduction

URL to PDF Microservice

How it works

Good to know

Examples

API

GET /api/render

POST /api/render - (JSON)

POST /api/render - (HTML)

GET /healthcheck

Development

1. Heroku deployment

2. Local development

Techstack

url-to-pdf-api's People

Contributors

Stargazers

Watchers

Forkers

url-to-pdf-api's Issues

https bug?

Headers and footers are not appearing

Styling is not working

What I get

What I do

Solution?

The html file I use

Recommend Projects

Recommend Topics

Recommend Org