Giter Site home page Giter Site logo

zenato / puppeteer-renderer Goto Github PK

View Code? Open in Web Editor NEW
308.0 308.0 92.0 836 KB

Puppeteer(Chrome headless node API) based web page renderer

License: MIT License

JavaScript 11.57% Dockerfile 7.16% TypeScript 81.27%
chrome-headless puppeteer server-side-rendering

puppeteer-renderer's Introduction

Hello 👋

puppeteer-renderer's People

Contributors

7a6163 avatar carbogninalberto avatar chaelli avatar dependabot[bot] avatar diskopete avatar ihipop avatar johnroyer avatar ksdme avatar pionl avatar scharfie avatar weph avatar zenato avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

puppeteer-renderer's Issues

Expected options.clip.x to be a number but found undefined

As in title

Error: Expected options.clip.x to be a number but found undefined
at Object.exports.assert (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
at Page.screenshot (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1070:25)
at Renderer.screenshot (/app/src/renderer.js:120:33)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/src/index.js:64:44

Maybe last fix for #36 broke something?

UnhandledPromiseRejectionWarning: Error: Page crashed!

I ran into this issue and was wondering if this problem is already known.

(node:25) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 178)
(node:25) UnhandledPromiseRejectionWarning: Error: Page crashed!
    at Page._onTargetCrashed (/app/node_modules/puppeteer/lib/Page.js:209:28)
    at CDPSession.Page.client.on (/app/node_modules/puppeteer/lib/Page.js:129:57)
    at CDPSession.emit (events.js:189:13)
    at CDPSession._onMessage (/app/node_modules/puppeteer/lib/Connection.js:166:18)
    at Connection._onMessage (/app/node_modules/puppeteer/lib/Connection.js:83:25)
    at WebSocketTransport._ws.addEventListener (/app/node_modules/puppeteer/lib/WebSocketTransport.js:25:32)
    at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:125:16)
    at WebSocket.emit (events.js:189:13)
    at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:797:20)
    at Receiver.emit (events.js:189:13)

Once the error is thrown it won't 'recover' and throw the same error until I restart the docker container.

According to this issues it's related to too small shared memory files:
https://github.com/puppeteer/puppeteer/issues/1321\
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#tips

I haven't debugged the problem thoroughly but It should be solved with this simple flag.

const browser = await puppeteer.launch({
  args: ['--disable-dev-shm-usage']
});

If you think it's a good idea, I am happy to prepare a pull request.

Thanks!

Defaults for screenshots are broken

Puppeteer defaults screenshots to PNG now, which means the defaults here are broken, since the PNG renderer can't have any quality set. Also since pupeteer-renderer already uses the type parameter, you can't override the extraOptions with a type of screenshot.

quality: Number(quality) || 100,

Error 500

Some websites (mainly internal ones, so may be routing issue) is giving me error 500 while trying to take a screenshot. Is it possible to display errors somehow what caused 500? Or access any logs?

API to generate jpeg file

Hi, I want to use api to generate jpeg file, but the screenshot option 'type' can not use, because in the URL already has 'type' parameter for choose pdf or screenshot, please help thanks.

How should it work?

Hi, @zenato

Just wonder about the idea (I'm surprised that can't see this question in issues).

We have next in middleware:

let isRender = false

module.exports = function(options) {
  if (!options || !options.url) {
    throw new Error('Must set url.')
  }

  let rendererUrl = options.url

  const userAgentPattern = options.userAgentPattern || new RegExp(botUserAgents.join('|'), 'i')
  const excludeUrlPattern =
    options.excludeUrlPattern || new RegExp(`\\.(${staticFileExtensions.join('|')})$`, 'i')
  const timeout = options.timeout || 10 * 1000

  return (req, res, next) => {
    if (isRender) return next()
    isRender = true

My concern is that isRender flag. On first request it's false. That's fine. We can run our logic to detect bot and so. After first call you make it true. And it makes middleware ignore all the following requests.

What's the idea?

Zombie chromium processes

Hi,

We are running the zenato/puppeteer-renderer docker image (version 2.1.5) on Ubuntu 18.04.3, and experiencing zombie chromium processes that slowly eat out all memory and CPU of the host machine. I suspect that the issue could be due to the Puppeteer Process problem. The problem should be fixed in the latest puppeteer release v10.4.0.

Would it be possible to bump the puppeteer version in puppeteer-renderer eg. to v10..40, or alternatively try one of the workarounds suggested in the above problem thread, eg. adding --no-zygote to the puppeteer launch arguments as suggested in this comment

type=pdf not actually returning a PDF and hanging

Hi, I like the approach you taka a lot - using the official Chrome and puppeteer libraries to stay as near to "real chrome" as possible.

But I have a pretty fundamental issue: started master branch locally. Trying your example URL curl -v localhost:3000/?url=http://www.google.com&type=pdf (or trying the same in the browser) does not return a PDF but the HTML just as if "type=pdf" were not passed.

In addition, the server is not closing the connection and not advising the content-length header, so that the download dialog in the browser is hanging and can only say " X kb of unknown, duration unkonwn"

Any idea ?

Inclusion of iframes in rendered output

Hi,

This could be considered more of an enhancement on your side and maybe more of a bug on the puppeteer side ??
I am using the 2.2.0 version and it seems that rendering of thumbnails from embedded youtube videos in the webpage that I am screenshot-ing is no longer possible while it was in version 2.1.5. I guess the bump in puppeteer version is to blame ??
I saw this article regarding iframes rendering with puppeteer and wonder if this is something that could be incorporated in your side of the code to facilitate the rendering of embedded iframes.

In my use case, I am using your docker version 2.2.0 and calling it wihtout any middleware straight through a URL like http://localhost:{port}/?url=https://www.google.com to render a webpage screenshot.

Thank you in advance for your time and work.

Custom DNS servers

Hello, Would it be possible to make option to provide custom DNS servers?
Currently docker is using 8.8.8.8/8.8.4.4

I would like to provide another DNS servers at runtime.

I can edit those in etc/resolv, but it resets every time i restart host PC.

Render started to act as proxy

I jast ran fresh install on docker. sudo docker run -d --restart unless-stopped --name webscreenshot -p 8050:3000 zenato/puppeteer-renderer:latest

But instead of image, i get website source html (which renders website) . It started to act as proxy and was working fine. No idea what happend. Most probably after recent upgrade?

http://localhost:8050/?url=http://google.pl No idea what I do wrong. Worked fine just yesterday.

animationTimeout shouldn't cause exceptions with fullPage (before the timeout passes, at least)

I've noticed occasional exceptions in
https://github.com/zenato/puppeteer-renderer/blob/master/src/wait-for-animations.js#L18

I think what's happening is JS is causing the page to change size, then the image returned by page.screenshot is a different size than previous. This results in an exception from pixelmatch:
https://github.com/mapbox/pixelmatch/blob/master/bin/pixelmatch#L26

Off the top of my head I'd prefer the loop just keep checking until the timeout expires, then return false if for some reason it still didn't match.

no old tags available on docker hub`

problem

newly released version of this image break my tests. admittedly i use it in a pretty hacky way! i would like to stay pinned to the last release, not the latest release

discussion

can we publish tagged images on dockerhub vs just latest?

thx!

Authentication

Is there any way to add authentication to the docker service api? For example a secret Bearer.
Currently if I run the docker image, everybody can use the service - which is really an issue - even more as it can act as a proxy.
Many thanks for your great work!

How to pass additional options for clipping

I've looked at extra options, however these does not seem to work. From Docs clip.y should work with screenshots, but I get 500 error. How should I build query to be able to clip (or scroll site)?

<?php
 $params['width'] = 1200;
    $params['height'] = 600;
    $params['margin.top'] = 100;
    $params['options.clip.y'] = 120; // No effect
    $params['clip.y'] = 120; // Error 500
    $params['clip']['y'] = 120; // Error 500
    $params['type'] = 'screenshot';
//    header('Content-type: image/png');
    echo file_get_contents(SCREENSHOT_API_URL . http_build_query($params));

Cannot find module 'fs/promises'

Error with latest update 2.4.2
Container fails to start
Reverting back to 2.4.0 solves the issue

Error: Cannot find module 'fs/promises'
Require stack:
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js
- /app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js
- /app/src/renderer.js
- /app/src/index.js
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:815:15)
    at Function.Module._load (internal/modules/cjs/loader.js:667:27)
    at Module.require (internal/modules/cjs/loader.js:887:19)
    at require (internal/modules/cjs/helpers.js:74:18)
    at Object.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js:36:20)
    at Module._compile (internal/modules/cjs/loader.js:999:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
    at Module.load (internal/modules/cjs/loader.js:863:32)
    at Function.Module._load (internal/modules/cjs/loader.js:708:14)
    at Module.require (internal/modules/cjs/loader.js:887:19) {
  code: 'MODULE_NOT_FOUND',
  requireStack: [
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js',
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js',
    '/app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js',
    '/app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js',
    '/app/src/renderer.js',
    '/app/src/index.js'
  ]
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `node src/index.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

Add more options

Hi, thank you for building this! I have been using it and I think it has a great potential, especially if it more configurable via the URL. Right now it only has url and type, I think it's good that if we have another:

  • waitUntil
  • waitForSelector
  • width
  • height
  • fullPage

Add health check endpoint

Adding a health check endpoint would allow this to be used as-is in cloud environments such as AWS fargate/ECS/EBS.

media for pdf

Hi, I am trying to understand if I can export the pdf with emulatemedia as screen but it seems that this can't be altered because it is hardcoded to always use print? Or am I not using the url varaibles properly?

I tried media, emulateMedia, mediaType & emulateMedia.mediaType none of which gave me any different results.

If I am reading this line

page = await this.createPage(url, { timeout, waitUntil, credentials, emulateMedia: 'print' })

correctly and it is hardcoded is there a chance you might leave it as an option in the URL and if it is not provided then by default use 'print'?

Thank you for taking the time to build and maintain this image!

Oops, An expected error seems to have occurred.

Error: net::ERR_CERT_AUTHORITY_INVALID at https://localhost/c5934.html
    at navigate (/app/node_modules/puppeteer/lib/FrameManager.js:108:45)
    at process._tickCallback (internal/process/next_tick.js:68:7)
  -- ASYNC --
    at Frame.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:105:23)
    at Page.goto (/app/node_modules/puppeteer/lib/Page.js:615:53)
    at Page.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:106:31)
    at Renderer.createPage (/app/src/renderer.js:22:16)
    at process._tickCallback (internal/process/next_tick.js:68:7)

Failed to start

Fresh instalation failed

renderer    | 
renderer    | > [email protected] start /app
renderer    | > node src/index.js
renderer    | 
renderer    | Fail to initialze renderer. Error: Failed to launch chrome!
renderer    | /app/node_modules/puppeteer/.local-chromium/linux-515411/chrome-linux/chrome: error while loading shared libraries: libgconf-2.so.4: cannot open shared object file: No such file or directory
renderer    | 
renderer    | 
renderer    | TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
renderer    | 
renderer    |     at onClose (/app/node_modules/puppeteer/lib/Launcher.js:211:14)
renderer    |     at Interface.helper.addEventListener (/app/node_modules/puppeteer/lib/Launcher.js:200:50)
renderer    |     at emitNone (events.js:111:20)
renderer    |     at Interface.emit (events.js:208:7)
renderer    |     at Interface.close (readline.js:370:8)
renderer    |     at Socket.onend (readline.js:149:10)
renderer    |     at emitNone (events.js:111:20)
renderer    |     at Socket.emit (events.js:208:7)
renderer    |     at endReadableNT (_stream_readable.js:1064:12)
renderer    |     at _combinedTickCallback (internal/process/next_tick.js:138:11)

Puppeter cache disable or clear

Is there a way to clear puppeter/docker cache? Preferably by parameter.

I render preview and then when I change image on server renderer does not see it. Holds html image in cache.

Is there a way to force puppeteer docker api to disable caching or force to refresh it?

Docker container fails to run

> [email protected] start /app
> node src/index.js

(node:18) ExperimentalWarning: The fs.promises API is experimental
Fail to initialze renderer. Error: Could not find browser revision 756035. Run "npm install" or "yarn install" to downlo
ad a browser binary.
    at ChromeLauncher.launch (/app/node_modules/puppeteer/lib/Launcher.js:59:23)

After building a custom docker container from the source using docker build -t imagename . and then running docker running imagename, I'm getting the above error message, looks like the dependencies aren't being installed properly.

Any ideas how to fix this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.