zenato / puppeteer-renderer Goto Github PK
View Code? Open in Web Editor NEWPuppeteer(Chrome headless node API) based web page renderer
License: MIT License
Puppeteer(Chrome headless node API) based web page renderer
License: MIT License
As in title
Error: Expected options.clip.x to be a number but found undefined
at Object.exports.assert (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/assert.js:26:15)
at Page.screenshot (/app/node_modules/puppeteer/lib/cjs/puppeteer/common/Page.js:1070:25)
at Renderer.screenshot (/app/src/renderer.js:120:33)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:97:5)
at async /app/src/index.js:64:44
Maybe last fix for #36 broke something?
I ran into this issue and was wondering if this problem is already known.
(node:25) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 178)
(node:25) UnhandledPromiseRejectionWarning: Error: Page crashed!
at Page._onTargetCrashed (/app/node_modules/puppeteer/lib/Page.js:209:28)
at CDPSession.Page.client.on (/app/node_modules/puppeteer/lib/Page.js:129:57)
at CDPSession.emit (events.js:189:13)
at CDPSession._onMessage (/app/node_modules/puppeteer/lib/Connection.js:166:18)
at Connection._onMessage (/app/node_modules/puppeteer/lib/Connection.js:83:25)
at WebSocketTransport._ws.addEventListener (/app/node_modules/puppeteer/lib/WebSocketTransport.js:25:32)
at WebSocket.onMessage (/app/node_modules/ws/lib/event-target.js:125:16)
at WebSocket.emit (events.js:189:13)
at Receiver.receiverOnMessage (/app/node_modules/ws/lib/websocket.js:797:20)
at Receiver.emit (events.js:189:13)
Once the error is thrown it won't 'recover' and throw the same error until I restart the docker container.
According to this issues it's related to too small shared memory files:
https://github.com/puppeteer/puppeteer/issues/1321\
https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md#tips
I haven't debugged the problem thoroughly but It should be solved with this simple flag.
const browser = await puppeteer.launch({
args: ['--disable-dev-shm-usage']
});
If you think it's a good idea, I am happy to prepare a pull request.
Thanks!
Puppeteer defaults screenshots to PNG now, which means the defaults here are broken, since the PNG renderer can't have any quality set. Also since pupeteer-renderer already uses the type
parameter, you can't override the extraOptions
with a type of screenshot.
puppeteer-renderer/src/renderer.js
Line 65 in 1f9ed18
Some websites (mainly internal ones, so may be routing issue) is giving me error 500 while trying to take a screenshot. Is it possible to display errors somehow what caused 500? Or access any logs?
Hi, I want to use api to generate jpeg file, but the screenshot option 'type' can not use, because in the URL already has 'type' parameter for choose pdf or screenshot, please help thanks.
Hi, @zenato
Just wonder about the idea (I'm surprised that can't see this question in issues).
We have next in middleware:
let isRender = false
module.exports = function(options) {
if (!options || !options.url) {
throw new Error('Must set url.')
}
let rendererUrl = options.url
const userAgentPattern = options.userAgentPattern || new RegExp(botUserAgents.join('|'), 'i')
const excludeUrlPattern =
options.excludeUrlPattern || new RegExp(`\\.(${staticFileExtensions.join('|')})$`, 'i')
const timeout = options.timeout || 10 * 1000
return (req, res, next) => {
if (isRender) return next()
isRender = true
My concern is that isRender
flag. On first request it's false. That's fine. We can run our logic to detect bot and so. After first call you make it true
. And it makes middleware ignore all the following requests.
What's the idea?
Hi,
We are running the zenato/puppeteer-renderer
docker image (version 2.1.5
) on Ubuntu 18.04.3, and experiencing zombie chromium processes that slowly eat out all memory and CPU of the host machine. I suspect that the issue could be due to the Puppeteer Process problem. The problem should be fixed in the latest puppeteer release v10.4.0.
Would it be possible to bump the puppeteer version in puppeteer-renderer eg. to v10..40, or alternatively try one of the workarounds suggested in the above problem thread, eg. adding --no-zygote
to the puppeteer launch arguments as suggested in this comment
Hi, I like the approach you taka a lot - using the official Chrome and puppeteer libraries to stay as near to "real chrome" as possible.
But I have a pretty fundamental issue: started master branch locally. Trying your example URL curl -v localhost:3000/?url=http://www.google.com&type=pdf
(or trying the same in the browser) does not return a PDF but the HTML just as if "type=pdf" were not passed.
In addition, the server is not closing the connection and not advising the content-length header, so that the download dialog in the browser is hanging and can only say " X kb of unknown, duration unkonwn"
Any idea ?
Hi,
This could be considered more of an enhancement on your side and maybe more of a bug on the puppeteer side ??
I am using the 2.2.0 version and it seems that rendering of thumbnails from embedded youtube videos in the webpage that I am screenshot-ing is no longer possible while it was in version 2.1.5. I guess the bump in puppeteer version is to blame ??
I saw this article regarding iframes rendering with puppeteer and wonder if this is something that could be incorporated in your side of the code to facilitate the rendering of embedded iframes.
In my use case, I am using your docker version 2.2.0 and calling it wihtout any middleware straight through a URL like http://localhost:{port}/?url=https://www.google.com to render a webpage screenshot.
Thank you in advance for your time and work.
Just wanted you to know that I forked this in this: https://github.com/nodeone/alpine-puppeteer-renderer - changed to Apline, updated puppeteer and added some more options. But I did a blunder and missed to get your git history in it - will fix that.
Hello, Would it be possible to make option to provide custom DNS servers?
Currently docker is using 8.8.8.8/8.8.4.4
I would like to provide another DNS servers at runtime.
I can edit those in etc/resolv, but it resets every time i restart host PC.
I jast ran fresh install on docker. sudo docker run -d --restart unless-stopped --name webscreenshot -p 8050:3000 zenato/puppeteer-renderer:latest
But instead of image, i get website source html (which renders website) . It started to act as proxy and was working fine. No idea what happend. Most probably after recent upgrade?
http://localhost:8050/?url=http://google.pl
No idea what I do wrong. Worked fine just yesterday.
Uncompressed, it's a whopping 2.29 GB. Is there anything more we could do to trim this down?
Random idea: Maybe use a slimmer chrome alternative e.g. https://github.com/alixaxel/chrome-aws-lambda?
I've noticed occasional exceptions in
https://github.com/zenato/puppeteer-renderer/blob/master/src/wait-for-animations.js#L18
I think what's happening is JS is causing the page to change size, then the image returned by page.screenshot
is a different size than previous
. This results in an exception from pixelmatch:
https://github.com/mapbox/pixelmatch/blob/master/bin/pixelmatch#L26
Off the top of my head I'd prefer the loop just keep checking until the timeout expires, then return false if for some reason it still didn't match.
newly released version of this image break my tests. admittedly i use it in a pretty hacky way! i would like to stay pinned to the last release, not the latest release
can we publish tagged images on dockerhub vs just latest?
thx!
I would like to report a security issue regarding Path traversal. Attackers can use the file protocol in URL parameters to view sensitive information.
Payload:
http://localhost:8080/html?url=file:///etc/passwd
Root cause:
The root cause is not sufficiently strict.
Patch:
#96
BTW, Can I apply for a CVE number?
Is there any way to add authentication to the docker service api? For example a secret Bearer.
Currently if I run the docker image, everybody can use the service - which is really an issue - even more as it can act as a proxy.
Many thanks for your great work!
I've looked at extra options, however these does not seem to work. From Docs clip.y
should work with screenshots, but I get 500 error. How should I build query to be able to clip (or scroll site)?
<?php
$params['width'] = 1200;
$params['height'] = 600;
$params['margin.top'] = 100;
$params['options.clip.y'] = 120; // No effect
$params['clip.y'] = 120; // Error 500
$params['clip']['y'] = 120; // Error 500
$params['type'] = 'screenshot';
// header('Content-type: image/png');
echo file_get_contents(SCREENSHOT_API_URL . http_build_query($params));
Are there any plans to provide arm64 images?
Error with latest update 2.4.2
Container fails to start
Reverting back to 2.4.0 solves the issue
Error: Cannot find module 'fs/promises'
Require stack:
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js
- /app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js
- /app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js
- /app/src/renderer.js
- /app/src/index.js
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:815:15)
at Function.Module._load (internal/modules/cjs/loader.js:667:27)
at Module.require (internal/modules/cjs/loader.js:887:19)
at require (internal/modules/cjs/helpers.js:74:18)
at Object.<anonymous> (/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js:36:20)
at Module._compile (internal/modules/cjs/loader.js:999:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
at Module.load (internal/modules/cjs/loader.js:863:32)
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
at Module.require (internal/modules/cjs/loader.js:887:19) {
code: 'MODULE_NOT_FOUND',
requireStack: [
'/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/BrowserFetcher.js',
'/app/node_modules/puppeteer-core/lib/cjs/puppeteer/node/node.js',
'/app/node_modules/puppeteer-core/lib/cjs/puppeteer/puppeteer-core.js',
'/app/node_modules/puppeteer/lib/cjs/puppeteer/puppeteer.js',
'/app/src/renderer.js',
'/app/src/index.js'
]
}
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `node src/index.js`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
Hi, thank you for building this! I have been using it and I think it has a great potential, especially if it more configurable via the URL. Right now it only has url
and type
, I think it's good that if we have another:
waitUntil
waitForSelector
width
height
fullPage
Thank you very much for creating and maintaining this project.
Is there perhaps a way to POST data to the renderer instead of using GET in order to allow for things like bigger header/footer templates etc that can cause the GET url+params to become too long?
Adding a health check endpoint would allow this to be used as-is in cloud environments such as AWS fargate/ECS/EBS.
Hi, I am trying to understand if I can export the pdf with emulatemedia as screen but it seems that this can't be altered because it is hardcoded to always use print? Or am I not using the url varaibles properly?
I tried media, emulateMedia, mediaType & emulateMedia.mediaType none of which gave me any different results.
If I am reading this line
puppeteer-renderer/src/renderer.js
Line 46 in 8287d18
correctly and it is hardcoded is there a chance you might leave it as an option in the URL and if it is not provided then by default use 'print'?
Thank you for taking the time to build and maintain this image!
Error: net::ERR_CERT_AUTHORITY_INVALID at https://localhost/c5934.html
at navigate (/app/node_modules/puppeteer/lib/FrameManager.js:108:45)
at process._tickCallback (internal/process/next_tick.js:68:7)
-- ASYNC --
at Frame.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:105:23)
at Page.goto (/app/node_modules/puppeteer/lib/Page.js:615:53)
at Page.<anonymous> (/app/node_modules/puppeteer/lib/helper.js:106:31)
at Renderer.createPage (/app/src/renderer.js:22:16)
at process._tickCallback (internal/process/next_tick.js:68:7)
Fresh instalation failed
renderer |
renderer | > [email protected] start /app
renderer | > node src/index.js
renderer |
renderer | Fail to initialze renderer. Error: Failed to launch chrome!
renderer | /app/node_modules/puppeteer/.local-chromium/linux-515411/chrome-linux/chrome: error while loading shared libraries: libgconf-2.so.4: cannot open shared object file: No such file or directory
renderer |
renderer |
renderer | TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md
renderer |
renderer | at onClose (/app/node_modules/puppeteer/lib/Launcher.js:211:14)
renderer | at Interface.helper.addEventListener (/app/node_modules/puppeteer/lib/Launcher.js:200:50)
renderer | at emitNone (events.js:111:20)
renderer | at Interface.emit (events.js:208:7)
renderer | at Interface.close (readline.js:370:8)
renderer | at Socket.onend (readline.js:149:10)
renderer | at emitNone (events.js:111:20)
renderer | at Socket.emit (events.js:208:7)
renderer | at endReadableNT (_stream_readable.js:1064:12)
renderer | at _combinedTickCallback (internal/process/next_tick.js:138:11)
url=https://www.baidu.com?id=1&search=name
we cant get the second parameter search.
Is there a way to clear puppeter/docker cache? Preferably by parameter.
I render preview and then when I change image on server renderer does not see it. Holds html image in cache.
Is there a way to force puppeteer docker api to disable caching or force to refresh it?
> [email protected] start /app
> node src/index.js
(node:18) ExperimentalWarning: The fs.promises API is experimental
Fail to initialze renderer. Error: Could not find browser revision 756035. Run "npm install" or "yarn install" to downlo
ad a browser binary.
at ChromeLauncher.launch (/app/node_modules/puppeteer/lib/Launcher.js:59:23)
After building a custom docker container from the source using docker build -t imagename .
and then running docker running imagename
, I'm getting the above error message, looks like the dependencies aren't being installed properly.
Any ideas how to fix this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.