Giter Site home page Giter Site logo

ebidel / try-puppeteer Goto Github PK

View Code? Open in Web Editor NEW
724.0 18.0 121.0 146 KB

Run Puppeteer code in the cloud

License: Apache License 2.0

JavaScript 55.44% Shell 1.49% HTML 18.03% CSS 17.36% Dockerfile 7.68%
puppeteer headless-chrome appengine docker node

try-puppeteer's Introduction

Try Puppeteer!

Run Puppeteer scripts in the cloud.

Develop

Installation:

yarn; yarn install-backend
# or npm i

Backend

The backend is a Docker container which installs the latest Chrome package that works with Puppeteer on Linux.

Note: You'll need to have Docker running before attempting each step in this section.

Building it

yarn build

Running the container

The container can be run in two modes: standalone as an executable, or as a web service.

1. Using the standalone CLI

The first is a "standalone" mode that runs a Puppeteer script from the CLI. It takes a script file as an argument and runs it in the container.

./backend/run_puppeteer.sh your-puppeteer-script.js

2. Running the web service

The second option is running the container as a web server. The endpoint accepts file uploads for running your Puppeteer scripts in the cloud:

Start the server:

cd backend
yarn serve
# yarn restart is handy too. It rebuilds the container and starts the server.

Example - running a Puppeteer script

async function runCode(code) {
  const form = new FormData();
  form.append('file', new Blob([code], {type: 'text/javascript'}));
  const resp = await fetch('http://localhost:8080/run', {method: 'POST', body: form});
  return await resp.json();
}

const code = `
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://example.com');
  console.log(await page.content());
  browser.close();
`;

runCode(code).then(result => {
  if (result.errors) {
    console.error(result.errors);
  }
  console.log(result.log);
});

Notes:

  • There's no need to require('puppeteer'). This is done for you on the backend.
  • Top-level async/await are supported.

Code editor frontend

Fire up the code editor UI from the main directory:

yarn serve

Then navigate to http://localhost:8081.

Deployment

  1. Update the version of Puppeteer used in index.html, include the doc link. TODO: make this automatic.

  2. yarn deploy deploys both the frontend and backend services to App Engine Flex. The apps can also be deployed individually:

yarn deploy-frontend
yarn deploy-backend

Notes & Limitations

  • By default, Puppeteer launches and uses its own bundled version of Chromium. To use the google-chrome-unstable installed by the container, pass executablePath:

    const browser = await puppeteer.launch({
      executablePath: 'google-chrome-unstable'
    });

try-puppeteer's People

Contributors

ambayim avatar chocolateboy avatar dependabot[bot] avatar ebidel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

try-puppeteer's Issues

OpenShift

Can I run this code on Openshift?

Missing fonts

I've managed to setup puppeteer with GAE and got it working. However, there are missing fonts - for example where html font family states Georgia, Tahoma, Trebuchet MS, and Verdana these are missing and conversion to PDF will default to Times New Roman. Fonts Times New Roman, Arial, and Courier New all work fine.

Any ideas on how to get the missing fonts installed and working with Puppeteer on GAE?

Thanks.

Broken flexbox layout in Firefox/Edge

STR: Run the default example

AR:

  • Result panel stays fixed and doesn't resize to 50%
  • Header offset on the right is misaligned.

Chrome Canary | Firefox Nightly

image

Analysis (thanks to @dholbert):

The overall height of elements in .puppeteer-results make it larger than the viewport height. The layout is optimizing for a Chrome bug that is being fixed: https://bugs.chromium.org/p/chromium/issues/detail?id=596743

Solution:

Adding min-height: 0 to .puppeteer-results fixes the issue.

image

Editor typeface

Editor typeface is sans-serif and has weird ligatures, while this might be nice for other uses but is not great for writing code.

Fix backend dockerfile

The build script is no longer working.

Some needed changes:

  • Update node version on the Dockerfile.
  • The command RUN npm install puppeteer will install the puppeteer 3.x.x with can have breaking changes.

This is a suggestion for a Dockerfile that worked for me. I changes the node image to use Alpine, and added the "puppeteer": "^2.0.0" dependency on the package.json.

I also changed the following lines on the server.js from:

    code = code.replace(/\.launch\([\w\W]*?\)/g,
        ".launch({args: ['--no-sandbox', '--disable-dev-shm-usage']})");

to:

    code = code.replace(/\.launch\([\w\W]*?\)/g,
        ".launch({executablePath: '/usr/bin/chromium-browser', args: ['--disable-dev-shm-usage']})");

Dockerfile:

FROM node:alpine

# Tell Puppeteer to skip installing Chrome. We'll be using the installed package.
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true

# Installs latest Chromium (79) package.
RUN apk add --no-cache \
      chromium \
      nss \
      freetype \
      freetype-dev \
      harfbuzz \
      ca-certificates \
      ttf-freefont

ADD https://github.com/Yelp/dumb-init/releases/download/v1.2.1/dumb-init_1.2.1_amd64 /usr/local/bin/dumb-init
RUN chmod +x /usr/local/bin/dumb-init

WORKDIR /usr/src/app

COPY . .
COPY package*.json ./

# Install deps for server.
RUN npm install --only=production

# building the image.
ARG CACHEBUST=1

# Add pptr user.
RUN addgroup -S pptruser && adduser -S -G pptruser pptruser \
    && mkdir -p /home/pptruser/Downloads \
    && chown -R pptruser:pptruser /home/pptruser \
    && chown -R pptruser:pptruser ../app

# Run user as non privileged.
USER pptruser

EXPOSE 8080

ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "server.js"]

Debugging the website locally

First of all, this is a great demo! I am curious what are the steps to debug the website using localhost? Also, do you have to use yarn restart to get any of your coding changes to be applied? It seems like a very slow workflow so I wanted to be sure this was the correct approach. If there is any other documentation on authoring/debugging this, I would greatly appreciate it.

launch() options get nuked

If the user launches with:

const browser = await puppeteer.launch({
    slowMo: 250, // slow down by 250ms
    headless: false,
    ignoreHTTPSErrors: true,
    dumpio: true,
});

We kill all these options adding --no-sandbox. Instead, the options should be captured, parsed, and reset.

How could I build dependency into try-puppeteer docker

Error: Cannot find module 'axios'
at Function.Module._resolveFilename (module.js:538:15)
at Function.Module._load (module.js:468:25)
at Module.require (module.js:587:17)
at require (internal/module.js:11:18)
at [eval]:2:11
at ContextifyScript.Script.runInThisContext (vm.js:50:33)
at Object.runInThisContext (vm.js:139:38)
at Object. ([eval]-wrapper:6:22)
at Module._compile (module.js:643:30)
at evalScript (bootstrap_node.js:462:27)

Broken example: proxy.js

Looks like proxy.js is broken as a default right now:

image

Just wanted to report so y'all are aware! πŸ™

Using puppeteer.connect(options)

Would it be possible to use this to connect using puppeteer.connect(options)? It would be nice to be able to connect to Chromium instance running in docker.

/dev/shm size

The default 64M dev/shm size is too small for full page shots of large pages - Wa Post, CNN for example.

I'm aware of the following options at the moment but nothing seems turnkey yet:

  1. There's talk on chromium channels to potentially take a tmp dir as a cli parameter.
  2. GCP could enable increasing the size of dev/shm on docker run. I am assuming they use default size to keep things simple, but IMO it's going to cause an unnecessary and premature exodus of users from app engine onto compute engine.
  3. Could modify puppeteer to handle streaming differently.
  4. Migrate off app engine custom/docker entirely and onto container engine proper.

Seems like this project could merge with browserless? Browserless is "run anywhere" so could offer a plugin interface to that layer, and "Google App Engine" could be the first implementation of an adapter.

PSA: Taking down try-puppeteer.appspot.com

Hi folks, we're taking down https://try-puppeteer.appspot.com/ as it's running a quite old version of puppeteer and it's not being maintained.

Of course, this try-puppeteer open source project is still here, which allows anyone to put up their own version. But the public instance will be offline.

Apologies and thanks for understanding.

Run on localhost

How I can install Chrome headless (like β€œtry-puppeteer.appspot.com”) on my localhost to that others clients can access it in my LAN?

Exist some documents that explain this?.

TypeError: Failed to fetch

hi, I installed backend and frontend on my vps, and to visit backend it shows:

"it works!"

which i assume it works, and i run frontend and always got:

TypeError: Failed to fetch

image

i thought this might be something wrong about my backend, so i set the backend point to try-puppeteer.appspot.com in app.js as below:

const BACKEND_HOST = ( 'https://backend-dot-try-puppeteer.appspot.com');

and still i got the same TypeError: Failed to fetch

any idea what i got wrong?

thanks?

Catch responses

This code return {"log":""}

const browser = await puppeteer.launch();
const page = await browser.newPage();
page.on('response', (response) => {
    console.log(response.url());
    browser.close();
});
await page.goto('some-test-uri', { waitUntil: 'networkidle2' });

So how can I get response urls?

Run without Docker

Is there any straightforward way of running the 'backend service' without docker?

Cheers
Lasse

Potential Memory Leak??

Hi, Thank you for nice puppeteer project.

I finished my puppeteer project by reference backend/server.js file.
In my case, I got a little memory leak. Because I did not call browser.disconnect() before reusing the browser object.

I read the backend/server.js file again. I did not find codes be related with disconnect browser. So I guess maybe try-puppeteer got the same memory leak problem ?

page.accessibility not working on https://try-puppeteer.appspot.com/

on https://try-puppeteer.appspot.com/

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.google.com');
console.log(!!page.accessibility);
console.log(!!page.coverage);
console.log(!!page.keyboard);
console.log(!!page.mouse);
console.log(!!page.touchscreen);
console.log(!!page.tracing);
await browser.close();

logs:

false
true
true
true
true
true

going by https://pptr.dev/#?product=Puppeteer&version=v1.19.0&show=api-class-page I would think they all should be true.

To verify it is possible, adding this to the above code does get a tree

const snapshot = await page._client.send('Accessibility.getFullAXTree');
console.log(JSON.stringify(snapshot,null,'  '));

Demo page VM can be escaped

Because the demo page uses Node's "vm" package, the context essentially inherits from Object, which inherits from Function, which means we can escape the context and do malicious things (sorry about that):

Running:

new Function(`this.constructor.constructor('return process')().exit()`)

Will kill node.

There's more discussion about that here gf3/sandbox#50

I'd recommend switching it out for "vm2", which is more secure and has a similar API.

Page content log gets trimmed

Hi,

not sure if you still support the online version of this project, but I just noticed that the "LOG" window always show a trimmed version of the final rendering content. No matter which page loads or its height, the last lines of code are always trimmed out.

Am I right?

Kind regards,

search.js not working on https://try-puppeteer.appspot.com/

Steps to reproduce

Tell us about your environment:

Puppeteer version: v1.9.0
Platform / OS version: Windows 10/latest chrome/latest firefox as of today
URLs (if applicable): https://try-puppeteer.appspot.com/
Node.js version: nil

What steps will reproduce the problem?

Go to https://try-puppeteer.appspot.com/
Select search.js from examples selection
Hit Run

Please include code that reproduces the issue.

No code needed. Just try search example.

What is the expected result?
Extract the results from the page.

What happens instead?
Error running your code. Error: No node found for selector: #searchbox input

Hosting services

Hey,

Where does the demo page is hosted? It runs pretty well on large pages.

Thanks

Failed to Launch Chrome on https://try-puppeteer.appspot.com/

On Canary Build

Google Chrome is up to date
Version 66.0.3344.0 (Official Build) canary (64-bit)

I get same error on Official Build

Google Chrome is up to date
Version 66.0.3344.0 (Official Build) canary (64-bit)

Getting this error in log section:

Try Puppeteer
v1.0.0

google.com
1
const browser = await puppeteer.launch();
2
​
3
const page = await browser.newPage();
4
await page.goto('http://www.google.com');
5
​
6
console.log(await page.content());
7
await page.screenshot({path: 'screenshot.png'});
8
​
9
await browser.close();
10
​
  
 
Error running your code. Error: Failed to launch chrome!
[0210/102207.499068:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499588:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499629:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499653:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499671:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499690:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499709:ERROR:platform_thread_posix.cc(123)] pthread_create: Resource temporarily unavailable (11)
[0210/102207.499726:FATAL:browser_main_loop.cc(1136)] Failed to start the browser thread: id == 6
#0 0x564e021fa2ec base::debug::StackTrace::StackTrace()
#1 0x564e0221173c logging::LogMessage::~LogMessage()
#2 0x564e00f12353 content::BrowserMainLoop::CreateThreads()
#3 0x564e0124a8e7 content::StartupTaskRunner::RunAllTasksNow()
#4 0x564e00f119fc content::BrowserMainLoop::CreateStartupTasks()
#5 0x564e00f162b8 content::BrowserMainRunnerImpl::Initialize()
#6 0x564e060e2059 headless::HeadlessContentMainDelegate::RunProcess()
#7 0x564e01f36f60 content::RunNamedProcessTypeMain()
#8 0x564e01f37975 content::ContentMainRunnerImpl::Run()
#9 0x564e01f40c5f service_manager::Main()
#10 0x564e01f36481 content::ContentMain()
#11 0x564e060e11fa headless::(anonymous namespace)::RunContentMain()
#12 0x564e060e126e headless::HeadlessBrowserMain()
#13 0x564e01f3da6a headless::HeadlessShellMain()
#14 0x564e007c31c5 ChromeMain
#15 0x7f42943a8b45 __libc_start_main
#16 0x564e007c302a _start

Received signal 6
#0 0x564e021fa2ec base::debug::StackTrace::StackTrace()
#1 0x564e021f9e51 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#2 0x7f429a223890 <unknown>
#3 0x7f42943bc067 gsignal
#4 0x7f42943bd448 abort
#5 0x564e021f8aa5 base::debug::BreakDebugger()
#6 0x564e02211b3f logging::LogMessage::~LogMessage()
#7 0x564e00f12353 content::BrowserMainLoop::CreateThreads()
#8 0x564e0124a8e7 content::StartupTaskRunner::RunAllTasksNow()
#9 0x564e00f119fc content::BrowserMainLoop::CreateStartupTasks()
#10 0x564e00f162b8 content::BrowserMainRunnerImpl::Initialize()
#11 0x564e060e2059 headless::HeadlessContentMainDelegate::RunProcess()
#12 0x564e01f36f60 content::RunNamedProcessTypeMain()
#13 0x564e01f37975 content::ContentMainRunnerImpl::Run()
#14 0x564e01f40c5f service_manager::Main()
#15 0x564e01f36481 content::ContentMain()
#16 0x564e060e11fa headless::(anonymous namespace)::RunContentMain()
#17 0x564e060e126e headless::HeadlessBrowserMain()
#18 0x564e01f3da6a headless::HeadlessShellMain()
#19 0x564e007c31c5 ChromeMain
#20 0x7f42943a8b45 __libc_start_main
#21 0x564e007c302a _start
  r8: ffffb302ba74cd30  r9: ffffb302ba74cd20 r10: 0000000000000008 r11: 0000000000000202
 r12: 00007ffe68213798 r13: 00007ffe68213788 r14: 00007ffe68213790 r15: 00007ffe682132f0
  di: 00000000000069c8  si: 00000000000069c8  bp: 00007ffe68213280  bx: 00007ffe682132f0
  dx: 0000000000000006  ax: 0000000000000000  cx: 00007f42943bc067  sp: 00007ffe68213148
  ip: 00007f42943bc067 efl: 0000000000000202 cgf: 002b000000000033 erf: 0000000000000000
 trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.


TROUBLESHOOTING: https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md

Where do I submit security related issues?

Hello,

I'd like to report a potential security vulnerability, but I don't want to make it immediately visible to all.

What is the most appropriate way to submit it privately?

Page crashed

I'm having this error while trying to execute the example:

Error running your code. Error: Page crashed!

Error: net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH

Tried below code but getting an error.

const puppeteer = require('puppeteer');
(async () => {
    const browser = await puppeteer.launch({ignoreHTTPSErrors: true, acceptInsecureCerts: true, args: ['--proxy-bypass-list=*', '--disable-gpu', '--disable-dev-shm-usage', '--disable-setuid-sandbox', '--no-first-run', '--no-sandbox', '--no-zygote', '--single-process', '--ignore-certificate-errors', '--ignore-certificate-errors-spki-list', '--enable-features=NetworkService']});
    const page = await browser.newPage();
    try {

        await page.goto('https://www.hostwpsolutions.com/', {waitUntil: 'networkidle2', timeout: 59000});
        const cookies = await page._client.send('Network.getAllCookies');
        JSON.stringify(cookies, null, 4);
    } catch (e) {
        console.log(e);
    }

    await browser.close();
})();

FR: Add "share snippet" button

It would be awesome to have a "share snippet" button in the try-puppeteer, that would generate a URL for the snippet.

For example, it can embed snippet in the URL itself:

try-puppeteer.appspot.com/?code=require('puppeteer')...

your-puppeteer-script.js

Hey,

thanks for the awesome code, but what does the your-puppeteer-script.js looks like ? I'm failing to follow the start up guide to start up the app.

Can someone help me please ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.