Giter Site home page Giter Site logo

zfcsoftware / puppeteer-real-browser Goto Github PK

View Code? Open in Web Editor NEW
251.0 7.0 33.0 155 KB

This package is designed to bypass puppeteer's bot-detecting captchas such as Cloudflare. It acts like a real browser and can be managed with puppeteer.

Home Page: https://www.npmjs.com/package/puppeteer-real-browser

License: MIT License

JavaScript 94.73% Dockerfile 5.27%
cloudflare-bypass puppeteer puppeteer-cloudflare-captcha puppeteer-extra-plugin puppeteer-fingerprint puppeteer-real-browser undetected-browser puppeteer-undetected-browser undetected-puppeteer undetected

puppeteer-real-browser's Introduction


Logo

Puppeteer Real Browser

This package prevents Puppeteer from being detected as a bot in services like Cloudflare and allows you to pass captchas without any problems. It behaves like a real browser.

If you are only interested in Cloudflare WAF, please check this repo:
https://github.com/zfcsoftware/cf-clearance-scraper

2024-02-25.16-53-05.mp4

Contributors Forks Stargazers Issues License

Installation

If you are using a Linux operating system, xvfb must be installed for the library to work correctly.

npm i puppeteer-real-browser

if you are using linux:

sudo apt-get install xvfb

Include

CommanJS

const start = async () => {
    var { connect } = await import('puppeteer-real-browser')
    const { page, browser } = await connect({})
}

Module

import { connect } from 'puppeteer-real-browser'

const { page, browser } = await connect({})

Usage

import { connect } from 'puppeteer-real-browser'

connect({

    headless: 'auto',

    args: [],

    customConfig: {},

    skipTarget: [],

    fingerprint: false,

    turnstile: true,

    connectOption: {},

    fpconfig: {},

    // proxy:{
    //     host:'<proxy-host>',
    //     port:'<proxy-port>',
    //     username:'<proxy-username>',
    //     password:'<proxy-password>'
    // }

})
.then(async response => {
    const {browser, page} = response
    await page.goto('<url>')
    
})
.catch(error=>{
    console.log(error.message)
})

headless: auto can take the values true and false. If auto is set, it uses the option that is stable on the operating system in use.

args: If there is an additional flag you want to add when starting Chromium, you can send it with this string.

customConfig: When launch is executed, the variables you send in be onje are added. For example, you can specify the browser path with executablePath.

skipTarget: It uses target filter to avoid detection. You can send the targets you want to allow. This feature is in beta. Its use is not recommended.

fingerprint: If set to true, it injects a unique fingerprint ID into the page every time the browser is launched and prevents you from being caught. Not recommended if not mandatory. May cause detection. runs the puppeteer-afp library.

turnstile: Cloudflare Turnstile automatically clicks on Captchas if set to true

connectOption: The variables you send when connecting to chromium created with puppeteer.connect are added fpconfig: This setting allows you to reuse fingerprint values that you have previously saved in the puppeteer-afp library. Please refer to the puppeteer-afp library documentation for details.

For example, if you want to open a 2nd page, you can use this library as follows.

import { connect } from 'puppeteer-real-browser'

connect({
    turnstile: true
})
.then(async response => {
        const { page, browser, setTarget } = response

        page.goto('https://nopecha.com/demo/cloudflare', {
            waitUntil: 'domcontentloaded'
        })

        setTarget({ status: false })

        let page2 = await browser.newPage();

        setTarget({ status: true })

        await page2.goto('https://nopecha.com/demo/cloudflare');
})

Docker

You can use the Dockerfile file in the main directory to use this library with docker. It has been tested with docker on Ubuntu server operating systems.

To run a test, you can follow these steps

git clone https://github.com/zfcsoftware/puppeteer-real-browser
cd puppeteer-real-browser
docker build -t puppeteer-real-browser-project .
docker run puppeteer-real-browser-project

Support Us

This library is completely open source and is constantly being updated. Please star this repo to keep these updates coming. Starring the repo will support us to improve it.

License

Distributed under the MIT License. See LICENSE for more information.

Thank You

  • Jimmy Laurent - Jimmy Laurent - inspired by cloudflare-scraper library
  • CrispyyBaconx - CrispyyBaconx - Contributed to converting this library to Typescript
  • Pavle Aleksic - pavlealeksic - We change the fingerprint with the puppeteer-afp library.

Disclaimer of Liability

No responsibility is accepted for the use of this software. This software is intended for educational and informational purposes only. Users should use this software at their own risk. The developer cannot be held liable for any damages that may result from the use of this software.

This software is not intended to bypass Cloudflare Captcha or any other security measure. It must not be used for malicious purposes. Malicious use may result in legal consequences.

This software is not officially endorsed or guaranteed. Users can visit the GitHub page to report bugs or contribute to the software, but they are not entitled to make any claims or request service fixes.

By using this software, you agree to this disclaimer.****

puppeteer-real-browser's People

Contributors

rtritto avatar zfcsoftware avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

puppeteer-real-browser's Issues

it is not working exactly like a real browser because the website is detecting this

is not working exactly like a real browser, the website I am using for testing is detecting that I am browsing through puppeteer, the website opens a video player if you enter it normally manually via Google Chome, but if you use puppeteer it shows a message and does not display the player

I'm using the latest version of node

const url = 'https://brbeast.com/video/f79921bbae40a577928b76d2fc3edc2a';
const sleep = ms => new Promise(res => setTimeout(res, ms));

const start = async () => {
    var { puppeteerRealBrowser } = await import('puppeteer-real-browser')
    const { page, browser } = await puppeteerRealBrowser({
        headless: false, // (optional) The default is false. If true is sent, the browser opens incognito. If false is sent, the browser opens visible.
        action:'default', // (optional) If default, it connects with puppeteer by opening the browser and returns you the page and browser. if socket is sent, it returns you the browser url to connect to. 
        executablePath:'default', // (optional) If you want to use a different browser instead of Chromium, you can pass the browser path with this variable.
        // (optional) If you are using a proxy, you can send it as follows.
        // proxy:{
        //     host:'<proxy-host>',
        //     port:'<proxy-port>',
        //     username:'<proxy-username>',
        //     password:'<proxy-password>'
        // }
    })
    console.log('Running tests..')
    // You should use it if you want the fingerprint values of the page to be changed.
    // puppeteerAfp(page);

    await page.goto(url)
    await sleep(5000)
    await page.screenshot({ path: 'testresult.png', fullPage: true })
   // await browser.close()
    console.log(`All done, check the screenshot. ✨`)
}




 start();

testresult

TF = true isnt detecting CF Turnstile

Greetings,

when I use tf = true its not detecting Turnstile at all, when I turn it off it does detect turnstile but it cannot pass the captcha

PS: I just used the example script from your starting page, nothing else added

Timeout error when waiting for browser response

Basically, during the execution of the code, after I give a new page.goto (new url), the puppeteer loses the browser reference and does not return anything, just the TimeoutError error

image

export async function login(page, credentials){ try { await page.locator([name="Username"]).fill(credentials.user) await page.locator([name="Password"]).fill(credentials.password) await page.click([data-qa="submit"]) await page.waitForTimeout(10000) await page.goto(https://br.betano.com/live/`)
// await page.waitForNavigation({ waitUntil: 'domcontentloaded' });
// await page.evaluate(() =>{
// window.location.href = https://br.betano.com/live/
// })
await page.waitForTimeout(10000)
await page.reload()
await page.waitForTimeout(10000)
} catch (error) {
console.log(error)
}
}`

The test does not start

SO: Ubuntu 22.04.3 LTS
installed xvfb

I clone the repository, run npm i and node ./src/test and it doesn't work, no console errors. Thank you

Question related Puppeteer-real-browser

  1. FingerPrint
    which one is best ?
    "fingerprint-generator": "^2.1.30",
    "fingerprint-injector": "^2.1.30",
    or
    puppeteer-afp

  2. Extra-plugins
    dose Puppeteer-real-browser need puppeteer-extra & puppeteer-extra-plugin-stealth or its can mask all parameters its self without extra plugins ?

  3. Puppeteer Version and Cluster tool
    my bot using "puppeteer": "^21.5.0" with Puppeteer Cluster Puppeteer-real-browser work fine with any version or need any specific version ?

thank you in advance :)

Cant Open Developer Tools

In the latest version, i cant open developer tools or cant right click -> inspect or open console in a page.

request for new option

hi, your package is really amazing. it is truly acting live real browser but when i saw in whoer.net it shows defferent timezone and language. if this funtion added in next update i thing it will be best package. thanks for creating such a nice package...

Running Puppeteer in Headless Mode for Captcha Solving

When I activate the headless mode, Puppeteer can't solve the capctra. Is there a way to run it in headless mode?

Additionally, the following solution is more accurate. Since we disable Puppeteer's access to Cloudflare, there won't be iframe access, so it would be more appropriate to manipulate the response and communicate with the iframe for a more accurate solution.

const {
  RequestInterceptionManager,
} = require("puppeteer-intercept-and-modify-requests");
const puppeteer = require("puppeteer-core");
const script = `<script>const targetSelector = 'input[type="checkbox"]';
const observer = new MutationObserver((mutationsList) => {
  for (const mutation of mutationsList) {
    if (mutation.type === 'childList') {
      const addedNodes = Array.from(mutation.addedNodes);
      for (const addedNode of addedNodes) {
        const node = addedNode.querySelector(targetSelector);
        if (node) {
          setTimeout(()=>{node.parentElement.click();},1000);
        }
      }
    }
  }
});

const targetElement = document.documentElement;
const observerOptions = {
  childList: true,
  subtree: true,
};
observer.observe(targetElement, observerOptions);</script>`;
function targetFilter(target) {
  if (target._getTargetInfo().type !== "iframe") {
    return true;
  }
  return false;
}
const main = async () => {
  const browser = await puppeteer.launch({
    executablePath:
      "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe",

    targetFilter,
    headless: false,
  });
  const page = await browser.newPage();

  const client = await page.target().createCDPSession();
  const interceptManager = new RequestInterceptionManager(client);
  await interceptManager.intercept({
    urlPattern: `https://challenges.cloudflare.com/*`,
    resourceType: "Document",
    modifyResponse({ body }) {
      return {
               body: body.replaceAll("<head>", "<head>" + script),
      };
    },
  });
  console.log("Connected to browser");
  await page.goto("https://nopecha.com/demo/cloudflare", {
    waitUntil: "domcontentloaded",
  });
  try {
    await page.waitForSelector(".link_row", {
      timeout: 100000,
    });
  } catch (error) {
    console.error(error);
  }
  await page.screenshot({ path: "example.png" });
  await browser.close();
};
main();

Puppeteer-real-browser as headless: "true"

I've been trying to run puppeteer-headless-browser as headless:true" but the browser still opens as headless:false. On headless:auto the browser launches as headless:false.
Below are the connect params i'm passing

connect({
    headless: 'true',
    args: [],
    customConfig: {},
    skipTarget: [],
    fingerprint: true,
    turnstile: true,
    connectOption: {},
    tf: true,
})

How to launch the browser as headless:true.

Not bypass captcha ClouldFlare

Hello everyone!
I've been trying to use Puppeteer with the current browser but can't get bypass the captcha of this particular https://jobnib.com/book/i-am-the-luna-chapter-32
Here is my code.

`
(async () => {
var { connect } = await import('puppeteer-real-browser')
const { page, browser } = await connect({
headless: 'auto',
args: [],
customConfig: {},
skipTarget: [],
fingerprint: false,
turnstile: true,
connectOption: {},
fpconfig: {},
})

try {
    await page.goto('https://jobnib.com/book/i-am-the-luna-chapter-32');
    await sleep(30000)

} catch (error) {
    console.error('Lỗi:', error);
} finally {
    await browser.close();
    pool.end();
    process.exit(1); 
}

})();
`

Thank all

Does this work on headless mode

For me it's not, it got timeout all the time, i guess headless mode can't pass cloudflare, is it just me or it's universal, thanks

send({
    url: domain,            
})
.then(resp=>{
    return res.status(200).json(resp);
})
 
 
const send = ({ url = '', proxy = {} }) => {
    return new Promise(async (resolve, reject) => {
        try {
            var { puppeteerRealBrowser } = await import('puppeteer-real-browser')
            var data = {}
            if (proxy && proxy.host && proxy.host.length > 0) {
                data.proxy = proxy
            }
            data.headless = true;                       👈👈🏻👈🏼👈🏽👈🏾👈🏿 headless here
 
            puppeteerRealBrowser = await puppeteerRealBrowser(data)
            var browser = puppeteerRealBrowser.browser
            var page = puppeteerRealBrowser.page
 
            try {
                var st = setTimeout( async () => {
                    await browser.close();
                    resolve({
                        code: 504,
                        message: 'Time Out'
                    })
 
                }, 55000);
 
                await page.goto(url, { waitUntil: 'domcontentloaded' })
                ......
            }
        }
    }
})

Error: require() of ES Module ./node_modules/puppeteer-real-browser/src/index.js from ./server.ts not supported. Instead change the require of index.js in ./server.ts to a dynamic import() which is available in all CommonJS modules.

When trying to run the application, the following error appears:

Error: require() of ES module /Users/jhonatabonadio/stfnoticias/node_modules/puppeteer-real-browser/src/index.js from /Users/jhonatabonadio/stfnoticias/server.ts not supported.
Instead, change the require from index.js in /Users/jhonatabonadio/stfnoticias/server.ts to a dynamic import() that is available in all CommonJS modules.

server.ts


import dotenv from 'dotenv'
import { connect } from 'puppeteer-real-browser'

dotenv.config()

puppeteer.use(StealthPlugin())

async function consultarProcesso(cpf: string) {
  connect({
    turnstile: true,
    fingerprint: true,
    headless: 'auto',
  }).then(async (response: any) => {
    const { page, browser } = response

    await page.goto('https://nopecha.com/demo/cloudflare')
  })
}

aws lambda

Hi =)
I'll admit right away that I'm a beginner in development
I can't quite figure out if it's possible to use puppeteer-real-browser in aws lambda, or if I'm doing something wrong and I'm getting a lot of errors.

Making a Cloudflare Proxy

Im pretty sure most webscrapers use APIs to Scrape CF Challenge cookies to enable them to make requests to specific servers . My idea is since your project is very powerful and it evolves around the idea of bypassing cloudflare , you could try make webserver that accepts url and proxy as arguments and returns CF Page content , cookies and headers using your project . i think it would be very interesing since most libs now dont work .

Handling Captcha Before doing other Actions

An issue I've encountered involves bypassing the captcha and ensuring it's fully processed before attempting any interactions with the page, such as inputting login credentials. Even with the turnstile: true and fingerprint: true configurations enabled, my script attempts to interact with the page immediately after navigation, before the captcha can be resolved. To work around this, I've had to introduce a fixed delay (waitForTimeout(20000); at the start of the initial goto , but im wondering if it can like wait when it sees the cloudfare captcha , currently it solves the captcha but it just tries to find elements on the screen which are not yet available since the captcha is not finished yet . the issue also is that the cloudfare could popup on different times in the process , so i can't wait 20seconds after each step , is there a way to detect the captcha appearing through puppeteer-real-browser ?

thank you

this is a snippet from my code
`

const {
  browser: browserInstance,
  page,
  setTarget,
} = await connect({
  headless: false,
  fingerprint: true, // Injects a unique fingerprint ID into the page
  turnstile: true, // Automatically clicks on Captchas
  tf: true, // Use targetfilter to avoid detection initially
});

browser = browserInstance; 

setTarget({ status: false });
const page2 = await browser.newPage();
setTarget({ status: true });

// Navigate to the appointment page
await page2.goto(
  "https://xxx",
  { waitUntil: "domcontentloaded" }
);
// Wait for 5 seconds
await page2.waitForTimeout(20000);


// Login
await page2.focus("#username");
await page2.type("#username", "[email protected]");

await page2.focus("#password");
await page2.type("#password", "xxxxx");

`

Adding TypeScript Support.

It will be very nice if we can get typescript support for a nice autocomplete and also because it cant be used in TS with modifying tsconfig file
image

Custom User data directory

Hi everyone!
I try to use puppeteer with existing browser, but my profile does not load properly!
Here is my code

const { page, browser } = await connect({
executablePath: "D:\GPMLogin\gpm_browser\gpm_browser_chromium_core_119\chrome.exe",
headless: 'false',
args: [
'--user-data-dir=D:\GPM\9eea9e17-d2e9-400a-bd0f-249c49a0a7b4-3624',
'--window-size=640,640',
],
customConfig: {},
skipTarget: [],
fingerprint: true,
turnstile: true,
connectOption: {}
})

Thank you so much!

Doesn't work with multiple tabs

I was wondering was is different about this project. I get that it helps to bootstrap the real browser and connect to it.
But what i dont get is why you launch 2 browser, first one that is hidden and the next one that is being used.
Doesn't launching a single browser directly and connect to it have the same effect with much less overhead?

HELP/SUGGESTION: Multi - Threading

Hello how you would deal with multi-threading? There are some good libraries like crawlee and puppeteer-cluster but I can't see the way to integrate them with puppeteer-real-browser

popup / new tab issue

whenever I click on a url that opens a pop up or I "control" + click to open in a new tab
it doesn't open the url but shows about:blank

any idea how to fix this?

Bug with recpatchav2

Hey,

sorry me again, when it detects an recaptchav2 its starts clicking it and then pressing only the first picture.
Maybe you could add an support for capsolver/capmonster to solve this kind of captchas, that would be awesome

Manageable Usage doesn't work

The first method with actual browser opening works fine, but the second method with headless: true doesn't work. I try the example from the documentation and it just opens two browsers, one with two blank pages, the other with one blank page

Multiple Browser issue

when i am running the script twice at same time from different terminals using different userDatadir, it is getting linked to the same chrome...
How to open two different browsers?

CF looping

Greetings,

when using an extension to generate new UA+Headers sometimes cloudflare seems to flag it and then its looping(solving again and again)
Is there a way to make an check so that it only trys 1-3 times to solve cloudflare and afterwards its closes and gives an error?

Update chromium

Current chromium verison: 3.2171.3008 - published: 2016/04/27
Latest chromium version: 3.0.3 - published: 2021/10/18 (latest publish)

Change package.json:

  "dependencies": {
-    "chromium": "^3.0.3",
+    "chromium": "3.0.3",
  }

I got this error on start:

/node_modules/chromium/Chromium.js:2
  if(global.__TINT.Chromium) {
                   ^

TypeError: Cannot read properties of undefined (reading 'Chromium')

Failed to install deps for v1.2.17

When installing puppeteer-real-browser, I got a failed checksum when installing v1.2.17 (v1.2.16 works).

pnpm i [email protected] --ignore-pnpmfile --no-lockfile -w
 WARN  A pnpm-lock.yaml file exists. The current configuration prohibits to read or write a lockfile
 WARN  deprecated @nx/[email protected]: Package no longer supported. Contact Support at https://www.npmjs.com/support for more info.
 WARN  deprecated @types/[email protected]: This is a stub types definition. helmet provides its own type definitions, so you do not need this installed.
 WARN  deprecated @aws-sdk/[email protected]: This package has moved to @smithy/signature-v4
 WARN  deprecated @aws-sdk/[email protected]: This package has moved to @smithy/util-buffer-from
 WARN  deprecated @prisma/[email protected]: Deprecated: @prisma/sdk was an internal package which doesn't follow semver and can include breaking changes without a warning. We renamed it to @prisma/internals to make it clearer.

If you're using this package it would be helpful if you could help us understand where, how, and why you are using it by ginving us feedback in https://github.com/prisma/prisma/discussions/13877). Your feedback will be valuable to us in defining a better API.
 WARN  deprecated [email protected]: The `subscriptions-transport-ws` package is no longer maintained. We recommend you use `graphql-ws` instead. For help migrating Apollo software to `graphql-ws`, see https://www.apollographql.com/docs/apollo-server/data/subscriptions/#switching-from-subscriptions-transport-ws    For general help using `graphql-ws`, see https://github.com/enisdenjo/graphql-ws/blob/master/README.md
 WARN  GET https://github.com/zfcsoftware/puppeteer-afp/tree/master error (ERR_PNPM_TARBALL_EXTRACT). Will retry in 10 seconds. 2 retries left.
 WARN  GET https://github.com/zfcsoftware/puppeteer-afp/tree/master error (ERR_PNPM_TARBALL_EXTRACT). Will retry in 1 minute. 1 retries left.
 ERR_PNPM_TARBALL_EXTRACT  Failed to add tarball from "https://github.com/zfcsoftware/puppeteer-afp/tree/master" to store: Invalid checksum for TAR header at offset 0. Expected NaN, got 42982

This error happened while installing the dependencies of [email protected]
Progress: resolved 4547, reused 4427, downloaded 0, added 0

Not entirely sure of the issue, but noticed v1.2.17 was released three days ago.

Question for viewport

Any specific reason why you have width and height hard coded?

    await page.setViewport({
        width: 1920,
        height: 1080
    });

Using fingerprints is actually worse than using a normal browser?

image

Thanks for the work you are doing. Just trying to help with some testing. Anyway creeper.js doesnt pass.
The fingerprint should try to give realistic numbers. Just inspect the lies detected 😉

For example its bad to just use Math.random() to specify the amount of ram.

`Object.defineProperty(navigator, 'deviceMemory', {get: () => Math.floor(Math.random() * 8) + 4, });`

Better give some realistic values

function randomEl(array) { let ridx = parseInt(array.length * Math.random()); return array[ridx]; }
randomEl([4,8,16,32])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.