Giter Site home page Giter Site logo

oxylabs / selenium-bypass-captcha Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 1.0 716 KB

See how to easily bypass CAPTCHA tests using Selenium in Python.

Home Page: https://oxylabs.io/blog/selenium-bypass-captcha

Python 100.00%
bypass-captcha captcha captcha-bypass python selenium selenium-python web-scraping

selenium-bypass-captcha's Introduction

How to Bypass CAPTCHA With Selenium & Python

mubeng

In this tutorial, you’ll learn how to handle CAPTCHA tests in Selenium and Python using undetected-chromedriver and Oxylabs’ Web Unblocker. See the full blog post for more details and tips.

Bypass CAPTCHA with Selenium and Python

The first step is to install Python if you haven't installed it already. You can download it from the official website. Download the latest version or a version greater than 3.6; otherwise, undetected-chromedriver won’t work properly.

Step 1 - Install dependencies

Install the undetected-chromedriver and requests module. You can use the pip command given below:

pip install undetected-chromedriver requests

Step 2 - Import libraries

Now that you’ve installed undetected-chromedriver, you can import it as shown below:

import undetected_chromedriver as webdriver

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--use_subprocess")

browser = webdriver.Chrome(options=chrome_options)

Notice, you’ve also created a browser instance. This will open a Chrome window in the background in headless mode.

Step 3 - Navigate to webpage

Use the browser instance to navigate to your target website. For this tutorial, let’s use https://sandbox.oxylabs.io/products as the target.

browser.get("https://sandbox.oxylabs.io/products")

Step 4 - Take a screenshot

Take a screenshot to verify the page is loaded properly without showing any CAPTCHA or bot protection screen. You can use the save_screenshot method of Selenium.

browser.save_screenshot("screenshot.png")

Your screenshot might vary slightly due to screen size, but it’ll look similar to the one given below:

Screenshot

The page has loaded properly without showing any CAPTCHA and the undetected-chromedriver has rendered the Javascript files.

Bypass CAPTCHA with Web Unblocker

To perform large-scale web scraping while bypassing CAPTCHA, you’ll need a strong tool. Web Unblocker, an AI–powered proxy solution for bypassing IP blocks and CAPTCHAs, will automatically rotate proxies for you, so you don’t have to worry about manually managing a list of proxies for your bots.

Step 1 - Import libraries

Let’s use the requests module to set up Web Unblocker.

import requests

Step 2 - Get Web Unblocker credentials

Create an account to get the Web Unblocker credentials. Within a few clicks, you can sign up and get a 1-week free trial to develop and thoroughly test the solution.

Step 3 - Prepare Web Unblocker

Web Unblocker’s host and port are unblock.oxylabs.io and 60000 respectively. Additionally, don’t forget to replace the USERNAME and PASSWORD with the correct credentials.

proxy = 'http://{}:{}@unblock.oxylabs.io:60000'.format("USERNAME", "PASSWORD")

proxies = {
    'http': proxy,
    'https': proxy
}

If you get any authentication-related errors in the later steps, don’t forget to check the Web Unblocker response codes here.

Step 4 - Fetch content

Now, you can use the proxies dict you created with the get method of the requests module. Web Unblocker also requires you to pass an extra parameter, verify=False, to the get method.

page = "https://sandbox.oxylabs.io/products"
response = requests.get(page, proxies=proxies, verify=False)

print(response.status_code)
content = response.content

You should see the status code 200 if everything works as expected. The content of the page will be stored in the content object, which you can process later with HTML Parser libraries such as Beautiful Soup or parse using the Custom Parser. Web Unblocker also renders JavaScript for you, so you can use this method for dynamic websites as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.