Giter Site home page Giter Site logo

random_proxy's Introduction

proxy_random

https://img.shields.io/pypi/v/proxy-random.svg?version=latest Documentation Status

Proxy Random is a tool to help small web scrapers. helping them prevent getting their ip banned from the target site.

default proxy lists are free-proxy-list.net and sslproxies.org but you can also add your desired proxy list websites and add a extractor for that website.

Installation

$ pip install proxy-random

Documentation

https://proxy-random.readthedocs.io/

Usage

here are some examples on how to use proxy-random.

Example 1:

import requests

from proxy_random import RandomProxy

# if you want to use a proxy to load the proxy lists
# use this instead(rp = RandomProxy(proxy='http://example.com:8080'))
rp = RandomProxy()
proxies = rp.extract_proxies() # you can also use rp.proxy_query.

# filter the US proxies which use port 80 or 443 and check if they work
# you can filter by multiple parameters at once or provide your own filter function(s)
workings = proxies.filter(port=[80, 443]) \
            .check_health(timeout=5).filter(working=True)

print(workings.random().url) # print a random working proxy

# or iterate through proxies and use them
for proxy in workings:
    # do something with the proxy
    requests.get("https://httpbin.org/ip", proxies={"http": proxy.url, "https": proxy.url})

ProxyQuery(s) are reusable so you can filter them as many times as needed.

here is another example of how to add custom providers

Example 2:

import requests
from bs4 import BeautifulSoup

from proxy_random import RandomProxy
from proxy_random.provider import Provider
from proxy_random.query import ProxyQuery
from proxy_random.proxy import Proxy

# you can also use RandomProxy(use_defaults=False) to disable default providers
rp = RandomProxy()
# add a custom provider

url = "https://free-proxy-list.net" # the url of the proxy list

# the function used to extract proxies from the url response
def extract_proxies(response: str) -> ProxyQuery:
    soup = BeautifulSoup(response, "html.parser")

    headings = [i.text.lower() for i in soup.find("thead").find_all("th")]

    rows = [[j.text for j in i] for i in soup.find("tbody").find_all("tr")]

    proxies = []
    for row in rows:
        proxy = Proxy()
        for i, name in enumerate(headings):
            if name == "ip address":
                proxy.ip = row[i]

            elif name == "port":
                proxy.port = int(row[i])

            elif name == "code":
                proxy.country_code = row[i]

            elif name == "last checked":
                proxy.last_checked = row[i]

            elif name in ("google", "https"):
                setattr(proxy, name, True if row[i] == "yes" else False)

            elif name in ("country", "anonymity"):
                setattr(proxy, name, row[i])

        proxies.append(proxy)

    return ProxyQuery(proxies)

# then create a new instance of the Provider class
provider = Provider(url=url, extractor=extract_proxies)
# then add the provider to the RandomProxy instance
rp.add_provider(provider)

# then extract the proxies like example 1
rp.extract_proxies()
...

My own usage of this package:

import requests

from proxy_random import RandomProxy

rp = RandomProxy(proxy="my proxy")
proxies = rp.extract_proxies()

workings = proxies.filter(custom_filters=[lambda x: x.country_code != "ir",]) \
            .limit(50).check_health(timeout=5).filter(working=True)


proxy = workings.random()

# use the proxy in some way
...

Refer to the documentation for more information about these classes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.