Giter Site home page Giter Site logo

poa00 / proxy_web_crawler Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rootviii/proxy_web_crawler

1.0 0.0 0.0 6.61 MB

Automates the process of repeatedly searching for a website via scraped proxy IP and search keywords

License: MIT License

Python 100.00%

proxy_web_crawler's Introduction

Search for a website with a different proxy each time

This script automates the process of searching for a website via keyword and the DuckDuckGo search engine.... page after page

Pass a complete URL and at least 1 keyword as command line arguments to run program:
python proxy_crawler.py -u <url> -k <keyword(s)>
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip"

Add the -x option to run headless (no GUI):
python proxy_crawler.py -u "https://www.whatsmyip.org" -k "my ip" -x

  • A list of proxies from the web are scraped first using sslproxies.org
  • Then using a new proxy socket for each iteration, the specified keyword(s) is searched for until the desired website is found
  • The website is then visited, and one random link is clicked within the website
  • The bot is slowed down on purpose, but will also run fairly slow due to proxy connection
  • Browser windows may open and close repeatedly during runtime (due to connection errors) until a healthy/valid proxy is encountered

  • Requirements:
    • python3
    • selenium
    • Firefox browser
    • geckodriver
  • Download the latest geckodriver from Mozilla
  • Unzip the file and place geckodriver into your path
  • Ensure selenium is installed: pip install -r requirements.txt

screenshot1



screenshot2



screenshot3


Author: rootVIII 2018-2023

proxy_web_crawler's People

Contributors

rootviii avatar

Stargazers

poa00 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.