Giter Site home page Giter Site logo

torbot's Introduction

    

                              ████████╗ ██████╗ ██████╗     ██████╗  ██████╗ ████████╗ 
                              ╚══██╔══╝██╔═══██╗██╔══██╗    ██╔══██╗██╔═████╗╚══██╔══╝ 
                                 ██║   ██║   ██║██████╔╝    ██████╔╝██║██╔██║   ██║ 
                                 ██║   ██║   ██║██╔══██╗    ██╔══██╗████╔╝██║   ██║
                                 ██║   ╚██████╔╝██║  ██║    ██████╔╝╚██████╔╝   ██║ 
                                 ╚═╝    ╚═════╝ ╚═╝  ╚═╝    ╚═════╝  ╚═════╝    ╚═╝ 
                                                            
                            
                                                                              
                                                           `.` `     
                                                       ``.:.--.`     
                                                      .-+++/-`       
                                                     `+sso:`         
                                                  `` /yy+.           
                                                  -+.oho.            
                                                   o../+y            
                                                  -s.-/:y:`          
                                               .:o+-`--::oo/-`       
                                            `/o+:.```---///oss+-     
                                          .+o:.``...`-::-+++++sys-   
                                         :y/```....``--::-yooooosh+  
                                        -h-``--.```..-:-::ssssssssd+ 
                                        h:``:.``....`--:-++hsssyyyym.
                                       .d.`/.``--.```:--//odyyyyyyym/
                                       `d.`+``:.```.--/-+/smyyhhhhhm:
                                        os`./`/````/`-/:+oydhhhhhhdh`
                                        `so.-/-:``./`.//osmddddddmd. 
                                          /s/-/:/.`/..+/ydmdddddmo`
                                           `:oosso/:+/syNmddmdy/. 
                                               `-/++oosyso+/.` 
                            
                            
          ██████╗ ███████╗██████╗ ███████╗██████╗  ██████╗    ██╗███╗   ██╗███████╗██╗██████╗ ███████╗
          ██╔══██╗██╔════╝██╔══██╗██╔════╝╚════██╗██╔════╝    ██║████╗  ██║██╔════╝██║██╔══██╗██╔════╝
          ██║  ██║█████╗  ██║  ██║███████╗ █████╔╝██║         ██║██╔██╗ ██║███████╗██║██║  ██║█████╗ 
          ██║  ██║██╔══╝  ██║  ██║╚════██║ ╚═══██╗██║         ██║██║╚██╗██║╚════██║██║██║  ██║██╔══╝ 
          ██████╔╝███████╗██████╔╝███████║██████╔╝╚██████╗    ██║██║ ╚████║███████║██║██████╔╝███████╗
          ╚═════╝ ╚══════╝╚═════╝ ╚══════╝╚═════╝  ╚═════╝    ╚═╝╚═╝  ╚═══╝╚══════╝╚═╝╚═════╝ ╚══════╝
                                                                                            


OSINT tool for Deep and Dark Web.

Build Status

Working Procedure/Basic Plan

The basic procedure executed by the web crawling algorithm takes a list of seed URLs as its input and repeatedly executes the following steps:

  1. Remove a URL from the URL list.
  2. Check existence of the page.
  3. Download the corresponding page.
  4. Check the Relevancy of the page.
  5. Extract any links contained in it.
  6. Check the cache if the links are already in it.
  7. Add the unique links back to the URL list.
  8. After all URLs are processed, return the most relevant page.

Features

  1. Onion Crawler (.onion).(Completed)
  2. Returns Page title and address with a short description about the site.(Partially Completed)
  3. Save links to database.(Not Started)
  4. Get emails from site.(Completed)
  5. Save crawl info to JSON file.(Completed)
  6. Crawl custom domains.(Completed)
  7. Check if the link is live.(Completed)
  8. Built-in Updater.(Completed) ...(will be updated)

Contribute

Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory and imported to the main file. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, Feature_FasterCrawl_1.0. Contributor name will be updated to the below list. :D

Dependencies

  1. Tor
  2. Python 3.x (Make sure pip3 is installed)
  3. requests
  4. Beautiful Soup 4
  5. Socket
  6. Sock
  7. Argparse
  8. Git
  9. termcolor
  10. tldextract

Basic setup

Before you run the torBot make sure the following things are done properly:

  • Run tor service sudo service tor start

  • Make sure that your torrc is configured to SOCKS_PORT localhost:9050

python3 torBot.py or use the -h/--help argument

`usage: torBot.py [-h] [-v] [--update] [-q] [-u URL] [-s] [-m] [-e EXTENSION]
                 [-l] [-i]

optional arguments:
  -h, --help            Show this help message and exit
  -v, --version         Show current version of TorBot.
  --update              Update TorBot to the latest stable version
  -q, --quiet           Prevent header from displaying
  -u URL, --url URL     Specifiy a website link to crawl, currently returns links on that page
  -s, --save            Save results to a file in json format
  -m, --mail            Get e-mail addresses from the crawled sites
  -e EXTENSION, --extension EXTENSION
                        Specifiy additional website extensions to the
                        list(.com or .org etc)
  -l, --live            Check if websites are live or not (slow)
  -i, --info            Info displays basic info of the scanned site (very
                        slow)` 
  • NOTE: All flags under -u URL, --url URL must also be passed a -u flag.

Read more about torrc here : Torrc

TO-DO

  • Implement A* Search for webcrawler
  • Multithreading
  • Optimization
  • Randomize Tor Connection (Random Header and Identity)

Have ideas?

If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST]. If the idea is worth implementing, congratz you are now a contributor.

License

GNU Public License

CREDITS

torbot's People

Contributors

psnappz avatar kingakeem avatar agrepravin avatar mamarto avatar y-mehta avatar mossbanay avatar

Watchers

kzwkt avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.