Giter Site home page Giter Site logo

od-crawler's Introduction

od-crawler

Build Status

Crawler for open directories.

Open directories are unprotected directories of pics, vids, music, software and otherwise interesting files.

Installation

There are no binaries available for the moment.

The easiest way is to clone this repo and build the project locally with Stack

Install Stack and locally run in the project directory.

stack install

It will build and copy the od-crawler binary to your bin directory.

Usage

./od-crawler-exe -h
Usage: od-crawler-exe TARGET [-p|--profile PROFILE] [-v|--verbose]
                      [-d|--directory DIRECTORY]
  Crawls open directories for interesting links

Available options:
  TARGET                   The target URL or the path to the file containing the
                           target URLs (one per line)
  -p,--profile PROFILE     Profile for allowed extensions (Videos, Pictures,
                           Music, Docs, SubTitles)
  -v,--verbose             Enable verbose mode for debugging purpose
  -d,--directory DIRECTORY The folder where to persist results - only new
                           entries will be shown
  --parallel               Crawl target URLS in parallel
  --monitoring             The monitoring port where metrics are exposed - http://localhost:$port
  -h,--help                Show this help text
  • Examples
./od-crawler-exe http://that.cool.od.site.com

Filter by file extension

./od-crawler-exe http://that.cool.od.site.com -p Videos

The ouput can of course be redirected to a file to enable later search.

./od-crawler-exe http://that.cool.od.site.com -p Videos > results.txt

In order to crawl several URLs at once, a file containing one URL per line can be used as input.

./od-crawler-exe ~/links.txt -p Videos

It is possible to save the result of a run in a folder. Not only will you be able to grep those files later on but the console will only display new items in the next run to save you the diff process.

./od-crawler-exe ~/links.txt -d ~/od-crawler/ -p Videos

For more power, let's crawl URLs in parallel! (max 10 URLs for the time being)

./od-crawler-exe ~/links.txt -d ~/od-crawler/ -p Videos --parallel

A web monitoring interface can be enabled to follow the overall progress and the internal metrics of the crawler.

./od-crawler-exe ~/links.txt -d ~/od-crawler/ -p Videos --parallel --monitoring 8000

Then open http://localhost:8000 to access the EKG console

monitoring example

od-crawler's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.