Giter Site home page Giter Site logo

undupify's Introduction

TL;DR

Undupify allows to get rid of most of irrelevant and identical-in-behavior URLs in a file. Undupify incorporates itself really well in a hacking workflow where you would want to apply an additional layer of filtering to your URLs before sending them to a deep time-consuming vulnerability scan.


Demo

Mask group


Description

When searching vulnerabilities at scale, it is a very frequent practice to retrieve all URLs associated to a company, with tools such as waybackurls or gau, and then perform query-parameters-based filtering, looking for XSS, SQLi, SSRF, etc.

In this context, even though retrieved URLs have been processed by a first layer of filtering, a bunch of URLs would stil remain, and lots of them would be completely irrelevant by basically consisting of a subtle variations of others. Even though they would have some different path names or different parameters’s value, they would be processed by the exact same back-end function. When this happens, we of course don’t want to deal with them multiple times, as they would basically have the same behavior against fuzzing.

This is where Undupify becomes useful : based on heuristics, it attempts to efficiently distinguish which URLs are duplicates of others, and remove them.

To detect whether an analyzed URL is duplicate or unique, the tool currently relies on the following heuristics :

  • Heuristic 1 - If the analyzed URL has a hostname & port that have never been seen on previous URLS, then it should NOT be considered duplicate but unique.
  • Heuristic 2 - If the analyzed URL has the exact same paths and parameters, but not necessarily same parameters’ values, as a previously seen URL, then it should be considered duplicate.
  • Heuristic 3 - If the analyzed URL has the exact same content between its two first path, delimited by /, and the same parameters, as a previously seen URL, then it should be considered duplicate.

Usage

python3 undupify.py -h

This displays help for the tool.

usage: undupify.py [-h] [--file FILE] [--output]

options:
  -h, --help            show this help message and exit
  --file FILE, -f FILE  file containing all URLs to clean
  --output, -o          output file path

Basic use:

python3 undupify.py -f URLs_to_filter.txt

Installation

1 - Clone

git clone https://github.com/Th0h0/undupify.git

2 - Install requirements

cd undupify
pip install regex

License

Undupify is distributed under MIT License.

undupify's People

Contributors

th0h0 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.