Giter Site home page Giter Site logo

bijanx / duplicut Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nil0x42/duplicut

0.0 0.0 0.0 1.07 MB

Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)

License: GNU General Public License v3.0

Makefile 0.74% C 95.91% Shell 2.60% Python 0.75%

duplicut's Introduction

Duplicut โœ‚๏ธ

Quickly dedupe massive wordlists, without changing the order tweet


travis build Mentioned in awesome-pentest

Created by nil0x42 and contributors


๐Ÿ“– Overview

Modern password wordlist creation usually implies concatenating multiple data sources.

Ideally, most probable passwords should stand at start of the wordlist, so most common passwords are cracked instantly.

With existing dedupe tools you are forced to choose if you prefer to preserve the order OR handle massive wordlists.

Unfortunately, wordlist creation requires both:

So i wrote duplicut in highly optimized C to address this very specific need ๐Ÿค“ ๐Ÿ’ป


๐Ÿ’ก Quick start

git clone https://github.com/nil0x42/duplicut
cd duplicut/ && make
./duplicut wordlist.txt -o clean-wordlist.txt

๐Ÿ”ง Options

  • Features:

    • Handle massive wordlists, even those whose size exceeds available RAM
    • Filter lines by max length (-l option)
    • Can remove lines containing non-printable ASCII chars (-p option)
    • Press any key to show program status at runtime.
  • Implementation:

    • Written in pure C code, designed to be fast
    • Compressed hashmap items on 64 bit platforms
    • Multithreading support
    • [TODO]: Use huge memory pages to increase performance
  • Limitations:

    • Any line longer than 255 chars is ignored
    • Heavily tested on Linux x64, mostly untested on other platforms.

๐Ÿ“– Technical Details

๐Ÿ”ธ 1- Memory optimized:

An uint64 is enough to index lines in hashmap, by packing size info within pointer's extra bits:

๐Ÿ”ธ 2- Massive file handling:

If whole file can't fit in memory, it is split into virtual chunks, then each one is tested against next chunks.

So complexity is equal to th triangle number:

๐Ÿ’ก Throubleshotting

If you find a bug, or something doesn't work as expected, please compile duplicut in debug mode and post an issue with attached output:

# debug level can be from 1 to 4
make debug level=1
./duplicut [OPTIONS] 2>&1 | tee /tmp/duplicut-debug.log

duplicut's People

Contributors

bijanx avatar imgbotapp avatar nil0x42 avatar solardiz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.