Giter Site home page Giter Site logo

ivan-sincek / file-scraper Goto Github PK

View Code? Open in Web Editor NEW
5.0 1.0 1.0 537 KB

Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.

License: MIT License

Python 100.00%
bug-bounty ethical-hacking mobile-penetration-testing offensive-security penetration-testing python radare2 red-team-engagement scraping security

file-scraper's Introduction

File Scraper

Scrape files for sensitive information, and generate an interactive HTML report. Based on Rabin2.

Customize the tool to your liking!

Tested on Kali Linux v2023.4 (64-bit).

Made for educational purposes. I hope it will help!

Table of Contents

How to Install

Install Radare2

On Kali Linux, run:

apt-get -y install radare2

On Windows OS, download and unpack radareorg/radare2, then, add the bin directory to Windows PATH environment variable.


On macOS, run:

brew install radare2

Standard Install

pip3 install --upgrade file-scraper

Build and Install From the Source

git clone https://github.com/ivan-sincek/file-scraper && cd file-scraper

python3 -m pip install --upgrade build

python3 -m build

python3 -m pip install dist/file_scraper-2.9-py3-none-any.whl

Build the Template & Run

Prepare a template:

{
   "authorization":{
      "query":"[^\\w\\d\\n]+(?:basic|bearer)\\ .+",
      "ignorecase":true,
      "search":true
   },
   "variable":{
      "query":"(?:access|account|admin|basic|bearer|card|conf|cred|customer|email|history|id|info|jwt|key|kyc|log|otp|pass|pin|priv|refresh|salt|secret|seed|setting|sign|token|transaction|transfer|user)[\\w\\d]*(?:\\\"\\ *\\:|\\ *\\=).+",
      "ignorecase":true,
      "search":true
   },
   "comment":{
      "query":"[^\\w\\d\\n]+(?:bug|comment|fix|issue|note|problem|to(?:\\_|\\ |)do|work)[^\\w\\d\\n]+.+",
      "ignorecase":true,
      "search":true
   },
   "url":{
      "query":"\\w+\\:\\/\\/[\\w\\-\\.\\@\\:\\/\\?\\=\\%\\&\\#]+",
      "unique":true,
      "collect":true
   },
   "ip":{
      "query":"(?:\b25[0-5]|\b2[0-4][0-9]|\b[01]?[0-9][0-9]?)(?:\\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}",
      "unique":true,
      "collect":true
   },
   "base64":{
      "query":"(?:[a-zA-Z0-9\\+\\/]{4})*(?:[a-zA-Z0-9\\+\\/]{4}|[a-zA-Z0-9\\+\\/]{3}\\=|[a-zA-Z0-9\\+\\/]{2}\\=\\=)",
      "minimum":8,
      "decode":"base64",
      "unique":true,
      "collect":true
   },
   "hex":{
      "query":"(?:(?:0x|(?:\\\\)+x)[a-fA-F0-9]{2})+|[a-fA-F0-9]+",
      "minimum":12,
      "decode":"hex",
      "unique":true,
      "collect":true
   },
   "cert":{
      "query":"-----BEGIN (?:CERTIFICATE|PRIVATE KEY)-----[\\s\\S]+?-----END (?:CERTIFICATE|PRIVATE KEY)-----",
      "decode":"cert",
      "unique":true,
      "collect":true
   }
}

Make sure your regular expressions return only one capturing group e.g. [1, 2, 3, 4]; and not a touple e.g. [(1, 2), (3, 4)].

Make sure to properly escape regular expression specific symbols in your template file, e.g. make sure to escape dot . as \\., and forward slash / as \\/, etc.

Name Type Required Description
query text yes Regular expression query.
search boolean no Highlight matches within output; otherwise, extract matches.
ignorecase boolean no Case-insensitive search.
minimum integer no Show only matches longer than int characters.
maximum integer no Show only matches lesser than int characters.
decode boolean no Decode matches. Available decodings: url, base64 hex, cert.
unique boolean no Filter out duplicates.
collect boolean no Collect all matches in one place.

How I run the tool most of the time:

file-scraper -dir directory -o results.html -e default

Default (built-in) exclude file types are as following:

car, css, gif, jpeg, jpg, mp3, mp4, nib, ogg, otf, png, storyboard, strings, svg, ttf, webp, woff, woff2, xib

Usage

File Scraper v2.9 ( github.com/ivan-sincek/file-scraper )

Usage:   file-scraper -dir directory -o out          [-t template     ] [-e excludes    ] [-th threads]
Example: file-scraper -dir decoded   -o results.html [-t template.json] [-e jpeg,jpg,png] [-th 10     ]

DESCRIPTION
    Scrape files for sensitive information
DIRECTORY
    Directory containing files, or a single file to scrape
    -dir, --directory> = decoded | files | test.exe | etc.
TEMPLATE
    JSON template file with extraction information, or a single RegEx to use
    Default: built-in JSON template file
    -t, --template = template.json | "secret\: [\w\d]+" | etc.
EXCLUDES
    Exclude all files that end with
    Use comma-separated values
    Specify 'default' to load the built-in list
    -e, --excludes = mp3 | default,jpeg,jpg,png | etc.
INCLUDES
    Include all files that end with
    Use comma-separated values
    Overrides excludes
    -i, --includes = java | json,xml,yaml | etc.
BEAUTIFY
    Beautify [minified] JavaScript (.js) files
    -b, --beautify
THREADS
    Number of parallel threads to run
    Default: 30
    -th, --threads = 10 | etc.
OUT
    Output HTML file
    -o, --out = results.html | etc.

Images

Interactive Report

Figure 1 - Interactive Report

Certificates

Figure 2 - Certificates

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.