Giter Site home page Giter Site logo

d-rickyy-b / pastepwn Goto Github PK

View Code? Open in Web Editor NEW
121.0 7.0 66.0 963 KB

Python framework to scrape Pastebin pastes and analyze them

License: MIT License

Python 99.74% Dockerfile 0.18% Shell 0.08%
pastebin scraping osint analyzing python framework hacktoberfest

pastepwn's Introduction

Logo

pastepwn - Paste-Scraping Python Framework

Build Status PyPI version Coverage Status Codacy Badge

Pastebin is a very helpful tool to store or rather share ascii encoded data online. In the world of OSINT, pastebin is being used by researchers all around the world to retrieve e.g. leaked account data, in order to find indicators about security breaches.

Pastepwn is a framework to scrape pastes and scan them for certain indicators. There are several analyzers and actions to be used out-of-the-box, but it is also easily extensible - you can create your own analyzers and actions on the fly.

Please note: This framework is not to be used for illegal actions. It can be used for querying public Pastebin pastes for e.g. your username or email address in order to increase your own security.

⚠️ Important note

In April 2020 Pastebin disabled access to their scraping API for a short period of time. At first people weren't able to access the scraping API in any way, but later on they re-enabled access to the API setup page. But since then it isn't possible to scrape "text" pastes. Only pastes with any kind of syntax set. That reduces the amount of pastes to a minimum, which reduced the usefulness of this tool.

Setting up pastepwn

To use the pastepwn framework you need to follow these simple steps:

  1. Make sure to have a Pastebin premium account!
  2. Install pastepwn via pip (pip3 install pastepwn
  3. Create a file (e.g. main.py) in your project root, where you put your code in²
  4. Fill that file with content - add analyzers and actions. Check the example implementation.

¹ Note that pastepwn only works with python3.6 or above
² (If you want to store all pastes, make sure to set up a mongodb, mysql or sqlite instance)

Behind a proxy

There are 2 ways to use this tool behind a proxy:

  • Define the following environment variables: HTTP_PROXY, HTTPS_PROXY, NO_PROXY.
  • When initializing the PastePwn object, use the proxies argument. proxies is a dict as defined in requests' documentation.

Troubleshooting

If you are having troubles, check out the wiki pages first. If your question/issue is not resolved there, feel free to create an issue or contact me on Telegram.

Roadmap and ToDos

Check the bug tracker on GitHub to get an up-to-date status about features and ToDos.

  • REST API for querying paste data (will be another project)
  • Add a helpful wiki with instructions and examples

pastepwn's People

Contributors

acefire6 avatar bajubullet avatar brandonlbarrow avatar d-rickyy-b avatar daruudii avatar dependabot-preview[bot] avatar dependabot[bot] avatar double-a-92 avatar gmassacc avatar grimmjow8 avatar ideneal avatar jmmille avatar martin-bucinskas avatar michalpirchala avatar mike-k0 avatar mxrk avatar plodocus avatar psidex avatar qkniep avatar r4h33m avatar raymonshansen avatar robotboyfriend avatar s3cpat avatar samyak2 avatar stephensorriaux avatar synackray avatar wescran avatar yellowfoxh4xor avatar zeroji avatar zrocket avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

pastepwn's Issues

Twitter Action

Similar to the good old dumpmon on Twitter it would be lovely to have a Twitter action which sends found pastes to twitter.

To add a bit more context:

To solve this issue there needs to be a new action added in the actions directory. The action must follow the example of the other actions in this directory. Don't forget to add the action to the action package file.

The most simple example for such an action is the GenericAction which executes a passed function.

Tests are not necessary but highly appreciated. If there are questions, don't hesitate to contact me.

Refactor templating for messages

Currently each action itself needs to implement the templating. This should be moved to a paste class. Best would be to put it into some kind of abstract base class.

I don't think that this issue is suitable for hacktoberfest. If you feel different, you can create pull requests for it. But I can't promise to merge them, because I have very exact conceptions about how this should work.

Email and password pair analyzer

There should be a new analyzer to find pastes which contain email/password pairs.

To solve this issue there needs to be a new analyzer added in the analyzers directory. The analyzer must follow the example of the other analyzers in this directory. Don't forget to add the analyzer to the analyzer package file.

The most simple example for such an analyzer is the AlwaysTrueAnalyzer which always returns True.

This analyzer should match email/password pairs. Check out this example paste. Best would be to subclass the regex analyzer.

Tests are not necessary but highly appreciated. If there are questions, don't hesitate to contact me.

API for accessing data from the server

Since the Pastebin API only allows for one IP to be entered there should be an pastepwn API which allows for searching through pastes or getting statistics. Also it would be great if the application can be controlled via the API (e.g. adding new analyzers on the fly).

"Request" class singleton

The request class should be made a singleton in order to be able to set proxy settings only once.

This must be done before merging #33

URL Analyzer

Additionally to that a subclass "PastebinURLAnalyzer" would be great to match pastebin urls inside pastes.

Add PastebinScraper by default, since it's currently the only scraper

When the scraping_handler.scrapers list is empty and the start method (see below) is being called, a new instance of PastebinScraper should be initialized and added via self.add_scraper(PastebinScraper) in line 97.

def start(self):
"""Starts the pastepwn instance"""
if self.__exception_event.is_set():
self.logger.error("An exception occured. Aborting the start of PastePwn!")
exit(1)
self.scraping_handler.start()
self.paste_dispatcher.start()
self.action_handler.start()

Improve analyzer identifier

There needs to be some way to store an identifyer for a certain analyzer to know why a paste was matched.

Do not save paste if it was deleted already

If the paste's body is Error, we cannot find this paste., it means that the paste was already deleted, before we could download it.

In that case we should not store, nor process it!

Create unit tests

There are no tests yet. The code could have several bugs which are not found due to missing tests.

MISP Action

Would be nice to be able to send alerts to MISP.

Fix Action imports

As just done for the analyzers the actions should import the local files (from .basicAction import BasicAction) to prevent issues regarding import loops/conflicts.

Error / Exception handler

Users might want to execute an action when the application crashes. Add proper support for an exception handler.

Database Dump Analyzer

There should be a new analyzer to find pastes which contain a database dump.

To solve this issue there needs to be a new analyzer added in the analyzers directory. The analyzer must follow the example of the other analyzers in this directory. Don't forget to add the analyzer to the analyzer package file.

The most simple example for such an analyzer is the AlwaysTrueAnalyzer which always returns True.

This analyzer should match different kinds of database dumps. There are different formats for database dumps. It's up to you how many of those you want to implement. On haveibeenpwned.com there is a nice explanation on database dumps in pastes which should serve as a source for ideas.

Tests are not necessary but highly appreciated. If there are questions, don't hesitate to contact me.

Wrapper around various hash analyzers

Users might not want to calculate their password hashes beforehand and thus might want to be able to use a wrapper (analyzer) around various hash analyzers. They initialize that wrapper with their password in the clear. The wrapper generates the hashes on the fly and checks each paste against those hashes.

I am not sure if that's a good idea but I'll leave this here for now.

IBAN Analyzer

There should be a way to analyze for IBANs.

The analyzer should inherit from the BasicAnalyzer or RegexAnalyzer class and should implement a method match which returns True if the paste.body contains a IBAN.

The constructor of the IBANAnalyzer should contain a parameter validate=False which can be set in order to validate the found IBAN.

Pastes missing content

Description
Some pastes fetched from pastebin lack the paste's body - it will only contain the following text: File is not ready for scraping yet. Try again in 1 minute.

If this happens, the paste must be added back to the scraping queue and be scraped again.

E-Mail action

There should be a possibility to send emails on certain pastes.

Add Travis

Automatic tests and deployment are very nice

  • Testing
  • PyPi module build
  • New GitHub Releases

ActionHandler

There needs to be another thread handling the triggered actions from the analyzers

On start method

There are users which would like to perform an action when pastepwn is fully initialized and running. There should be a way to register a handler for that.

Create docker-compose.yml file

Currently there is no (working) docker image available. The goal is to have a docker image + a docker-compose file which automatically starts a mongodb & pastepwn or a mysql & pastepwn or sqlite & pastepwn.

For that we need to read environment variables.

Syslog Action

Implement an action which sends matched pastes to syslog.

Hash Analyzer

Add a Hash Analyzer which matches certain password hashes. Can inherit from RegexAnalyzer

RegexAnalyzers should return what they find, instead of just indicating whether or not they found it.

I'm not sure whether this idea is compatible with your current idea of how this tool should be used, but it strikes me as odd that the RegexAnalyzers simply report whether or not a match was found, rather than returning all the data they were able to match.

For the PastebinURLAnalyzer I just made a pull request for, for example, I imagine it might be useful if you could feed it a number of pastes, and it could create some sort of dictionary which mapped the paste it had found a match on with a list of all URLs it was able to find.

That way, I could say "check out these 200 pastes and show me all the emails, pastebin urls, and bcrypt password hashes you find."

Just an idea, of course. Would probably require a bit of a redesign of the way the analyzers are used, but would require minimal changes to the actual analyzers.

Phone number analyzer

There should be a new analyzer to find pastes which contain phone numbers.

To solve this issue there needs to be a new analyzer added in the analyzers directory. The analyzer must follow the example of the other analyzers in this directory. Don't forget to add the analyzer to the analyzer package file.

The most simple example for such an analyzer is the AlwaysTrueAnalyzer which always returns True.

This analyzer should match phone numbers in the international format. Best would be to subclass the regex analyzer and create a regex to detect phone numbers. Make sure to also detect whitespace separated country codes with \s*. Also please check if the country codes and lengths are valid for a phone number.

Tests are not necessary but highly appreciated. If there are questions, don't hesitate to contact me.

Database imports

  1. There are missing imports in the requirements.txt

  2. Currently when using pastepwn a user needs to download ALL the db connector packages (mongo & mysql currently). Depending on the amount of db connectors which at some point will be supported by pastepwn, it might be stupid to have them all in the same package.


Possible solution(s):

  1. try-except the error and return an error message. This is not elegant but allows everything to be bundeled in one package.
  2. move the database connectors to a different package. That way it's way cleaner but users need to install multiple packages.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.