Giter Site home page Giter Site logo

scrapegoat's Introduction

Tool name - scrapegoat Team Members - Ryan, Matticus

Member links:

Matticus:

Ryan:

Tool Description:

scrapegoat.py takes in a json formatted file from the snscrape twitter-user api, individualizes the json objects, and presents links embedded within the 'content' section for review by the researcher

Installation(tested on Pop_Os linux):

  • [] Ensure that git is installed
  • [] Ensure that python3.8 or higher is installed
  • [] Ensure that pip is installed
  • [] sudo apt install python3-pip
  • [] Ensure snscrape is installed
  • [] sudo pip3 install snscrape
  • [] Create a directory for snscrape-output
  • [] mkdir ~/snscrape-output
  • [] cd ~/snscrape-output
  • [] Query the twitter-user api with the --jsonl flag and redirect the output to a file
  • [] snsscrape --jsonl twitter-user foo > ~/snscrape-output/foo.out
  • [] Clone the repository for scrapegoat
  • [] git clone https://github.com/SecurityPlz/scrapegoat.git
  • [] Change directory to the tools new home
  • [] cd ~/scrapegoat/scrapegoat
  • [] Feed our query from step 6 into scrapegoat
  • [] python3 ./scrapegoat.py < ~/snscrape-output/foo.out
  • [] Repeat steps 6-9 as perscribed by your hackers intuition

Usage

The MVP version of this tool (v1) can only be used (as configured) to do exactly what is described above in the install/practical example above. Stay tuned for more and better usage capabilities

Additional Information

Next steps: Finish parsing links from output and implement tldextract for domain matching and frequency analysis
At this stage in development, the tool is only presenting links from the dataset. We would like the tool to tell the researcher about the links, and to enable the tool to be looking at other pieces of data, and to inform the researcher on those as well.
The purpose that we have for this project was to make a simple and easy to use data processing tool for the output of an api request, the scope was picked according to our skill level, and to what we thought would be useful; and MVP acheivable in a weekend.

Acknowledgments Thank you to Bellingcat for hosting the hackathon that gave birth to this, and other useful and fun tools. Thank you to my teamate Ryan for stepping outside of his comfort zone to assist with the creation of this tool, and thank you to the discerning user who provides feedback, issues, and pull requests on our humble tool.

scrapegoat's People

Contributors

securityplz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.