Giter Site home page Giter Site logo

matsura8 / blueleaks-explorer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from micahflee/blueleaks-explorer

0.0 0.0 0.0 10.08 MB

open source software for journalists to investigate all of the data in the BlueLeaks dataset

License: GNU Affero General Public License v3.0

JavaScript 1.39% Python 33.92% HTML 3.59% Vue 58.74% Dockerfile 2.37%

blueleaks-explorer's Introduction

BlueLeaks Explorer

BlueLeaks Explorer is open source software for journalists to investigate all of the data in the BlueLeaks dataset. You must have a copy of the BlueLeaks dataset (250GB to download, 271GB once extracted) in order to use it.

For in-depth instructions, read Chapter 10 of Hacks, Leaks, and Revelations.

Screenshot of BlueLeaks Explorer

How it Works

The BlueLeaks dataset contains hundreds of folders with names like ncric (Northern California Regional Intelligence Center), arictexas (Austin Regional Intelligence Center), and memiac (Maine Information Analysis Center). Each of these folders includes data from a different hacked law enforcement website.

These websites use Microsoft Access databases, and the data from these databases is available in hundreds of CSV files with names like EmailBuilder.csv (all of the bulk emails sent by the website), Registrations.csv (details about everyone who has an account on the website), and SARs.csv ("suspicious activity reports").

BlueLeaks Explorer is a tool that lets you visualize and search the data in these CSV files. Each BlueLeaks site is different--it has a different structure, with different tables that are related to each other in different ways. Exactly how a BlueLeaks site should be laid out is called its structure, and you can use BlueLeaks Explorer to define it.

For example, for each table you choose which fields are interesting and should be displayed, and you can hide the rest. You can define the type of each field. If a field represents a path to a document in BlueLeaks, you can make it link directly to that document. If it's a path to an image in the BlueLeaks data, you can make it display the image. If it's includes HTML, you can render the HTML. You can also define relationships between tables--you can make it so when you view a row in DocumentCategory, it displays all of the Document rows in that categy. And when you view a Docuument it displays the actual DocumentCategory that it's associated with instead of just a DocumentCategoryID.

Getting Started

To run BlueLeaks Explorer on your computer, you need Docker installed.

Create Your docker-compose.yaml

Create a new folder for your BlueLeaks Explorer data, which will take up about 5GB of data. Create a file in that folder called docker-compose.yaml, and copy and paste this into it:

version: "3.9"
    
services:
  app:
    image: micahflee/blueleaks-explorer:latest
    ports:
      - "8000:80"
    volumes:
      - /Volumes/datasets/BlueLeaks-extracted:/data/blueleaks
      - ./databases:/data/databases
      - ./structures:/data/structures

Under volumes, replace /Volumes/datasets/BlueLeaks-extracted with the path to the extracted BlueLeaks dataset on your computer.

Open a terminal, change to your BlueLeaks Explorer folder, and run:

docker-compose up

Wait for the blueleaks-explorer container to start.

Initialize BlueLeaks Explorer

The first time you use BlueLeaks Explorer you must run the initialize script. This will import data from all of the CSV files into SQLite databases.

Open a new terminal, change to your BlueLeaks Explorer folder, and run this command:

docker-compose exec app poetry run python ./initialize.py

It will take several minutes to run, and it will create 4.7GB of SQLite3 databases in your databases folder. You only need to do this step once.

Using BlueLeaks Explorer

Load http://localhost:8000/ in a web browser. Welcome to BlueLeaks Explorer!

blueleaks-explorer's People

Contributors

akilism avatar micahflee avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.