Giter Site home page Giter Site logo

stopa's Introduction

THe SToPA Research Lab

This is the official repository for the Small Town Police Accountability (SToPA) Research Lab. This research lab is co-organized by:

and the Institute for the Quantitative Study of Inclusion, Diversity, and Equity (QSIDE).

Getting Started with the SToPA library

Forking the Repository

To get a copy of the SToPA repository you'll need to fork this repository. You can do this from the Fork button on the upper right of the repository landing page. Once you've forked, you will see that the repository name on the upper left is now <your_username>/SToPA. You can now clone the respository to your local machine with

$ git clone https://github.com/<your_username>/SToPA.git 

for example if your username is annahaensch, it would be

$ git clone https://github.com/annahaensch/SToPA.git 

From your terminal you can change into the SToPA directory with

$ cd SToPA

Setting Up your Environment

Before you get started, you'll need to create a new environment using conda (in case you need it, installation guide here). If you use conda you can create a new environment (we'll call it stopa_env) with

$ conda create --name stopa_env

and activate your new environment, with

$ conda activate stopa_env

Now you can install the necessary requirements for accessing the base functionality of the SToPA library using the following command.

$ conda install pip
$ pip install -U -r requirements.txt

Installing the parsing engine

If you wish to use the Optical Character Recognition (OCR) and parsing tools to build the csv files, you'll need to carry out the following additional installation steps.

For Mac Users

If you are using a mac and you plan to run the OCR engine, you will also need to make sure that tesseract is added to your path my installing it with homebrew.

$ brew install tesseract

For Everyone

The following steps should install all remaining necessary requirements and dependencies.

$ conda install -c conda-forge poppler
$ pip install -U -r requirements_ocr.txt

Using the SToPA Library

How to OCR the Williamstown Policing Data

The 2019 and 2020 police logs were originally received as printed pdf pages which were scanned and saved as Logs2019.pdf and Logs2020.pdf. These can be found in data/primary_datasets/. During the 2021 QSIDE Datathon4Justice, we began the preliminary construction of an OCR pipeline to convert these documents to a readable text format. You can do this too, from the STOPA directory in a terminal, run

$ cd scripts 
$ python pdf_to_parquet.py 2019

or replace 2019 with whichever year you're interested in. This will take awhile to run, so be patient. Pages xxxx through yyyy of the pdf will be printed as parquet files to

SToPA/data/2019_parquet_logs/pages_xxxx_yyyy.pq

and the analogous for 2020. If you want to skip this step, you can find the pre-OCR'd parquet files in google drive here.

How to Parse the Text Files

After performing the OCR steop, 2019 and 2020 police log parquet files can be parsed as a nice easy-to-read csv. To parse the logs for any year, from the STOPA directory in your terminal, run

>cd scripts
>python parquet_to_csv.py 2019

or replace 2019 with whichever year you're interested in. If this raises errors it might be the case that you are missing some dependencies, these can be installed using pip (or whatever package management software you prefer). This will take a couple of minutes to run. The output files will be printed as csv files to

SToPA/data/<today's date>_parsed_2019.csv

or the analogous for 2020. If you want to skip this step, you can fine the pre-parsed csv files in google drive here.

Getting Started with the Data

Once the data is parsed, you can interact with it in a Jupyter notebook. If you installed all of the necessary dependencies when setting up your environment, you should be able to launch a Jupyter server from the top level directory of this repository with

$ jupyter notebook

This will load a Jupyter environment in a web browser. If you navigate to notebooks, you will find Getting_Started_with_the_Data.ipynb. Just open that, and off you go!

Mapping Tools

If you're interested in making maps, then navigate to analysis/interactive_zoomable_map/ and start there with the associated README.

Contributing to SToPA

If you have code in your fork which you'd like to add you can do this using the contribute in the Github interface. Alternatively, you can add the qsideinstitute repository as your upstream remote with

git remote add upstream https://github.com/qsideinstitute/SToPA.git

and if you run git remote -v you shoudl see something like

origin	https://github.com/<username>/SToPA.git (fetch)
origin	https://github.com/<username>/SToPA.git (push)
upstream	https://github.com/qsideinstitute/SToPA.git (fetch)
upstream	https://github.com/qsideinstitute/SToPA.git (push)

Now you can push your changes, open a pull request, and get your cool new work merged into the SToPA project.

Contact

If you'd like to be added to this repository, please email [email protected] with your Github username and the reason you'd like to be added.

stopa's People

Contributors

annahaensch avatar spencer-brooks avatar maminian avatar mendible avatar becausemath avatar mhenson2 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.