Giter Site home page Giter Site logo

dsci623_final's Introduction

DSCI 623: Final Project

In June 2018, ProPublica published an interactive graphic titled Paying the President, detailing the pattern of Trump campaign and administration spending millions at his properties. They subsequently released their dataset, title Spending at Trump Properties, to the public and encouraged others to investigate the details themselves.

Overview

In an attempt to practice exploratory data analysis, students were given this dataset and asked to walk through a series of questions that might interest be of interest to an auditor.

This is a non-exhaustive list of the types of questions we were asked:

  • How many records are there in the dataset?
  • How many actual unique purposes_scrubbed are there for this spending?
  • How many property_scrubbed actually contain the word "Trump"?
  • What is the total being spent on these properties?

I completed the initial analysis using command line tools.

Additional work was completed using the Jupyter notebook.

Citations

2018, ProPublica. Spending at Trump Properties [dataset]. ProPublica.org. Accessed on Apr. 24, 2021.

License

The code used in the process of exploring this dataset has been made publicly available underneath the Mozilla Public License Version 2.0. You can find the license file here: licensed.

Dependencies

The initial analysis took place on an Amazon Lightsail server running Ubuntu 20.04, and utilized the following utilities:

eBay's TSV Utilities - A set of command line tools designed for TSV data files, released under the highly permissive Boost Software License 1.0.

Prof. Sonstein's tsv2json Utility - A command line utility for converting TSV files into JSON, written and compiled with the D programming language. Shared under the Creative Commons Attribution-ShareAlike 4.0 International License.

clarkgrubb's Data Tools - Found via eBay's TSV Utiltiies' other toolkits page, this is a set of data tools that offer small quality-of-life helpers and format converters, in the form of Python and Bash shell scripts. Licensed under the MIT License. The reservoir-sample command is one that's useful for discovery.

Some of the Python packages that are useful for exploratory data analysis within the context of the Jupyter notebook are:

pandas numpy matplotlib

dsci623_final's People

Contributors

effendiian avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.