Giter Site home page Giter Site logo

maz2198 / spark Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 361 KB

This repository represents several projects completed in IE HST's MS in Business Analytics and Big Data program Spark course using PySpark.

Jupyter Notebook 100.00%
pyspark spark crime chicago-crime analytics flights insights clea

spark's Introduction

Spark

This repo represents work completed during the IE University's Spark course using PySpark.

Chicago Crimes Analysis: 2014 -2016

The main purpose of this assignment was to choose a large open-source dataset, develop a persona/business case and answer business question using Spark queries taught in class.

I chose the Chicago Crimes Dataset. This dataset reflects reported incidents of crime (with the exception of murders where data exists for each victim) that occurred in the City of Chicago from 2001 to present, minus the most recent seven days. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. In order to protect the privacy of crime victims, addresses are shown at the block level only and specific locations are not identified. This dataset is available for download from Chicago City Data Portal.

The code is outlined in ChicagoCrimesIndividual.ipynb and the business case and answers to the business queries are summarized in MarangSparkAssignment.pdf.

USA Flights Analysis

According to a 2010 report made by the US Federal Aviation Administration, the economic price of domestic flight delays generates yearly costs of USD 32.9 billion to passengers, airlines and other parts of the economy. More than half of that amount comes from the pockets of passengers, as they do not only waste time waiting for their planes to leave, but also in missed connecting flights, money spent not only on food but also sleeping on hotel rooms while they're stranded.

The report, focusing on data from the year 2007, estimated that air transportation delays put a a dent of USD 4 billion in the country's gross domestic product in that year. The full report can be found in the following link: report.

But which are the causes for these delays?

In order to answer this question, we analyzed the provided dataset, containing up to 1.936.758 different internal flights in the US for 2008 and the causes for their delays, diversions and cancellations; if any.

The data comes from the U.S. Department of Transportation's (DOT) Bureau of Transportation Statistics (BTS). The business questions, queries and insights are outlined in FlightsGroupAssignment.ipynb.

This project was completed with Diego Cuartas and Nisrine Ferahi.

spark's People

Contributors

maz2198 avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.