Giter Site home page Giter Site logo

nyc_taxi_trip_analysis's Introduction

Hired-Ride Trips Analysis in New York City

Overview

heatmap_static

This project serves as the final submission for the "Tools for Analytics" course, instructed by Erin Root at Columbia University. The goal is to investigate and illustrate the trends and patterns of hired-ride trips in New York City from January 2009 through June 2015, using data from Uber and NYC Yellow Taxi services, as well as local historical weather information. The analysis seeks to understand the factors affecting the popularity of Uber and Yellow Taxis, user preferences during various weather conditions, and the dynamics of trips across the city.

The project is divided into four sections: Data Preprocessing, Data Storage, Data Understanding, and Data Visualization. Python programming language, Jupyter Notebook, and SQL are used throughout these sections. Key Python libraries include pandas, geopandas, matplotlib, and SQLAlchemy.

heatmap_animation

For a detailed description of each section, please refer to the project structure.

Getting Started

To run the analysis, set up and execute the Final Project.ipynb notebook.

Dependencies

The following libraries are required to run the project:

  • folium==0.14.0
  • geopandas==0.10.2
  • numpy==1.21.6
  • pandas==1.3.5
  • pyarrow==11.0.0
  • requests==2.29.0
  • seaborn==0.12.2
  • sqlite3 (included in Python's standard library)
  • BeautifulSoup4==4.12.0 (imported as bs4)
  • matplotlib==3.2.0
  • scipy==1.4.1
  • sqlalchemy==1.3.4

For a complete list of dependencies, please refer to the requirements.txt file.

Authors

Project Structure

The project is organized into four main sections, each playing a crucial role in the analysis of NYC taxi trip data:

1. Data Preprocessing

In this section, we download, clean, and preprocess datasets for Uber rides, Yellow Taxi trips, and historical weather data. The preprocessing steps involve filling in missing information, generating samples, and calculating distances between pickup and drop-off locations.

For weather data, we pay special attention to interpolation. We sort the data chronologically and use bi-directional linear interpolation to estimate missing values for variables such as 'HourlyWindSpeed', 'DailySustainedWindSpeed', 'DailyPeakWindSpeed', 'DailyAverageWindSpeed', 'SunriseTime', and 'SunsetTime'. We round the values of 'SunriseTime' and 'SunsetTime' to the nearest integer (nearest second) and convert them back to time values. We also generate random values for "HourlyPrecipitation" based on existing values and calculate "DailyPrecipitation" accordingly.

For taxi and Uber data, our primary focus after downloading and merging data files is filtering out routes outside the specified region, normalizing column names, and extracting data.

2. Data Storage

We create a SQLite database in this section and populate it with four tables: Yellow Taxi trips, Uber trips, hourly weather information, and daily weather information. We also develop a schema.sql file defining the schema for each table.

3. Data Understanding

In this section, we design SQL queries to answer specific questions related to hired-ride trip patterns, popular hours and days, distances traveled, and weather conditions. These queries help us gain a more comprehensive understanding of the data and its features.

4. Data Visualization

Finally, we generate ten visualizations embedded in the Jupyter Notebook using matplotlib and other visualization libraries. We also create an animation depicting the yearly change of heatmap for all hired trips over a map of the area. These visualizations provide insights into the popularity of hired rides during different hours and days, average distances traveled, drop-offs at major airports, spatial distribution of trips, and the relationship between tip amount and distance or precipitation.

Conclusion

This project demonstrates the power of data analysis and visualization in providing valuable insights into the patterns and preferences of hired-ride trips in a large urban area like New York City. It showcases how various factors, such as time, weather, and location, can influence the demand and usage of hired rides, helping service providers better understand their customers and optimize their operations.

nyc_taxi_trip_analysis's People

Contributors

powerzbt avatar kellypiao avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

kellypiao

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.