Giter Site home page Giter Site logo

taxi_trip_records's Introduction

Trip Length Analysis for Taxis in New York City

Introduction

This project analyzes the average trip length for yellow taxis in New York City, focusing on both trip duration and distance. It automates the tracking process through three primary components: data ingestion, transformation, and visualization.

  • Data Ingestion: Downloads data in Parquet format from the NYC Taxi & Limousine Commission, ensuring continuous data flow.
  • Data Transformation: Prunes and applies feature engineering to the downloaded data to prepare it for analysis.
  • Data Visualization: Visualizes both monthly and rolling averages of trip lengths over time, providing insights into temporal trends.

Project Structure

The project is organized as follows:

  • params.yaml: Defines parameters for data processing and analysis.
  • config/config.yaml: Contains configuration values for the project.
  • requirements.txt: Lists all the required Python packages.
  • research/: Jupyter notebooks for component research.
  • src/utils/: Utility functions used across different components.
  • src/entity/init.py: Defines class attributes for each project component.
  • src/config/config.py: Manages application configuration.
  • src/components/: Core code for each project component.
  • src/pipeline/: Execution scripts for each component, managing the workflow.
  • main.py: Entry point for running the pipeline scripts.
  • app.py: Application entry point, orchestrating the overall process.
  • Dockerfile: Provides instructions for Dockerizing the application.
  • tests/: Unit tests for each component located in src/components.

How to Run Tests

To execute tests for a specific component, use the command 'pytest tests/<component_name>.py'. Replace <component_name> with the appropriate component name, such as test_data_ingestion, test_data_transformation, or test_data_visualization, depending on which component you wish to test.

How to Run Project

This project can be run either directly through Python or within a Docker container. Below are the instructions for both methods:

Running Directly with Python

Before running the application, ensure you have Python installed and the virtual environment has been correctly set up by running:

'pip install -r requirements.txt'

Once the environment has been correctly set up, you can run the application:

  • Direct Application Run: Use the command 'python main.py' in your terminal to start the main application process.
  • Access via FastAPI: Use the command 'python app.py' to start the application with FastAPI. This will allow you to access the application's API endpoints.

Running with Docker

To run the application using Docker, follow these steps:

  1. Build the Docker Image: Run 'docker build -t your-image-name .' in the terminal. Replace your-image-name with a name of your choice for the Docker image.

  2. Run the Docker Container: After the image is built, you can start the container using: 'docker run -p 80:80 your-image-name'. This command will map port 80 of the container to port 80 on your host machine.

Analysis

The project includes an in-depth analysis of the rolling 45-day average for trip distance and duration of yellow taxis in NYC, starting from January 2009 to the present. This analysis aims to uncover trends and patterns in taxi usage over the years.

overview

Overall Trend: The analysis of taxi trip data from 2009 to the present suggests a discernible evolution in the average distances and durations of taxi trips in New York City. This gradual shift could be attributed to urban development, evolving traffic patterns, or changes in commuter behaviors. The data indicates a tendency for taxi trips to become lengthier, with distance and duration generally showing a consistent correlation. However, external factors and specific events within the city may disrupt this trend periodically. Such a trend underscores the dynamic nature of urban mobility and reflects the city's ongoing infrastructural and socio-economic transformations.

seasonality-1

seasonality-2

Seasonality: There is a clear seasonal pattern observed with peaks and troughs occurring around the same time each year. Trip distances peak during the summer months, which could be due to increased tourism and outdoor activities when the weather is favorable. Conversely, trip durations appear to peak slightly later in the year, which may be influenced by increased traffic congestion and slower travel times as the weather deteriorates and the holiday season approaches. The shortest trip distances and durations tend to occur in the winter months, particularly around February, which may correspond with the coldest time of the year when there's likely to be a reduction in tourism and possibly more residents choosing to stay indoors or use alternative modes of transportation.

covid

COVID-19 Impact: In early 2020, everything changed. When COVID-19 hit, the correlation between how far and how long trips were broke down. Even though trips got longer, they took less time, probably because there were fewer cars on the road due to lockdowns. This was an unexpected twist that shows just how much the pandemic changed everyday life.

after covid

Recovery After COVID-19: As the city advanced beyond the peak of the pandemic, the relationship between the distance and duration of taxi trips started to realign with historical patterns. However, a 'new normal' seems to have taken shape, with distinct average trip lengths diverging from pre-pandemic figures. This shift could be a reflection of lasting changes in commuting habits or a rise in remote working practices.

taxi_trip_records's People

Contributors

jjjjjooooo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.