Giter Site home page Giter Site logo

oiannace / etl-pipeline Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 409 KB

ETL pipeline that extracts and transforms student athlete academic performance data, then populates a data warehouse using a star schema dimensional model.

Python 100.00%
etl-pipeline postgresql python star-schema

etl-pipeline's Introduction

ETL Data Pipeline

Project Context

The main dataset being used contains academic scores for student athletes on NCAA Division 1 teams. The granularity of the data is at the school, sport, and gender level. The goal for this project is to determine if the academic scores for sports teams are correlated with the physicality of the sport. In other words, is there a correlation between contact sports and poor academic performance.

The data was extracted from different sources (csv, web scraping), cleaned and transformed to uniformity, and then loaded into a PostgreSQL database according to the below star schema.

Dimensional Model

Dimension Tables: date_dim, location_dim, school_dim, sport_dim
Fact Table: academic_score_snapshot_fact

The dimensional model is implemented using a star schema.

alt_text

Using the code

Create and activate a virtual environment, then install the dependencies. All example code below is using Powershell.
Note: venv_name is the name of your virtual environment

PS C:\> python -m venv venv_name
PS C:\> venv_name\Scripts\Activate.ps1
PS C:\> pip install -r packages.txt

To create PostgreSQL database and dimension and fact tables according to the above star schema, run the create_star_schema.py file.

PS C:\> python create_star_schema.py

Finally, to execute the ETL (Extract, Transform, Load) pipeline and populate the data warehouse according to the above star schema, run the loader.py file.

PS C:\> python loader.py

Testing the code

To test the code using pytest, run the following command in PowerShell:

PS C:\> pytest -q tests.py

Note: "-q" is used to condense the output of the above command.

etl-pipeline's People

Contributors

oiannace avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.