Giter Site home page Giter Site logo

etl-challenge's Introduction

ETL-Project

U.S. Quality of life by State

ETL

Team Members

  • Harry Feldman
  • Jessica Pardo
  • Andrey Tokarev
  • Raven Washington

Introduction

This project is designed to conduct an ETL process by extracting, transforming, and loading data that pertains to the quality of life in different States of the USA. The purpose of extracting the data will be to create a database for potential future country-wide analysis of housing, healthcare, and other measure of quality of life.

Data Extraction

In this project, CSV datasets were extrated from the following sources:

To complete the process of data extraction and transformation, it was prepared a python file ETL Notebook.

Data Enginering

After extracting the data, we made an Entity-Relationship Diagram (ERD) by using an open-source toolkit called Quick Database Diagrams. The model looks as follows:

ERD

Data Transformation

The transformation of the data include the following workflow:

  • Pandas functions in Jupyter Notebook to transform all CSV files responses.
  • CSV files transformed into a dataframes.
  • Python transformation functions for data cleaning, joining, filtering, and null values removed.
  • Several columns removed
  • Duplicate rows was removed, and successfully managed.

More detail of the transformation of the data can be seen in ETL Notebook.

Data Loading

After the process of extracting and transforming the data, we created a SQL database to load the dababase. First, we made a table schema SQL Table Schema for each of the CSV files saved in the Resources directory.

Schema

Using Python and SQLAlchemy, we loaded our data into the tables into PostgreSQL for population.

Loading

etl-challenge's People

Contributors

hjfeldy avatar jessicapardo avatar rwashington1159 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.