Giter Site home page Giter Site logo

crowdfunding_etl's Introduction

Crowdfunding_ETL

Project 2

Project Goals:

This is a course project assignment from BCS to be completed as a team where the team is tasked to perform the Extract, Transform, and Load (ETL) process using Jupyter Notebook. The team members are Ms. Abdalla, Mr. Krenn, Mr. Ngo, and Mr. Samwel.

Data Sources Files:

The key files for this project are in three locations within this repository. The first location is in the main folder of the repository. The file "ETL_Mini_Project_RAbdalla_DKrenn_KNgo_YSamwel.ipynb" contains all of the syntax used to extract, clean, manage, and analyze the data for this project. The second key location is the postgreSQL folder, which contains most of the documents for the final part of the assignment. This folder contains a document for setting up (crowdfunding_db_schema.sql) and checking (crowdfunding_select_queries.sql) the SQL database. There's also "crowdfunding_ERD.png" in this folder which shows the entity relationship diagram (ERD) for the project. The final key location is in the "Resources" which contains 5 key excel files for the project.

Data Extraction & Cleaning:

The data were imported from crowdfunding.xlsx. From there, the category & subcategory column was separated into a category column and a subcategory column. Then the unique values from category and subcategory were each exported into their own dataframe, and each item was assigned its own variable ID. The dataframes were then exported to separate excel files. In the next portion of the assignment, the main excel file, crowdfunding.xlsx, was cleaned and updated. A few variables were renamed with cleaner column headers. For a few of the columns, data types were updated, including the reformatting some columns as dates. The category and subcategory IDs were then imported in from the prevously created excel files. Columns that weren't needed were dropped, and the resulting dataframe was reexported to crowdfunding.xlsx. In the next portion of the assignment, a new dataframe was created to hold data on contacts. The original contacts spreadsheet had the data formatted as json. So the json data was imported and converted into a multicolumn dataframe. The dataframe was then cleaned and expored back into an excel file. Even though we were only required to do this processes using one of two approaches, we ultimately completed this portion using both techniques. The final portion of the project ties all of the previously created spreadsheets together. In order to make sense of all the data collectively, an ERD was created to show the cross-over constructs amongst all of the excel files. A database schema was created for SQL, and the data were loaded into a database named crowdfunding_db.

Database Creation & Running Instructions:

Creating employees_db and tables:

Create a new database named crowdfunding_db then open and run the crowdfunding_db_schema.sql to create the following tables:

  • contacts
  • category
  • subcategory
  • campaign

Importing:

Start by importing the csv files that do not have any dependencies in the following order:

  1. contacts.csv to contacts table
  2. category.csv to category table
  3. subcategory.csv to subcategory table

Once the above files have been imported, the following csv file can be imported into their tables in any order:

  • campaign.csv to campaign table

Querying:

Open and run each query for each question from the crowdfunding_select_queries.sql file.

crowdfunding_etl's People

Contributors

123noob1 avatar drkrenn avatar yoerisamwel avatar rrrrasha avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.