Giter Site home page Giter Site logo

etl-project-1's Introduction

ETL Project

Team Members

Dianne Jardinez, Aastha Arora, Swarna Latha

Project Summary

The objective of this project was to extract data from websites and available APIs. The following datasets were then transformed by cleaning, joining, and filtering into nine tables. The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin.

Finding Data

The following Data Sources were used below:

  • IMDb Website

    • Method: Webscraping
    • Used for: Collecting the Top 250 IMDB rated movie list
  • OMDb API

    • Method: API Extraction
    • Used for: Collecting IMDb id and other movie related details like actor, director, etc.
  • Utelly API

    • Method: API Extraction
    • Used For: Collecting streaming options for Top 250 IMDb movies
  • uNoGS API

    • Method: API Extraction
    • Used For: Collecting movies on Netflix in released in the United States which have an IMDb rating between 7 and 10
  • Google Search Engine

    • Method: Webscraping
    • Used for: Collecting viewing Streaming Service availability and price

Data Cleanup & Analysis

  • Data extracted were formated in CSV and JSON files
  • The following datasets were then transformed by cleaning, joining, and filtering into nine tables
  • The object-relational database, PostgreSQL, was used to load the datasets into pgAdmin. A relational database was selected as the data was in a structured format

Project Report

  • Extract:

    • Google scraping.ipynb:
      • contains IMDB website and Google Search Engine Webscraping
    • netflix_high_imdb_rated(uNoGS api).ipynb:
      • contains IMDB website Webscraping, OMDb API, and uNoGS API extraction
    • streaming_options(utelly api).ipynb:
      • contains Utelly API extraction
  • Transform:

    • Transform.ipynb:
      • contains all datasets that were transformed into nine tables
  • Load:

    • SQL folder:
      • contains ERD and schema
    • SQL_Table folder:
      • contains the creation of and all nine tables created in pgAdmin with PostgreSQL
    • Project Report document:
      • contains detailed project description and sample PostgreSQL queries in pgAdmin

etl-project-1's People

Contributors

aastha-arora avatar diannejardinez avatar latha-g avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.