Giter Site home page Giter Site logo

piterbrito / etl--mongodb-youtube Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 179 KB

Large amount of data with diferent formats such (csv, xml, json, raw), goal is to get 25 files with around 1 Million rows. Perform ETL on the data and store in SQL database

License: MIT License

Jupyter Notebook 100.00%
sql postgresql sqlite-database python3 jupyter-notebook command-line pandas numpy-arrays pymongo sqlalchemy beautifulsoup splinter apis web-scraping json-parser query etl-pipeline

etl--mongodb-youtube's Introduction

ETL--SQL-database-YouTube.

Project 2

Background

Project ETL (Load, Transform and Load), three database functions that are combined into one tool to pull data out of one database and place it into another database.

Extract is the process of reading data from a database. In this stage, the data is collected, often from multiple and different types of sources.

Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data.

Load is the process of writing the data into the target database.

Goal

Get Youtube data from diferent variaty of source such APIs, Web-scraping and Google-Scholar data-sets

Get a Large amount of data with diferent formats such (csv, xml, json, raw), goal is to get 25 files with around 1 Million rows.

Once I have identified the datasets,I will perform ETL on the data and document the following within the jpynb.

The type of transformation needed for this data (cleaning, joining, filtering, aggregating, etc).

The type of final production database to load the data into (relational or non-relational).

The final tables or collections that will be used in the production database.

Submit a final technical report with the above information and steps required to reproduce your ETL process.

Thinking Process

Extract the data from a reliable data source like Kaggle, Web-scraping and API. bring it into the python environment with pandas as a csv and structure it into a pandas dataframe to begin the transformation phase of the data by cleaning the data, fixing the null and missing values , grouping by relevant variables to create visualizations and identify trends and variables. After the data is cleaned and fixed i will load the pandas dataframe into a local database such as postgreSQL and check for SQL tables with SQL commands.

Report

I will included a detailed data dictionary along with the code and the corresponding output of each cell step by step that will cover the detailed explanantion of taking that particular approach towards solving the problem.

etl--mongodb-youtube's People

Contributors

piterbrito avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.