Giter Site home page Giter Site logo

udacity-project-cassandra's Introduction

Project: Data Modeling with Apache Cassandra ๐Ÿš€

Note: You don't need to use the docker on this project to run it. Was just used to be easier to just clone and run. ๐Ÿ˜Š

This Udacity Data Engineering Nanodegree project to create an database to a music app Sparkify. We'll use Cassandra as our database and the modeling of the tables is to answer the queries bellow:

  1. artist, song(is the name/title of it) and song's length (aka durattion) for sessionId=338 and itemInSession=4
  2. songinfo_by_user_by_session includes artist, song and user for a given userId and sessionId
  3. userinfo_by_song includes user names for a given song.

And all the pre processing, pipeline and modeling is in a single jupyter notebook.

Data Processing, ETL pipeline and Modeling ๐Ÿช›

  1. Processing

    The data are partitioned by date on the event_data folder. And the processing will read all the partitioned csv files and unify the data in a single file called event_datafile_new.csv on the output_data folder.

  2. ETL Pipeline

    The cells right after the creation of the table on the jupyter notebook will run a insert on this table from the event_datafile_new.csv based on the queries mentioned on the earlier section.

  3. Modeling

    Occurs on the ETL pipeline, but just to justify the modeling. The tables are maded to answer the questions maded on by the queries. This approach is to gain performance and the easy understand, for just seeing the name of the table you know what his purpouse and what he answer.

Run the project ๐Ÿ’ฟ

With the make command run the command bellow:

# Build the docker containers and run the prject
make run

Connect to the Jupyter notebook: http://localhost:8888/?token=jupyter_token

Finally, when you finish drop the containers.

# Will stop the containers running
make stop

Note: If you want to delete the images generated on this project, run the command bellow

make drop

That's all folks! ๐Ÿซก

udacity-project-cassandra's People

Contributors

deividmarreiro avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.