This project loads data from 2 types of files :
- files containing song information
- log files containing user activity
The loaded information is then fed into a postgres database to help in analysing information about what songs users are listening to.
create_tables.py
is used to create all the relational tables and the constraints necessary for storing the songs related data. (Tables will be deleted if they exist already)sql_queries.py
contains the SQL queries related to creation of tables and inserting and fetching of dataetl.py
contains the logic for fetching the data from the files and populating the database with them
Run pip install -r requirements.txt
to install the libraries needed for the project
Ensure that there is a local instance of postgres running with the following in place:
- A database named
sparkifydb
- The dabase should be accesible with by a user named
student
with paswordstudent
Run python create_tables.py
to create the needed tables.
Run python etl.py
to load data from the data
folder into the DB