This is a repository for the "Logs Analysis" udacity project which is the third project required for completing the Udacity Full Stack Web Developer Nanodegree program.
The main purpose of this project is to let students understand the fundamentals of SQL by building an internal reporting tool that queries a PostgreSQL™ database containing over a million rows that contains a newspaper's newspaper articles, article authors, and the web server log for the newspaper site to answer three questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
The database includes three tables:
- authors table
- articles table
- log table
- newsdata.zip - contains the compressed data from the PostgreSQL database.
- log_analysis.py - contains the code for the reporting tool.
- database-table-schema.txt - contains the schema of the PostgreSQL database.
- output_screenshot.PNG - an image of the expected output of running the reporting tool.
- Vagrantfile - contains necessary Virtual Machine configurations for the project from this this udacity repository.
Install the following:
Install the following:
- Python3
- psycopg2 python package
- PostgreSQL
-
Unzip the newsdata.zip file, the uncompressed file is a 120MB SQL file, keep the resulting file in the directory of the project (with Vagrantfile).
-
Open a terminal and cd to the project directory and run the command
vagrant up
, this will build the virtual machine and may take some time. -
After the virtual machine is set up, log into it with the command
vagrant ssh
and enter your password (default password is: vagrant). -
Inside the virtual machine cd to the /vagrant/ directory
cd /vagrant/
and do thels
command to make sure the files in the virtual machine exist on the project directory on your actual PC. -
To load the data to PostgreSQL, run
psql -d news -f newsdata.sql
-
Exit from psql by entering
exit
orCtrl + D
-
Run the following command:
python log_analysis.py
you should see an output which is the same as the expected output below. -
Exit from the virtual machine machine by the
exit
command. -
Run the
vagrant halt
command to stop the virtual machine.
-
Unzip the newsdata.zip file, the uncompressed file is a 120MB SQL file, keep the resulting file in the directory of the project.
-
Run a terminal and
cd
to the project's directory. -
Load the data to PostgreSQL by running
psql -d news -f newsdata.sql
on the terminal. -
Run the following command:
python log_analysis.py
to see an output which is the same as the expected output below.