Log Analysis Project

The Log Analysis project is part of the Udacity Full Stack Nanodegree program. The goal of this project is to answer the following questions using queries to a PostgreSQL database. The questions being answered in this project are:

What are the most popular three articles of all time?
Who are the most popular article authors of all time?
On which days did more than 1% of requests lead to errors?

Throughout the course of this Nanodegree program, a vagrant virtual machine configuration is required on the host system. This virtual machine is a Linux based VM that provides a PostgreSQL database and support software needed to complete this project. The VM can be downloaded and installed from the Virtual Box website. Udacity provided us with a Vagrantfile for configuration of the virtual machine. That Vagrantfile is inclueded in this repository.

Once the vagrant VM is installed, the user will need to navigate to the directory containing the Vagrantfile and use the command vagrant up to cause Vagrant to download and install the Linux virtual machine. Then the user can log into the VM using the vagrant ssh command.

Udacity provided the file newsdata.sql in a zipped format at the start of the project. The file newsdata.zip can be found here. This file will be copied to the vagrant directory. Files that are in the directory /vagrant on the virtual machine are shared with the vagrant folder on the host computer.

We need to change to the vagrant directory and run the command psql -d news -f newsdata.sql to load the data.

Source Code and Setup

The source code for this project can be found on my github page.

This project encouraged us to use our new knowledge of SQL and possibly create a view to assist with the queries.

The view popular was created and used in the loganalysis.py file to help answer the first two questions, as they contained related information.

Once the user has connected to the VM and loaded the newsdata into a SQL database, she/he will need to connect to the database using the command psql -d news and run the following command:

create view popular as
	select articles.title, articles.author, authors.name, count(log.path) as num
	from articles left join log on log.path
	like concat('%',articles.slug)
	join authors on articles.author = authors.id
	group by articles.title, articles.author, authors.name
	order by num desc;

Running the code

The user will need to download and install Python, if it is not already installed.

This project was implemented usign Python 3 and I recommend using Python 3 for all future projects.

Executing the code is as simple as typing the command

python3 loganalysis.py

into the terminal.

The output of this file can also be seen in the text file log_results.txt.

Resources and References

The following links were helpful in creating the SQL queries and properly outputting the information using Python.

SQL Multiple Joins question on Stack Overflow.
Using LIKE in a join question on Stack Overflow.
The PostgreSQL date function.
Formatting for the datetime object in Python.
Properly displaying the % symbol in a Python string, from Stack Overflow.
Limiting the number of digits after the decimal point in a Python float, from Stack Overflow.

Python Code Quality

The source code was run through the command-line tool pep8 to check that the code conforms to the PEP8 standards. The pep8 tool recommened to install and use pycodestyle tool due to GitHub Issue #466. The source code was also checked using this tool, which did not report any errors.

This tool is run on the command line of the terminal as follows:

pycodestyle loganalysis.py

License

The contents of this repository are covered under the MIT License.

sjcorreia / log-analysis Goto Github PK

log-analysis's Introduction

Log Analysis Project

Source Code and Setup

Running the code

Resources and References

Python Code Quality

License

log-analysis's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent