A Udacity Data Scientist Nanodegree Project
There should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
For this project, I was interestested in using Black Friday Dataset from Kaggle to better understand:
Question 1: Which User spent most during black Friday, list the top 20 spending users
Question 2: How about the User Distribution by Age Group? And also consider Gender
Question 3: Which products are most popular during Black Friday, list the top 20
Question 4: Look at the users again, this time focus on group by Occupation in different city
Question 5: Correlation between Gender, Age, Occupation, City_Category, Stay_In_Current_City_Years, Marital_Status, Product_Category_x vs Purchase
There are 1 notebooks available here to showcase work related to the above questions. The notebooks is exploratory in searching through the data pertaining to the questions showcased by the notebook title. Markdown cells & comments were used to assist in walking through the thought process for individual steps.
BlackFriday.csv - This file contains 550,000 observations about the black Friday in a retail store, it contains different kinds of variables either numerical or categorical. It contains missing values.
The main findings of the code can be found at the post available here.
Must give credit to Analytics Vidhya & Kaggle for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!