This repo is about simple data analysis of video game titles, platforms and their sales in various parts of the world. By leveraging the wonderful python libraries for data analysis we gain deep insights into the data.
Some areas worth exploring:
- Titles which are available for more than one platform
- Top contending platform
- Which type of platform is popular?
- Top selling genres
- Top publishers by Global Sales
Dataset can be downloaded from here
Motivated by Gregory Smith's web scrape of VGChartz Video Games Sales, this data set simply extends the number of variables with another web scrape from Metacritic. Unfortunately, there are missing observations as Metacritic only covers a subset of the platforms. Also, a game may not have all the observations of the additional variables discussed below. Complete cases are ~ 6,900
Alongside the fields: Name, Platform, Year_of_Release, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales, we have:-
- Critic_score - Aggregate score compiled by Metacritic staff
- Critic_count - The number of critics used in coming up with the Critic_score
- User_score - Score by Metacritic's subscribers
- User_count - Number of users who gave the user_score
- Developer - Party responsible for creating the game
- Rating - The ESRB ratings
- Python 3
- Jupyter Notebook
- Pandas (for data analysis)
- Numpy
- Matplotlib (for visualization)
- Seaborn (for visualization)
View the Notebook
- Download and install Anaconda. It contains all the relevant packages mentioned in Prerequiste section
- Download the dataset and this repository
- Open terminal/command prompt and navigate to the downloaded repository, then run "jupyter notebook" command in the terminal
- Jupyter file explorer will open in the browser. Click on the notebook then run Cell-->Run All from the menu.
- Use predictive modelling to estimate future sales