Microsoft wishes to create a new movie studio but they have limited knowledge on movie creation. Analysis was made using genres, total gross earnings over the years and ratings. From this analysis, the following recommendations were made:
- Create a streaming platform for their movies
- Produce movies that have a combination of adventure, action, comedy and drama
- Benchmark with top earning movie studios such as BV studios
Due to Microsoft's lack of knowledge in the creation of movies, the following business questions were asked to solve this problem.
Questions to considered:
-
What are the business's pain points related to this project? Microsoft has very little knowledge about the creation of movies and are unsure of what movies they should produce in there new movie studio
-
How did you pick the data analysis question(s) that you did? Genre, rating and gross earnings were used to pick the business questions.
-
Why are these questions important from a business perspective? Microsoft can be able to make informed decisions on the type of movie genres to produce.
This project used 3 datasets, two from imbd and one from bom.
Questions to considered:
-
Where did the data come from, and how do they relate to the data analysis questions? BOM movie_gross dataset has important columns such as studio, domestic_gross and foreign_gross. IMBD title basics dataset has substantial columns such as start_year, runtime_minutes and genres. IMBD ratings dataset has important columns such as average rating and number of votes
-
What is the target variable? The target variables were genres, start_year(movie release year), the total_gross (combination of domestic and foreign gross)
Data cleaning, analysis and visualization
The three datasets were imported, cleaned, merged and irrelevant columns dropped. Data cleaning was done by:
Identifying duplicates:
Duplicates were identified using the title column and start year columns. Duplicates were dropped were maintaining entries with the least amount of missing values.
Identifying missing values.
A function was created to print out columns that had missing values in each dataset. The genre column for the IMBD title basics dataset had missing values which comprised 3% of the data, which is a small percentage hence those rows were dropped.
Changing data types
Data types for all three datasets were checked using the .dtypes attribute. Any incorrect data types such as that of the foreign gross column were changed using appropriate method.
Flattened the genre column
The values of each row in the genre column were split into a list. Transformation of each element of the list of genre categories into a row, replicating index values was done.
Checking for placeholders or outliers
A function was defined to print out the contents of each column. The data had no placeholders.There were outliers in the start year column. Some years are in the future. This years were targeted and removed.
Merging of data sets
The datasets were then merged and irrelevant columns dropped.
For data analysis, the following was done: * Visualization of the total box office revenue over the years was done using a line plot. * A comparison between the top_10 most preferred movie genres Vs the top_10 most profitable movie genres was done by plotting a bar plot in the same figure with different axes. * The top 5 highest rated genres were visualized using a bar plot. * The correlation between Average rating and total gross was done visualized using a scatter plot. * A visualization of the top 5 studios based on total revenue earned was done using a bar plot.
Interpreting the results
Observations for Total box office revenue over the years:
- Movies made a lot of money in 2018.
- There has been a sharp decrease in total gross earned by movies since 2018, with the lowest year being 2020.
This makes sense since streaming platforms started gaining a lot of hype in 2018, leading to lot of people preferring to stream movies on platforms such as Netflix hence the decline in box office revenues. - Movies made zero box office revenue in 2020.
This is because COVID started in 2020, which lead to the closure of movie theaters in a lot of countries hence there was no box office revenue for movies that year.
Observations for the preferred movie genres Vs The most profitable movie genres:
- Drama is the most preferred movie genre.
- Romance, Thriller, Horror which are in the top 10 most preferred movies is not in the top 10 most profitable movies
- Adventure is the most profitable movie genre.
- Animation, Fantasy and Family are in the top 10 most profitable movies but are not in the top 10 most preferred genres
Observations for the top 5 highest rated genres:
- Documentary is the highest rated movie genre.
- The first four genres, that is Documentary, drama , comedy and biography are also in the top ten most preferred/profitable movies
- History is not in the top ten most preferred/profitable movies
Observations for the correlation between Average rating and total gross:
- There is no correlation between average rating and the total revenue earned
- Majority of movies make a total gross of less than 400 million dollars
- Majority of movies have an average rating higher than 4
Observations for the Top 5 studios based on total revenue earned:
- BV studio is the highest earning studio
- Fox and Uni.Studio have approximately the same gross earnings
- All 5 studios have a gross total greater than 400 million dollars
The generalizations have good grounding based off the data given and therefore can aid Microsoft to make informed decisions of the movies they should make. The data analyzed was collected for the last 10 years therefore it is able to gives us pretty accurate insights. Missing values were carefully replaced to avoid skewed results.Data was aggregated where necessary to help give a clearer picture of the data.
Total box office revenue over the years
Preferred movie genres _Vs_ The most profitable movie genres
Top 5 highest rated genres
The correlation between Average rating and total gross
Top 5 studios based on total revenue earned
- There has been a sharp decrease in box office gross earnings since 2018. This can be attributed to the increase preference for movie streaming in platforms such as Netflix.
- Adventure, action, comedy and drama are genres that are highly preferred by audiences and also have high profitability.
- BV Studio has the highest total gross earnings.
From the results in my analysis, I recommended the following:
- Create a streaming platform for their movies
- Produce movies that have a combination of adventure, action, comedy and drama
- Benchmark with top earning movie studios such as BV studios
- Determine profit based on the movie's budget
- Compare streaming services gross earnings and box office gross earnings per movie
- Assess the relationship between genre and number of votes or what influences the number of votes per movie
See the full analysis in the Jupyter Notebook or review this presentation.
For additional info, contact Fridah Kimathi at [email protected] or via my LinkedIn profile.
├── images
├── zippedData
├── .canvas
├── .gitignore
├── Microsoft-Movie-Analyisis.ipynb
├── Microsoft-Movie-Analysis-Presentation ppt.pdf
└── README.md