Giter Site home page Giter Site logo

fridahkimathi / microsoft-movie-analysis Goto Github PK

View Code? Open in Web Editor NEW

This project forked from learn-co-curriculum/dsc-phase-1-project

0.0 0.0 0.0 59.64 MB

The project used Python to perform exploratory analysis of movie revenue data from IMDB and BOM and derived recommendations on the movie genres Microsoft’s new movie studio should produce.

License: Other

Jupyter Notebook 100.00%
data-analysis data-visualization python

microsoft-movie-analysis's Introduction

Microsoft Movie Analysis - Phase 1 Project

Girl in a jacket

Author: Fridah Kimathi

Overview


Microsoft wishes to create a new movie studio but they have limited knowledge on movie creation. Analysis was made using genres, total gross earnings over the years and ratings. From this analysis, the following recommendations were made:

  1. Create a streaming platform for their movies
  2. Produce movies that have a combination of adventure, action, comedy and drama
  3. Benchmark with top earning movie studios such as BV studios

Business Problem


Due to Microsoft's lack of knowledge in the creation of movies, the following business questions were asked to solve this problem.


Questions to considered:

  • What are the business's pain points related to this project? Microsoft has very little knowledge about the creation of movies and are unsure of what movies they should produce in there new movie studio

  • How did you pick the data analysis question(s) that you did? Genre, rating and gross earnings were used to pick the business questions.

  • Why are these questions important from a business perspective? Microsoft can be able to make informed decisions on the type of movie genres to produce.


The Data


This project used 3 datasets, two from imbd and one from bom.


Questions to considered:

  • Where did the data come from, and how do they relate to the data analysis questions? BOM movie_gross dataset has important columns such as studio, domestic_gross and foreign_gross. IMBD title basics dataset has substantial columns such as start_year, runtime_minutes and genres. IMBD ratings dataset has important columns such as average rating and number of votes

  • What is the target variable? The target variables were genres, start_year(movie release year), the total_gross (combination of domestic and foreign gross)


Methods


Data cleaning, analysis and visualization


The three datasets were imported, cleaned, merged and irrelevant columns dropped. Data cleaning was done by:

  Identifying duplicates:

Duplicates were identified using the title column and start year columns. Duplicates were dropped were maintaining entries with the least amount of missing values.

 Identifying missing values.

A function was created to print out columns that had missing values in each dataset. The genre column for the IMBD title basics dataset had missing values which comprised 3% of the data, which is a small percentage hence those rows were dropped.

 Changing data types 

Data types for all three datasets were checked using the .dtypes attribute. Any incorrect data types such as that of the foreign gross column were changed using appropriate method.

 Flattened the genre column

The values of each row in the genre column were split into a list. Transformation of each element of the list of genre categories into a row, replicating index values was done.

 Checking for placeholders or outliers

A function was defined to print out the contents of each column. The data had no placeholders.There were outliers in the start year column. Some years are in the future. This years were targeted and removed.

 Merging of data sets

The datasets were then merged and irrelevant columns dropped.

For data analysis, the following was done: * Visualization of the total box office revenue over the years was done using a line plot. * A comparison between the top_10 most preferred movie genres Vs the top_10 most profitable movie genres was done by plotting a bar plot in the same figure with different axes. * The top 5 highest rated genres were visualized using a bar plot. * The correlation between Average rating and total gross was done visualized using a scatter plot. * A visualization of the top 5 studios based on total revenue earned was done using a bar plot.


Results

Interpreting the results


Observations for Total box office revenue over the years:

  • Movies made a lot of money in 2018.
  • There has been a sharp decrease in total gross earned by movies since 2018, with the lowest year being 2020.
    This makes sense since streaming platforms started gaining a lot of hype in 2018, leading to lot of people preferring to stream movies on platforms such as Netflix hence the decline in box office revenues.
  • Movies made zero box office revenue in 2020.
    This is because COVID started in 2020, which lead to the closure of movie theaters in a lot of countries hence there was no box office revenue for movies that year.

Observations for the preferred movie genres Vs The most profitable movie genres:

  • Drama is the most preferred movie genre.
  • Romance, Thriller, Horror which are in the top 10 most preferred movies is not in the top 10 most profitable movies
  • Adventure is the most profitable movie genre.
  • Animation, Fantasy and Family are in the top 10 most profitable movies but are not in the top 10 most preferred genres

Observations for the top 5 highest rated genres:

  • Documentary is the highest rated movie genre.
  • The first four genres, that is Documentary, drama , comedy and biography are also in the top ten most preferred/profitable movies
  • History is not in the top ten most preferred/profitable movies

Observations for the correlation between Average rating and total gross:

  • There is no correlation between average rating and the total revenue earned
  • Majority of movies make a total gross of less than 400 million dollars
  • Majority of movies have an average rating higher than 4

Observations for the Top 5 studios based on total revenue earned:

  • BV studio is the highest earning studio
  • Fox and Uni.Studio have approximately the same gross earnings
  • All 5 studios have a gross total greater than 400 million dollars

The generalizations have good grounding based off the data given and therefore can aid Microsoft to make informed decisions of the movies they should make. The data analyzed was collected for the last 10 years therefore it is able to gives us pretty accurate insights. Missing values were carefully replaced to avoid skewed results.Data was aggregated where necessary to help give a clearer picture of the data.

Visualizations

       Total box office revenue over the years

       Preferred movie genres _Vs_ The most profitable movie genres

      Top 5 highest rated genres

      The correlation between Average rating and total gross

      Top 5 studios based on total revenue earned

Conclusions


  • There has been a sharp decrease in box office gross earnings since 2018. This can be attributed to the increase preference for movie streaming in platforms such as Netflix.
  • Adventure, action, comedy and drama are genres that are highly preferred by audiences and also have high profitability.
  • BV Studio has the highest total gross earnings.

Recommendation

From the results in my analysis, I recommended the following:

  1. Create a streaming platform for their movies
  2. Produce movies that have a combination of adventure, action, comedy and drama
  3. Benchmark with top earning movie studios such as BV studios

Future Plans:

  1. Determine profit based on the movie's budget
  2. Compare streaming services gross earnings and box office gross earnings per movie
  3. Assess the relationship between genre and number of votes or what influences the number of votes per movie

For More Information

See the full analysis in the Jupyter Notebook or review this presentation.

For additional info, contact Fridah Kimathi at [email protected] or via my LinkedIn profile.

Repository Structure

├── images
├── zippedData
├── .canvas
├── .gitignore
├── Microsoft-Movie-Analyisis.ipynb
├── Microsoft-Movie-Analysis-Presentation ppt.pdf
└── README.md

microsoft-movie-analysis's People

Contributors

fridahkimathi avatar davidbraslow avatar cheffrey2000 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.