Giter Site home page Giter Site logo

somjit101 / netflix-movie-recommendation Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 9.28 MB

A case study of the Netflix Prize solution where, given anonymous data of users and the ratings given to movies, the objective to provide recommendations to users for movies which they would like, based on their past activity and taste.

Jupyter Notebook 100.00%
recommendation-system netflix-prize movie-recommendation matrix-factorization machine-learning netflix svd-matrix-factorisation svd svdpp boosting-algorithms xgboost xgboost-regression regression-models surprise-library svd-factorization implicit-feedback recommendation-engine

netflix-movie-recommendation's Introduction

Netflix-Movie-Recommendation

A case study of the Netflix Prize solution where, given anonymous data of users and the ratings given to movies, the objective to provide recommendations to users for movies which they would like, based on their past activity and taste.

Business Problem

Problem Description

" Netflix is all about connecting people to the movies they love. To help customers find those movies, they developed world-class movie recommendation system: CinematchSM. Its job is to predict whether someone will enjoy a movie based on how much they liked or disliked other movies. Netflix use those predictions to make personal movie recommendations based on each customer’s unique tastes. And while Cinematch is doing pretty well, it can always be made better.

Now there are a lot of interesting alternative approaches to how Cinematch works that netflix haven’t tried. Some are described in the literature, some aren’t. We’re curious whether any of these can beat Cinematch by making better predictions. Because, frankly, if there is a much better approach it could make a big difference to our customers and our business. "

(Source : Netflix Prize Rules)

Problem Statement

Netflix provided a lot of anonymous rating data, and a prediction accuracy bar that is 10% better than what Cinematch can do on the same training data set. (Accuracy is a measurement of how closely predicted ratings of movies match subsequent actual ratings.)

Real world Objectives and Constraints

Objectives

  1. Predict the rating that a user would give to a movie that he ahs not yet rated.
  2. Minimize the difference between predicted and actual rating (RMSE and MAPE)

Constraint:

  1. Some form of interpretability.

Machine Learning Problem

Dataset Overview

Dataset Link

Data files :

  • combined_data_1.txt
  • combined_data_2.txt
  • combined_data_3.txt
  • combined_data_4.txt
  • movie_titles.csv

The first line of each file combined_data_1.txt, combined_data_2.txt, combined_data_3.txt, combined_data_4.txt contains the movie id followed by a colon. Each subsequent line in the file corresponds to a rating from a customer and its date in the following format:

CustomerID Rating Date
... ... ...

separated by commas

MovieIDs range from 1 to 17770 sequentially. CustomerIDs range from 1 to 2649429, with gaps. There are 480189 users. Ratings are on a five star (integral) scale from 1 to 5. Dates have the format YYYY-MM-DD.

Example Datapoint

1:
1488844,3,2005-09-06
822109,5,2005-05-13
885013,4,2005-10-19
30878,4,2005-12-26
823519,3,2004-05-03
893988,3,2005-11-17
124105,4,2004-08-05
1248029,3,2004-04-22
1842128,4,2004-05-09
2238063,3,2005-05-11
1503895,4,2005-05-19
2207774,5,2005-06-06
2590061,3,2004-08-12
2442,3,2004-04-14
543865,4,2004-05-28
1209119,4,2004-03-23
804919,4,2004-06-10
1086807,3,2004-12-28
1711859,4,2005-05-08
372233,5,2005-11-23
1080361,3,2005-03-28
1245640,3,2005-12-19
558634,4,2004-12-14
2165002,4,2004-04-06
1181550,3,2004-02-01
1227322,4,2004-02-06
427928,4,2004-02-26
814701,5,2005-09-29
808731,4,2005-10-31
662870,5,2005-08-24
337541,5,2005-03-23
786312,3,2004-11-16
1133214,4,2004-03-07
1537427,4,2004-03-29
1209954,5,2005-05-09
2381599,3,2005-09-12
525356,2,2004-07-11
1910569,4,2004-04-12
2263586,4,2004-08-20
2421815,2,2004-02-26
1009622,1,2005-01-19
1481961,2,2005-05-24
401047,4,2005-06-03
2179073,3,2004-08-29
1434636,3,2004-05-01
93986,5,2005-10-06
1308744,5,2005-10-29
2647871,4,2005-12-30
1905581,5,2005-08-16
2508819,3,2004-05-18
1578279,1,2005-05-19
1159695,4,2005-02-15
2588432,3,2005-03-31
2423091,3,2005-09-12
470232,4,2004-04-08
2148699,2,2004-06-05
1342007,3,2004-07-16
466135,4,2004-07-13
2472440,3,2005-08-13
1283744,3,2004-04-17
1927580,4,2004-11-08
716874,5,2005-05-06
4326,4,2005-10-29

Mapping the Real world problem to a Machine Learning Problem

For a given movie and user we need to predict the rating would be given by him/her to the movie.
The given problem is a Recommendation problem.
It can also seen as a Regression problem.

Performance Metrics

Machine Learning Objectives/Constraints

  1. Minimize RMSE.
  2. Try to provide some interpretability.

ML Models Used :

  • XGBoost Gradient-boosted Regression
  • SVD Matrix Factorization (Regularized)
  • SVD++ Matrix Factorization (with Implicit Feedback)
  • XGBoost Regression with SVD features

The results of each models and their comparison on a small sample of the total dataset can be found on this file.

netflix-movie-recommendation's People

Contributors

somjit101 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.