Giter Site home page Giter Site logo

cs547-assignment3's Introduction

Purdue CS54701 - Spring 2015

CS54701 Assignment 3: Collaborative Recommendation Algorithm

Movie Recommendation System

In this assignment, you will develop different algorithms to make recommendations for movies.

###The Training Data

The training data: a set of movie ratings by 200 users (userid: 1-200) on 1000 movies (movieid: 1-1000). The data is stored in a 200 row x 1000 column table. Each row represents one user. Each column represents one movie. A rating is a value in the range of 1 to 5, where 1 is "least favored" and 5 is "most favored". Please NOTE that a value of 0 means that the user has not explicitly rated the movie.

Please download the training data here: train.txt.

The Test Data

There are three test files: test5.txt, test10.txt and test20.txt.

[test5.txt] A pool of movie ratings by 100 users (userid: 201-300). Each user has already rated 5 movies. The format of the data is as follows: the file contains 100 blocks of lines. Each block contains several triples : (U, M, R), which means that user U gives R points to movie M. Please note that in the test file, if R=0, then you are expected to predict the best possible rating which user U will give movie M. The following is a block for user 262. (line 5588-5595 of test5.txt)

262 259 3      // user 262 gives movie 259 3 points.
262 310 5      // user 262 gives movie 310 5 points.
262 321 5      // user 262 gives movie 321 5 points.
262 323 2      // user 262 gives movie 323 2 point.
262 358 1      // user 262 gives movie 358 1 point.
262 11 0       // need to predict user 262's rating for movie 11
262 22 0       // need to predict user 262's rating for movie 22
262 100 0       ...

ATTENTION: Please make the prediction block by block: every time when you are making predictions for user U, please assume that you ONLY know the knowledge of the training data (train.txt) and the existing 5 ratings for this user. In another words, please DO NOT use the knowledge of any other blocks in the test file when making predictions.

The format of test10.txt and test20.txt is nearly the same as test5.txt, the only difference is that: in test10.txt, 10 ratings are given for a specific user; in test20.txt, 20 ratings are given for a specific user.

Please make sure your predicted ratings are from 1 to 5. Otherwise, the grading system will return error.

Tasks

Your task is to design and develop collaborative filtering algorithms that predict the unknown ratings in the test data by learning users' preference from the training data.

Please complete the following experiments:

1. Implement the Memory-Based Collaborative Filtering Algorithm (40 pts)

Please implement two versions of the memory-based collaborative filtering algorithm as the Pearson Coefficient method and the vector similarity method

For more detailed information you can refer to

Breese J. S., Heckerman D., Kadie C. (1998). Empirical Analysis of Predictive Algorithms for Collaborative Filtering. (Pdf)

2. Implement your algorithm 1 (15 pts)

3. Implement your algorithm 2 (15 pts)

You can try different extensions of the memory-based algorithm (e.g., algorithms in the above paper).

You can also try different model-based methods. Some references can be found:

Hofmann, T., & Puzicha, J. (1999). Latent Class Models for Collaborative Filtering. In the Proceedings of International Joint Conference on Artificial Intelligence. (pdf)

Pennock, D. M., Horvitz, E., Lawrence, S., & Giles, C. L. (2000). Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach. In the Proceeding of the Sixteenth Conference on Uncertainty in Artificial Intelligence. (pdf)

Si, L. & Jin. R. (2003). Flexible mixture model for collaborative filtering.   In the Proceeding of the International Conference of Machine Learning. (pdf)

4. Results Discussion (25 pts)

Please provide the following information

  1. The accuracy of the algorithms;  Do you think the values are reasonable? How can you justify the results by analyzing the advantages and disadvantages of the algorithms

  2. How long each algorithm takes to complete the prediction? Discuss the efficiency of the algorithms.

5. Results Competition (5 pts)

To make the homework a little more interesting; 5 points will be assigned according to the performance of your recommendation system. The best performance of the three algorithms from each student will be recorded and compared. 5 full points will be assigned to the top 3 players, 4 points for players of ranks 4-6, 3 points for ranks 7-9, 2 points for ranks 10-12, and lastly 1 point for ranks 13-15.

cs547-assignment3's People

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

cxuereb warutered

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.