Giter Site home page Giter Site logo

recommend's Introduction

Build Status Coverage Status Recommend

Simple recommendatnion system implementation with Python

Current model:

  • Probabilistic Matrix Factorization
  • Bayesian Matrix Factorization
  • Alternating Least Squares with Weighted Lambda Regularization (ALS-WR)

Reference:

  • "Probabilistic Matrix Factorization", R. Salakhutdinov and A.Mnih., NIPS 2008
  • "Bayesian Probabilistic Matrix Factorization using MCMC", R. Salakhutdinov and A.Mnih., ICML 2008
  • Matlab code: http://www.cs.toronto.edu/~rsalakhu/BPMF.html
  • "Large-scale Parallel Collaborative Filtering for the Netflix Prize", Y. Zhou, D. Wilkinson, R. Schreiber and R. Pan, 2008

Install:

# clone repoisitory
git clone [email protected]:chyikwei/recommend.git
cd recommend

# install numpy & scipy
pip install -r requirements.txt
pip install .

Getting started:

  • A jupyter notbook that compares PMF and BPMF model can be found here.

  • To run BPMF with MovieLens 1M dataset: First, download MovieLens 1M dataset and unzip it (data will be in ml-1m folder). Then run:

>>> import numpy as np
>>> from recommend.bpmf import BPMF
>>> from recommend.utils.evaluation import RMSE
>>> from recommend.utils.datasets import load_movielens_1m_ratings

# load user ratings
>>> ratings = load_movielens_1m_ratings('ml-1m/ratings.dat')
>>> n_user = max(ratings[:, 0])
>>> n_item = max(ratings[:, 1])
>>> ratings[:, (0, 1)] -= 1 # shift ids by 1 to let user_id & movie_id start from 0

# fit model
>>> bpmf = BPMF(n_user=n_user, n_item=n_item, n_feature=10,
                max_rating=5., min_rating=1., seed=0).fit(ratings, n_iters=20)
>>> RMSE(bpmf.predict(ratings[:, :2]), ratings[:,2]) # training RMSE
0.79784331768263683

# predict ratings for user 0 and item 0 to 9:
>>> bpmf.predict(np.array([[0, i] for i in xrange(10)]))
array([ 4.35574067,  3.60580936,  3.77778456,  3.4479072 ,  3.60901065,
        4.29750917,  3.66302187,  4.43915423,  3.85788772,  4.02423073])
  • Complete examples can be found in examples/ folder. The scripts will download MovieLens 1M dataset automatically, run PMF(BPMF) model and show training/validation RMSE.

Running Test:

python setup.py test

or run test with coverage:

coverage run --source=recommend setup.py test
coverage report -m

Uninstall:

pip uninstall recommend

Notes:

  • Old version code can be found in v0.0.1. It contains a Probabilistic Matrix Factorization model with theano implementation.

  • The previous version (0.2.1) did not implement correctly MCMC sampling in the BPMF algorithm. In fact, at every timestep it computed the predictions basing on the current value of the feature matrices, and used it to estimate the RMSE. This has no meaning from the MCMC point of view, whose purpose is to sample the feature matrices from the correct distributions in order to estimate the integral through which the rating ditribution is computed. Instead, the correct approach (see Eq. 10 in reference [2]) entails averaging the predictions at every time step to get a final prediction and compute the RMSE. Essentially, the predicted value itself does not depend only on the last extracted value for the feature matrices, but on the whole chain. Having modified this, the RMSE for both the train and test set with BPMF improves (you can see it in this notebook). (Thanks LoryPack's contribution!)

recommend's People

Contributors

chyikwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

recommend's Issues

Multiplication between item_features and user_features

Hi,
I've a question:
why if I multiply item_features and user_features I don't obtain the initial matrix which I give as input to PMF (or BPMF) model? Is it a matrix factorization technique, no? If so, it should be true that the multiplication of that elements gives the initial matrix.

Problems with PMF

Hi,
My professor and I have investigated deeply this library for applying PMF on a ratings matrix and we have encountered a problem during fit() method of PMF.
With 'ratings_matrix' we indicate the ratings matrix which we give as input to this library.

ratings_matrix = [[ 9.00000000e+00 2.71600000e+03 4.00000000e+00]
[ 3.84100000e+03 3.50300000e+03 5.00000000e+00]
[ 2.89900000e+03 1.24600000e+03 5.00000000e+00]
...,
[ 2.63700000e+03 1.26700000e+03 3.00000000e+00]
[ 2.83000000e+02 2.42000000e+03 1.00000000e+00]
[ 4.55100000e+03 1.64000000e+02 3.00000000e+00]]

When we call fit() method - pmf = PMF(n_user = number_of_users, n_item = number_of_items, n_feature = number_of_features, min_rating = min_rating_value, max_rating = max) - , with Numpy version 1.13.3, we encountered this problem (the following):
Traceback (most recent call last):
File "/Users/edoardo/PycharmProjects/MasterThesisProject/Thesis.py", line 529, in
final_ratings_matrix_pmf = probabilistic_matrix_factorization_technique(ratings_matrix)
File "/Users/edoardo/PycharmProjects/MasterThesisProject/Thesis.py", line 250, in probabilistic_matrix_factorization_technique
pmf = PMF(n_user = number_of_users, n_item = number_of_items, n_feature = number_of_features, min_rating = min_rating_value, max_rating = max_rating_value, seed = initial_seed)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py", line 56, in init
self.user_features_ = 0.1 * self.random_state.rand(n_user, n_feature)
File "mtrand.pyx", line 1358, in mtrand.RandomState.rand
File "mtrand.pyx", line 856, in mtrand.RandomState.random_sample
File "mtrand.pyx", line 167, in mtrand.cont0_array
TypeError: 'numpy.float64' object cannot be interpreted as an index

When we call fit() method - , with Numpy version 1.11.0, we encountered this other problem (the following):
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:56: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
self.user_features_ = 0.1 * self.random_state.rand(n_user, n_feature)
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:57: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
self.item_features_ = 0.1 * self.random_state.rand(n_item, n_feature)
PMF training and testing phases
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:70: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
u_feature_mom = np.zeros((self.n_user, self.n_feature))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:71: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
i_feature_mom = np.zeros((self.n_item, self.n_feature))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:73: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
u_feature_grads = np.zeros((self.n_user, self.n_feature))
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py:74: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
i_feature_grads = np.zeros((self.n_item, self.n_feature))
Traceback (most recent call last):
File "/Users/edoardo/PycharmProjects/MasterThesisProject/Thesis.py", line 529, in
final_ratings_matrix_pmf = probabilistic_matrix_factorization_technique(ratings_matrix)
File "/Users/edoardo/PycharmProjects/MasterThesisProject/Thesis.py", line 254, in probabilistic_matrix_factorization_technique
pmf.fit(training_set, n_iters = evaluation_iterations)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py", line 87, in fit
data.take(0, axis=1), axis=0)
TypeError: Cannot cast array data from dtype('float64') to dtype('int64') according to the rule 'safe'

Have you ever encountered this type of problem? We don't know how to solve this problem!

Any help can be useful! Thanks a lot!

A small mistake in the algorithm

Though the program runs without errors, there is a bug in the algorithm. The following will be the corrected ones. Line 123 in https://github.com/chyikwei/recommend/blob/master/mf/bayesian_matrix_factorization.py

 WI_post = inv(inv(self.WI_item) + N * S_bar + \
            np.dot(norm_X_bar, norm_X_bar.T) * \
            (N * self.beta_item) / (self.beta_item + N))

Line 157 in https://github.com/chyikwei/recommend/blob/master/mf/bayesian_matrix_factorization.py

 WI_post = inv(inv(self.WI_user) + N * S_bar + \
            np.dot(norm_X_bar, norm_X_bar.T) * \
            (N * self.beta_user) / (self.beta_user + N))

Running bmf with my data

I have got a 31x9 matrix and I want to perform bmf through your code. Firstly, I read the matrix in the sparse format (180x3) as in the case of your example. Then, I calculate the max of the first and second col and trying to perform your code:

print n_user 31
print n_item 9
print n_feat 15
print ratings #numpy np.array

[[ 1  1 11]
 [ 1  5  7]
 [ 1  6 12]
...
 [31  5  7]
 [31  6  9]
 [31  8  9]]

#fit model
bpmf = BPMF(n_user=n_user, n_item=n_item, n_feature=n_feat,
                max_rating=15., min_rating=0., seed=0).fit(ratings, n_iters=20)
print RMSE(bpmf.predict(ratings[:, :2]), ratings[:,2]) # training RMSE

And I am receiving the following message: raise ValueError("max user_id >= %d", n_user)
ValueError: ('max user_id >= %d', 31)
What am I doing wrong? Actually it is working if I put n_user = 32 and n_item = 10. But does that make any sense? Furthermore the results of the bpmf.predict(ratings) are just the approximated values in my initial resutls. What about the rest of the values?

Parameter 'Lambda' in the ALS-WR Model

Hi,
I've seen the implementation of the ALS-WR model provided in this library.
This model has got a parameter called 'lambda'. I've a question: can I set the 'lambda' parameter of the model and how? The 'lambda' parameter in the reference paper is the 'reg' parameter in this implementation, right? If wrong, how can I set the 'lambda' parameter?

Thanks for the answer!
Best regards.

RuntimeWarning: overflow encountered in multiply

Hi

I let eval_iters = 50, I encountered a problem(RuntimeWarning: overflow encountered in multiply). With the number of eval_iters' increase, I only want to have the minimized RMSE. But I don't know how to solve the problem.
Here is the question:
recommend-0.1.0-py2.7.egg/recommend/pmf.py:86: RuntimeWarning: overflow encountered in multiply
recommend-0.1.0-py2.7.egg/recommend/pmf.py:88: RuntimeWarning: overflow encountered in multiply
recommend-0.1.0-py2.7.egg/recommend/pmf.py:89: RuntimeWarning: overflow encountered in multiply
recommend-0.1.0-py2.7.egg/recommend/pmf.py:90: RuntimeWarning: overflow encountered in multiply
recommend-0.1.0-py2.7.egg/recommend/pmf.py:97: RuntimeWarning: invalid value encountered in add
recommend-0.1.0-py2.7.egg/recommend/pmf.py:104: RuntimeWarning: invalid value encountered in add
recommend-0.1.0-py2.7.egg/recommend/pmf.py:133: RuntimeWarning: invalid value encountered in greater
site-packages/recommend-0.1.0-py2.7.egg/recommend/pmf.py:136: RuntimeWarning: invalid value encountered in less
INFO: iter: 24, train RMSE: nan
INFO: iter: 25, train RMSE: nan
INFO: iter: 26, train RMSE: nan
INFO: iter: 27, train RMSE: nan ...

BMF and PMF parameters

Hi there, I am trying to figure out about the parameters of BMF and PMF algoriths that you are using to tune the equation of the optimization function. Could you please elaborate bit about the parameters needed tuning?

Problem with PMF [fit() method]

Hi,
I'm a computer science student in Milan. My goal is to use this library in order to perform PMF on my ratings matrix.
The problem I encountered is the following:
"File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/pmf.py", line 62, in fit
self.max_rating, self.min_rating)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/recommend/utils/validation.py", line 22, in check_ratings
raise ValueError("max user_id >= %d", n_user)
ValueError: ('max user_id >= %d', 4)

My ratings matrix is like your ratings matrix in your examples (so, each user and item id start from 0, not from 1, so I don't have to set -1 for the id). One example of my ratings matrix is the following:

"[[0 0 1]
[0 1 2]
[0 2 1]
[0 3 1]
[0 4 0]
[1 0 1]
[1 1 0]
[1 2 4]
[1 3 4]
[1 4 4]
[2 0 1]
[2 1 1]
[2 2 0]
[2 3 1]
[2 4 1]
[3 0 5]
[3 1 4]
[3 2 5]
[3 3 5]
[3 4 0]
[4 0 3]
[4 1 1]
[4 2 0]
[4 3 3]
[4 4 3]]"

As you can see, max user id = 4 and also max item id = 4 (and both start from 0, not from 1).

When I call fit() method, my program stops giving me that error.
I don't know how to solve my problem.
Can you give me a help?

Thanks.

Predict Users and Items Latent Features with PMF or BPMF Models

Hi,
The pipeline I’ve implemented is the following:

  1. train a PMF model with a ratings matrix as input, obtaining latent features for both items and users;
  2. predict the rating values for a specific user whose identifier is in the input ratings matrix, by using the latent features extracted with the previous step.

Now I would like to use the trained PMF model in order to extract the latent features (in order to compute the predicted rating value) for new users whose identifiers are not in the input ratings matrix, i.e. predict rating values and obtain latent features for users not present while PMF training. It is possible to do? How? Please report a code example, if possible.

It is possible to do the same task with the BPMF model?

Thanks for the answers,
Best regards.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.