Giter Site home page Giter Site logo

evfro / polara Goto Github PK

View Code? Open in Web Editor NEW
251.0 12.0 23.0 2.03 MB

Recommender system and evaluation framework for top-n recommendations tasks that respects polarity of feedbacks. Fast, flexible and easy to use. Written in python, boosted by scientific python stack.

License: MIT License

Python 100.00%
recommender-system evaluation matrix-factorization tensor-factorization top-n-recommendations collaborative-filtering

polara's Introduction

POLARA

Polara is the first recommendation framework that allows a deeper analysis of recommender systems performance, based on the idea of feedback polarity (by analogy with sentiment polarity in NLP).

In addition to standard question of "how good a recommender system is at recommending relevant items", it allows assessing the ability of a recommender system to avoid irrelevant recommendations (thus, less likely to disappoint a user). You can read more about this idea in a research paper Fifty Shades of Ratings: How to Benefit from a Negative Feedback in Top-N Recommendations Tasks. The research results can be easily reproduced with this framework, visit a "fixed state" version of the code at https://github.com/Evfro/fifty-shades (there're also many usage examples). The framework also features efficient tensor-based implementation of an algorithm, proposed in the paper, that takes full advantage of the polarity-based formulation.

Prerequisites

Current version of Polara supports both Python 2 and Python 3 environments. Future versions are likely to drop support of Python 2 to make a better use of Python 3 features.

The framework heavily depends on Pandas, Numpy, Scipy and Numba packages. Better performance can be achieved with mkl (optional). It's also recommended to use jupyter notebook for experimentation. Visualization of results can be done with help of matplotlib. The easiest way to get all those at once is to use the latest Anaconda distribution.

If you use a separate conda environment for testing, the following command can be issued to ensure that all required dependencies are in place (see this for more info):

conda install --file conda_req.txt

Alternatively, a new conda environment with all required packages can be created by:

conda create -n <your_environment_name> python=3.7 --file conda_req.txt

Installation

The easiest way is to install directly from source. Activate your conda environment and run:
pip install --no-cache-dir --upgrade git+https://github.com/evfro/polara.git#egg=polara
This will install the current release version. For the most recent developer version insert @develop between polara.git and #egg=polara in the line above.

Alternatively, you can manually clone this repository to a local machine (git clone git://github.com/evfro/polara.git). Once in the root of the newly created local repository, run
python setup.py install

Usage example

A special effort was made to make a recsys for humans, which stresses on the ease of use of the framework. For example, that's how you build a pure SVD recommender on top of the Movielens 1M dataset:

from polara.recommender.data import RecommenderData
from polara.recommender.models import SVDModel
from polara.datasets.movielens import get_movielens_data
# get data and convert it into appropriate format
ml_data = get_movielens_data(get_genres=False)
data_model = RecommenderData(ml_data, 'userid', 'movieid', 'rating')
# build PureSVD model and evaluate it
svd = SVDModel(data_model)
svd.build()
svd.evaluate()

Several different scenarios and use cases, which cover many practical aspects, can also be found in the examples directory.

Creating new recommender models

Basic models can be extended by subclassing RecommenderModel class and defining two required methods: self.build() and self.get_recommendations(). Here's an example of a simple item-to-item recommender model:

from polara.recommender.models import RecommenderModel

class CooccurrenceModel(RecommenderModel):
    def __init__(self, *args, **kwargs):
        super(CooccurrenceModel, self).__init__(*args, **kwargs)
        self.method = 'item-to-item' # pick some meaningful name

    def build(self):
        # build model - calculate item-to-item matrix
        user_item_matrix = self.get_training_matrix()
        # rating matrix product  R^T R  gives cooccurrences count
        i2i_matrix = user_item_matrix.T.dot(user_item_matrix) # gives CSC format
        # exclude "self-links" and ensure only non-zero elements are stored
        i2i_matrix.setdiag(0)
        i2i_matrix.eliminate_zeros()
        # store matrix for generating recommendations
        self.i2i_matrix = i2i_matrix

    def get_recommendations(self):
        # get test users information and generate top-k recommendations
        test_matrix, test_data = self.get_test_matrix()
        # calculate predicted scores
        i2i_scores = test_matrix.dot(self.i2i_matrix)
        # prevent seen items from appearing in recommendations
        if self.filter_seen:
            self.downvote_seen_items(i2i_scores, test_data)
        # generate top-k recommendations for every test user
        top_recs = self.get_topk_elements(i2i_scores)
        return top_recs

And the model is ready for evaluation:

i2i = CooccurrenceModel(data_model)
i2i.build()
i2i.evaluate()

Bulk experiments

Here's an example of how to perform top-k recommendations experiments with 5-fold cross-validation for several models at once:

from polara.evaluation import evaluation_engine as ee
from polara.recommender.models import PopularityModel, RandomModel

# define models
i2i = CooccurrenceModel(data_model)
svd = SVDModel(data_model)
popular = PopularityModel(data_model)
random = RandomModel(data_model)
models = [i2i, svd, popular, random]

metrics = ['ranking', 'relevance'] # metrics for evaluation: NDGC, Precision, Recall, etc.
folds = [1, 2, 3, 4, 5] # use all 5 folds for cross-validation (default)
topk_values = [1, 5, 10, 20, 50] # values of k to experiment with

# run 5-fold CV experiment
result = ee.run_cv_experiment(models, folds, metrics,
                              fold_experiment=ee.topk_test,
                              topk_list=topk_values)

# calculate average values across all folds for e.g. relevance metrics
scores = result.mean(axis=0, level=['top-n', 'model']) # use .std instead of .mean for standard deviation
scores.xs('recall', level='metric', axis=1).unstack('model')

which results in something like:

model MP PureSVD RND item-to-item
top-n
1 0.017828 0.079428 0.000055 0.024673
5 0.086604 0.219408 0.001104 0.126013
10 0.138546 0.300658 0.001987 0.202134
... ... ... ... ...

Custom pipelines

Polara by default takes care of raw data and helps to organize full evaluation pipeline, that includes splitting data into training, test and evaluation datasets, performing cross-validation and gathering results. However, if you need more control on that workflow, you can easily implement your custom usage scenario for you own needs.

Build models without evaluation

If you simply want to build a model on a provided data, then you only need to define a training set. This can be easily achieved with the help of prepare_training_only method (assuming you have a pandas dataframe named train_data with corresponding "user", "item" and "rating" columns):

data_model = RecommenderData(train_data, 'user', 'item', 'rating')
data_model.prepare_training_only()

Now you are ready to build your models (as in examples above) and export them to whatever workflow you currently have.

Warm-start and known-user scenarios

By default polara makes testset and trainset disjoint by users, which allows to evaluate models against user warm-start. However in some situations (for example, when polara is used within a larger pipeline) you might want to implement strictly a known user scenario to assess the quality of your recommender system on the unseen (held-out) items for the known users. The change between these two scenarios as controlled by setting data_model.warm_start attribute to True or False. See Warm-start and standard scenarios Jupyter notebook as an example.

Externally provided test data

If you don't want polara to perform data splitting (for example, when your test data is already provided), you can use the set_test_data method of a RecommenderData instance. It has a number of input arguments that cover all major cases of externally provided data. For example, assuming that you have new users' preferences encoded in the unseen_data dataframe and the corresponding held-out preferences in the holdout dataframe, the following command allows to include them into the data model:

data_model.set_test_data(testset=unseen_data, holdout=holdout, warm_start=True)

Polara will automatically perform all required transformations to ensure correct functioning of the evaluation pipeline. To evaluate models you simply call standard methods without any modifications:

svd.build()
svd.evaluate()

In this case the recommendations are generated based on the testset and evaluated against the holdout. See more usage examples in the Custom evaluation notebook.

Reproducing others work

Polara offers even more options to highly customize experimentation pipeline and tailor it to specific needs. See, for example, Reproducing EIGENREC results notebook to learn how Polara can be used to reproduce experiments from the "EIGENREC: generalizing PureSVD for effective and efficient top-N recommendations" paper.

How to cite

If you find this framework useful for your research, please cite the following paper:

"HybridSVD: when collaborative information is not enough"; Evgeny Frolov and Ivan Oseledets, 2019. In Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). ACM, New York, NY, USA, 331-339.

polara's People

Contributors

evfro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

polara's Issues

polara.preprocessing module not found

Hi! I clone a git repository with polara and run setup.py in appropriate folder. But then trying to import it got a ModuleNotFoundError: No module named 'polara.preprocessing'.
Is there some hints to use it or it's a bug?

Get recommendation from PureSVD model given user's preference

Hi @evfro ,

I want to get recommendation based on user's preference, not user_id. For example, there are 10 items in training data, and (new) user's preference is [0,1,0,0,0,0,1,1,0,0]. Theoretically, we just multiply the vector and item matrix (p . V . Vt). How can I achieve that?

TypeError: 'int' object is not iterable

Hi,
While running data_model.prepare(), There is error on following line.
features_melted = self.item_features.agg(lambda x: [f for l in x for f in l], axis=1)

Error is TypeError: 'int' object is not iterable

However I tried converting every element in the item_tag dataframe in str then into np array, But doesn't seem to work.

Conda and Pip package

I just found that there no package on Conda and Pip repository. It would be great to put it into both registries :)

ModuleNotFoundError: No module named 'polara'


ModuleNotFoundError Traceback (most recent call last)
Input In [50], in
----> 1 from polara.recommender.coldstart.models import LightFMItemColdStart

ModuleNotFoundError: No module named 'polara'

Even after doing pip install polara facing the issue.

NotImplementedError: Data has duplicate values

data_model = ItemColdStartData(
training_data,
*training_data.columns, # userid, itemid
item_features=content_feature_df,
seed=seed)

print(data_model)

HERE IM GETTING ERROR: NotImplementedError: Data has duplicate values

My dataframe has multiple entries for a user. cant drop them. any help here

Screenshot 2022-03-25 at 14 29 38

Precision@K

I am currently working with an implicit feedback dataset and trying to evaluate a model based on the precision@K measure. I am having difficulties understanding the implemented precision calculation in a case when the list of recommendations consists only of top K recommendations.

@evfro Can you please clarify if the calculated precision is a precision@K or a regular precision measure?

ModuleNotFoundError: No module named 'polara.recommender.hybrid'

Hi @evfro , I love this great Polara framework!

However, when I run ScaledSVD notebook (https://github.com/evfro/recsys19_hybridsvd/blob/master/ScaledSVD.ipynb), I got this issue:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-3b2c68df4112> in <module>
      8                                          set_config)
      9 
---> 10 from scaledsvd import ScaledSVD, ScaledSVDItemColdStart
     11 from data_preprocessing import (get_movielens_data,
     12                                 get_bookcrossing_data,

~/bukalapak/recsys19_hybridsvd/scaledsvd.py in <module>
      3 from scipy.sparse.linalg import norm as spnorm
      4 from polara import SVDModel
----> 5 from polara.recommender.coldstart.models import SVDModelItemColdStart
      6 
      7 

~/miniconda3/envs/erwin-svd/lib/python3.6/site-packages/polara-0.6.5.dev0-py3.6.egg/polara/recommender/coldstart/models.py in <module>
      5 from polara import SVDModel
      6 from polara.recommender.models import RecommenderModel, ScaledMatrixMixin
----> 7 from polara.recommender.hybrid.models import LCEModel, HybridSVD
      8 from polara.recommender.external.lightfm.lightfmwrapper import LightFMWrapper
      9 from polara.lib.similarity import stack_features

ModuleNotFoundError: No module named 'polara.recommender.hybrid'

How to solve the issue? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.