Giter Site home page Giter Site logo

disigma / recommenders Goto Github PK

View Code? Open in Web Editor NEW

This project forked from recommenders-team/recommenders

0.0 0.0 0.0 28.9 MB

Best Practices on Recommendation Systems

Home Page: https://microsoft-recommenders.readthedocs.io/en/latest/

License: MIT License

Jupyter Notebook 73.87% Python 26.01% Dockerfile 0.12%

recommenders's Introduction

Recommenders

This repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The examples detail our learnings on five key tasks:

  • Prepare Data: Preparing and loading data for each recommender algorithm
  • Model: Building models using various classical and deep learning recommender algorithms such as Alternating Least Squares (ALS) or eXtreme Deep Factorization Machines (xDeepFM).
  • Evaluate: Evaluating algorithms with offline metrics
  • Model Select and Optimize: Tuning and optimizing hyperparameters for recommender models
  • Operationalize: Operationalizing models in a production environment on Azure

Several utilities are provided in reco_utils to support common tasks such as loading datasets in the format expected by different algorithms, evaluating model outputs, and splitting training/test data. Implementations of several state-of-the-art algorithms are included for self-study and customization in your own applications. See the reco_utils documentation.

For a more detailed overview of the repository, please see the documents at the wiki page.

Getting Started

Please see the setup guide for more details on setting up your machine locally, on Spark, or on Azure Databricks.

To setup on your local machine:

  1. Install Anaconda with Python >= 3.6. Miniconda is a quick way to get started.
  2. Clone the repository
    git clone https://github.com/Microsoft/Recommenders
    
  3. Run the generate conda file script to create a conda environment: (This is for a basic python environment, see SETUP.md for PySpark and GPU environment setup)
    cd Recommenders
    python scripts/generate_conda_file.py
    conda env create -f reco_base.yaml  
    
  4. Activate the conda environment and register it with Jupyter:
    conda activate reco_base
    python -m ipykernel install --user --name reco_base --display-name "Python (reco)"
    
  5. Start the Jupyter notebook server
    cd notebooks
    jupyter notebook
    
  6. Run the SAR Python CPU MovieLens notebook under the 00_quick_start folder. Make sure to change the kernel to "Python (reco)".

NOTE - The Alternating Least Squares (ALS) notebooks require a PySpark environment to run. Please follow the steps in the setup guide to run these notebooks in a PySpark environment.

Install this repository via PIP

A setup.py file is provided in order to simplify the installation of this utilities in this repo from the main directory. This still requires the conda environment to be installed as described above. Once the necessary dependencies are installed you can use the following command to install reco_utils as it's own python package.

pip install -e reco_utils

It is also possible to install directly from Github. Or from a specific branch as well.

pip install -e git+https://github.com/microsoft/recommenders/#egg=pkg\&subdirectory=reco_utils
pip install -e git+https://github.com/microsoft/recommenders/@staging#egg=pkg\&subdirectory=reco_utils

NOTE - The pip installation does not install any of the necessary package dependencies, it is expected that conda will be used as shown above to setup the environment for the utilities being used.

Algorithms

The table below lists the recommender algorithms currently available in the repository. Notebooks are linked under the Environment column when different implementations are available.

Algorithm Environment Type Description
Alternating Least Squares (ALS) PySpark Collaborative Filtering Matrix factorization algorithm for explicit or implicit feedback in large datasets, optimized by Spark MLLib for scalability and distributed computing capability
Deep Knowledge-Aware Network (DKN)* Python CPU / Python GPU Content-Based Filtering Deep learning algorithm incorporating a knowledge graph and article embeddings to provide powerful news or article recommendations
Extreme Deep Factorization Machine (xDeepFM)* Python CPU / Python GPU Hybrid Deep learning based algorithm for implicit and explicit feedback with user/item features
FastAI Embedding Dot Bias (FAST) Python CPU / Python GPU Collaborative Filtering General purpose algorithm with embeddings and biases for users and items
LightGBM/Gradient Boosting Tree* Python CPU / PySpark Content-Based Filtering Gradient Boosting Tree algorithm for fast training and low memory usage in content-based problems
Neural Collaborative Filtering (NCF) Python CPU / Python GPU Collaborative Filtering Deep learning algorithm with enhanced performance for implicit feedback
Restricted Boltzmann Machines (RBM) Python CPU / Python GPU Collaborative Filtering Neural network based algorithm for learning the underlying probability distribution for explicit or implicit feedback
Riemannian Low-rank Matrix Completion (RLRMC)* Python CPU Collaborative Filtering Matrix factorization algorithm using Riemannian conjugate gradients optimization with small memory consumption.
Simple Algorithm for Recommendation (SAR)* Python CPU Collaborative Filtering Similarity-based algorithm for implicit feedback dataset
Surprise/Singular Value Decomposition (SVD) Python CPU Collaborative Filtering Matrix factorization algorithm for predicting explicit rating feedback in datasets that are not very large
Vowpal Wabbit Family (VW)* Python CPU (online training) Content-Based Filtering Fast online learning algorithms, great for scenarios where user features / context are constantly changing
Wide and Deep Python CPU / Python GPU Hybrid Deep learning algorithm that can memorize feature interactions and generalize user features

NOTE: * indicates algorithms invented/contributed by Microsoft.

Preliminary Comparison

We provide a benchmark notebook to illustrate how different algorithms could be evaluated and compared. In this notebook, the MovieLens dataset is split into training/test sets at a 75/25 ratio using a stratified split. A recommendation model is trained using each of the collaborative filtering algorithms below. We utilize empirical parameter values reported in literature here. For ranking metrics we use k=10 (top 10 recommended items). We run the comparison on a Standard NC6s_v2 Azure DSVM (6 vCPUs, 112 GB memory and 1 P100 GPU). Spark ALS is run in local standalone mode. In this table we show the results on Movielens 100k, running the algorithms for 15 epochs.

Algo MAP nDCG@k Precision@k Recall@k RMSE MAE R2 Explained Variance
ALS 0.004732 0.044239 0.048462 0.017796 0.965038 0.753001 0.255647 0.251648
SVD 0.012873 0.095930 0.091198 0.032783 0.938681 0.742690 0.291967 0.291971
SAR 0.113028 0.388321 0.333828 0.183179 N/A N/A N/A N/A
NCF 0.107720 0.396118 0.347296 0.180775 N/A N/A N/A N/A
FastAI 0.025503 0.147866 0.130329 0.053824 0.943084 0.744337 0.285308 0.287671

Contributing

This project welcomes contributions and suggestions. Before contributing, please see our contribution guidelines.

Build Status

Build Type Branch Status Branch Status
Linux CPU master Status staging Status
Linux GPU master Status staging Status
Linux Spark master Status staging Status
Windows CPU master Status staging Status
Windows GPU master Status staging Status
Windows Spark master Status staging Status

AzureML Build Status

These DevOps pipelines run the existing tests on AzureML.

Build Type Branch Status Branch Status
nightly_cpu_tests master Build Status Staging Build Status
nightly_gpu_tests master Build Status Staging Build Status

NOTE - these tests are the nightly builds, which compute the smoke and integration tests. Master is our main branch and staging is our development branch. We use pytest for testing python utilities in reco_utils and papermill for the notebooks. For more information about the testing pipelines, please see the test documentation.

recommenders's People

Contributors

abhirame avatar almudenasanz avatar anargyri avatar az0 avatar bethz avatar chenhuims avatar danielsc avatar datashinobi avatar dciborow avatar eisber avatar gcampanella avatar gramhagen avatar heatherbshapiro avatar jingyanwangms avatar jreynolds01 avatar leavingseason avatar loomlike avatar maxkazmsft avatar microsoftopensource avatar miguelgfierro avatar motefly avatar nicolashug avatar nikhilrj avatar pratikjawanpuria avatar roalexan avatar tandav avatar wesszumino avatar wutaomsft avatar yueguoguo avatar zegerius avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.