Giter Site home page Giter Site logo

matchbox's Introduction

MatchBox

Industrial recommender systems generally have two main stages: matching and ranking. In the first stage, candidate item matching (also known as candidate retrieval) aims for efficient and high-recall retrieval from a large item corpus. MatchBox provides an open source library for candidate item matching, with stunning features in configurability, tunability, and reproducibility.

Model Zoo

Publication Model Paper Benchmark
UAI'09 MF-BPR BPR: Bayesian Personalized Ranking from Implicit Feedback ↗️
RecSys'16 YoutubeNet Deep Neural Networks for YouTube Recommendations ↗️
CIKM'21 MF-CCL/ SimpleX SimpleX: A Simple and Strong Baseline for Collaborative Filtering ↗️

Dependency

We suggest to use the following environment where we test MatchBox only.

  • CUDA 10.0
  • python 3.6
  • pytorch 1.0
  • PyYAML
  • pandas
  • scikit-learn
  • numpy
  • h5py
  • tqdm

Get Started

The code workflow is structured as follows:

# Set the dataset config and model config
feature_cols = [{...}] # define feature columns
label_col = {...} # define label column
params = {...} # set data params and model params

# Set the feature encoding specs
feature_encoder = FeatureEncoder(feature_cols, label_col, ...) # define the feature encoder
datasets.build_dataset(feature_encoder, ...) # fit feature_encoder and build dataset 

# Load data generators
train_gen, valid_gen, test_gen = datasets.h5_generator(feature_encoder, ...)

# Define a model
model = SimpleX(...)

# Train the model
model.fit(train_gen, valid_gen, ...)

# Evaluation
model.evaluate(test_gen)

Run the code

For reproducing the experiment results, you can run the benchmarking scripts with the corresponding configs as follows.

  • --config: The config directory where dataset config and model config are located.
  • --expid: The experiment id defined in a model config file to set a group of hyper-parameters.
  • --gpu: The gpu index used for experiment, and -1 for CPU.
cd data/Yelp18/yelp18_m1
python matchbox_convert_data.py

cd model_zoo/SimpleX
python run_expid.py --config ./config/SimpleX_yelp18_m1 --expid SimpleX_yelp18_m1 --gpu 0

...

python run_expid.py --config ./config/SimpleX_amazonbooks_m1 --expid SimpleX_amazonbooks_m1 --gpu 0
python run_expid.py --config ./config/SimpleX_gowalla_m1 --expid SimpleX_gowalla_m1 --gpu 0

The running logs are also available in each config directory.

matchbox's People

Contributors

acnowa avatar xpai avatar zhujiem avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.