Giter Site home page Giter Site logo

amazingdd / daisyrec Goto Github PK

View Code? Open in Web Editor NEW
545.0 13.0 86.0 13.16 MB

This is the repository of our article published in RecSys 2020 "Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison"

License: MIT License

Python 99.21% Perl 0.42% Shell 0.38%
recommender-system matrix-factorization factorization-machines item2vec k-nearest-neighbors pytorch slim neural-collaborative-filtering svdpp biasmf

daisyrec's Introduction

PyPI - Python Version Version GitHub repo size GitHub arXiv

Overview

daisyRec is a Python toolkit developed for benchmarking top-N recommendation task. The name DAISY stands for multi-Dimension fAir comparIson for recommender SYstem.

The figure below shows the overall framework of DaisyRec-v2.0.

This repository is used for publishing. If you are interested in details of our experiments ranking results, try to reach this repo file.

We really appreciate these repositories to help us improve the code efficiency:

How to Run

Make sure you have a CUDA enviroment to accelarate since the deep-learning models could be based on it.

1. Install from pip

pip install daisyRec

2. Clone from github

git clone https://github.com/AmazingDD/daisyRec.git && cd daisyRec
  • Example codes are listed in run_examples, try to refer them and find out how to use daisy; you can also implement these codes by moving them into daisyRec/.

  • The GUI Command Generator for test.py and tune.py, which can assist you to quikly write arguments and run the fair comparison experiments, is now available here.

    The generated command will be like this:

    python tune.py --param1=20 --param2=30 ....
    python test.py --param1=20 --param2=30 ....
    

    We highly recommend you to implement the code with our GUI firstly!

Documentation

The documentation of DaisyRec is available here, which provides detailed explanations for all arguments.

Implemented Algorithms

Models in daisyRec only take triples <user, item, rating> into account, so FM-related models will be specialized accrodingly. Below are the algorithms implemented in daisyRec. More baselines will be added later.

Model Publication
MostPop A re-visit of the popularity baseline in recommender systems
ItemKNN Item-based top-N recommendation algorithms
EASE Embarrassingly Shallow Autoencoders for Sparse Data
PureSVD Top-n recommender system via matrix completion
SLIM SLIM: Sparse Linear Methods for Top-N Recommender Systems
MF Matrix factorization techniques for recommender systems
FM Factorization Machines
NeuMF Neural Collaborative Filtering
NFM Neural Factorization Machines for Sparse Predictive Analytics
NGCF Neural Graph Collaborative Filtering
Multi-VAE Variational Autoencoders for Collaborative Filtering
Item2Vec Item2vec: neural item embedding for collaborative filtering
LightGCN LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation

Datasets

You can download experiment data, and put them into the data folder. All data are available in links below:

Cite

Please cite both of the following papers if you use DaisyRec in a research paper in any way (e.g., code and ranking results):

@inproceedings{sun2020are,
  title={Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison},
  author={Sun, Zhu and Yu, Di and Fang, Hui and Yang, Jie and Qu, Xinghua and Zhang, Jie and Geng, Cong},
  booktitle={Proceedings of the 14th ACM Conference on Recommender Systems},
  year={2020}
}

@article{sun2022daisyrec,
  title={DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation},
  author={Sun, Zhu and Fang, Hui and Yang, Jie and Qu, Xinghua and Liu, Hongyang and Yu, Di and Ong, Yew-Soon and Zhang, Jie},
  journal={arXiv preprint arXiv:2206.10848},
  year={2022}
}

daisyrec's People

Contributors

amazingdd avatar gcong9 avatar hyllll avatar mike-fzy avatar sunzhuntu avatar yoheikikuta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

daisyrec's Issues

the parameter test_method='ufo' in daisy.utils.splitter.split_test()

In your fuction split_test(), parameter test_method='ufo' means : split by ratio in user level.
I think the 'ufo' should let 20% of each user's interaction records be the test set. In your fuction, let 20% of users' interactions as test set, so that user' embedding in the test set are not trained...

about ctr prediction metric

This is a good project!
But can u add some ctr prediction metrics such as 'AUC' or 'F1'๏ผŸ
I think add it is easy, and it will be more helpful.

Why can MRR be bigger than 1?

Hello, I have read your paper and find it is a fantastic work. However, I find mrr metric is bigger than 1 in your result such as ML1M. But in my view, it can not be greater than 1. Can you clear up my confusion, thanks!

Qidong Liu
Best wishes!

Paper availability?

Hi,

I'm interested in reading the associated paper "Are We Evaluating Rigorously?" but I can't seem to find it anywhere. Is it available openly?

Thanks

Yelp dataset statistics

Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.

I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).

Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?

Best regards!

JSONDecode Error

I'm on the dev branch, so that may be the issue, but I'm just trying to get a decent baseline for the EASE algorithm. Using the recommended command generator, I tried to run the command python tune.py --optimization_metric=ndcg --hyperopt_trail=20 --algo_name=ease --dataset=ml-100k --prepro=origin --topk=50 --epochs=50 --test_size=0.2 --val_size=0.1 --cand_num=1000 --test_method=tsbr --val_method=tsbr --tune_pack='{}' but then the output threw an error:

11 Sep 20:03 INFO - {'gpu': '0', 'seed': 2022, 'reproducibility': True, 'state': None, 'optimization_metric': 'ndcg', 'hyperopt_trail': 20, 'tune_testset': False, 'tune_pack': "'{}'", 'algo_name': 'ease', 'val_method': 'tsbr', 'test_method': 'tsbr', 'fold_num': 1, 'val_size': 0.1, 'test_size': 0.2, 'topk': 50, 'cand_num': 1000, 'sample_method': 'uniform', 'sample_ratio': 0, 'num_ng': 4, 'batch_size': 256, 'loss_type': 'BPR', 'init_method': 'default', 'optimizer': 'default', 'early_stop': False, 'data_path': 'data/', 'res_path': None, 'dataset': 'ml-100k', 'prepro': 'origin', 'level': 'ui', 'UID_NAME': 'user', 'IID_NAME': 'item', 'INTER_NAME': 'rating', 'TID_NAME': 'timestamp', 'binary_inter': True, 'positive_threshold': None, 'metrics': ['recall', 'mrr', 'ndcg', 'hit', 'precision'], 'reg': 200.0}
Traceback (most recent call last):
  File "daisyRec\tune.py", line 106, in <module>
    param_dict = json.loads(config['tune_pack'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

ModuleNotFoundError: No module named '...'

I followed the instructions to run the project, but immediately ran into ModuleNotFoundError: No module named 'optuna'. Did I miss an install somewhere?

If I didn't miss anything, which version of optuna should I use, and can this be included in the requirements.txt?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.