Giter Site home page Giter Site logo

amazingdd / daisyrec Goto Github PK

View Code? Open in Web Editor NEW
545.0 13.0 86.0 13.16 MB

Official code for "DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation" (TPAMI2022) and "Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison" (RecSys2020)

License: MIT License

Python 99.21% Perl 0.42% Shell 0.38%
recommender-system matrix-factorization factorization-machines item2vec k-nearest-neighbors pytorch slim neural-collaborative-filtering svdpp biasmf collaborative-filtering neumf deepfm nfm cdae vae afm ease

daisyrec's Issues

JSONDecode Error

I'm on the dev branch, so that may be the issue, but I'm just trying to get a decent baseline for the EASE algorithm. Using the recommended command generator, I tried to run the command python tune.py --optimization_metric=ndcg --hyperopt_trail=20 --algo_name=ease --dataset=ml-100k --prepro=origin --topk=50 --epochs=50 --test_size=0.2 --val_size=0.1 --cand_num=1000 --test_method=tsbr --val_method=tsbr --tune_pack='{}' but then the output threw an error:

11 Sep 20:03 INFO - {'gpu': '0', 'seed': 2022, 'reproducibility': True, 'state': None, 'optimization_metric': 'ndcg', 'hyperopt_trail': 20, 'tune_testset': False, 'tune_pack': "'{}'", 'algo_name': 'ease', 'val_method': 'tsbr', 'test_method': 'tsbr', 'fold_num': 1, 'val_size': 0.1, 'test_size': 0.2, 'topk': 50, 'cand_num': 1000, 'sample_method': 'uniform', 'sample_ratio': 0, 'num_ng': 4, 'batch_size': 256, 'loss_type': 'BPR', 'init_method': 'default', 'optimizer': 'default', 'early_stop': False, 'data_path': 'data/', 'res_path': None, 'dataset': 'ml-100k', 'prepro': 'origin', 'level': 'ui', 'UID_NAME': 'user', 'IID_NAME': 'item', 'INTER_NAME': 'rating', 'TID_NAME': 'timestamp', 'binary_inter': True, 'positive_threshold': None, 'metrics': ['recall', 'mrr', 'ndcg', 'hit', 'precision'], 'reg': 200.0}
Traceback (most recent call last):
  File "daisyRec\tune.py", line 106, in <module>
    param_dict = json.loads(config['tune_pack'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Paper availability?

Hi,

I'm interested in reading the associated paper "Are We Evaluating Rigorously?" but I can't seem to find it anywhere. Is it available openly?

Thanks

Yelp dataset statistics

Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.

I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).

Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?

Best regards!

the parameter test_method='ufo' in daisy.utils.splitter.split_test()

In your fuction split_test(), parameter test_method='ufo' means : split by ratio in user level.
I think the 'ufo' should let 20% of each user's interaction records be the test set. In your fuction, let 20% of users' interactions as test set, so that user' embedding in the test set are not trained...

ModuleNotFoundError: No module named '...'

I followed the instructions to run the project, but immediately ran into ModuleNotFoundError: No module named 'optuna'. Did I miss an install somewhere?

If I didn't miss anything, which version of optuna should I use, and can this be included in the requirements.txt?

about ctr prediction metric

This is a good project!
But can u add some ctr prediction metrics such as 'AUC' or 'F1'๏ผŸ
I think add it is easy, and it will be more helpful.

Why can MRR be bigger than 1?

Hello, I have read your paper and find it is a fantastic work. However, I find mrr metric is bigger than 1 in your result such as ML1M. But in my view, it can not be greater than 1. Can you clear up my confusion, thanks!

Qidong Liu
Best wishes!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.