amazingdd / daisyrec Goto Github PK

Official code for "DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation" (TPAMI2022) and "Are We Evaluating Rigorously? Benchmarking Recommendation for Reproducible Evaluation and Fair Comparison" (RecSys2020)

License: MIT License

Python 99.21% Perl 0.42% Shell 0.38%

recommender-system matrix-factorization factorization-machines item2vec k-nearest-neighbors pytorch slim neural-collaborative-filtering svdpp biasmf collaborative-filtering neumf deepfm nfm cdae vae afm ease

daisyrec's Issues

JSONDecode Error

I'm on the dev branch, so that may be the issue, but I'm just trying to get a decent baseline for the EASE algorithm. Using the recommended command generator, I tried to run the command python tune.py --optimization_metric=ndcg --hyperopt_trail=20 --algo_name=ease --dataset=ml-100k --prepro=origin --topk=50 --epochs=50 --test_size=0.2 --val_size=0.1 --cand_num=1000 --test_method=tsbr --val_method=tsbr --tune_pack='{}' but then the output threw an error:

11 Sep 20:03 INFO - {'gpu': '0', 'seed': 2022, 'reproducibility': True, 'state': None, 'optimization_metric': 'ndcg', 'hyperopt_trail': 20, 'tune_testset': False, 'tune_pack': "'{}'", 'algo_name': 'ease', 'val_method': 'tsbr', 'test_method': 'tsbr', 'fold_num': 1, 'val_size': 0.1, 'test_size': 0.2, 'topk': 50, 'cand_num': 1000, 'sample_method': 'uniform', 'sample_ratio': 0, 'num_ng': 4, 'batch_size': 256, 'loss_type': 'BPR', 'init_method': 'default', 'optimizer': 'default', 'early_stop': False, 'data_path': 'data/', 'res_path': None, 'dataset': 'ml-100k', 'prepro': 'origin', 'level': 'ui', 'UID_NAME': 'user', 'IID_NAME': 'item', 'INTER_NAME': 'rating', 'TID_NAME': 'timestamp', 'binary_inter': True, 'positive_threshold': None, 'metrics': ['recall', 'mrr', 'ndcg', 'hit', 'precision'], 'reg': 200.0}
Traceback (most recent call last):
  File "daisyRec\tune.py", line 106, in <module>
    param_dict = json.loads(config['tune_pack'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1520.0_x64__qbz5n2kfra8p0\Lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Paper availability?

Hi,

I'm interested in reading the associated paper "Are We Evaluating Rigorously?" but I can't seem to find it anywhere. Is it available openly?

Thanks

Is DaisyRec going to have session-based recommenders?

First of all, I really appreciate your work and this library. It's what exactly I've looking for..!

I see your paper ends with "diving into more diverse recommendation tasks (e.g., session-aware)".

I just wondered that this library is going to include session based recommenders such as GRURec(https://arxiv.org/abs/1511.06939)...

The problem with the <UNK> in the oitems variable returned by the skip-gram function in the Item2Vec model

According to item2idx, we can know that the index corresponding to is max_item_num. Meanwhile, oitems list contains a large number of index entries. The largest index of the lookup table is max_item_num-1, which causes out-of-bounds. So how can this problem be solveed?

Yelp dataset statistics

Hi! I want to compare my model's results with your benchmark on Yelp dataset, but the dataset I found has different number of interactions/user/items with the numbers you report in the paper.

I downloaded the dataset from the site and kaggle and both sources has 160,585 items, 2,189,457 users and 8,635,403 interactions (as on the site). But you report 174,567 items , 1,326,101 users, and 5,261,669 interactions for origin Yelp in the paper (table 1). Papers says you considered all interactions with rating >=1 for the Yelp dataset and your code corresponds to this (no filtering by rating).

Could you please tell, where to get or how to prepare the data to get identical dataset and compare my model with your baselines?

Best regards!

the parameter test_method='ufo' in daisy.utils.splitter.split_test()

In your fuction split_test(), parameter test_method='ufo' means : split by ratio in user level.
I think the 'ufo' should let 20% of each user's interaction records be the test set. In your fuction, let 20% of users' interactions as test set, so that user' embedding in the test set are not trained...

amazingdd / daisyrec Goto Github PK

daisyrec's Issues

JSONDecode Error

Paper availability?

Is DaisyRec going to have session-based recommenders?

The problem with the <UNK> in the oitems variable returned by the skip-gram function in the Item2Vec model

Yelp dataset statistics

the parameter test_method='ufo' in daisy.utils.splitter.split_test()

ModuleNotFoundError: No module named '...'

about ctr prediction metric

Why can MRR be bigger than 1?

Bug in CDAE model about out_activation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent