ustcml / recstudio Goto Github PK

A highly-modularized and recommendation-efficient recommendation library based on PyTorch.

License: MIT License

Python 92.80% Jupyter Notebook 7.20%

collaborative-filtering ctr-prediction deep-learning factorization-machines knowledge-graph matrix-factorization pytorch recommender-system sequential-recommendation graph-neural-networks

recstudio's People

Contributors

Stargazers

Watchers

recstudio's Issues

Wrong entry names in some dataset config files

Inter_feat with ratings lower than low_rating_thres are filtered out in the following code.

RecStudio/recstudio/data/dataset.py

Lines 486 to 487 in 2bd40a8

    
           def _filter(self, min_user_inter, min_item_inter): 
        
               self._filter_ratings(self.config.get('low_rating_thres', None))

However, the corresponding entries are misspelled as low_rating_threshold in the following dataset config files:

amazon-beauty
amazon-books
amazon-electronics
gowalla
ml-10m
ml-20m
tmall
yelp

Recorvery the threads number limit for pytorch

In SeqDataset, the training speed is limited by the data loading due to the padding operation. Maybe we could use more threads in padding procedure.

Failed to open http://recstudio.org.cn/

无法打开官网，因此获取不到官方文档的信息。
正在使用这个库，有些内容不是很明白，希望能获得帮助，或者提供文档。

AE models output invalid results on part of datasets

I find MultiVAE and MultiDAE both output nan recall on ml-1m while performing correctly on ml-100k and gowalla. But BPR (MF model) and LightGCN (graph model) are normal on all three datasets. So I guess it may be a problem with AE models.

Feature names in different tables are not allowed to be the same.

When there are two same feature names in two different tables (e.g. one in the user information table and one in the item information table), there would be hidden problems.

For example, there is a column named category in both user.csv and item.csv. When I want to get the values of both two columns, the value of category would be overwritten

float seq (vector) feature supported in dataset

In dataset, sometimes we would use dataset which contains vector type features. For example, in if we want to use embeddings generated by language model, the feature would be embedding. Therefore, maybe we need the support of float sequence type.

No time field is not supportable.

Why user_hist and user_count of val_data is not contained by val_data itself?

the uh and uc of trn_data are added to trn_data, val_data and tst_data.
the uh and uc of val_data are added to tst_data, but why not add them to val_data itself?

The code is at the end of _build()

Where to find LSH Sampling

In your paper, ref "Table 5: Samplers in RecStudio", you mentioned that LSH based samplers have been implemented. But I cannot find them in your code.

InfoNCE loss

The temperature hyperparameter seems to be missing from InfoNCE loss function in RecStudio.

Out-Of-Memory Error in Validation Phase

Why there will be a sudden Cuda memory usage increase in the validation phase?
The batch size of the validation phase set in the config file is smaller than the training phase, but there will be a sudden Cuda memory usage increase in the validation phase, which causes the OOM Error.
Specifically, when the model runs the code in run.py，model.evaluate will cost more Cuda memory than model.fit, could you please help me solve this problem? Thanks for your attention.

No feature scaling.

It seems like there is no feature scaling in dataset.py (when the field type is float).

HyperParam plugin in tensorboard

For better visualization in tensorboard, hyper-parameter plugin could be used. There is a guide here.

	def _filter(self, min_user_inter, min_item_inter):
	self._filter_ratings(self.config.get('low_rating_thres', None))

ustcml / recstudio Goto Github PK

recstudio's People

Contributors

Stargazers

Watchers

Forkers

recstudio's Issues

Recommend Projects

Recommend Topics

Recommend Org