The 2022-nips-tenrec from yuangh-x

about dataset

我想请问 OK平台和QB平台分别是什么文章中好像没有详细说明，是QQ看点和QQ浏览器吗？

复现论文结果

您好，我直接复用了下载数据集中的ctr1M数据，按照正负样本比1:2采样，8:1:1切分，在我自己实现的xdeepfm模型上的效果auc为0.899，远高于论文中的结果，想问一下原论文中提供的参考结果是否训练达到最优结果？

Will you provide video or content links for user to extract multimodal features

It would be great for me to have richer multimodal data.

Not found QK-article.csv in TenRec.zip file

Hi, thank you for your guys sharing the large dataset, I have downloaded the TenRec.zip via the link https://drive.google.com/file/d/1R1JhdT9CHzT3qBJODz09pVpHMzShcQ7a/view?usp=sharing. However, when I decompressed the zip file, I only got three files QB-article.csv, QB-video.csv, and QK-video.csv, while the QK-article.csv was not found. I am not sure if I made a mistake or if the file wasn't in the zip package. Would you like to help me check the reason? Very appreciated.

QK-video & QB-video原始数据中的video_category代表什么含义？

我看这列video_category的值只有 0、1、空值，它具体代表的是什么含义呢？

gender值分别表示?

您好，我想请问一下，在QK-video.csv等文件中gender属性的值似乎仅包含整数值0-2，这些值分别表示？

Read-Percentage为什么有超过100的？

The number of interactions in ctr_data_1M.csv is different from that reported in the paper.

Hi, author! Thank you for your contribution!
I have a question. Why is the number of interactions in ctr_data_1M.csv different from that reported in the paper? I load the ctr_data_1M.csv and find 120342306 interactions in it. But the number of interactions reported in the paper is 86642580. Do I need to filter some interactions? What strategy did you use to filter interactions? Hope to your reply!

ctr_data_1M.csv中的hist_x是什么意思？

是这个用户最近10个cllick过的item id吗？

训练和测试数据划分问题

一般情况下，训练集和测试集的划分是按行为发生时间进行划分，比如用1号数据用于训练，使用2号数据做测试集。
但是看代码中训练集和测试集的划分，ctr任务是随机划分，序列推荐任务是把用户的倒数第二个行为作为valid集，最后一个行为作为test集合。不是严格按照行为发生时间的先后顺序。
这样划分会不会不太好？

BERT4Rec部分数据处理问题

main函数中用train_val_test_split函数将原数据集分为train_data, val_data, test_data 时train_data 取的是序列的[: -2], val_data取的是序列的[-2: -1], test_data 取的是序列的[-1: ]，那么在进行验证的时候以[: -2]为特征，[-2: -1]为ground truth，Build_full_EvalDataset函数（utils.py 第1158行）中的seq = self.u2seq[user][:-1]是否应该为seq = self.u2seq[user][:]？

以及在进行测试的时候是否应该以序列的[: -1]为特征， [-1: ]为ground truth，即在main.py（第90行）构建test_dataset时Build_full_EvalDataset 函数的第一个参数应该为原序列的[: -1]，而不仅仅是 [: -2]？

bert4rec的疑问

bert4rec的dev 负样例和评估跟我看pytorch版本的有些不一样，
1.负样例的个数是全量还是没有参与评估？
虽然我看到有一些候选集的代码，但是没有使用
2.模型的预测
这块的模型预测代码是否有

cold数据集中的item_score表示什么

Session-based Recommendation测试为什么取scores的mean

Session-based Recommendation中
假设seq input为(v1,v2,v3,...,vT)，模型输出tensor形状为（B，T， V），V为全集物品的数目
在训练的时候, 每个时间步对应的输出tensor意义是可以对应为预测下一时刻的物品概率，采用了CE loss作为损失。
但在测试的时候，而是将所有时间步的输出取了平均（对应代码：trainer.py里502行的scores = scores.mean(1)）作为概率，而不是取最后时刻（score[:,-1]）作为输出，这样是不是和训练的时候不一致了呢？

国内数据集链接可以放一个吗？

国内需求还是蛮大的吧？在国内放一个5G数据集的可下载链接，应该也不麻烦？所以可以先暂时放一个？和你们现在做的网站又不冲突

复现论文结果

我用gen_ctr.py产生了100K用户数据，运行ESSM任务。
python main.py --task_name=mtl --seed=100 --model_name=esmm --dataset_path="ctr_data_100000.csv" --train_batch_size=4096 --val_batch_size=4096 --test_batch_size=4096 --epochs=20 --lr=0.0001 --embedding_size=32 --mtl_task_num=2
我发现最好click/like的AUC只有“0.605，0.710”
Epoch 18 train loss is 0.736, click auc is 0.605 and like auc is 0.710

远远低于论文中的ESMM 0.7940 0.9110

想请问下可能是地方需要调整。
附件是我产生的数据。

ctr_data_100000.zip

What are the minimum hardware requirements to reproduce the results of the paper?

esmm多任务的方法我没看出来pctcvr的计算

你好最近我在学习多任务推荐，我看到esmm的方法是通过pcvr*pctr计算出pctcvr，并对pctr和pctcvr进行损失计算训练网络，但是在esmm的模型中我只看出来计算了pctr和pcvr，好像并没有计算pctcvr？

数据集中用户性别存在错误！

对于QK-article和QK-video中，QK-article存在一个用户多个性别的情况；QK-ariticle和QK-video重叠用户的性别无法对应
QB-article和QB-video问题同上
以下是代码验证我描述的问题

感谢你的回答!

gen_ctr.py数据预处理出错

这行报错，

2022-NIPS-Tenrec/Data Processing/gen_ctr.py

Line 44 in e6dbad4

    
           source_data.columns = ['user_id', 'item_id', 'click', 'exp', 'follow', 'like', 'share', 'short_v', 'play_times', 'gender', 'age']

ValueError: Length mismatch: Expected axis has 10 elements, new values have 11 elements

应该没有“exp”这列？

yuangh-x / 2022-nips-tenrec Goto Github PK

2022-nips-tenrec's People

Contributors

Stargazers

Watchers

Forkers

2022-nips-tenrec's Issues

Recommend Projects

Recommend Topics

Recommend Org