2022-nips-tenrec's People
Forkers
kamalsky jiangyiheng1 xiaoqingwang htw2012 michaelhuazhang kk19990709 duduruhappy ninicoder njuelephant2021 shadowkun linlifan chia-chiao placeboooo yutian012432 dalian-ai timedcy joycewang071 zhshch fajieyuan heimmer rphilipzhang zongwuwang jhaaar westlake-repl veritatis lilyprit2022-nips-tenrec's Issues
about dataset
我想请问 OK平台和QB平台 分别是什么 文章中好像没有详细说明,是QQ看点和QQ浏览器吗?
Not found QK-article.csv in TenRec.zip file
Hi, thank you for your guys sharing the large dataset, I have downloaded the TenRec.zip via the link https://drive.google.com/file/d/1R1JhdT9CHzT3qBJODz09pVpHMzShcQ7a/view?usp=sharing. However, when I decompressed the zip file, I only got three files QB-article.csv, QB-video.csv, and QK-video.csv, while the QK-article.csv was not found. I am not sure if I made a mistake or if the file wasn't in the zip package. Would you like to help me check the reason? Very appreciated.
bert4rec的疑问
bert4rec的dev 负样例和评估跟我看pytorch版本的有些不一样,
1.负样例的个数是全量还是没有参与评估?
虽然我看到有一些候选集的代码,但是没有使用
2.模型的预测
这块的模型预测代码是否有
can't download dataset
![image](https://private-user-images.githubusercontent.com/51738591/273370379-a2f9e9d2-a898-4e3b-ae0f-bd6cea6a82a2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTU5NjMwODIsIm5iZiI6MTcxNTk2Mjc4MiwicGF0aCI6Ii81MTczODU5MS8yNzMzNzAzNzktYTJmOWU5ZDItYTg5OC00ZTNiLWFlMGYtYmQ2Y2VhNmE4MmEyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA1MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNTE3VDE2MTk0MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBiZmNlYzdjNjE3NWJhYTI1YThkOTIyYzk1NWRlYTI1MjdhMjRhYjIyM2VkN2ZmYjc4YzI0Mzg3MmRjOTFhOTQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.mAB0Wy_RuTR7jxT1AZxexkJEnCkS0sK0e29SzYWXCSE)
进入数据集下载页面https://static.qblv.qq.com/qblv/h5/algo-frontend/tenrec_dataset.html
输入相关信息并勾选协议后,显示上述错误,更换邮箱信息依然报错,请问是什么原因呢?🙏
QK-video & QB-video原始数据中的video_category代表什么含义?
我看这列video_category的值只有 0、1、空值,它具体代表的是什么含义呢?
What are the minimum hardware requirements to reproduce the results of the paper?
训练和测试数据划分问题
一般情况下,训练集和测试集的划分是按行为发生时间进行划分,比如用1号数据用于训练,使用2号数据做测试集。
但是看代码中训练集和测试集的划分,ctr任务是随机划分,序列推荐任务是把用户的倒数第二个行为作为valid集,最后一个行为作为test集合。不是严格按照行为发生时间的先后顺序。
这样划分会不会不太好?
gender值分别表示?
您好,我想请问一下,在QK-video.csv等文件中gender属性的值似乎仅包含整数值0-2,这些值分别表示?
BERT4Rec部分数据处理问题
main
函数中用train_val_test_split
函数将原数据集分为train_data, val_data, test_data
时train_data
取的是序列的[: -2]
, val_data
取的是序列的[-2: -1]
, test_data
取的是序列的[-1: ]
,那么在进行验证的时候以[: -2]
为特征,[-2: -1]
为ground truth,Build_full_EvalDataset
函数(utils.py
第1158行)中的seq = self.u2seq[user][:-1]
是否应该为seq = self.u2seq[user][:]
?
以及在进行测试的时候是否应该以序列的[: -1]
为特征, [-1: ]
为ground truth,即在main.py
(第90行)构建test_dataset
时Build_full_EvalDataset
函数的第一个参数 应该为原序列的[: -1]
, 而不仅仅是 [: -2]
?
cold数据集中的item_score表示什么
esmm多任务的方法我没看出来pctcvr的计算
你好最近我在学习多任务推荐,我看到esmm的方法是通过pcvr*pctr计算出pctcvr,并对pctr和pctcvr进行损失计算训练网络,但是在esmm的模型中我只看出来计算了pctr和pcvr,好像并没有计算pctcvr?
The number of interactions in ctr_data_1M.csv is different from that reported in the paper.
Hi, author! Thank you for your contribution!
I have a question. Why is the number of interactions in ctr_data_1M.csv different from that reported in the paper? I load the ctr_data_1M.csv and find 120342306 interactions in it. But the number of interactions reported in the paper is 86642580. Do I need to filter some interactions? What strategy did you use to filter interactions? Hope to your reply!
timestamp
请问论文中描述的tenrec的timestamp怎么没有在csv中看到
Session-based Recommendation测试为什么取scores的mean
Session-based Recommendation中
假设seq input为(v1,v2,v3,...,vT),模型输出tensor形状为(B,T, V),V为全集物品的数目
在训练的时候, 每个时间步对应的输出tensor意义是可以对应为预测下一时刻的物品概率,采用了CE loss作为损失。
但在测试的时候,而是将所有时间步的输出取了平均(对应代码:trainer.py里502行的scores = scores.mean(1))作为概率,而不是取最后时刻(score[:,-1])作为输出,这样是不是和训练的时候不一致了呢?
Will you provide video or content links for user to extract multimodal features
It would be great for me to have richer multimodal data.
age属性的值的含义
您好,我想请问一下,在QK-video.csv、sbr_data_1M.csv等文件中age属性的值似乎仅包含整数值0-5,这些值分别表示哪个年龄段呢?
数据集中用户性别存在错误!
gen_ctr.py数据预处理出错
这行报错,
ValueError: Length mismatch: Expected axis has 10 elements, new values have 11 elements
应该没有“exp”这列?
ctr_data_1M.csv中的hist_x是什么意思?
是这个用户最近10个cllick过的item id吗?
复现论文结果
我用gen_ctr.py产生了100K用户数据,运行ESSM任务。
python main.py --task_name=mtl --seed=100 --model_name=esmm --dataset_path="ctr_data_100000.csv" --train_batch_size=4096 --val_batch_size=4096 --test_batch_size=4096 --epochs=20 --lr=0.0001 --embedding_size=32 --mtl_task_num=2
我发现最好click/like的AUC只有“0.605,0.710”
Epoch 18 train loss is 0.736, click auc is 0.605 and like auc is 0.710
远远低于论文中的ESMM 0.7940 0.9110
想请问下可能是地方需要调整。
附件是我产生的数据。
dataset下载不下来
google drive下载老中断,方便分享一个其他的下载方式吗
Read-Percentage为什么有超过100的?
数据集是否可以处理后进行公开
非常感谢您们发布的这么高质量的数据集!有个问题想问一下:如果我在论文中使用了该数据集,那我“在论文的github库中提供处理后(处理具体是指user id和item id的remap,item过滤)的数据集的下载”是否会违反使用协议呢?
再次感谢,期待您的回复!
数据集使用范围?
请问下该数据集的使用范围是什么?
针对条款没太特别清楚数据集的下载和使用范围,是否可以用于论文写作中充当模型训练,测试这样的用途呢?
还是说不能用于发表论文,只能用于自己学习使用?
感谢你的回答!
国内数据集链接可以放一个吗?
国内需求还是蛮大的吧?在国内放一个5G数据集的可下载链接,应该也不麻烦?所以可以先暂时放一个?和你们现在做的网站 又不冲突
复现论文结果
您好,我直接复用了下载数据集中的ctr1M数据,按照正负样本比1:2采样,8:1:1切分,在我自己实现的xdeepfm模型上的效果auc为0.899,远高于论文中的结果,想问一下原论文中提供的参考结果是否训练达到最优结果?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.