kywen1119 / dsran Goto Github PK

View Code? Open in Web Editor NEW

69.0 4.0 12.0 31.47 MB

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

License: Apache License 2.0

Python 97.53% Shell 2.47%

pytorch image-text-matching tcsvt cross-modal computer-vision

dsran's People

Contributors

Stargazers

Watchers

Forkers

kirol1995 mymuli ahagary paper-nlp shiyi-yang911 xiaoxiaoyi123 fdu618lab justfortw xixiareone trellixvulnteam mrinmoy2developer daisy9977525

dsran's Issues

About Flickr30k data/f30k/images

Hello, I would like to ask whether the "images" under the "data" folder store the ".jpg" file. I downloaded the "data" path you gave and put it in the "images". Is this correct? If it is wrong, where can the data in the'images' folder be downloaded?

vocab.py bulid_vocab(....,threshold=300)?

DSRAN/vocab.py

Line 121 in 630d9dc

vocab = build_vocab(data_path, data_name, jsons=annotations, threshold=300)

为什么这里是300，之前的大多设为4，occurrence高于300的词很少吧

about regional/global/textual features graph

I wonder Where are you building the code about regional/global/textual features graph,Thanks a lot.
.

关于model.py

您好，请问在model.py的ImageEncoder中，foward()内的这两句代码是什么意思？
features_top = features_orig[-1]
features = features_top.view(features_top.size(0), features_top.size(1), -1).transpose(2, 1) # b, 49, 2048

about text embedding

请问在text embedding 中，由于句子长短的不同，不同sentence的word个数会不同，您的代码中貌似是采用了用0来padding的做法，这些padding对后续的GAT等网络会产生影响吗，或者说您是否考虑过这个问题？

关于公式(9)和公式(10)

非常感谢您开源论文的代码。

请问一下，
（1）在论文中公式(9)和公式(10)的计算，在开源的代码中，model.py或者GAT.py文件中，哪几行是计算这个的呢？
（2）看到您在跑mscoco数据集时，batch_size=300，想问一下，您的实验硬件环境(GPU的个数、型号、单个显存大小)是什么？
（3）在ResNet152的基础上，加入GAT网络，模型的参数引入较大，在mscoco训练的时候，有什么技巧呢？

你好，请问图像全局特征怎么提取的，直接用resnet代码和模型就可么？能分享一下提取的全局特征么

你好，请问图像全局特征怎么提取的，直接用resnet代码和模型就可么？能分享一下提取的全局特征么，先谢谢了

关于Two-Models Ensemble的问题

你好，我想请教论文表1中的Two-Models Ensemble这一实验，不是很能理解，想请教一下作者，是哪两种model进行ensemble呢，以及对于bert和gru又如何做不同的ensemble？

非常感谢！

about Dataparallel training

I wonder if you have used nn.DataParallel in GRU model before?
I tried that but failed. The error showed that the input data is still put in one GPU. The input data weren't been cutted into several pieces which corresponding to each GPU.

About Rerank

Hello, I would like to consult "sims_f.npy". Is it trained by you and can only be used for your model or can I use rerank on other models to use this file? And how did it get it?

关于内存

作者可以透漏一下这个实验训练测试中所需要的内存吗？还有来自VLP巨大的region feature和之前SCAN中用的pre-comp feature有什么不同，只是单纯的框数量增到100吗？我看网络流程图中检测regions是用fasterRCNN，我以为是和前人一样用的BUTD的pre-comp，实际上是用的VLP的100框对吗？这个对于效果影响大吗？可以换回之前的pre-comp吗？因为我实验环境内存有限。🤦‍♂️允悲

ZeroDivisionError: float division by zero

Computing results...
Test: [0/40] Le 62.6173 (62.6172) Time 21.456 (0.000)
Test: [10/40] Le 60.1161 (61.4845) Time 0.555 (0.000)
Test: [20/40] Le 60.0993 (61.2988) Time 0.562 (0.000)
Test: [30/40] Le 63.2631 (61.8778) Time 0.469 (0.000)
encode_time:43.724445
5k---------------
torch.Size([1000, 1024])
Images: 1000, Captions: 5000
imgs: 1000, caps: 5000
i2t:r1: 76.1, r5: 93.2, r10: 97.0
t2i:r1: 57.5, r5: 84.4, r10: 91.0
rsum=499.2
sims_time:1.139145
1k---------------
0
Images: 200, Captions: 1000
imgs: 200, caps: 1000
i2t:r1: 90.5, r5: 97.5, r10: 98.5
t2i:r1: 76.2, r5: 95.7, r10: 98.2
rsum=556.6

1
Images: 0, Captions: 0
imgs: 0, caps: 0
Traceback (most recent call last):
File "evaluation_bert.py", line 351, in
main()
File "evaluation_bert.py", line 348, in main
evalrank(opt.model + '/' + opt.name + ".pth.tar", data_path = opt.data_path, split="test", fold5=opt.fold, region_bbox_file=opt.region_bbox_file, feature_path=opt.feature_path)
File "evaluation_bert.py", line 201, in evalrank
r = simrank(sims)
File "evaluation_bert.py", line 304, in simrank
r1 = 100.0 * len(numpy.where(ranks < 1)[0]) / len(ranks)
ZeroDivisionError: float division by zero

————————————————————————————————
I've noticed
Images: 0, Captions: 0
imgs: 0, caps: 0

What could let this happen ？

关于pre-computed image features

您好，我想问一下代码中提到的pre-computed image features包含哪些内容呢？它和SCAN提供的bottom-up features有什么不同吗？

关于feature_path and region_bbox_file

您好，请问feature_path 与 region_bbox_file这两个参数地址分别指的是什么？方便的话可以上传一下文件吗？或者可以讲一下是怎么获得的。十分感谢！

关于MSCOCO测试集

因为一个图片对应的是五个caption,所以测试的是1K TEST 5折交叉验证用的是多少size的数据集, 5K TEST又用的是多少size的数据集?
在evalrank()函数中,有
for i in range(5):
print(i)
img_emb_new = img_embs[i * 5000 : int(i * 5000 + img_embs.size(0)/5):5]
cap_emb_new = cap_embs[i * 5000 : int(i * 5000 + cap_embs.size(0)/5)]
这里明显,如果用1K张图扩展五倍构成的数据集,是一定会下标越界的.