Giter Site home page Giter Site logo

kywen1119 / dsran Goto Github PK

View Code? Open in Web Editor NEW
69.0 4.0 12.0 31.47 MB

Code for journal paper "Learning Dual Semantic Relations with Graph Attention for Image-Text Matching", TCSVT, 2020.

License: Apache License 2.0

Python 97.53% Shell 2.47%
pytorch image-text-matching tcsvt cross-modal computer-vision

dsran's People

Contributors

kywen1119 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dsran's Issues

About Flickr30k data/f30k/images

Hello, I would like to ask whether the "images" under the "data" folder store the ".jpg" file. I downloaded the "data" path you gave and put it in the "images". Is this correct? If it is wrong, where can the data in the'images' folder be downloaded?

关于model.py

您好,请问在model.py的ImageEncoder中,foward()内的这两句代码是什么意思?
features_top = features_orig[-1]
features = features_top.view(features_top.size(0), features_top.size(1), -1).transpose(2, 1) # b, 49, 2048

about text embedding

请问在text embedding 中,由于句子长短的不同,不同sentence的word个数会不同,您的代码中貌似是采用了用0来padding的做法,这些padding对后续的GAT等网络会产生影响吗,或者说您是否考虑过这个问题?

关于公式(9)和公式(10)

非常感谢您开源论文的代码。

请问一下,
(1)在论文中公式(9)和公式(10)的计算,在开源的代码中,model.py或者GAT.py文件中,哪几行是计算这个的呢?
(2)看到您在跑mscoco数据集时,batch_size=300,想问一下,您的实验硬件环境(GPU的个数、型号、单个显存大小)是什么?
(3)在ResNet152的基础上,加入GAT网络,模型的参数引入较大,在mscoco训练的时候,有什么技巧呢?

关于Two-Models Ensemble的问题

你好,我想请教论文表1中的Two-Models Ensemble这一实验,不是很能理解,想请教一下作者,是哪两种model进行ensemble呢,以及对于bert和gru又如何做不同的ensemble?

非常感谢!

about Dataparallel training

I wonder if you have used nn.DataParallel in GRU model before?
I tried that but failed. The error showed that the input data is still put in one GPU. The input data weren't been cutted into several pieces which corresponding to each GPU.

About Rerank

Hello, I would like to consult "sims_f.npy". Is it trained by you and can only be used for your model or can I use rerank on other models to use this file? And how did it get it?

关于内存

作者可以透漏一下这个实验训练测试中所需要的内存吗?还有来自VLP巨大的region feature和之前SCAN中用的pre-comp feature有什么不同,只是单纯的框数量增到100吗?我看网络流程图中检测regions是用fasterRCNN,我以为是和前人一样用的BUTD的pre-comp,实际上是用的VLP的100框对吗?这个对于效果影响大吗?可以换回之前的pre-comp吗?因为我实验环境内存有限。🤦‍♂️允悲

ZeroDivisionError: float division by zero

Computing results...
Test: [0/40] Le 62.6173 (62.6172) Time 21.456 (0.000)
Test: [10/40] Le 60.1161 (61.4845) Time 0.555 (0.000)
Test: [20/40] Le 60.0993 (61.2988) Time 0.562 (0.000)
Test: [30/40] Le 63.2631 (61.8778) Time 0.469 (0.000)
encode_time:43.724445
5k---------------
torch.Size([1000, 1024])
Images: 1000, Captions: 5000
imgs: 1000, caps: 5000
i2t:r1: 76.1, r5: 93.2, r10: 97.0
t2i:r1: 57.5, r5: 84.4, r10: 91.0
rsum=499.2
sims_time:1.139145
1k---------------
0
Images: 200, Captions: 1000
imgs: 200, caps: 1000
i2t:r1: 90.5, r5: 97.5, r10: 98.5
t2i:r1: 76.2, r5: 95.7, r10: 98.2
rsum=556.6

1
Images: 0, Captions: 0
imgs: 0, caps: 0
Traceback (most recent call last):
File "evaluation_bert.py", line 351, in
main()
File "evaluation_bert.py", line 348, in main
evalrank(opt.model + '/' + opt.name + ".pth.tar", data_path = opt.data_path, split="test", fold5=opt.fold, region_bbox_file=opt.region_bbox_file, feature_path=opt.feature_path)
File "evaluation_bert.py", line 201, in evalrank
r = simrank(sims)
File "evaluation_bert.py", line 304, in simrank
r1 = 100.0 * len(numpy.where(ranks < 1)[0]) / len(ranks)
ZeroDivisionError: float division by zero

————————————————————————————————
I've noticed
Images: 0, Captions: 0
imgs: 0, caps: 0

What could let this happen ?

关于pre-computed image features

您好,我想问一下代码中提到的pre-computed image features包含哪些内容呢?它和SCAN提供的bottom-up features有什么不同吗?

关于feature_path and region_bbox_file

您好,请问feature_path 与 region_bbox_file这两个参数地址分别指的是什么?方便的话可以上传一下文件吗?或者可以讲一下是怎么获得的。十分感谢!

关于MSCOCO测试集

因为一个图片对应的是五个caption,所以测试的是1K TEST 5折交叉验证用的是多少size的数据集, 5K TEST又用的是多少size的数据集?
在evalrank()函数中,有
for i in range(5):
print(i)
img_emb_new = img_embs[i * 5000 : int(i * 5000 + img_embs.size(0)/5):5]
cap_emb_new = cap_embs[i * 5000 : int(i * 5000 + cap_embs.size(0)/5)]
这里明显,如果用1K张图扩展五倍构成的数据集,是一定会下标越界的.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.