jnhwkim / ban-vqa Goto Github PK
View Code? Open in Web Editor NEWBilinear attention networks for visual question answering
License: MIT License
Bilinear attention networks for visual question answering
License: MIT License
Hello, thanks for your work! The links of questions and annotations in download.sh are unaccessible to me, so I use questions and annotations from VQA[https://visualqa.org] like this one (https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip). However, I got huge train_loss while running python main.py --use_both True --use_vg True --batch_size 32
.
I was wondering if I used the wrong data. If so, could anyone please tell me or provide another valid link?
While running adaptive_detection_features_converter.py for the TSV files, I am getting this error and can't resolve it. Any leads here would be helpful. This error occurs when trying to decode the features/boxes from the tsv file.
File "tools/adaptive_detection_features_converter.py", line 156, in extract
bboxes = np.frombuffer(base64.decodestring(item['boxes']), dtype=np.float32).reshape((item['num_boxes'], -1))
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 554, in decodestring
return decodebytes(s)
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 546, in decodebytes
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
Hi,
I am wondering if you can share the code for BAN on Flickr30k.
Thanks very much!
Dear Authors,
the link to download the pre-trained model for Flickr30k no longer works. Could you please update it again?
Link not working: https://drive.google.com/uc?export=download&id=1xiVVRPsbabipyHes25iE0uj2YkdKWv3K
Davide
Hi, thanks for the library.
Is it possible to share details of your ensemble method?
Hello,
I'd like to use the model, I expect to enter a question in string, and an image path, but in the test.py, the input is the saved model, I am wondering where to input the question and image to test the model?
line 39 in bc.py:
self.h_net = weight_norm(nn.Linear(h_dim, h_out), dim=None)
is this should be
self.h_net = weight_norm(nn.Linear(h_dim*self.k, h_out), dim=None)
Hi Kim:
Thanks for sharing your great work and elegant codes.
I have questions about your test-dev results. As your READER.MD indicated, the training contains the data-augmentation trick with Visual Genome. However, the compared models (Counter, Bottom-up) in your paper did not use Visual Genome for training. That seems an unfair comparison.
Have you trained BAN model without Visual Genome? I think it could better verify your model high efficiency.
Hi, thank you for sharing your code. I was wondering what does data/train36_imgid2idx.pkl contain exactly ?
Hello,
I'm the first time try to use a vqa network, and I wonder how can I use the pretrained model to ask a question on a image and get a response? Thank you.
I followed all the instructions and use the default hyperparameters, which should give me the best results. However, if I set random seed=1204 as default, I can only get 69.84 on test-dev split, which is 0.2 lower than the reported results. And I notice that the standard deviations reported on val split is around 0.11.
Can you give me some advice on how to fix the gap?
Thx!
Hello, thanks for your great code! I have some trouble while running
python3 main.py --use_both True --use_vg True
I have 4 TITAN Xps, which has 12.2G memory per GPU, and set the batchsize to 256. Then I get the following error:
nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 97, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/Project/ban-vqa/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58
And If I set batchsize to 128, it will occupy ~12G GPU memory during the early stage and then goes down to ~6G per GPU. Is there something wrong with my execution?
Thx!
Hello guys,
Very nice piece of work.
I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.
This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.
Best,
T.C
When i run the main.py
I got un error
main.py: error: unrecognized arguments: True True
Then, I fixed the command
$ python3 main.py --use_both True --use_vg True
into
$ python3 main.py --use_both --use_vg
when I run python test.py --label mytest
, get this error:
Traceback (most recent call last):
File "test.py", line 91, in <module>
eval_dset = VQAFeatureDataset(args.split, dictionary, adaptive=True)
File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 244, in __init__
self.entries = _load_dataset(dataroot, name, self.img_id2idx, self.label2ans)
File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 142, in _load_dataset
entries.append(_create_entry(img_id2val[img_id], question, None))
KeyError: 1
I find data/test2015_imgid2idx.pkl
is {}
, the file is generated with python3 tools/adaptive_detection_features_converter.py
.
Can you help me? @jnhwkim Thanks in advance for any suggestions.
I have downloaded everything listed in tools/download.sh.
Could you provide the missing data as well?
Thank you.
Traceback (most recent call last):
File "tools/adaptive_detection_features_converter.py", line 199, in
extract('train', infiles, args.task)
File "tools/adaptive_detection_features_converter.py", line 94, in extract
imgids = utils.load_imageid(path_imgs[split])
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 47, in load_imageid
images = load_folder(folder, 'jpg')
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 40, in load_folder
for f in sorted(os.listdir(folder)):
FileNotFoundError: [Errno 2] No such file or directory: 'data/train2014'
Hi, I am tring to run your repository, but I keep getting the following error:
Namespace(batch_size=128, epochs=13, gamma=8, input=None, model='ban', num_hid=1280, op='c', output='saved_models/ban', seed=1204, tfidf=True, use_both=False, use_vg=False)
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
File "main.py", line 50, in <module>
train_dset = VQAFeatureDataset('train', dictionary, adaptive=True)
File "/home/michas/Desktop/codes/ban-vqa/dataset.py", line 234, in __init__
self.features = np.array(hf.get('image_features'))
MemoryError
I suppose it is hapenning because of trying to load the whole dataset as a numpy array into RAM (which I have 32GB). Can you suggest any solution?
Thanks
I want to only inference using this model.
Is it possible to have only the pre-trained model file for inference?
If not, should I run both download.sh and download_data.sh for inference only?
When I run python3 test.py --label mytest, i got a warning 'RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().' and code still complete but the result was evaluated on VQA challenge only 1% for overall. I use the your pretrained model and feature.
Are the hdf5 files in the downloaded flickr30k_features.zip used to reproduce the results? I don't see tsv files in flickr30k_features.zip but I do need the features and bounding boxes for flickr 30k validation/testing sets. The files in flickr30k_features.zip are confusing, for example, in val.hdf5 file, there are (30722, 2048) features, but in adaptive_detection_features_converter.py, known_num_boxes for validation set is 29906, so what are these 30722 features?
Hi JinHwa:
I wonder why the learning rate is different with or without evalLoader? Thanks in advance!
Best
Jiasen
Hi,
Love your work and repository
Just want to now how can I get the attention visualization? (like Figures 3,4 in the paper)
Hello,
I am trying to evaluate the pretrained model on the VQA dataset. If possible, I would like to ask you the following questions:
Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
File "evaluate.py", line 47, in <module>
model.load_state_dict(model_data.get('model_state', model_data))
File "/home/claudio.greco/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
.format(name))
KeyError: 'unexpected key "module.w_emb.emb_.weight" in state_dict'
Probably, this happens because the default parameters of the script do not match the ones of the pretrained model. Am I right?
In order to solve problem (1), I executed the command "python3.6 evaluate.py. --num_hid=1280 --op='c' --gamma=8". In this case, it works, but the script returns the result "eval score: 82.23 (92.66)", which seems a bit too high to me. With what row and table in the paper should I compare this result to?
I tried to evaluate the pretrained model on the test split of the VQA dataset by changing "eval_dset = VQAFeatureDataset('dev', dictionary, adaptive=True)" to "eval_dset = VQAFeatureDataset('test2015', dictionary, adaptive=True)" in the evaluate.py script. However, in that case, the script returns the following error:
Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
/mnt/8tera/claudio.greco/ban-vqa/language_model.py:95: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
output, hidden = self.rnn(x, hidden)
Traceback (most recent call last):
File "evaluate_new.py", line 51, in <module>
eval_score, bound, entropy = evaluate(model, eval_loader)
File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 121, in evaluate
batch_score = compute_score_with_logits(pred, a.cuda()).sum()
File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 26, in compute_score_with_logits
one_hots.scatter_(1, logits.view(-1, 1), 1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)
Do you know why this is happening?
Thank you very much!
Upon reproducing this result, I encounter the following error.
Traceback (most recent call last):
File "main.py", line 96, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/tingting/Documents/tingting/ban-vqa/train.py", line 72, in train
pred, att = model(v, b, q, a)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
return replicate(module, device_ids)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
RuntimeError: slice() cannot be applied to a 0-dim tensor
After tracing this code, I found that if I delete "nn.DataParallel(model).cuda()", it worked well.
I use 4 GTX 1080 ti. Have you encountered the same thing before?
nevermind, sovled
Dear authors,
I saw your previous answer, but I didn't have time to answer before the issue was closed.
I have tried two different Linux systems and have also tried on Windows. I have tried Chrome and Firefox. I can download the package but not unzip it because it gives me an error with the train.hdf5
file. It says the file is corrupted. I also tried two different internet connections. I can't unzip without errors. I have tried to download the file several times, but the result is always the same.
Could you please check the train.hdf5
file?
Davide
Originally posted by @drigoni in #46 (comment)
Hi Kim:
Thanks for your excellent work and code
Did you use the visual genome dataset for training to get the results on validation set which is listed in table 1 in your paper? As you compared with bottom-up and top-down results which used the VG dataset, so I assume you also used the VG dataset + VQA 2.0 train to get the final results on validation set, am i right ?
It seems like the Flickr30K grounding task in the report is not included in the repo.
Am I missing something?
I don't have 'data/question_answers.json' and 'image_data/json',how to get it or generate it
In attention.py, forward_all method returns only one values.
Line 53 in cf0c8e1
My machine has 3 Titan 1080 Ti, 12 Intel i7 CPUs. Its total memory is 65GB. However, the program cost me more than 5800s to run an epoch.
My command is python3 main.py --use_both True --use_vg True --batch_size 128
because batch size 256 will out of memory.
epoch 1, time: 5844.42
train_loss: 3.32, norm: 4.2468, score: 51.21
gradual warmup lr: 0.0010
epoch 2, time: 5844.72
train_loss: 3.05, norm: 2.5201, score: 55.44
gradual warmup lr: 0.0014
epoch 3, time: 5839.73
train_loss: 2.90, norm: 1.7370, score: 58.02
lr: 0.0014
epoch 4, time: 5835.09
train_loss: 2.75, norm: 1.3749, score: 60.45
lr: 0.0014
epoch 5, time: 5837.11
train_loss: 2.64, norm: 1.2232, score: 62.33
lr: 0.0014
epoch 6, time: 5829.90
train_loss: 2.54, norm: 1.1545, score: 63.88
lr: 0.0014
epoch 7, time: 5832.88
train_loss: 2.46, norm: 1.1238, score: 65.32
lr: 0.0014
epoch 8, time: 5834.77
train_loss: 2.39, norm: 1.1157, score: 66.59
hi, @jnhwkim I have a question about visual genome version.
I find visual genome version is 1.2 in README.md
.
but in dataset.py
, the 1.2 version of image_data.json
does not have a key called id
, this key in version 1.0.
so which version should I use?
thank you first!
Hello :)
first of all thank you for sharing your repo!
i am having trouble creating those files:
indices_file = {
'train': 'data/train_imgid2idx.pkl',
'val': 'data/val_imgid2idx.pkl',
'test': 'data/test2015_imgid2idx.pkl'}
ids_file = {
'train': 'data/train_ids.pkl',
'val': 'data/val_ids.pkl',
'test': 'data/test2015_ids.pkl'}
because the utils.py demands .jpeg images to create the indexes which are not created at this point. Could you be so kind to share the id.pkls?
thank you and best regards
Max
When you run evaluate.py for the pretrained model, is there a way to run evaluate without needing a GPU/Cuda?
Dear authors:
the link image metadata and question answers of VQA are no longer works.could you support it again?
Hi,
I downloaded flickr30k_features.zip from https://drive.google.com/file/d/1BmcxeY1kXzMZv54d4wMtl7HGc8Cs9zgO/view?usp=sharing but all the tsv files are not in this zip file. Where can I get them? Also, why there exist train/val/test.hdf5 files in this zip file? I thought these files should be generated through adaptive_detection_features_converter.py.
Thanks a lot for sharing code!
After downloading cache.pkl.tgz and entering the following command:
tar xvf data/cache/cache.pkl.tgz -C data/cache/
I got:
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
Is there something wrong with the cache file on google drive?
I can not download the image feature, so can you provide another way to download it?
Same as above
I read your paper very enjoyable.
However, there is a small problem with code execution.
http://visualq.org/ related links in download.sh do not work.
Does the link in https://visualqa.org/download.html work as expected?
Thank you for your research.
What does this mean in the code, logits = torch.einsum('xhyk,bvk,bqk->bhvq', (self.h_mat, v_, q_)) + self.h_bias
What does the 'xhyk,bvk,bqk->bhvq' mean???
Hello,
I used Bottom-up Attention to get boxes for Flickr30k data. Unfortunately, I could not get the same upperbound you reported in the paper. I get 0.6507 you reported 0.8745. Do you mind providing the details how you used Bottom-up model for inducing boxes. Below I listed mine:
model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.2
min_boxes=10
max_boxes=100
UPDATE:
When I increase the number of boxes I get better upperbound but still it is not as good as yours, below setup gives me upperbound 0.8530
model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.01
min_boxes=200
max_boxes=200
Dear authors,
I am following the instruction reported in file ./tools/download_flickr.sh
.
I have succeeded in downloading the Flickr30k Image Features: (Link: https://drive.google.com/file/d/1BmcxeY1kXzMZv54d4wMtl7HGc8Cs9zgO/view?usp=sharing), but I am not able to unzip the file due to an error. I tried different ways to unzip it but I am not able to do it.
Could you please check the file?
Best regards,
Davide
Hi, I am very interested in your BAN model on flickr30k. I am wondering do you provide labels for detected objects together with bounding boxes and features, just like what faster-rcnn or bottom-up attention would do? Since I am not so sure about how you prepared your dataset, I'm afraid if I use pre-trained models to predict labels myself, the dataloader pipeline would have some problem. Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.