Giter Site home page Giter Site logo

jnhwkim / ban-vqa Goto Github PK

View Code? Open in Web Editor NEW
536.0 536.0 101.0 1.24 MB

Bilinear attention networks for visual question answering

License: MIT License

Python 95.48% Shell 4.52%
attention bilinear-pooling pytorch-implmention visual-question-answering

ban-vqa's People

Contributors

jaesuny avatar jnhwkim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ban-vqa's Issues

Unaccessible questions and annotations

Hello, thanks for your work! The links of questions and annotations in download.sh are unaccessible to me, so I use questions and annotations from VQA[https://visualqa.org] like this one (https://s3.amazonaws.com/cvmlp/vqa/mscoco/vqa/v2_Annotations_Train_mscoco.zip). However, I got huge train_loss while running python main.py --use_both True --use_vg True --batch_size 32.
I was wondering if I used the wrong data. If so, could anyone please tell me or provide another valid link?

error when using adaptive_detection_features_converter.py

While running adaptive_detection_features_converter.py for the TSV files, I am getting this error and can't resolve it. Any leads here would be helpful. This error occurs when trying to decode the features/boxes from the tsv file.

File "tools/adaptive_detection_features_converter.py", line 156, in extract
bboxes = np.frombuffer(base64.decodestring(item['boxes']), dtype=np.float32).reshape((item['num_boxes'], -1))
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 554, in decodestring
return decodebytes(s)
File "/home/reddy/myvenv/lib/python3.6/base64.py", line 546, in decodebytes
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding

Ensemble details

Hi, thanks for the library.
Is it possible to share details of your ensemble method?

test.py

Hello,

I'd like to use the model, I expect to enter a question in string, and an image path, but in the test.py, the input is the saved model, I am wondering where to input the question and image to test the model?

bug in bc.py

line 39 in bc.py:
self.h_net = weight_norm(nn.Linear(h_dim, h_out), dim=None)
is this should be
self.h_net = weight_norm(nn.Linear(h_dim*self.k, h_out), dim=None)

Compared models without using Visual Genome

Hi Kim:

Thanks for sharing your great work and elegant codes.

I have questions about your test-dev results. As your READER.MD indicated, the training contains the data-augmentation trick with Visual Genome. However, the compared models (Counter, Bottom-up) in your paper did not use Visual Genome for training. That seems an unfair comparison.

Have you trained BAN model without Visual Genome? I think it could better verify your model high efficiency.

train36_imgid2idx.pkl file

Hi, thank you for sharing your code. I was wondering what does data/train36_imgid2idx.pkl contain exactly ?

How to use the pre

Hello,
I'm the first time try to use a vqa network, and I wonder how can I use the pretrained model to ask a question on a image and get a response? Thank you.

cannot reproduce the best result of single model

I followed all the instructions and use the default hyperparameters, which should give me the best results. However, if I set random seed=1204 as default, I can only get 69.84 on test-dev split, which is 0.2 lower than the reported results. And I notice that the standard deviations reported on val split is around 0.11.
Can you give me some advice on how to fix the gap?
Thx!

Out of memory while executing loss.backward()

Hello, thanks for your great code! I have some trouble while running

python3 main.py --use_both True --use_vg True

I have 4 TITAN Xps, which has 12.2G memory per GPU, and set the batchsize to 256. Then I get the following error:

nParams= 90618566
optim: adamax lr=0.0007, decay_step=2, decay_rate=0.25, grad_clip=0.25
gradual warmup lr: 0.0003
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
File "main.py", line 97, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/Project/ban-vqa/train.py", line 74, in train
loss.backward()
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/anaconda3/envs/pytorch/lib/python3.5/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/torch/lib/THC/generic/THCStorage.cu:58

And If I set batchsize to 128, it will occupy ~12G GPU memory during the early stage and then goes down to ~6G per GPU. Is there something wrong with my execution?
Thx!

Question

Hello guys,

Very nice piece of work.
I was wondering why you didn't use a
einsum implementation of the bilinear attention in order to speed up training.
image
This equation is perfect for it. U should have a significant gain, and it would be nice for once to have highly optimized code available on github.

Best,
T.C

I got an error with arguments

When i run the main.py
I got un error

main.py: error: unrecognized arguments: True True

Then, I fixed the command

$ python3 main.py --use_both True --use_vg True

into

$ python3 main.py --use_both --use_vg

Run test get KeyError: 1 error

when I run python test.py --label mytest, get this error:

Traceback (most recent call last):
  File "test.py", line 91, in <module>
    eval_dset = VQAFeatureDataset(args.split, dictionary, adaptive=True)
  File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 244, in __init__
    self.entries = _load_dataset(dataroot, name, self.img_id2idx, self.label2ans)
  File "/home/gwh/Downloads/ban-vqa-master/dataset.py", line 142, in _load_dataset
    entries.append(_create_entry(img_id2val[img_id], question, None))
KeyError: 1

I find data/test2015_imgid2idx.pkl is {}, the file is generated with python3 tools/adaptive_detection_features_converter.py.

Can you help me? @jnhwkim Thanks in advance for any suggestions.

error from tools/process.sh

I have downloaded everything listed in tools/download.sh.
Could you provide the missing data as well?
Thank you.

Traceback (most recent call last):
File "tools/adaptive_detection_features_converter.py", line 199, in
extract('train', infiles, args.task)
File "tools/adaptive_detection_features_converter.py", line 94, in extract
imgids = utils.load_imageid(path_imgs[split])
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 47, in load_imageid
images = load_folder(folder, 'jpg')
File "/home/sizhangyu/Documents/pytorch_code/ban-vqa/utils.py", line 40, in load_folder
for f in sorted(os.listdir(folder)):
FileNotFoundError: [Errno 2] No such file or directory: 'data/train2014'

Memory error

Hi, I am tring to run your repository, but I keep getting the following error:

Namespace(batch_size=128, epochs=13, gamma=8, input=None, model='ban', num_hid=1280, op='c', output='saved_models/ban', seed=1204, tfidf=True, use_both=False, use_vg=False)
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
  File "main.py", line 50, in <module>
    train_dset = VQAFeatureDataset('train', dictionary, adaptive=True)
  File "/home/michas/Desktop/codes/ban-vqa/dataset.py", line 234, in __init__
    self.features = np.array(hf.get('image_features'))
MemoryError

I suppose it is hapenning because of trying to load the whole dataset as a numpy array into RAM (which I have 32GB). Can you suggest any solution?
Thanks

Which files are needed for inference only?

I want to only inference using this model.

Is it possible to have only the pre-trained model file for inference?
If not, should I run both download.sh and download_data.sh for inference only?

Evaluating accuracy on test?

When I run python3 test.py --label mytest, i got a warning 'RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greately increasing memory usage. To compact weights again call flatten_parameters().' and code still complete but the result was evaluated on VQA challenge only 1% for overall. I use the your pretrained model and feature.

flickr 30k features download

Are the hdf5 files in the downloaded flickr30k_features.zip used to reproduce the results? I don't see tsv files in flickr30k_features.zip but I do need the features and bounding boxes for flickr 30k validation/testing sets. The files in flickr30k_features.zip are confusing, for example, in val.hdf5 file, there are (30722, 2048) features, but in adaptive_detection_features_converter.py, known_num_boxes for validation set is 29906, so what are these 30722 features?

Attention Visualization

Hi,
Love your work and repository

Just want to now how can I get the attention visualization? (like Figures 3,4 in the paper)

Evaluating pretrained model

Hello,

I am trying to evaluate the pretrained model on the VQA dataset. If possible, I would like to ask you the following questions:

  1. I executed the command "python3.6 evaluate.py". However, in that case, the script returns the following error:
Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
Traceback (most recent call last):
  File "evaluate.py", line 47, in <module>
    model.load_state_dict(model_data.get('model_state', model_data))
  File "/home/claudio.greco/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 522, in load_state_dict
    .format(name))
KeyError: 'unexpected key "module.w_emb.emb_.weight" in state_dict'

Probably, this happens because the default parameters of the script do not match the ones of the pretrained model. Am I right?

  1. In order to solve problem (1), I executed the command "python3.6 evaluate.py. --num_hid=1280 --op='c' --gamma=8". In this case, it works, but the script returns the result "eval score: 82.23 (92.66)", which seems a bit too high to me. With what row and table in the paper should I compare this result to?

  2. I tried to evaluate the pretrained model on the test split of the VQA dataset by changing "eval_dset = VQAFeatureDataset('dev', dictionary, adaptive=True)" to "eval_dset = VQAFeatureDataset('test2015', dictionary, adaptive=True)" in the evaluate.py script. However, in that case, the script returns the following error:

Evaluate a given model optimized by training split using validation split.
loading dictionary from data/dictionary.pkl
loading features from h5 file
/mnt/8tera/claudio.greco/ban-vqa/language_model.py:95: UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
  output, hidden = self.rnn(x, hidden)
Traceback (most recent call last):
  File "evaluate_new.py", line 51, in <module>
    eval_score, bound, entropy = evaluate(model, eval_loader)
  File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 121, in evaluate
    batch_score = compute_score_with_logits(pred, a.cuda()).sum()
  File "/mnt/8tera/claudio.greco/ban-vqa/train.py", line 26, in compute_score_with_logits
    one_hots.scatter_(1, logits.view(-1, 1), 1)
RuntimeError: dimension out of range (expected to be in range of [-1, 0], but got 1)

Do you know why this is happening?

Thank you very much!

Reproducing error

Upon reproducing this result, I encounter the following error.
Traceback (most recent call last):
File "main.py", line 96, in
train(model, train_loader, eval_loader, args.epochs, args.output, optim, epoch)
File "/home/tingting/Documents/tingting/ban-vqa/train.py", line 72, in train
pred, att = model(v, b, q, a)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 113, in forward
replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 118, in replicate
return replicate(module, device_ids)
File "/home/tingting/tingting/lib/python3.5/site-packages/torch/nn/parallel/replicate.py", line 12, in replicate
param_copies = Broadcast.apply(devices, *params)
RuntimeError: slice() cannot be applied to a 0-dim tensor

After tracing this code, I found that if I delete "nn.DataParallel(model).cuda()", it worked well.

I use 4 GTX 1080 ti. Have you encountered the same thing before?

Error in Flickr30k features

Dear authors,

I saw your previous answer, but I didn't have time to answer before the issue was closed.
I have tried two different Linux systems and have also tried on Windows. I have tried Chrome and Firefox. I can download the package but not unzip it because it gives me an error with the train.hdf5 file. It says the file is corrupted. I also tried two different internet connections. I can't unzip without errors. I have tried to download the file several times, but the result is always the same.

Could you please check the train.hdf5 file?
Davide

Originally posted by @drigoni in #46 (comment)

Do you use the VG dataset to get the results on validation set

Hi Kim:
Thanks for your excellent work and code

Did you use the visual genome dataset for training to get the results on validation set which is listed in table 1 in your paper? As you compared with bottom-up and top-down results which used the VG dataset, so I assume you also used the VG dataset + VQA 2.0 train to get the final results on validation set, am i right ?

Flickr30K evaluation?

It seems like the Flickr30K grounding task in the report is not included in the repo.
Am I missing something?

how to get the files

I don't have 'data/question_answers.json' and 'image_data/json',how to get it or generate it

Training too slow

My machine has 3 Titan 1080 Ti, 12 Intel i7 CPUs. Its total memory is 65GB. However, the program cost me more than 5800s to run an epoch.
My command is python3 main.py --use_both True --use_vg True --batch_size 128 because batch size 256 will out of memory.

epoch 1, time: 5844.42
        train_loss: 3.32, norm: 4.2468, score: 51.21
gradual warmup lr: 0.0010
epoch 2, time: 5844.72
        train_loss: 3.05, norm: 2.5201, score: 55.44
gradual warmup lr: 0.0014
epoch 3, time: 5839.73
        train_loss: 2.90, norm: 1.7370, score: 58.02
lr: 0.0014
epoch 4, time: 5835.09
        train_loss: 2.75, norm: 1.3749, score: 60.45
lr: 0.0014
epoch 5, time: 5837.11
        train_loss: 2.64, norm: 1.2232, score: 62.33
lr: 0.0014
epoch 6, time: 5829.90
        train_loss: 2.54, norm: 1.1545, score: 63.88
lr: 0.0014
epoch 7, time: 5832.88
        train_loss: 2.46, norm: 1.1238, score: 65.32
lr: 0.0014
epoch 8, time: 5834.77
        train_loss: 2.39, norm: 1.1157, score: 66.59

Question about Visual Genome version

hi, @jnhwkim I have a question about visual genome version.

I find visual genome version is 1.2 in README.md.
image

but in dataset.py, the 1.2 version of image_data.json does not have a key called id, this key in version 1.0.
image

the 1.2 version example:
image

so which version should I use?

thank you first!

Trouble creating ID.pkls

Hello :)

first of all thank you for sharing your repo!

i am having trouble creating those files:
indices_file = {
'train': 'data/train_imgid2idx.pkl',
'val': 'data/val_imgid2idx.pkl',
'test': 'data/test2015_imgid2idx.pkl'}
ids_file = {
'train': 'data/train_ids.pkl',
'val': 'data/val_ids.pkl',
'test': 'data/test2015_ids.pkl'}

because the utils.py demands .jpeg images to create the indexes which are not created at this point. Could you be so kind to share the id.pkls?

thank you and best regards
Max

Evaluate.py

When you run evaluate.py for the pretrained model, is there a way to run evaluate without needing a GPU/Cuda?

link no longer works

Dear authors:
the link image metadata and question answers of VQA are no longer works.could you support it again?

tar cache.pkl.tgz error, when downloading Pickle caches for the pretrained model

Thanks a lot for sharing code!
After downloading cache.pkl.tgz and entering the following command:

tar xvf data/cache/cache.pkl.tgz -C data/cache/

I got:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

Is there something wrong with the cache file on google drive?

flickr30k upperbound

Hello,

I used Bottom-up Attention to get boxes for Flickr30k data. Unfortunately, I could not get the same upperbound you reported in the paper. I get 0.6507 you reported 0.8745. Do you mind providing the details how you used Bottom-up model for inducing boxes. Below I listed mine:

model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.2
min_boxes=10
max_boxes=100

UPDATE:

When I increase the number of boxes I get better upperbound but still it is not as good as yours, below setup gives me upperbound 0.8530

model_name: resnet101_faster_rcnn_final.caffemodel
conf_thresh=0.01
min_boxes=200
max_boxes=200

How to get labels for objects?

Hi, I am very interested in your BAN model on flickr30k. I am wondering do you provide labels for detected objects together with bounding boxes and features, just like what faster-rcnn or bottom-up attention would do? Since I am not so sure about how you prepared your dataset, I'm afraid if I use pre-trained models to predict labels myself, the dataloader pipeline would have some problem. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.