Giter Site home page Giter Site logo

jialinwu17 / self_critical_vqa Goto Github PK

View Code? Open in Web Editor NEW
40.0 5.0 10.0 76.43 MB

Code for NeurIPS 2019 paper ``Self-Critical Reasoning for Robust Visual Question Answering''

Python 98.57% Shell 1.43%
vqa interpretable-deep-learning interpretable-ai explainable-ai visual-question-answering

self_critical_vqa's Introduction

Code for Self-Critical Reasoning for Robust Visual Question Answering (NeurIPS 2019 Spotlight)

This repo contains codes for ''Self-Critical Reasoning for Robust Visual Question Answering'' with VQA-X human textual explanations This repo contains code modified from here, many thanks!

Prerequisites

Python 3.7.1
PyTorch 1.1.0
spaCy (we use en_core_web_lg spaCy model)
h5py, pickle, json, cv2

Preprocessing

Please download the detection features from this google drive and put it to 'data' folder
Please run bash tools/download.sh to download other useful data files including VQA QA pairs and Glove embeddings
Please run bash tools/preprocess.sh to preprocess the data
mkdir saved_models

Training

The training propocess is split to three stage or two stage:

Three stage version (pretrain on CP, fine-tune using influential strengthening loss and fine-tune with both.)

(1) Pretrain on VQA-CP train dataset by runnning
CUDA_VISIBLE_DEVICES=0 python main.py --load_hint -1 --use_all 1 --learning_rate 0.001 --split v2cp_train --split_test v2cp_test --max_epochs 40
After the pretraining you will have a saved model in saved_models named by the start training time.
Alternatively, you can directly download a model from here.

(2) Pretrain using the influential strengthening loss
Here, please replace the 86-th line in the train.py with your VQA-CP pretrained models.
Then, please run the following line to strengthen the most influential object.
CUDA_VISIBLE_DEVICES=0 python main.py --load_hint 0 --use_all 0 --learning_rate 0.00001 --split v2cp_train_vqx --split_test v2cp_test --max_epochs 12 --hint_loss_weight 20
After the pretraining you will have anthor saved model in saved_models named by the start training time.
Alternatively, you can directly download a model from here.

(3) Training with the self-critical objectives.
Here, please replace the 82-th line in the train.py with your influence strengthened pretrained models.
Then, please run the following line for training.
CUDA_VISIBLE_DEVICES=0 python main.py --load_hint 1 --use_all 0 --learning_rate 0.00001 --split v2cp_train_vqx --split_test v2cp_test --max_epochs 5 --hint_loss_weight 20 --compare_loss_weight 1500
Alternatively, you can directly download a model from here.

self_critical_vqa's People

Contributors

jialinwu17 avatar jialinwu1717 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

self_critical_vqa's Issues

Could you provide answer to label maps?

Would it be possible to provide trainval_ans2label.pkl and trainval_label2ans.pkl files. The files generated from compute_softscore.py does not seem to be consistent with the labels in VQA_caption_{name}dataset.pkl files.

For instance, for qid 166207000, the GT answers should be robot/beep*, but the label returned by the dataset is 142, which corresponds to the answer: 'real' in label2ans.

Reproduction Results

Following after the the instructions in the readme file and using the first two pre-trained networks does not reproduce the 49.5% accuracy reported in the paper.
Also, following the experimental settings in the paper doesn't reproduce the results.
Tried 20 different seeds and the best score is 49.

Can you please share the exact settings you use to get 49.5?

Thanks!

Is there a bug in create_vqx_hint.py?

Hi, jialing!
Recently, I am trying to introduce 'the most influential objects' into my model. However, when I check 'create_vqx_hint.py', I don't understand the code at lines 187-189.

if cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1]) > 0.3 :
        if hint_score_attr[j] <= cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1]):
                hint_score[j] = cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1])

I also found that you don't use 'hint_score_attr' anymore, but this code would change 'hint_score[j]' that has been assigned at line 179. Thus, is it should be more reasonable that using 'hint_score_attr[j]' to replace 'hint_score[j]' at line 189. like this:

if cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1]) > 0.3 :
        if hint_score_attr[j] <= cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1]):
                hint_score_attr[j] = cosine_similarity(exp_emb[attr_token:attr_token+1], atts[j:j+1])

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.