Giter Site home page Giter Site logo

han's Introduction

Exploring Human-like Attention Supervision in Visual Question Answering

Here we provide HLAT Dataset proposed in paper Exploring Human-like Attention Supervision in Visual Question Answering, which has been accepted by AAAI-2018.

In this work, we propose a Human Attention Network (HAN) to predict the attention map for a given image-question pair.

The framework of HAN

There are some examples that generated by the HAN.

Examples of HAN

We improve the performance of attention-based VQA models by adding human-like attention supervision.

The structure of attention supervision

Our method shows good performance in improving the accuracy of VQA, especially in counting problem, e.g.How many candles are on the table? For more details, please refer to our paper.

HLAT Dataset

Here we provide attention maps generated by the HAN for both the VQA1.0 and the VQA2.0 dataset.

They are saved in .h5 files.

The .h5 files of attention maps can be downloade from here

The .h5 file format has the following data structure: { "pre_attmap" : attention maps for all question id }

Which means that in each .h5 file, there is only one dict, whose key is named as "pre_attmap". The order of the attention maps is as same as the order of the question ids in the file. Therefore we use the order of question ids to get the attention maps for the question-image pairs. The order of question id follows the VQA 1.0 and VQA 2.0 official datasets website.

For VQA1.0 dataset, there are:

  • 369,861 attention maps for question-image pairs in the trainval set
  • 244,302 attention maps for question-image pairs in the testing set
  • 60,864 attention maps for question-image pairs in the test-dev set

For VQA2.0 dataset, there are:

  • 658,111 attention maps for question-image pairs in the trainval set
  • 447,793 attention maps for question-image pairs in the testing set
  • 107,394 attention maps for question-image pairs in the test-dev set

VQA-HAT Dataset

Here we also provide the link of the VQA-HAT Dataset, which is from paper Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?.

Cite

We would be very appreciated if you cite our work:
@inproceedings{qiao2018exploring,
title={Exploring human-like attention supervision in visual question answering},
author={Qiao, Tingting and Dong, Jianfeng and Xu, Duanqing},
booktitle={Thirty-Second AAAI Conference on Artificial Intelligence},
year={2018}
}

han's People

Contributors

qiaott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

han's Issues

Order of VQA-2 HAN Train-Val compared to Original VQA-2 Questions JSON?

This is pretty great work, and we're hoping to use it as part of our research!

Question: The HAN Attention Maps h5 file for VQA-v2 combines the training and validation set into a single file (vqa2_trainval2014_attention_maps.h5).

However, on the VQA-2 download page, there are two JSON files with the Training and Validation questions specifically: https://visualqa.org/download.html

How do I map questions from the JSON file to indices in the h5 file 'pre_attmap' array? Did you combine all the training and validation questions and sort by question_id? Or did you just concatenate the two lists of questions (train + val) then use that order?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.