Giter Site home page Giter Site logo

embodiedqa's Introduction

EmbodiedQA

Code for the paper

Embodied Question Answering
Abhishek Das, Samyak Datta, Georgia Gkioxari, Stefan Lee, Devi Parikh, Dhruv Batra
arxiv.org/abs/1711.11543
CVPR 2018 (Oral)

In Embodied Question Answering (EmbodiedQA), an agent is spawned at a random location in a 3D environment and asked a question (for e.g. "What color is the car?"). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person vision, and then answer the question ("orange").

This repository provides

If you find this code useful, consider citing our work:

@inproceedings{embodiedqa,
  title={{E}mbodied {Q}uestion {A}nswering},
  author={Abhishek Das and Samyak Datta and Georgia Gkioxari and Stefan Lee and Devi Parikh and Dhruv Batra},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

Setup

virtualenv -p python3 .env
source .env/bin/activate
pip install -r requirements.txt

Download the SUNCG dataset and install House3D.

Question generation

Questions for EmbodiedQA are generated programmatically, in a manner similar to CLEVR (Johnson et al., 2017).

NOTE: Pre-generated EQA v1 questions are available for download here.

Generating questions for all templates in EQA v1, v1-extended

cd data/question-gen
./run_me.sh MM_DD

List defined question templates

from engine import Engine

E = Engine()
for i in E.template_defs:
    print(i, E.template_defs[i])

Generate questions for a particular template (say location)

from house_parse import HouseParse
from engine import Engine

Hp = HouseParse(dataDir='/path/to/suncg')
Hp.parse('0aa5e04f06a805881285402096eac723')

E = Engine()
E.cacheHouse(Hp)
qns = E.executeFn(E.template_defs['location'])

print(qns[0]['question'], qns[0]['answer'])
# what room is the clock located in? bedroom

Pretrained CNN

We trained a shallow encoder-decoder CNN from scratch in the House3D environment, for RGB reconstruction, semantic segmentation and depth estimation. Once trained, we throw away the decoders, and use the encoder as a frozen feature extractor for navigation and question answering. The CNN is available for download here:

wget https://www.dropbox.com/s/ju1zw4iipxlj966/03_13_h3d_hybrid_cnn.pt

The training code expects the checkpoint to be present in training/models/.

Supervised Learning

Download and preprocess the dataset

Download EQA v1 and shortest path navigations:

wget https://www.dropbox.com/s/6zu1b1jzl0qt7t1/eqa_v1.json
wget https://www.dropbox.com/s/vgp2ygh1bht1jyb/shortest-paths.zip
unzip shortest-paths.zip

Preprocess the dataset for training

cd training
python utils/preprocess_questions.py \
    -input_json /path/to/eqa_v1.json \
    -shortest_path_dir /path/to/shortest/paths/v3 \
    -output_train_h5 data/train.h5 \
    -output_val_h5 data/val.h5 \
    -output_test_h5 data/test.h5 \
    -output_data_json data/data.json \
    -output_vocab data/vocab.json

Visual question answering

Update pretrained CNN path in models.py.

python train_vqa.py -to_log 1 -input_type ques,image -identifier ques-image

This model computes question-conditioned attention over last 5 frames from oracle navigation (shortest paths), and predicts answer. Assuming shortest paths are optimal for answering the question -- which is predominantly true for most questions in EQA v1 (location, color, place preposition) with the exception of a few location questions that might need more visual context than walking right up till the object -- this can be thought of as an upper bound on expected accuracy, and performance will get worse when navigation trajectories are sampled from trained policies.

Navigation

Download potential maps for evaluating navigation and training with REINFORCE.

wget https://www.dropbox.com/s/53edqtr04jts4q0/target-obj-conn-maps-500.zip

Planner-controller policy

python train_nav.py -to_log 1 -model_type pacman -identifier pacman

REINFORCE

python train_eqa.py -to_log 1 \
    -nav_checkpoint_path /path/to/nav/ques-image-pacman/checkpoint.pt \
    -ans_checkpoint_path /path/to/vqa/ques-image/checkpoint.pt \
    -identifier ques-image-eqa

Changelog

06/13

This code release contains the following changes over the CVPR version

  • Larger dataset of questions + shortest paths
  • Color names as answers to color questions (earlier they were hex strings)

Acknowledgements

License

BSD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.