The webqa from stephen0808

This repo hosts the source code for the baseline models described in WebQA: Multihop and Multimodal QA.

All models were initialized from the released VLP checkpoints.

We release checkpoints fine-tuned on WebQA here.

News

Update (29 Sep, 2021):

We clarify here what are arguments --gold_feature_folder, --distractor_feature_folder, and --x_distractor_feature_folder. Basically, during implementation we divide the images into 3 buckets: positive images for image-based queries (gold), negative images for image-based queries (distractors) and negative images for text-based queries (x_distractors), where the 'x' stands for 'cross-modality'. Image- and text-based queries can be disinguished via the "Qcate" field in the dataset file. Text-based queries all have Qcate == 'text', while the rest are image-based ones.

Environment

cd VLP
conda env create -f misc/vlp.yml --prefix /home/<username>/miniconda3/envs/vlp
conda activate vlp

Clone repo https://github.com/NVIDIA/apex

cd apex
git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8
python setup.py install --cuda_ext --cpp_ext

pip install datasets==1.7.0
pip install opencv-python==3.4.2.17

Visual Features Extraction

X101fpn

The detectron2-based feature extraction code is available under this repo. Part of the code is based on LuoweiZhou/detectron-vlp and facebookresearch/detectron2

Download checkpoint

VinVL

Please refer to pzzhang/VinVL and microsoft/scene_graph_benchmark

Download checkpoint

Commands

cd vlp

Retrieval training

python run_webqa.py --new_segment_ids --train_batch_size 128 --split train --answer_provided_by 'img|txt' --task_to_learn 'filter' --num_workers 4 --max_pred 10 --mask_prob 1.0 --learning_rate 3e-5 --gradient_accumulation_steps 128 --save_loss_curve --output_dir light_output/filter_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/filter_debug --use_x_distractors --do_train --num_train_epochs 6

Retrieval inference

python run_webqa.py --new_segment_ids --train_batch_size 16 --split val --answer_provided_by 'img|txt' --task_to_learn 'filter' --num_workers 4 --max_pred 10 --mask_prob 1.0 --learning_rate 3e-5 --gradient_accumulation_steps 8 --save_loss_curve --output_dir light_output/filter_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/filter_debug --recover_step 3 --use_x_distractors

QA training

python run_webqa.py --new_segment_ids --do_train --train_batch_size 128 --split train --answer_provided_by 'img|txt' --task_to_learn 'qa' --num_workers 4 --max_pred 50 --mask_prob 0.5 --learning_rate 1e-4 --gradient_accumulation_steps 64 --save_loss_curve --num_train_epochs 16 --output_dir light_output/qa_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/qa_debug

QA decode

python decode_webqa.py --new_segment_ids --batch_size 32 --answer_provided_by "img|txt" --beam_size 5 --split "test" --num_workers 4 --output_dir light_output/qa_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/qa_debug --no_eval --recover_step 11

With VinVL features, run run_webqa_vinvl.py or decode_webqa_vinvl.py instead.

Reference

Please acknowledge the following paper if you use the code:

@inproceedings{WebQA21,
 title ={{WebQA: Multihop and Multimodal QA}},
 author={Yinghsan Chang and Mridu Narang and
         Hisami Suzuki and Guihong Cao and
         Jianfeng Gao and Yonatan Bisk},
 journal = {ArXiv},
 year = {2021},
 url  = {https://arxiv.org/abs/2109.00590}
}

Related Projects/Codebase

VinVL Visual Representations: https://github.com/pzzhang/VinVL
Scene Graph Benchmark: https://github.com/microsoft/scene_graph_benchmark
Detectron2: https://github.com/facebookresearch/detectron2

Acknowledgement

Our code is mainly based on Zhou et al.'s VLP repo. We thank the authors for their valuable work.

stephen0808 / webqa Goto Github PK

webqa's Introduction

News

Environment

Visual Features Extraction

Commands

Reference

Related Projects/Codebase

Acknowledgement

webqa's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent