Giter Site home page Giter Site logo

webqa's Introduction

This repo hosts the source code for the baseline models described in WebQA: Multihop and Multimodal QA.

All models were initialized from the released VLP checkpoints.

We release checkpoints fine-tuned on WebQA here.

News

Update (29 Sep, 2021):

We clarify here what are arguments --gold_feature_folder, --distractor_feature_folder, and --x_distractor_feature_folder. Basically, during implementation we divide the images into 3 buckets: positive images for image-based queries (gold), negative images for image-based queries (distractors) and negative images for text-based queries (x_distractors), where the 'x' stands for 'cross-modality'. Image- and text-based queries can be disinguished via the "Qcate" field in the dataset file. Text-based queries all have Qcate == 'text', while the rest are image-based ones.

Environment

cd VLP
conda env create -f misc/vlp.yml --prefix /home/<username>/miniconda3/envs/vlp
conda activate vlp

Clone repo https://github.com/NVIDIA/apex

cd apex
git reset --hard 1603407bf49c7fc3da74fceb6a6c7b47fece2ef8
python setup.py install --cuda_ext --cpp_ext
pip install datasets==1.7.0
pip install opencv-python==3.4.2.17 

Visual Features Extraction

  • X101fpn

The detectron2-based feature extraction code is available under this repo. Part of the code is based on LuoweiZhou/detectron-vlp and facebookresearch/detectron2

Download checkpoint

  • VinVL

Please refer to pzzhang/VinVL and microsoft/scene_graph_benchmark

Download checkpoint

Commands

cd vlp

Retrieval training

python run_webqa.py --new_segment_ids --train_batch_size 128 --split train --answer_provided_by 'img|txt' --task_to_learn 'filter' --num_workers 4 --max_pred 10 --mask_prob 1.0 --learning_rate 3e-5 --gradient_accumulation_steps 128 --save_loss_curve --output_dir light_output/filter_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/filter_debug --use_x_distractors --do_train --num_train_epochs 6

Retrieval inference

python run_webqa.py --new_segment_ids --train_batch_size 16 --split val --answer_provided_by 'img|txt' --task_to_learn 'filter' --num_workers 4 --max_pred 10 --mask_prob 1.0 --learning_rate 3e-5 --gradient_accumulation_steps 8 --save_loss_curve --output_dir light_output/filter_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/filter_debug --recover_step 3 --use_x_distractors

QA training

python run_webqa.py --new_segment_ids --do_train --train_batch_size 128 --split train --answer_provided_by 'img|txt' --task_to_learn 'qa' --num_workers 4 --max_pred 50 --mask_prob 0.5 --learning_rate 1e-4 --gradient_accumulation_steps 64 --save_loss_curve --num_train_epochs 16 --output_dir light_output/qa_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/qa_debug

QA decode

python decode_webqa.py --new_segment_ids --batch_size 32 --answer_provided_by "img|txt" --beam_size 5 --split "test" --num_workers 4 --output_dir light_output/qa_debug --ckpts_dir /data/yingshac/MMMHQA/ckpts/qa_debug --no_eval --recover_step 11

With VinVL features, run run_webqa_vinvl.py or decode_webqa_vinvl.py instead.

Reference

Please acknowledge the following paper if you use the code:

@inproceedings{WebQA21,
 title ={{WebQA: Multihop and Multimodal QA}},
 author={Yinghsan Chang and Mridu Narang and
         Hisami Suzuki and Guihong Cao and
         Jianfeng Gao and Yonatan Bisk},
 journal = {ArXiv},
 year = {2021},
 url  = {https://arxiv.org/abs/2109.00590}
}

Related Projects/Codebase

Acknowledgement

Our code is mainly based on Zhou et al.'s VLP repo. We thank the authors for their valuable work.

webqa's People

Contributors

darkmatter08 avatar erjanmx avatar luoweizhou avatar webqna avatar zdxdsw avatar

Watchers

 avatar

Forkers

trellixvulnteam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.