Giter Site home page Giter Site logo

bcmi / causal-vidqa Goto Github PK

View Code? Open in Web Editor NEW
47.0 8.0 2.0 21.1 MB

[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The code used in our paper "From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering", CVPR2022.

License: MIT License

Shell 0.76% Python 99.24%
commonsense-reasoning video-question-answering evidence-reason video-question-answering-dataset visual-understanding

causal-vidqa's Introduction

Causal-VidQA

The Causal-VidQA dataset contains 107,600 QA pairs from the Causal-VidQA dataset. The dataset aims to facilitate deeper video understanding towards video reasoning. In detail, we present the task of Causal-VidQA, which includes four types of questions ranging from scene description (description) to evidence reasoning (explanation) and commonsense reasoning (prediction and counterfactual). For commonsense reasoning, we set up a two-step solution by answering the question and providing a proper reason.

Here is an example from our dataset and the comparison between our dataset and other VisualQA datasets.

Example from our Causal-VidQA Dataset
Dataset Visual Type Visual Source Annotation Description Explanation Prediction Counterfactual #Video/Image #QA Video Length (s)
Motivation Image MS COCO Man $\times$ 10,191 - -
VCR Image Movie Clip Man $\times$ 110,000 290,000 -
MovieQA Video Movie Stories Auto $\times$ $\times$ 548 21,406 200
TVQA Video TV Show Man $\times$ $\times$ 21,793 152,545 76
TGIF-QA Video TGIF Auto $\times$ $\times$ $\times$ 71,741 165,165 3
ActivityNet-QA Video ActivityNet Man $\times$ $\times$ 5,800 58,000 180
Social-IQ Video YouTube Man $\times$ $\times$ 1,250 7,500 60
CLEVRER Video Game Engine Man 20,000 305,280 5
V2C Video MSR-VTT Man $\times$ $\times$ 10,000 115,312 30
NExT-QA Video YFCC-100M Man $\times$ $\times$ 5,440 52,044 44
Causal-VidQA Video Kinetics-700 Man 26,900 107,600 9
Comparison between our dataset and other VisualQA datasets

In this page, you can find the code of some SOTA VideoQA methods and the dataset for our CVPR conference paper.

  • Jiangtong Li, Li Niu and Liqing Zhang. From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering. CVPR, 2022. [paper link]

Download

  1. Visual Feature
  2. Text Feature
  3. Dataset Split
  4. Text annotation
  5. Original Data

Install

Please create an env for this project using miniconda (should install miniconda first)

>conda create -n causal-vidqa python==3.6.12
>conda activate causal-vidqa
>git clone https://github.com/bcmi/Causal-VidQA
>pip install -r requirement.txt 

Data Preparation

Please download the pre-computed features and QA annotations from Download 1-4. And place them in ['data/visual_feature'], ['data/text_feature'], ['data/split'] and ['data/QA']. Note that the Text annotation is package as QA.tar, you need to unpack it first before place it to ['data/QA'].

If you want to extract different video features and text features from our Causal-VidQA dataset, you can download the original data from Download 5 and do whatever your want to extract features.

Usage

Once the data is ready, you can easily run the code. First, to run these models with GloVe feature, you can directly train the B2A by:

>sh bash/train_glove.sh

Note that if you want to train the model with BERT feature, we suggest your to first load the BERT feature to sharedarray by:

>python dataset/load.py

and then train the B2A with BERT feature by:

>sh bash/train_bert.sh.

After the train shell file is conducted, you can find the the prediction file under ['results/model_name/model_prefix.json'] and you can evaluate the prediction results by:

>python eval_mc.py

You can also obtain the prediction by running:

>sh bash/eval.sh

The command above will load the model from ['experiment/model_name/model_prefix/model/best.pkl'] and generate the prediction file.

Hint: we have release a trained model for B2A method, please place this the trained weight in ['experiment/B2A/B2A/model/best.pkl'] and then make prediction by running:

>sh bash/eval.sh

(The results may be slightly different depending on the environments and random seeds.)

(For comparison, please refer to the results in our paper.)

Test set Evaluation

In the released dataset, we hide the correct answer id for questions in our test set. If you want to evaluate the results on test set, please participate the competition in CodaLab.

Hint:

  1. Each participant can submit the inference results for 10 times in total and 2 times for each day.
  2. Please zip the inference json file before submit to CodaLab.
  3. If the register requirement is not approved within one day, please email me.

Citation

@InProceedings{li2022from,
    author    = {Li, Jiangtong and Niu, Li and Zhang, Liqing},
    title     = {From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022}
}

Acknowledgement

Our reproduction of the methods is mainly based on the Next-QA and other respective official repositories, we thank the authors to release their code. If you use the related part, please cite the corresponding paper commented in the code.

causal-vidqa's People

Contributors

jiangtong-li avatar jiangtongli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

causal-vidqa's Issues

test set evaluation

Thanks for this great work. It seems there are some issues when submitting results on the CodaLab for test set evaluation. It always indicates "submitting" without any outputs. Could you provide any help?
causal_submission1

Labels of test data.

I trained the whole model and got the expected result on the Validation set. But the result on the Test set was 0. I find that the label of the test set is always -1. How can I get the test set Label?

How much RAM is needed for using BERT features?

Thanks for your innovative work.
I am trying to run the code with the BERT features. However, I failed to python dataset/load.py because of a Bus error. In my knowledge, this is probably because I don't have enough RAM for creating the shared array. After this error, I saw that the used shared memory was around 60G. Would you let me know how much RAM is needed for using BERT features? If the usage is large, are there any alternative approaches?
Thank you!

Data unavailable

Hi!
I have been trying to reproduce your results but have faced some issues when downloading the data.
When I click on the links you provided, I am asked to use a school VPN:

"抱歉,如果您是在校外访问本网站。

建议您先拨通学校的VPN后再访问。

关于VPN,请参考:交大VPN"

Therefore it seems like I cannot access the data...
Could you perhaps help me with that problem?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.