PMR

Introduction

Premise-based Multi-modal Reasoning(PMR) is a task that explores the ability of models to reason with both textual (from the premise) and visual(from images) clues.

Through manually annotation and adversarial generation, we create PMR dataset with 32,720 samples. Here are the stats for PMR, and you can explore it on our website.

	Ori.			Adv.			Total~
	Train	Val	Test	Train	Val	Test
#samples	12,080	1,538	1,742	12,080	1,538	1,742	30,720
#unique 1-gram	9,882	3,819	4,101	8,046	3,071	3,359	11,041
#unique 2-gram	72,048	17,678	19,292	50,526	12,236	13,453	84,365
Avg premise length	9.48	9.47	9.54	9.48	9.47	9.54	9.49
Avg action text length	14.38	14.41	14.45	14.20	14.42	14.31	14.31
Avg #objects mentioned	1.92	1.91	1.94	2.42	2.43	2.38	2.17
#images	9,536	1,213	1,370	9,536	1,213	1,370	12,119
#movies covered	1,353	209	170	1,353	209	170	1,732

Dataset Access

Dataset can be downloaded at Google Drive.

PMR has been selected as one of the evalution tasks on CCL2022, and we provide full train and validation sets(both original and adversarial samples) to train models. For model evalution, you can submit the predictions of model on test set(test-ori-without-label.jsonl) by mailing at [email protected], and we will give feedback timely.

Data Format

Here is a brief introduction to the data format.

{
        "total_id": 98,
        # Name of movie which the image is from.
	"movie": "3051_NANNY_MCPHEE_RETURNS",
  
	# Object tags from Fast RCNN
	"objects": ["person", "person", "handbag", "spoon"],
  
	# Path of the image
	"img_fn": "lsmdc_3051_NANNY_MCPHEE_RETURNS/[email protected]",
	
	# Id of the image
	"img_id": "train-5244",
  
	# Path of the file storing the information of bounding boxes
	"metadata_fn": "lsmdc_3051_NANNY_MCPHEE_RETURNS/[email protected]",
  
	# Tokenized premise, the integers in lists indicate the index of objects in the above list.
	"premise": [[1], "and", [0], "are", "in", "good", "relationship", "."],
  
	# Category of the premise.
	"category": "character"
  
	# Tokenized actions, the intergers in lists indicate the index of objects.
	"answer_choices": [
		[[1], "with", "a", "handbag", "will", "hug", [0], "tightly", "."],
		[[1], "with", "a", "green", "handbag", "will", "shout", "at", [0], "in", "the", "kitchen", "."],
		[[1], "with", "a", "handbag", "will", "shout", "at", [0], "in", "the", "kitchen", "."],
		[[1], "with", "a", "green", "handbag", "will", "hug", [0], "tightly", "."]
		],
    
	# The types of answers in the order corresponding to the answer_choices
	"answer_types": ["Action-True", "Distractor2", "Action-False", "Distractor1"],
  
	# The index of the correct answer in answer_choices.
	"answer_label": 0
	
	# For original set, the total_id of the sample that has the same image as the current sample if it exists.(-1 is the default)
	"pal_id":-1
	
	# For adversarial set, the list of total_id which the four choices are from.
	"answer_ori_ids":[14097, 12681, 387, 13170]
}

Baseline Models

We provide baseline models here. They are adapted from three vision-language pretrained models which have great performance on multimodal understanding tasks.

Citation

Please consider citing this paper if you find this repository useful:

@article{PMR2022,
	title	= {Premise-based Multimodal Reasoning: {A} Human-like Cognitive Process},
	author  = {Qingxiu Dong and
               Ziwei Qin and
               Heming Xia and
               Tian Feng and
               Shoujie Tong and
               Haoran Meng and
               Lin Xu and
               Tianyu Liu and
               Zuifang Sui and
               Weidong Zhan and
               Sujian Li and
               Zhongyu Wei},
	journal = {CoRR},
	volume  = {abs/2105.07122},
	year    = {2021},
}

gaoli1537 / big_model-pmr Goto Github PK

big_model-pmr's Introduction

PMR

Introduction

Dataset Access

Data Format

Baseline Models

Citation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent