Giter Site home page Giter Site logo

big_model-pmr's Introduction

PMR

Introduction

Premise-based Multi-modal Reasoning(PMR) is a task that explores the ability of models to reason with both textual (from the premise) and visual(from images) clues.

Through manually annotation and adversarial generation, we create PMR dataset with 32,720 samples. Here are the stats for PMR, and you can explore it on our website.

Ori. Adv. Total~
Train Val Test Train Val Test
#samples 12,080 1,538 1,742 12,080 1,538 1,742 30,720
#unique 1-gram 9,882 3,819 4,101 8,046 3,071 3,359 11,041
#unique 2-gram 72,048 17,678 19,292 50,526 12,236 13,453 84,365
Avg premise length 9.48 9.47 9.54 9.48 9.47 9.54 9.49
Avg action text length 14.38 14.41 14.45 14.20 14.42 14.31 14.31
Avg #objects mentioned 1.92 1.91 1.94 2.42 2.43 2.38 2.17
#images 9,536 1,213 1,370 9,536 1,213 1,370 12,119
#movies covered 1,353 209 170 1,353 209 170 1,732

Dataset Access

Dataset can be downloaded at Google Drive.

PMR has been selected as one of the evalution tasks on CCL2022, and we provide full train and validation sets(both original and adversarial samples) to train models. For model evalution, you can submit the predictions of model on test set(test-ori-without-label.jsonl) by mailing at [email protected], and we will give feedback timely.

Data Format

Here is a brief introduction to the data format.

{
        "total_id": 98,
        # Name of movie which the image is from.
	"movie": "3051_NANNY_MCPHEE_RETURNS",
  
	# Object tags from Fast RCNN
	"objects": ["person", "person", "handbag", "spoon"],
  
	# Path of the image
	"img_fn": "lsmdc_3051_NANNY_MCPHEE_RETURNS/[email protected]",
	
	# Id of the image
	"img_id": "train-5244",
  
	# Path of the file storing the information of bounding boxes
	"metadata_fn": "lsmdc_3051_NANNY_MCPHEE_RETURNS/[email protected]",
  
	# Tokenized premise, the integers in lists indicate the index of objects in the above list.
	"premise": [[1], "and", [0], "are", "in", "good", "relationship", "."],
  
	# Category of the premise.
	"category": "character"
  
	# Tokenized actions, the intergers in lists indicate the index of objects.
	"answer_choices": [
		[[1], "with", "a", "handbag", "will", "hug", [0], "tightly", "."],
		[[1], "with", "a", "green", "handbag", "will", "shout", "at", [0], "in", "the", "kitchen", "."],
		[[1], "with", "a", "handbag", "will", "shout", "at", [0], "in", "the", "kitchen", "."],
		[[1], "with", "a", "green", "handbag", "will", "hug", [0], "tightly", "."]
		],
    
	# The types of answers in the order corresponding to the answer_choices
	"answer_types": ["Action-True", "Distractor2", "Action-False", "Distractor1"],
  
	# The index of the correct answer in answer_choices.
	"answer_label": 0
	
	# For original set, the total_id of the sample that has the same image as the current sample if it exists.(-1 is the default)
	"pal_id":-1
	
	# For adversarial set, the list of total_id which the four choices are from.
	"answer_ori_ids":[14097, 12681, 387, 13170]
}

Baseline Models

We provide baseline models here. They are adapted from three vision-language pretrained models which have great performance on multimodal understanding tasks.

  1. PMR-baseline-VL-BERT (source repo)
  2. UNITER
  3. ERINIE

Citation

Please consider citing this paper if you find this repository useful:

@article{PMR2022,
	title	= {Premise-based Multimodal Reasoning: {A} Human-like Cognitive Process},
	author  = {Qingxiu Dong and
               Ziwei Qin and
               Heming Xia and
               Tian Feng and
               Shoujie Tong and
               Haoran Meng and
               Lin Xu and
               Tianyu Liu and
               Zuifang Sui and
               Weidong Zhan and
               Sujian Li and
               Zhongyu Wei},
	journal = {CoRR},
	volume  = {abs/2105.07122},
	year    = {2021},
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.