Giter Site home page Giter Site logo

docmsu's Introduction

DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding

paper
This repo is the official dataset and Pytorch implementation of DocMSU: A Comprehensive Benchmark for Document-level Multimodal Sarcasm Understanding [AAAI2024].

DocMSU

Introduction

In document-level news, sarcasm clues are sparse or small and are often concealed in long text. Moreover, compared to sentence-level comments like tweets, which mainly focus on only a few trends or hot topics (e.g., sports events), content in the news is considerably diverse.
Models created for sentence-level MSU may fail to capture sarcasm clues in document-level news. To fill this gap, we present a comprehensive benchmark for Document-level Multimodal Sarcasm Understanding (DocMSU).

DocMSU Dataset

A new benchmarkthat contains high-quality annotations of 102,588 pieces of news with text-image pairs in 9 hot topics.
DocMSU

Method

DocMSU We use the pre-trained BERT to generate contextualized token-level representations of the document and then form a document matrix of size L^L with a padding mechanism. We rely on a simplified Resnet to output image representations and a projection layer to spilt the representations of an image window into L^L patches. We add patches of each image window to the document matrix to fuse the two modalities. The fused representations are fed to Swin-Transformer to patch attentions with a sliding window.

Experiments

To evaluate our model, we perform two MSU tasks, i.e., sarcasm detection and sarcasm localization.
DocMSU DocMSU

Get Started

git clone https://github.com/fesvhtr/DocMSU.git
cd DocMSU
conda create -n docmsu python=3.8
pip install -r requirements.txt
conda activate docmsu

./run.sh to train and evaluate the model.

Dataset Download

Please download the dataset from here. Here are two files: img.zip, anno.zip (Images and annotation files).
Put them into ./DocMSU/data/release/ and unzip all.

Checkpoints

Download checkpoint swin_base_patch4_window7_224.pth swin_small_patch4_window7_224.pth swin_tiny_patch4_window7_224.pth for swin-transformer here.
Download recommended textmodel_8.pth visualmodel_8.pth checkpoint for DocMSU here.

Acknowledgments

This work was partially supported by the joint funds for Regional Innovation and Development of the National Natural Science Foundation of China (No. U21A20449), the Beijing Natural Science Foundation under Grant M21037, and the Fundamental Research Funds for the Central Universities under Grant 2242022k60006. Please cite using this BibTeX:

@inproceedings{du2024docmsu,
  title={DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding},
  author={Du, Hang and Nan, Guoshun and Zhang, Sicheng and Xie, Binzhu and Xu, Junrui and Fan, Hehe and Cui, Qimei and Tao, Xiaofeng and Jiang, Xudong},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={16},
  pages={17933--17941},
  year={2024}
}

License

Creative Commons License
DocMSU is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

docmsu's People

Contributors

fesvhtr avatar benchmarkss avatar

Stargazers

 avatar Binzhu Xie avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

docmsu's Issues

Sarcasm sample propotion

In docmsu_all.json, does the 'is_sar' key denote sample(news, image) whether or not sarcasm?
I load json file, finding out that there's only about 7000 samples with is_sar = 1, while the paper data statistic shows there are about 30k sample that is sarcasm. I wonder where is going wrong.Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.