Giter Site home page Giter Site logo

umar1997 / propaganda-codeswitched-text Goto Github PK

View Code? Open in Web Editor NEW
6.0 1.0 2.0 48.11 MB

[EMNLP 2023] Official repository of paper titled "Detecting Propaganda Techniques in Code-Switched Social Media Text"

Home Page: https://arxiv.org/abs/2305.14534

License: MIT License

Python 8.74% Jupyter Notebook 89.58% HTML 1.64% Shell 0.04%
code-switching low-resource-languages natural-language-processing propaganda-detection

propaganda-codeswitched-text's Introduction

Detecting Propaganda Techniques in Code-Switched Social Media Text (EMNLP'23)

Detecting Propaganda Techniques in Code-Switched Social Media Text
Muhammad Umar Salman, Asif Hanif, Shady Shehata and Preslav Nakov

paper poster slides


main figure

Abstract:

Propaganda is a form of communication intended to influence the opinions and the mindset of the public to promote a particular agenda. With the rise of social media, propaganda has spread rapidly, leading to the need for automatic propaganda detection systems. Most work on propaganda detection has focused on high-resource languages, such as English, and little effort has been made to detect propaganda for low-resource languages. Yet, it is common to find a mix of multiple languages in social media communication, a phenomenon known as code-switching. Code-switching combines different languages within the same text, which poses a challenge for automatic systems. Considering this premise, we propose a novel task of detecting propaganda techniques in code-switched text. To support this task, we create a corpus of 1,030 texts code-switching between English and Roman Urdu, annotated with 20 propaganda techniques at fragment-level. We perform a number of experiments contrasting different experimental setups, and we find that it is important to model the multilinguality directly rather than using translation as well as to use the right fine-tuning strategy.


Contributions

  1. Formulation of Novel NLP Task: We formulate the novel NLP task of detecting propaganda techniques in code-switched text in the languages (English and Roman Urdu)
  2. Creation of Annotated Corpus: We construct and annotate a new corpus specifically for this task, comprising 1,030 code-switched texts in English and Roman Urdu. These texts are annotated at a fragmentlevel with 20 propaganda techniques.
  3. Evaluating different NLP Models: We experiment with various model classes, including monolingual, multilingual, crosslingual models, and Large Language Models (LLMs), for this task and dataset and we provide a comparative performance analysis.
  4. Developed a Web-based Platform: We design and create a new website platform with a user interface to annotate spans of text and label them as different propaganda techniques.

Contact

Should you have any question, please create an issue on this repository or contact at [email protected]


propaganda-codeswitched-text's People

Contributors

umar1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

mbzuai-nlp

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.