TathyaCov : Detecting Fake Tweets in the times of COVID 19

DEMO VIDEO: https://youtu.be/pdWoBxBu9-k

This repository contains the implementation of the paper : "No Rumours Please! A Multi-Indic-Lingual Approach for Covid Fake-Tweet Detection". The system aims to classify whether a tweet contains a verifiable claim or not in real-time and has been specifically trained to detect COVID19 related fake news. We use AI based techniques to process the tweet text and use it, along with user features, to classify the tweets as either REAL or FAKE. We are handling tweets in three different languages: English, Hindi and Bengali.

Structure :

Each of the folders are equipped with detailed READMEs on how to run the scripts.

For dataset, refer to the data folder
To scrape and annotate more dataset, refer to scraping_tools folder (We encourage extending the dataset to accomadate more annotations in languages explored and unexplored in this work)
For the transformer based classifiers, refer to the transformer_classifiers folder
For ML based models and GUI implementation, refer to the GUI_MLModels folder

We next provide a very brief overview of the dataset and the methods used in our work in the following sections.

Dataset:

We create the Indic-covidemic tweet dataset and use it for training and testing purpose. We consider the English tweets from the Infodemic dataset and scrape Bengali and Hindi tweets from Twitter which are related to COVID-19. Fresh annotations were done and incorporated to create the larger Indic dataset for this task. For this purpose, scraping and parsing tools were created which might be helpful to further mine Indic data. We have published our annotated dataset for research purposes which can be found here.

Method:

We experimented with two different models to handle the tweet classification. In one setting, we consider a mono-lingual model, for handling English tweets. We extend the concept, by replacing the classifier with the multi-lingual one, where we consider tweets from English, Hindi and Bengali languages, as of now. The main essence of our proposed approach lies in the features we have used for the classification task, the different classifiers and their corresponding adaptation done for identifying the fake tweets.

The architecture of the classifier is as shown below.

We have used various textual and user related features for the classification task as follows:

bert based sentence encoding of the tweets (TxtEmbd)
tweet features (twttxt)
user features (twtusr)
link score (FactVer) - Ratio of similarity calculated between a given tweet and titles of verified URL list obtained on querying the tweet on Google Search Engine (algorithm given below). We have a list of 50 URLs listed as verified sources.
bias score (Bias) - The probability of a tweet containing offensive language.

It is evident from the correlation plot that a subset of user features and tweet features can be helpful. We have experimented with different classifiers, the results of which are as given below.

Graphical User Interface (GUI):

We design a simple static HTML page to obtain the tweet id/URL, as user input, and detect if the tweet is real or fake. Though our monolingual English classifier gave the best performance, even by beating the SOTA, we choose the multi-lingual classifier for its wider application. Some of the snapshots of our demo is shown below:

FLASK API:

The GUI has been hosted in a IBM server (http://pca.sl.cloud9.ibm.com:1999/) which is accessible within IBM domain.
process.py is a working code to host the GUI in the localhost. It can be easily modified to host the demo in any other server as well.

Citation :

If you find our work useful, please cite our work as:

@misc{kar2020rumours,
      title={No Rumours Please! A Multi-Indic-Lingual Approach for COVID Fake-Tweet Detection}, 
      author={Debanjana Kar and Mohit Bhardwaj and Suranjana Samanta and Amar Prakash Azad},
      year={2020},
      eprint={2010.06906},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

undarmaa / covid19_fakenews_detection Goto Github PK

covid19_fakenews_detection's Introduction

TathyaCov : Detecting Fake Tweets in the times of COVID 19

Structure :

Dataset:

Method:

Graphical User Interface (GUI):

FLASK API:

Citation :

covid19_fakenews_detection's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent