In this project, I quantify various prototypical authors' (i.e. politicians, doctors, etc.) stance on COVID-19, by analysing polarisation in tweets. The tweets used were fetched using random samples of COVID-related tweet IDs from https://github.com/echen102/COVID-19-TweetIDs, spanning over six months. In total, around 3 million tweets were collected (many of which were removed as a result of the cleaning/refining process).
An 'ideal points' model, which is an unsupervised probabalistic topic model, is used to detect topics as well as quantifying the authors' stance on an interpretable scale. This model, located in the tbip directory, was extracted from https://github.com/keyonvafa/tbip, and has been adapted slightly for this project.
Since using the 'ideal points' model in the traditional way (using textual information from various specific authors) would render the problem impractical for this task (there are a vast number of authors/Twitter users), I had to generalise the model. In order to adapt the model for my purpose, prototypical author profiles were built up using extensive NLP techniques to analyse the bios (a brief description/intro) of the accounts that posted the tweets, which I used to create four broad author profiles: academics, doctors, journalists and politicians.