Giter Site home page Giter Site logo

jieyuzhao / framing-twitter Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ddemszky/framing-twitter

0.0 2.0 0.0 104.05 MB

Code for the paper "Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings"

Jupyter Notebook 97.01% Python 2.47% Shell 0.05% R 0.45% TeX 0.02%

framing-twitter's Introduction

framing-twitter

This repository contains code for the paper:

Demszky, D., Garg, N., Voigt, R., Zou, J., Gentzkow, M., Shapiro, J. & Jurafsky, D. (2019). Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings. In 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).

@inproceedings{demszky2019analyzing,
 author = {Demszky, Dorottya and Garg, Nikhil and Voigt, Rob and Zou, James and Gentzkow, Matthew and Shapiro, Jesse and Jurafsky, Dan},
 booktitle = {17th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)},
 title = {{Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings}},
 url = {https://nlp.stanford.edu/pubs/demszky2019analyzing.pdf},
 year = {2019}
}

All the results as well as the plots in the paper were generated using the scripts in this repository. Due to Twitter's privacy policy, we are not able to share the original tweets, but we are sharing the tweet IDs under data/tweet_ids.

Requirements

Make sure to use Python3 when running the scripts. The package requirements can be obtained by running pip install -r requirements.txt.

Folders

paper

The paper as well as plots and tables.

data

  • tweet_ids: ids of the tweets we used (unfiltered)
  • input: inputs for the scripts; this folder also includes all the handles of Democrat and Republican politicians we used to determine partisanship (all_dems.txt and all_reps.txt)
  • output: outputs of the scripts
  • topic_eval: the results of our topic model evaluation

verify_partisanship_assignment

Code and data for verifying our method for partisanship assignment (Section 2 in the paper).

1_process_data

Code for preprocessing the data -- i.e. removing retweets, building vocabularies and cleaning up tweets.

2_topic_clustering

Scripts for performing the topic clustering, step by step. The scripts need to be used in ordered sequence:

  • 1_compute_cooccurrence.py: compute word co-occurrence matrix to use as an input for GloVe
  • 2_glove_train.py: train GloVe using the Mittens package
  • 3_tweet_embeddings.py: construct tweet embeddings using Arora et al.'s (2017) method
  • 4_compute_cluster_means.py: compute cluster means using k-means with cosine distance, based on a sample of the data
  • 5_get_topic_proximities.py: compute the proximities of all tweets to each topic
  • For running BTM:
    • download the scripts from here
    • set the BTM directory path within config.json and within 2_topic_clustering/myBTMexample.sh
    • follow the steps in pre-processing for BTM.ipynb
  • For MALLET:
    • download the MALLET binary
    • set the directory path within config.json
    • follows the steps in MALLET.ipynb

3_leave_out_polarization

Scripts for measuring the polarization of tweets, based on Gentzkow et al. (2018). Note that this builds on topic assignments, therefore these scripts can be executed only once topics have been assigned.

  • overall_polarization.py: compute the overall leave-out estimate for each event
  • between_topic_polarization.py: compute the between-topic polarization for each event
  • topic_polarization.py: compute the within-topic polarization for each event
  • topic_polarization_over_time.py: compute within-topic polarization over time for particular events

4_word_partisanship

Code for measuring and plotting the partisanship (log odds ratio) of words, phrases and semantic categories.

  • word_partisanship.py: calculate the partisanship of all words for each event
  • plot word partisanship.ipynb: compare partisanship of individual words / phrases across events
  • event grounding.ipynb: measure and plot the partisanship of event grounding
  • modals.ipynb: measure and plot the partisanship of modals
  • pronouns.ipynb: measure and plot the partisanship of pronouns

5_affect

Measure the partisanship of affect categories.

  • affect_parisanship.ipynb:
    • construct affect lexicon based on the NRC Emotion Lexicon, by filtering it for our own domain via GloVe embeddings
    • plot the partisanship of affect categories
  • get_affect_features.py: collect affect features for each event based on our affect lexicon

data_exploration

Notebooks for looking at the data and the embeddings.

framing-twitter's People

Contributors

ddemszky avatar

Watchers

James Cloos avatar Jieyu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.