Giter Site home page Giter Site logo

cascade's Introduction

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Code for the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums (COLING 2018, New Mexico).

Description

In this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content and context-driven modeling for sarcasm detection in online social media discussions (Reddit).

Requirements

  1. Clone this repo.
  2. Python (2.7 or 3.3-3.6)
  3. Install your preferred version of TensorFlow 1.4.0 (for CPU, GPU; from PyPI, compiled, etc).
  4. Install the rest of the requirements: pip install -r requirements.txt
  5. Download the FastText pre-trained embeddings and extract it somewhere.
  6. Download the comments.json dataset file [1] and place it in data/.
  7. If you want to run the Preprocessing steps (optional), install YAJL 2, download the train-balanced.csv file, save it under data/ and continue with the Preprocessing instructions. Otherwise, just download user_gcca_embeddings.npz, place it in users/user_embeddings/ and go directly to Running CASCADE section.

Preprocessing

User Embeddings

  1. User Embeddings: Stylometric features.

    The file data/comments.json has Reddit users and their corresponding comments. Per user, there might be multiple number of comments. Hence, we concatenate all the comments corresponding to the same user with the <END> tag:

    cd users
    python create_per_user_paragraph.py

    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

    python train_stylometric.py

    Generate user_stylometric.csv (user stylometric features) using the trained model:

    python generate_stylometric.py
  2. User Embeddings: Personality features

    Pre-train a CNN-based model to detect personality features from text. The code utilizes two datasets to train. The second dataset [2] can be obtained by requesting it to the original authors.

    python process_data.py [path/to/FastText_embedding]
    python train_personality.py

    Generate user_personality.csv (user personality features) using this model:

    python generate_user_personality.py

    To use the pre-trained model from our experiments, download the model weights and unzip them inside the folder user/.

  3. User Embeddings: Multi-view fusion

    Merge the user_stylometric.csv and user_personality.csv files into a single merged user_view_vectors.csv file:

    python merge_user_views.py

    Multi-view fusion of the user views (stylometric and personality) is performed using GCCA (~ CCA for two views). Generate fused user embeddings user_gcca_embeddings.npz using the following command:

    python user_wgcca.py --input user_embeddings/user_view_vectors.csv --output user_embeddings/user_gcca_embeddings.npz --k 100 --no_of_views 2

    This implementation of GCCA has been adapted from the wgcca repo.

    Finally:

    cd ..
  4. Discourse Embeddings

    Similar to user stylometric features, create the discourse features for each discussion forum (sub-reddit):

    cd discourse
    python create_per_discourse_paragraph.py

    The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

    python train_discourse.py

    Generate discourse.csv (user stylometric features) using the trained model:

    python generate_discourse.py

    Finally:

    cd ..

Running CASCADE

Hybrid CNN

Hybrid CNN combining user-embeddings and discourse-features with textual modeling.

cd src
python process_data.py [path/to/FastText_embedding]
python train_cascade.py

The CNN codebase has been adapted from the repo cnn-text-classification-tf from Denny Britz.

Citation

If you use this code in your work then please cite the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums with the following:

@InProceedings{C18-1156,
  author = 	"Hazarika, Devamanyu
		and Poria, Soujanya
		and Gorantla, Sruthi
		and Cambria, Erik
		and Zimmermann, Roger
		and Mihalcea, Rada",
  title = 	"CASCADE: Contextual Sarcasm Detection in Online Discussion Forums",
  booktitle = 	"Proceedings of the 27th International Conference on Computational Linguistics",
  year = 	"2018",
  publisher = 	"Association for Computational Linguistics",
  pages = 	"1837--1848",
  location = 	"Santa Fe, New Mexico, USA",
  url = 	"http://aclweb.org/anthology/C18-1156"
}

References

[1]. Khodak, Mikhail, Nikunj Saunshi, and Kiran Vodrahalli. "A large self-annotated corpus for sarcasm." Proceedings of the Eleventh International Conference on Language Resources and Evaluation. 2018.

[2]. Celli, Fabio, et al. "Workshop on computational personality recognition (shared task)." Proceedings of the Workshop on Computational Personality Recognition. 2013.

cascade's People

Contributors

devamanyu avatar soujanyaporia avatar bryant1410 avatar sruthigorantla avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.