Giter Site home page Giter Site logo

sentiment-detection's Introduction

sentiment-detection

We investigate the use of document embeddings (doc2vec) in the sentiment classification problem, or more specifically, categorising movie reviews. Since there are a number of hyperparameters in doc2vec models, we evaluate every one of them and then the optimal hyperparameter setting. After tuning hyperparameters of Support Vector Machine (SVM) via a 10-fold cross-validation, we examine the performance of SVM classifier using doc2vec in comparison with the baseline system, a Naive Bayes (NB) classifier using Bag-Of-Words (BOW) features. We finally analyse the embedding space, which is learnt in doc2vec models, by visualizing similar subgroups of documents and inferring similar words given the target word.

We found that Support Vector Machine based classifiers using document embeddings (89.35%) significantly outperform Naive Bayes classifier using traditional Bag-Of-Words representations (82.25%).

Naive Bayes

The Naive Bayes (NB) classifier is based on the Bayes rule:

$P(c|d) = \frac{P(c) P(d|c)}{P(d)}$

Since NB assumes that every feature $f_i$ is conditionally independent given class $c$, the term $P(c|d)$ can be expressed as:

$P_{NB}(c|d) := \frac{P(c)\prod_{i=1}^{m} P(f_i|c)^{n_i(d)}}{P(d)}$

This is the foundation of NB classifiers and we apply unigrams and bigrams as the features, both of which are basically the bag-of-$n$-words model with different $n$ values. Pang et al. (2002) discussed the use of bigrams might undermine the conditional independence assumptions, and bigrams caused classification accuracy to decline by 5.8% in their experiement, and by 1.4% in our experiment.

Support Vector Machines

Support Vector Machines (SVM) are considered as a universal learner to solve various kinds of Machine Learning problems, and with the introduction of Bag-Of-Words and the property of being robust on high feature dimensionality, SVM has been proved effective on text categorization tasks. More information could be referred to this practical.

Bag-Of-Words Model

Early state-of-art document representations were based on the bag-of-words model, which represent input documents a fixed-length vector. Bag-of-words models are surprisingly effective but still lose information about word order. Bag of n-grams models consider word phrases of length n to represent documents as fixed-length vectors to capture local word order but suffer from data sparsity and high dimensionality.

word2vec

learning neural word embeddings (Mikolov et al., 2013)

word2vec is a more recent model that embeds words in a lower-dimensional vector space using a shallow neural network. The result is a set of word-vectors where vectors close together in vector space have similar meanings based on context, and word-vectors distant to each other have differing meanings.

Two models in word2vec:

  • skip-gram model
  • continuouts-bag-of-words model

doc2vec

embeddings for sequences of words (Le and Mikolov, 2014)

doc2vec acts as if a document has another floating word-like vector, which contributes to all training predictions, and is updated like other word-vectors, but we will call it a doc-vector.

Two models in doc2vec:

  • distributed memory (PV-DM)
  • distributed bag of words (PV-DBOW)

Full table of doc2vec models pre-trained in the project:

index algorithm vector size window size negative samples hierarchical softmax epochs accuracy
1 DM 100 10 NA Y 10/20 78.1% / 79.5%
2 DM 100 10 5 NA 10/20 81.9% / 82.9%
3 DM 100 20 5 NA 10/20 82.0% / 82.8%
4 DM 150 10 NA Y 10/20 78.6% / 79.9%
5 DM 150 10 5 NA 10/20 82.4% / 83.9%
6 DM 150 20 5 NA 10/20 81.8% / 82.7%
7 DBOW 100 NA NA Y 10/20 86.7% / 87.9%
8 DBOW 100 NA 5 NA 10/20 87.8% / 88.2%
9 DBOW 150 NA NA Y 10/20 86.3% / 86.3%
10 DBOW 150 NA 5 NA 10/20 87.3 % / 88.3%

Negative sampling was introduced as an alternative to the most complex hierarchical softmax step at the output layer in doc2vec model, with the authors (Mikolov et al., 2013) finding that not only it is more efficient, but actually produces better word vectors on average. The better performance of negative sampling could be seen from comparison between (1) and (2), (4) and (5), (7) and (8) in the above table.

Concatenated Doc2Vec models

index dm model id dbow model id accuracy
1 15 18 84.1%
2 15 20 84.4%

Comparison between NB classifier and SVM classifier

index classifier features accuracy (%) variance (*e-3)
(1) NB unigrams 81.15 .635
(2) NB bigrams 80.85 .675
(3) NB uni/bigrams 82.25 .771
(4) SVM unigrams 83.40 .544
(5) SVM bigrams 80.90 .509
(6) SVM uni/bigrams 84.10 .709
(7) SVM DM 83.25 .476
(8) SVM DBOW 89.35 .345
(9) SVM DM/DBOW 84.15 .394

where (1) to (6) are the baseline system in our experiment. We can see that SVM using unigrams and bigrams achieve the best preformance in the baseline, and SVM using document embeddings DBOW achieve the overall best performance.

Monte Carlo Permutation Test

1 2 3 4 5 6 7 8 9
1 - 1.0 0.7780 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
2 1.0 - 0.6839 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
3 0.7703 0.6709 - 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002
4 0.0002 0.0002 0.0002 - 0.4757 0.6863 0.8430 0.6810 0.9198
5 0.0002 0.0002 0.0002 0.4831 - 0.8412 0.6845 0.8352 0.6247
6 0.0002 0.0002 0.0002 0.7040 0.8446 - 0.9210 1.0 0.8486
7 0.0002 0.0002 0.0002 0.8384 0.6903 0.9162 - 0.9284 1.0
8 0.0002 0.0002 0.0002 0.6895 0.8454 1.0 0.9240 - 0.8340
9 0.0002 0.0002 0.0002 0.9256 0.6449 0.8418 1.0 0.8447 -

We can see that SVM classifiers all significantly outperform NB classifiers.

note 1

According to Le & Mikolov 2014, PV-DM alone usually works well for most tasks (with state-of-art performance), but its combination with PV-DBOW is usually more consistent across many tasks that they tried and therefore strongly recommended.

In this project, however, contrary to the results of the paper, PV-DBOW alone performs as good as anything else. Concatenating vectors from different models only sometimes offers a tiny predictive improvement - and stays generally close to the best-performing solo model included.

note 2

re-inferring doc-vectors Because the bulk-trained vectors has much of their training early, when the model itself was still settling, it is sometimes the case that rather than using the bulk-trained vectors, new vectors re-inferred from the final state of the model serve better as data for downstream tasks.

sentiment-detection's People

Contributors

zihengzzh avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.