Giter Site home page Giter Site logo

cjhutto / vadersentiment Goto Github PK

View Code? Open in Web Editor NEW
4.3K 145.0 993.0 2.61 MB

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

License: MIT License

Python 100.00%

vadersentiment's Introduction

VADER-Sentiment-Analysis

VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media. It is fully open-sourced under the [MIT License] (we sincerely appreciate all attributions and readily accept most contributions, but please don't hold us liable).

Features and Updates

Many thanks to George Berry, Ewan Klein, Pierpaolo Pantone for key contributions to make VADER better. The new updates includes capabilities regarding:

  1. Refactoring for Python 3 compatibility, improved modularity, and incorporation into [NLTK] ...many thanks to Ewan & Pierpaolo.

  2. Restructuring for much improved speed/performance, reducing the time complexity from something like O(N^4) to O(N)...many thanks to George.

  3. Simplified pip install and better support for vaderSentiment module and component import. (Dependency on vader_lexicon.txt file now uses automated file location discovery so you don't need to manually designate its location in the code, or copy the file into your executing code's directory.)

  4. More complete demo in the __main__ for vaderSentiment.py. The demo has:

    • examples of typical use cases for sentiment analysis, including proper handling of sentences with:

      • typical negations (e.g., "not good")
      • use of contractions as negations (e.g., "wasn't very good")
      • conventional use of punctuation to signal increased sentiment intensity (e.g., "Good!!!")
      • conventional use of word-shape to signal emphasis (e.g., using ALL CAPS for words/phrases)
      • using degree modifiers to alter sentiment intensity (e.g., intensity boosters such as "very" and intensity dampeners such as "kind of")
      • understanding many sentiment-laden slang words (e.g., 'sux')
      • understanding many sentiment-laden slang words as modifiers such as 'uber' or 'friggin' or 'kinda'
      • understanding many sentiment-laden emoticons such as :) and :D
      • translating utf-8 encoded emojis such as 💘 and 💋 and 😁
      • understanding sentiment-laden initialisms and acronyms (for example: 'lol')
    • more examples of tricky sentences that confuse other sentiment analysis tools

    • example for how VADER can work in conjunction with NLTK to do sentiment analysis on longer texts...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analyses

    • examples of a concept for assessing the sentiment of images, video, or other tagged multimedia content

    • if you have access to the Internet, the demo has an example of how VADER can work with analyzing sentiment of texts in other languages (non-English text sentences).

Introduction

This README file describes the dataset of the paper:

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text
(by C.J. Hutto and Eric Gilbert)
Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
For questions, please contact:
C.J. Hutto
Georgia Institute of Technology, Atlanta, GA 30032
cjhutto [at] gatech [dot] edu

Citation Information

If you use either the dataset or any of the VADER sentiment analysis tools (VADER sentiment lexicon or Python code for rule-based sentiment analysis engine) in your research, please cite the above paper. For example:

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

Installation

There are a couple of ways to install and use VADER sentiment:

  1. The simplest is to use the command line to do an installation from [PyPI] using pip, e.g.,
    > pip install vaderSentiment
  2. Or, you might already have VADER and simply need to upgrade to the latest version, e.g.,
    > pip install --upgrade vaderSentiment
  3. You could also clone this [GitHub repository]
  4. You could download and unzip the [full master branch zip file]

In addition to the VADER sentiment analysis Python module, options 3 or 4 will also download all the additional resources and datasets (described below).

Resources and Dataset Descriptions

The package here includes PRIMARY RESOURCES (items 1-3) as well as additional DATASETS AND TESTING RESOURCES (items 4-12):

  1. vader_icwsm2014_final.pdf

    The original paper for the data set, see citation information (above).

  2. vader_lexicon.txt
    FORMAT: the file is tab delimited with TOKEN, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-HUMAN-SENTIMENT-RATINGS

    NOTE: The current algorithm makes immediate use of the first two elements (token and mean valence). The final two elements (SD and raw ratings) are provided for rigor. For example, if you want to follow the same rigorous process that we used for the study, you should find 10 independent humans to evaluate/rate each new token you want to add to the lexicon, make sure the standard deviation doesn't exceed 2.5, and take the average rating for the valence. This will keep the file consistent.

    DESCRIPTION: Empirically validated by multiple independent human judges, VADER incorporates a "gold-standard" sentiment lexicon that is especially attuned to microblog-like contexts.

    The VADER sentiment lexicon is sensitive both the polarity and the intensity of sentiments expressed in social media contexts, and is also generally applicable to sentiment analysis in other domains.

    Sentiment ratings from 10 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability). Over 9,000 token features were rated on a scale from "[–4] Extremely Negative" to "[4] Extremely Positive", with allowance for "[0] Neutral (or Neither, N/A)". We kept every lexical feature that had a non-zero mean rating, and whose standard deviation was less than 2.5 as determined by the aggregate of those ten independent raters. This left us with just over 7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from –4 to +4. For example, the word "okay" has a positive valence of 0.9, "good" is 1.9, and "great" is 3.1, whereas "horrible" is –2.5, the frowning emoticon :( is –2.2, and "sucks" and it's slang derivative "sux" are both –1.5.

    Manually creating (much less, validating) a comprehensive sentiment lexicon is a labor intensive and sometimes error prone process, so it is no wonder that many opinion mining researchers and practitioners rely so heavily on existing lexicons as primary resources. We are pleased to offer ours as a new resource. We began by constructing a list inspired by examining existing well-established sentiment word-banks (LIWC, ANEW, and GI). To this, we next incorporate numerous lexical features common to sentiment expression in microblogs, including:

    • a full list of Western-style emoticons, for example, :-) denotes a smiley face and generally indicates positive sentiment
    • sentiment-related acronyms and initialisms (e.g., LOL and WTF are both examples of sentiment-laden initialisms)
    • commonly used slang with sentiment value (e.g., nah, meh and giggly).

    We empirically confirmed the general applicability of each feature candidate to sentiment expressions using a wisdom-of-the-crowd (WotC) approach (Surowiecki, 2004) to acquire a valid point estimate for the sentiment valence (polarity & intensity) of each context-free candidate feature.

  3. vaderSentiment.py

    The Python code for the rule-based sentiment analysis engine. Implements the grammatical and syntactical rules described in the paper, incorporating empirically derived quantifications for the impact of each rule on the perceived intensity of sentiment in sentence-level text. Importantly, these heuristics go beyond what would normally be captured in a typical bag-of-words model. They incorporate word-order sensitive relationships between terms. For example, degree modifiers (also called intensifiers, booster words, or degree adverbs) impact sentiment intensity by either increasing or decreasing the intensity. Consider these examples:

    1. "The service here is extremely good"
    2. "The service here is good"
    3. "The service here is marginally good"

    From Table 3 in the paper, we see that for 95% of the data, using a degree modifier increases the positive sentiment intensity of example (a) by 0.227 to 0.36, with a mean difference of 0.293 on a rating scale from 1 to 4. Likewise, example (c) reduces the perceived sentiment intensity by 0.293, on average.

  4. tweets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TWEET-TEXT

    DESCRIPTION: includes "tweet-like" text as inspired by 4,000 tweets pulled from Twitter’s public timeline, plus 200 completely contrived tweet-like texts intended to specifically test syntactical and grammatical conventions of conveying differences in sentiment intensity. The "tweet-like" texts incorporate a fictitious username (@anonymous) in places where a username might typically appear, along with a fake URL (http://url_removed) in places where a URL might typically appear, as inspired by the original tweets. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'tweets_anonDataRatings.txt' (described below).

  5. tweets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  6. nytEditorialSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 5,190 sentence-level snippets from 500 New York Times opinion news editorials/articles; we used the NLTK tokenizer to segment the articles into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'nytEditorialSnippets_anonDataRatings.txt' (described below).

  7. nytEditorialSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  8. movieReviewSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 10,605 sentence-level snippets from rotten.tomatoes.com. The snippets were derived from an original set of 2000 movie reviews (1000 positive and 1000 negative) in Pang & Lee (2004); we used the NLTK tokenizer to segment the reviews into sentence phrases, and added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'movieReviewSnippets_anonDataRatings.txt' (described below).

  9. movieReviewSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  10. amazonReviewSnippets_GroundTruth.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, and TEXT-SNIPPET

    DESCRIPTION: includes 3,708 sentence-level snippets from 309 customer reviews on 5 different products. The reviews were originally used in Hu & Liu (2004); we added sentiment intensity ratings. The ID and MEAN-SENTIMENT-RATING correspond to the raw sentiment rating data provided in 'amazonReviewSnippets_anonDataRatings.txt' (described below).

  11. amazonReviewSnippets_anonDataRatings.txt

    FORMAT: the file is tab delimited with ID, MEAN-SENTIMENT-RATING, STANDARD DEVIATION, and RAW-SENTIMENT-RATINGS

    DESCRIPTION: Sentiment ratings from a minimum of 20 independent human raters (all pre-screened, trained, and quality checked for optimal inter-rater reliability).

  12. Comp.Social website with more papers/research:

    [Comp.Social](http://comp.social.gatech.edu/papers/)

Python Demo and Code Examples

Demo, including example of non-English text translations

For a more complete demo, point your terminal to vader's install directory (e.g., if you installed using pip, it might be \Python3x\lib\site-packages\vaderSentiment), and then run python vaderSentiment.py. (Be sure you are set to handle UTF-8 encoding in your terminal or IDE... there are also additional library/package requirements such as NLTK and requests to help demonstrate some common real world needs/desired uses).

The demo has more examples of tricky sentences that confuse other sentiment analysis tools. It also demonstrates how VADER can work in conjunction with NLTK to do sentiment analysis on longer texts...i.e., decomposing paragraphs, articles/reports/publications, or novels into sentence-level analysis. It also demonstrates a concept for assessing the sentiment of images, video, or other tagged multimedia content.

If you have access to the Internet, the demo will also show how VADER can work with analyzing sentiment of non-English text sentences. Please be aware that VADER does not inherently provide it's own translation. The use of "My Memory Translation Service" from MY MEMORY NET (see: http://mymemory.translated.net) is part of the demonstration showing (one way) for how to use VADER on non-English text. (Please note the usage limits for number of requests: http://mymemory.translated.net/doc/usagelimits.php)

Code Examples

    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    #note: depending on how you installed (e.g., using source code download versus pip install), you may need to import like this:
    #from vaderSentiment import SentimentIntensityAnalyzer

# --- examples -------
sentences = ["VADER is smart, handsome, and funny.",  # positive sentence example
             "VADER is smart, handsome, and funny!",  # punctuation emphasis handled correctly (sentiment intensity adjusted)
             "VADER is very smart, handsome, and funny.", # booster words handled correctly (sentiment intensity adjusted)
             "VADER is VERY SMART, handsome, and FUNNY.",  # emphasis for ALLCAPS handled
             "VADER is VERY SMART, handsome, and FUNNY!!!", # combination of signals - VADER appropriately adjusts intensity
             "VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!", # booster words & punctuation make this close to ceiling for score
             "VADER is not smart, handsome, nor funny.",  # negation sentence example
             "The book was good.",  # positive sentence
             "At least it isn't a horrible book.",  # negated negative sentence with contraction
             "The book was only kind of good.", # qualified positive sentence is handled correctly (intensity adjusted)
             "The plot was good, but the characters are uncompelling and the dialog is not great.", # mixed negation sentence
             "Today SUX!",  # negative slang with capitalization emphasis
             "Today only kinda sux! But I'll get by, lol", # mixed sentiment example with slang and constrastive conjunction "but"
             "Make sure you :) or :D today!",  # emoticons handled
             "Catch utf-8 emoji such as such as 💘 and 💋 and 😁",  # emojis handled
             "Not bad at all"  # Capitalized negation
             ]

analyzer = SentimentIntensityAnalyzer()
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    print("{:-<65} {}".format(sentence, str(vs)))

Again, for a more complete demo, go to the install directory and run python vaderSentiment.py. (Be sure you are set to handle UTF-8 encoding in your terminal or IDE.)

Output for the above example code

VADER is smart, handsome, and funny.----------------------------- {'pos': 0.746, 'compound': 0.8316, 'neu': 0.254, 'neg': 0.0}
VADER is smart, handsome, and funny!----------------------------- {'pos': 0.752, 'compound': 0.8439, 'neu': 0.248, 'neg': 0.0}
VADER is very smart, handsome, and funny.------------------------ {'pos': 0.701, 'compound': 0.8545, 'neu': 0.299, 'neg': 0.0}
VADER is VERY SMART, handsome, and FUNNY.------------------------ {'pos': 0.754, 'compound': 0.9227, 'neu': 0.246, 'neg': 0.0}
VADER is VERY SMART, handsome, and FUNNY!!!---------------------- {'pos': 0.767, 'compound': 0.9342, 'neu': 0.233, 'neg': 0.0}
VADER is VERY SMART, uber handsome, and FRIGGIN FUNNY!!!--------- {'pos': 0.706, 'compound': 0.9469, 'neu': 0.294, 'neg': 0.0}
VADER is not smart, handsome, nor funny.------------------------- {'pos': 0.0, 'compound': -0.7424, 'neu': 0.354, 'neg': 0.646}
The book was good.----------------------------------------------- {'pos': 0.492, 'compound': 0.4404, 'neu': 0.508, 'neg': 0.0}
At least it isn't a horrible book.------------------------------- {'pos': 0.363, 'compound': 0.431, 'neu': 0.637, 'neg': 0.0}
The book was only kind of good.---------------------------------- {'pos': 0.303, 'compound': 0.3832, 'neu': 0.697, 'neg': 0.0}
The plot was good, but the characters are uncompelling and the dialog is not great. {'pos': 0.094, 'compound': -0.7042, 'neu': 0.579, 'neg': 0.327}
Today SUX!------------------------------------------------------- {'pos': 0.0, 'compound': -0.5461, 'neu': 0.221, 'neg': 0.779}
Today only kinda sux! But I'll get by, lol----------------------- {'pos': 0.317, 'compound': 0.5249, 'neu': 0.556, 'neg': 0.127}
Make sure you :) or :D today!------------------------------------ {'pos': 0.706, 'compound': 0.8633, 'neu': 0.294, 'neg': 0.0}
Catch utf-8 emoji such as 💘 and 💋 and 😁-------------------- {'pos': 0.279, 'compound': 0.7003, 'neu': 0.721, 'neg': 0.0}
Not bad at all--------------------------------------------------- {'pos': 0.487, 'compound': 0.431, 'neu': 0.513, 'neg': 0.0}

About the Scoring

  • The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

    It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

  1. positive sentiment: compound score >= 0.05
  2. neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
  3. negative sentiment: compound score <= -0.05

NOTE: The compound score is the one most commonly used for sentiment analysis by most researchers, including the authors.

  • The pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). These are the most useful metrics if you want to analyze the context & presentation of how sentiment is conveyed or embedded in rhetoric for a given sentence. For example, different writing styles may embed strongly positive or negative sentiment within varying proportions of neutral text -- i.e., some writing styles may reflect a penchant for strongly flavored rhetoric, whereas other styles may use a great deal of neutral text while still conveying a similar overall (compound) sentiment. As another example: researchers analyzing information presentation in journalistic or editorical news might desire to establish whether the proportions of text (associated with a topic or named entity, for example) are balanced with similar amounts of positively and negatively framed text versus being "biased" towards one polarity or the other for the topic/entity.
    • IMPORTANTLY: these proportions represent the "raw categorization" of each lexical item (e.g., words, emoticons/emojis, or initialisms) into positve, negative, or neutral classes; they do not account for the VADER rule-based enhancements such as word-order sensitivity for sentiment-laden multi-word phrases, degree modifiers, word-shape amplifiers, punctuation amplifiers, negation polarity switches, or contrastive conjunction sensitivity.

Ports to Other Programming Languages

Feel free to let me know about ports of VADER Sentiment to other programming languages. So far, I know about these helpful ports:

  1. Java
    VaderSentimentJava by apanimesh061
  2. JavaScript
    vaderSentiment-js by nimaeskandary
  3. PHP
    php-vadersentiment by abusby
  4. Scala
    Sentiment by ziyasal
  5. C#
    vadersharp by codingupastorm Jordan Andrews
  6. Rust
    vader-sentiment-rust by ckw017
  7. Go
    GoVader by jonreiter Jon Reiter
  8. R
    R Vader by Katie Roehrick

vadersentiment's People

Contributors

0wlyw00d avatar cjhutto avatar diva-lab avatar flekschas avatar janik6882 avatar kennyjoseph avatar kootenpv avatar max-frai avatar p208p2002 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vadersentiment's Issues

pkgdata to prevent not finding lexicon

I tried:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sa = SentimentIntensityAnalyzer()

it results in:

FileNotFoundError: [Errno 2] No such file or directory: 'lexicon/vader_lexicon.txt'

When looking at setup.py, it looks like you haven't used package data. Is there a reason for not doing so, or would you welcome a PR? You could see the following repo/file for the use of pkg data https://github.com/kootenpv/natura/blob/master/setup.py

Codec Issue

Hi @cjhutto

When I run the code from the NLTK tutorial - http://www.nltk.org/howto/sentiment.html - about using Vader I get the error below. I worked out that I had to move the vader_lexicon.txt file into my NLTK sentiment folder, but that didn't solve this Codec problem.

Have run the code with both python 2 and 3.

Any ideas what I can do?

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-76d3725b79f2> in <module>()
     57 sentences.extend(tricky_sentences)
     58 
---> 59 sid = SentimentIntensityAnalyzer()
     60 
     61 for sentence in sentences:

//anaconda/lib/python3.5/site-packages/nltk/sentiment/vader.py in __init__(self, lexicon_file)
    200     def __init__(self, lexicon_file="vader_lexicon.txt"):
    201         self.lexicon_file = os.path.join(os.path.dirname(__file__), lexicon_file)
--> 202         self.lexicon = self.make_lex_dict()
    203 
    204     def make_lex_dict(self):

//anaconda/lib/python3.5/site-packages/nltk/sentiment/vader.py in make_lex_dict(self)
    208         lex_dict = {}
    209         with codecs.open(self.lexicon_file, encoding='utf8') as infile:
--> 210             for line in infile:
    211                 (word, measure) = line.strip().split('\t')[0:2]
    212                 lex_dict[word] = float(measure)

//anaconda/lib/python3.5/codecs.py in __next__(self)
    709 
    710         """ Return the next decoded line from the input stream."""
--> 711         return next(self.reader)
    712 
    713     def __iter__(self):

//anaconda/lib/python3.5/codecs.py in __next__(self)
    640 
    641         """ Return the next decoded line from the input stream."""
--> 642         line = self.readline()
    643         if line:
    644             return line

//anaconda/lib/python3.5/codecs.py in readline(self, size, keepends)
    553         # If size is given, we call read() only once
    554         while True:
--> 555             data = self.read(readsize, firstline=True)
    556             if data:
    557                 # If we're at a "\r" read one extra character (which might

//anaconda/lib/python3.5/codecs.py in read(self, size, chars, firstline)
    499                 break
    500             try:
--> 501                 newchars, decodedbytes = self.decode(data, self.errors)
    502             except UnicodeDecodeError as exc:
    503                 if firstline:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xde in position 0: invalid continuation byte

Missing dependencies (requests)

Hello!
When attempting to run vaderSentiment after installing it in my venv with the suggested pip install vaderSentiment, I got the following error:

Traceback (most recent call last):
  File "rate.py", line 1, in <module>
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
  File "/Users/martin/Documents/codingnomads/nlpython/big_projects/rate-mds/env/lib/python3.6/site-packages/vaderSentiment/vaderSentiment.py", line 17, in <module>
    import requests
ModuleNotFoundError: No module named 'requests'

Seems that requests is an implicit(?) dependency.
I looked for a requirements.txt but not sure that's how it works with packages up on PyPI.

My pip freeze suggests that a requirements.txt for vaderSentiment could look like this:

certifi==2018.4.16
chardet==3.0.4
idna==2.7
requests==2.19.1
urllib3==1.23

If this is helpful, I can create one and make a PR? But maybe there are other reasons for this not to be included that I am not aware of.

Let me know if this is something I can help with - would gladly do so :)

Emoticons UTF-8

First of all, I really like your lib. It's powerful.
I used utf-8 twitter emoticons like ❤️😂😫😊 and all of them have neutral sentiment.
Maybe it is just my issue, but I tried using UTF-8 encoding in my code and it didn't help.
I think, it's a good idea to add them to vader_lexicon.txt and support UTF-8 emoticons.

Nowadays, they are more popular in social media than standard emoticons (':)' , ':(' ,':D', etc.)
Please check this website:
http://unicode.org/emoji/charts/full-emoji-list.html
I think, it might increase VADER accuracy significantly.

Do you plan to rewrite this code to Java to make it more popular?
I can help you with that.

incorrect sentiment due to "!"

I tried the following examples:

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

analyser.polarity_scores("This is so bad")
{'compound': -0.6696, 'neg': 0.6, 'neu': 0.4, 'pos': 0.0} -- Correct sentiment

But when i add 4 excamations ("!!!!"), the sentence comes out as Neutral.

analyser.polarity_scores("This is so bad!!!!!")
{'compound': 0.0, 'neg': 0.0, 'neu': 1.0, 'pos': 0.0}

Addition of multiple exclamations has created problems in this case. I tested for upto 6 exclamations & the breaking point seems to be 4. The sentiment works well till 3 exclamations in the sentence (atleast for this particular example)

Can someone help me with this?

ImportError: can't import name sentiment

There is a closed issue that was very similar to the issue that I'm having but the fix there doesn't seem to be working here. To be sure, I've uninstalled and resinstalled the software, upgraded pip, and checked the file permissions of the files.

WHAT'S HAPPENING:

  1. Installed vaderSentiment with an updated version of pip using:
    python -m pip install vaderSentiment
  2. In my .py file I import like so:
    from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
  3. But, when I run I get:
    Traceback (most recent call last):
    File "file.py", line 6, in
    from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
    ImportError: cannot import name sentiment

WHAT ELSE I'VE TRIED:

  1. importing with the following:
    from vaderSentiment import sentiment as vaderSentiment
    Yet I still get the same error...

Any help would be greatly appreciated.

ImportError: cannot import name sentiment

I installed vaderSentiment with pip and have ensured it is in the correct file, un- and re-installed it, attempted to upgrade pip, attempted to change the permissions for the files and am still having difficulty using this library. Error below:

Traceback (most recent call last):
File "search_twitter.py", line 1, in
from vaderSentiment import sentiment as vaderSentiment
ImportError: cannot import name sentiment

Any help as soon as possible would be greatly appreciated as my project is due on Tuesday. Thank you very much,
Jon

Confirming Threshold for Compound.

The Readme file states that for neutral comment, vader needs the compound score to be below <0.05 and >-0.05, but the JS port mentions the thresholds as <0.5 and >-0.5, can someone clarify as to which one it is?

syntax error

successful import
platform : windows 7(x64)
python version : 3.5.1

Traceback (most recent call last):
File "C:\Users\user\Desktop\sentiment\sentiment2.py", line 2, in
from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
File "", line 969, in _find_and_load
File "", line 954, in _find_and_load_unlocked
File "", line 896, in _find_spec
File "", line 1136, in find_spec
File "", line 1112, in _get_spec
File "", line 1093, in _legacy_get_spec
File "", line 444, in spec_from_loader
File "", line 530, in spec_from_file_location
File "C:\Python\Python35\lib\site-packages\vadersentiment-0.5-py3.5.egg\vaderSentiment\vaderSentiment.py", line 23
return dict(map(lambda (w, m): (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))
^
SyntaxError: invalid syntax

subjectivity scores

Most sentiment analysers have subjective scores e.g. textblob
is there a way to incorporate this into vader sentiment?
what about training new data?

Emoteicon decoding

Hello together,

does somebody know if there is a simple way to turn utf8 smileys to the ones that the vader_sentiment_lexicon uses?

When i scrape instagram data, python displays the emoteicons the same way as the instgram website, but when i export those posts to csv, i need to decode them.

Thanks for help

Weightage given to smileys (when negated)

So, Vader gives 1.0 positive for " :) " and 1.0 negative for " :( " and with that I know that the smileys are being detected correctly. However, it fails to identify the polarity correctly for this particular case:

sentence = "nothing for redheads :("
polarity got: {'neg': 0.0, 'neu': 0.555, 'pos': 0.445, 'compound': 0.3412}

It is surprising that this sentence is tipping towards the positive polarity while the negative remains at 0.0.
Now if I remove the smiley and find the polarity, this is what I get:

sentence = "nothing for redheads"
polarity got: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

And this result is absolutely correct. It is a neutral statement. So, why is that a negative lexicon, tending the sentence towards a positive outcome? I wanted to know if I can manipulate the weight of smileys to reduce such errors. Since Vader is capable of handling many tricky sentences, this should not have been an issue right ? or is it just an outlier condition ?

Add words of different language to vaderSentiment

I am trying to write multiple words of Hindi language to vader Sentiment using this

   analyzer=SentimentIntensityAnalyzer()
   new_words={
                          'ग़लतापना':  -2.0,
                          'एकता_का_अभाव':  3.4,
                }
   analyzer.lexicon.update(new_words)

But it is not correctly predicting the new words.

Unable to add own words to vader_lexicon.txt

I want to add own words to vader_lexicon.txt but it is throwing an error
(word, measure) = line.strip().split('\t')[0:2] ValueError: not enough values to unpack (expected 2, got 1)

I modified the make_lex_dict method by adding the if statement as:
def make_lex_dict(self):
lex_dict = {}
for line in self.lexicon_full_filepath.split('\n'):
if "," in line:
(word, measure) = line.strip().split('\t')[0:2]
lex_dict[word] = float(measure)
return lex_dict`

It resolved the error but the score is not deflected back on running the program and its giving the neutral score ,even after me assigning other values to the words

AttributeError: 'list' object has no attribute 'encode'

Running on Python 2.7
Traceback (most recent call last):
File "test.py", line 10, in
vs = analyzer.polarity_scores(sentence)
File "/Users/markjrobby/Desktop/feedback/venv/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.py", line 218, in polarity_scores
sentitext = SentiText(text)
File "/Users/markjrobby/Desktop/feedback/venv/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.py", line 150, in init
text = str(text.encode('utf-8'))
AttributeError: 'list' object has no attribute 'encode'

Fix that solves the error:
text = str(text).encode('utf-8')

vader redirects all print output to console in jupyter notebook

just spent a few hours debugging this, finally narrowed it down to the cause--after importing vader using

"from vaderSentiment.vaderSentiment import sentiment as vaderSentiment" ,
any print function does not print to the ipython notebook - it redirects the print output to the terminal where the ipython notebook was launched from.

using ipython notebook 4.1.1 and python 2.7.11

Vader gives wrong result for some sentences

Hello Everyone,

So far Vader has helped me a lot in polarizing the data, I am analyzing on. I got all the analysis right and really like its sentiment intensity scoring.

I am getting just one problem with the sentences like

"Without a doubt excellent idea",
"neg": 0.422, "neu": 0.281, "pos": 0.297

"No payment and returning problems"
"neg": 0.62, "neu": 0.38, "pos": 0

It is giving wrong results

Add support for Facebook Emojis

Hey there!
I was using vaderSentiment to analyze Facebook comments on my ads and I realized that comments that used Facebook emojis were parsed as neutral {'pos': 0.0, 'neg': 0.0, 'compound': 0.0, 'neu': 1.0}
example -> image
Plus de 50 euros de frais de réservation, je trouve ça très exagéré !!😡
However, if I removed the Facebook emoji from a sentence that used those emojis I would get actual scores, for instance : {'pos': 0.0, 'neu': 0.834, 'compound': -0.2465, 'neg': 0.166}

Do you know why Facebook Emojis make the whole sentence neutral? Do you plan to add support for these emojis?

Thanks!

UnicodeDecodeError when calling SentimentIntensityAnalyzer

Hi all

I've just been trying to learn how to use the SentimentIntensityAnalyzer() and I've come up with the problem where:

analyzer = SentimentIntensityAnalyzer()
 ---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-31-6c626c4ef428> in <module>()
----> 1 analyzer = SentimentIntensityAnalyzer()
      2 analyzer.polarity_score(line_first)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site
packages/nltk/sentiment/vader.pyc in __init__(self, lexicon_file)
    200     def __init__(self, lexicon_file="sentiment/vader_lexicon.zip/vader_lexicon/vader_lexicon.txt"):
    201         self.lexicon_file = nltk.data.load(lexicon_file)
--> 202         self.lexicon = self.make_lex_dict()
    203 
    204     def make_lex_dict(self):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/sentiment/vader.pyc in make_lex_dict(self)
    208         lex_dict = {}
    209         for line in self.lexicon_file.split('\n'):
--> 210             (word, measure) = line.strip().split('\t')[0:2]
    211             lex_dict[word] = float(measure)
    212         return lex_dict

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
    697 
    698         """ Return the next decoded line from the input stream."""
--> 699         return self.reader.next()
    700 
    701     def __iter__(self):

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in next(self)
    628 
    629         """ Return the next decoded line from the input stream."""
--> 630         line = self.readline()
    631         if line:
    632             return line

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in readline(self, size, keepends)
    543         # If size is given, we call read() only once
    544         while True:
--> 545             data = self.read(readsize, firstline=True)
    546             if data:
    547                 # If we're at a "\r" read one extra character (which might

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.pyc in read(self, size, chars, firstline)
    490             data = self.bytebuffer + newdata
    491             try:
--> 492                 newchars, decodedbytes = self.decode(data, self.errors)
    493             except UnicodeDecodeError, exc:
    494                 if firstline:

UnicodeDecodeError: 'utf8' codec can't decode byte 0xde in position 0: invalid continuation byte

I've read the thread with a similar issue however, I dont quite understand where to add the 'u' to make the string unicode. I've only did: analyzer = SentimentIntensityAnalyzer()

can someone help me?

In Python 3.5

Traceback (most recent call last):
File "/Users/pablofernandez/Desktop/StockMarket/stocktwits.py", line 37, in
from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
File "/Users/pablofernandez/anaconda/lib/python3.5/site-packages/vaderSentiment/vaderSentiment.py", line 20, in
reload(sys)
NameError: name 'reload' is not defined

Unable to add data to vader_lexicon.txt

Hi,

i tried to add missing word to the lexicon file but without success
when i test the script with word that do exists it's ok
but when i add a word of my own it doesn't
i kept the tabbing spacing as well as editing the py file with the line:
for line in self.lexicon_full_filepath.rstrip('\n').split('\n'):
Thanks

where vader ports to other languages should live

Hi cj, I sent an email recently but not sure if you monitor the email set for your github account, I noticed there a few ports of this project to other languages, such as https://github.com/abusby/php-vadersentiment and https://github.com/apanimesh061/VaderSentimentJava. After recently implementing VADER in JavaScript, I was wondering what your(/ anyone else with interest in VADER) thoughts are on a VADER org in Github to organize different implementations/ experiments, or if they should just continue living under the account of whoever implemented it

Using own lexicon

Hello,

Is there a way we can use our own lexicon in Vader sentiment? If so how can one achieve this task?

We are focusing on Sports based lexicon.

How to solve this problem?

D:\python\Lib\site-packages\vaderSentiment>python vaderSentiment.py
Traceback (most recent call last):
File "vaderSentiment.py", line 14, in
import math, re, string, requests, json
ImportError: No module named requests

Use of "’" as apostrophe in negators

Use of "’" (right single quotation mark) as apostrophes in negators like "won’t" or "don’t" is common and these variations should be added to negators list in the code.

Negation case-sensitivity

Hello,

Today I randomly tried to evaluate the following sentence: 'Not bad at all'. To my surprise, the compound was negative so I was trying to find out why would it.

After a quick read of the source code, I understood that Vader only handles lowercase negation.
Is there any solution to this?

A first instect is to .lower the whole text, but that would make things terrible, I'll be losing some key-features of Vader like GOOD vs good etc..

I'm thinking (as a last resort) to split the words, check NEGATE contains their lowercase form and re-write the sentence with the negations in lowercase, but I'm hopping for a better solution.

Thank you.

Error message..

When I try the sample code, I get the following error message. How should I fix it? Python 3.5.0 on Mac.

Traceback (most recent call last):
File "vader_sentiment.py", line 3, in
from vaderSentiment.vaderSentiment import sentiment as vaderSentiment
File "/Users/sungmoon/.pyenv/versions/3.5.0a4/lib/python3.5/site-packages/vaderSentiment/vaderSentiment.py", line 23
return dict(map(lambda (w, m): (w, float(m)), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))
^
SyntaxError: invalid syntax

Issue reading lexicon as UTF-8 does not seem to be reflected when downloading the package via pip

Please let me know if I am doing something wrong. I tried installing the package (using ubuntu 16.04) by entering both:

pip install vaderSentiment

pip install --upgrade vaderSentiment

However, there is still an issue with the lexicon not being read explicitly as UTF-8. I ended up finding the change on the repo here in vaderSentiment.py and implementing it in the source code on my machine manually. Is there a reason the repo here is more up to date than what I am getting when I install via pip? Please let me know if this is my misunderstanding or a legitimate issue.

Cleaning the sentences

Hello all,

I am new to sentiment analysis and vader.
I have the following tweets:

tweet1=["@ComfortablySmug: "I'm just saying millennials dont count as people & should be jailed"\n"Wrong the Constitution says I do"\n"3rd amendmen…"]

tweet2= ["❦※ Girls Black dress Best Prices https://t.co/d9G5IzPX9E"]

when i want to calculate the score for the above tweets, is it needed to cleanse the sentences in any way? or straightway could i use the above sentences?

Can i also have these symbols ?, !, ; , &, $ or should it be substituted with anything?

Thanks in advance,
Vishnu

Wrong score for sentence like this...

Hi,

I am currently working on a project which need to analyze emails.
I have found a case that may help to enrich this library.

Pls  note  we still  have no sight   fro your inv and pls kindly make your positive support to the customer

The score of this is {'pos': 0.428, 'neu': 0.495, 'neg': 0.078, 'compound': 0.836}.
Semantically, this should be a negative statement on requiring a more positive support. Is it possible to make this correct with vaderSentiment?

Thanks,
Peter

newline and dropdown aren't working after pulling vadersentiment

I followed the last issue posted here and compiled vadersentiment after disabling line 20 and after that I'm able to print out object,however some additional problems persist.

1.I am using anaconda spyder as editor and the dropdown that appears after pressing dot(.) isn't appearing anymore.

2.Also in the command prompt after entering a python command when I am pressing enter it's not going to new line and executing and rather showing the effect of pressing shift+enter.

Any work around for these?

Python 3

The following change in lambda syntax in vaderSentiment.py was required to run the tool with Python 3.5:

def make_lex_dict(f): return dict(map(lambda wm: (wm[0], float(wm[1])), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))

Not predicting sentiment of emoticons correctly

It is not predicting inconsistent results on emoticons.For instance, when I am passing this as '🙂' an argument, it is correctly predicting the outcome but on using same emoticons multiple times '🙂🙂',
it is giving neutral results.Similarly ,the same issue is arising in different cases of other emoji and sometimes ,it is not even detecting the single emoji too.

Source code for Vader Gold Lexicon creation

I read the paper on which the Vader paper is implemented and I wanted to know if the code that converts the various datasets into the gold standard Lexicon list is available Openly? I would like to modify the existing datasets that Vader Uses to suit one of my use cases. @cjhutto Any inputs on the same?

TypeError: 'encoding' is an invalid keyword argument for this function

Im getting this error while compiling SentimentIntensityAnalyzer()

ErrorLog:
venv/local/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.py", line 212, in init
with open(lexicon_full_filepath, encoding='utf-8') as f:
TypeError: 'encoding' is an invalid keyword argument for this function

Hashtags and Excessive Punctuation Fail

scary has a negative compound score, but #scary has a compound score of 0. This doesn't seem like the right behavior for a tool geared towards analysis of texts from social media.

awesome! has a positive compound score, awesome!! is even more positive, awesome!!! is still more positive, and then suddenly awesome!!!! has a compound score of 0. Again, this doesn't seem appropriate.

Python 3

The following change in lambda syntax in vaderSentiment.py was required to run the tool with Python 3.5:

def make_lex_dict(f): return dict(map(lambda wm: (wm[0], float(wm[1])), [wmsr.strip().split('\t')[0:2] for wmsr in open(f) ]))

vader_sentiment_lexicon.txt encoding

There are two characters in vader_sentiment_lexicon.txt that are problematic when opening the file on Ubuntu. These are thorn symbols (currently lines 127 and 159). If the file is open in Notepad++ on Windows the detected encoding is ansi and switching it to utf-8 breaks these two symbols. At the same time on Ubuntu I guess default file encoding in 'open(f)' is utf-8 and open(f) fails with invalid characters. If these two characters are replaced with their correct utf-8 values after switching to utf-8 encoding in vader_sentiment_lexicon.txt, the problem is solved. (This file works fine when running it on Windows).

Thank you for the tool.

Modification of the corpus

Hi Guys;

I am curious if it is possible to modify the corpus? I wanted to ask before I go spelunking around the source code with no real clear direction.

Thanks!

Braden.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.