Giter Site home page Giter Site logo

peterxiaoguo / sentiment-analysis-nlp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from snehabangar/sentiment-analysis-nlp

0.0 1.0 0.0 10.26 MB

Sentiment analysis on Yelp Restaurant Review to identify the tone of the user reviews for a restaurant as positive/negative/neutral.

Python 100.00%

sentiment-analysis-nlp's Introduction

Scope

This is a project for the Natural Language Processing Course taught in the graduate programm of the Computer Science department of the Univerisity of Texas at Dallas during the fall semester 2016.

Goal

The objective of this project is to apply various sentiment analysis techniques(NLP) on the restaurant reviews and assess if they can correctly identify the tone of the reviews as positive/negative/neutral.

Data

Yelp has publicly released a sample of their data (including over 2.7 million reviews) as part of their Dataset Challenge. This data can be used for the project as it is easy/quick to acquire. This solution uses the data sets provided by Yelp in the Yelp Dataset Challenge. It is available online at Yelp dataset, located at [1]

It includes the data

  • 7M reviews and 649K tips by 687K users for 86K businesses
  • 566K business attributes, e.g., hours, parking availability, ambience.
  • Social network of 687K users for a total of 4.2M social edges.
  • Aggregated check-ins over time for each of the 86K businesses
  • 200,000 pictures from the included businesses

The data for the project was taken from following two files

  • yelp_academic_dataset_review.json
  • yelp_academic_dataset_business.json

Tools and Technologies used

Python, NLTK

Proposed Solution

Defining Sentiment

For the purpose of project, we define sentiment to be "a personal positive or negative feeling." Here are some examples:

Sentiment Review
Positive The food here is very good.
Neutral The ambience is okay and the food was usual.
Negative I am never coming to this restaurant again. The food was tasteless.

High Level Steps

The high-level sequence involved in processing is as follows:

  1. Raw data collection from Yelp Dataset

  2. Sentiment labeling

  3. Transform into train/test sets for classifier

  4. Bag of Words

  5. Transform train/test sets for final classification by classifier

  6. Adjust classifier and repeat until best model

Dependencies

Make sure you have the following libraries installed before running the code.

Also you must have installed the stopword corpora of NLTK. Run the following in a python console for NTLK downloader.

import nltk

nltk.download()

Extracting reviews

This step must be done before running any of the classifiers below.

You need to run the DataCreator file to extract the reviews for businesses of category restaurant and generate samples for each review class (pos/neg/neutral). The script creates three json files one for each class and a file which contains all the restaurant id and names.

python data_Creator.py

(Make sure to have input data files in folder yelp_dataset_challenge_academic_dataset . For created sample files check yelp_dataset_challenge_academic_dataset folder)

Input Data files โ€“

yelp_academic_dataset_business.json

yelp_academic_dataset_review.json

ngram_words.txt

YELP_Restaurant_Categories.txt

Naive Bayes

It trains one classifier for feature extraction filter (single words, stopwords removal, stemming, n-gram) and prints the predicted and actual rating for each restaurant along with the overall accuracy.

python main.py

Maximum Entropy

It trains maximum entropy classifier for feature extraction filter (single words, stopwords removal, stemming, n-gram) and prints the predicted and actual rating for each restaurant along with the overall accuracy.

python max_entropy.py

Results

For detailed analysis and result, please check Project_report.docx

sentiment-analysis-nlp's People

Contributors

snehabangar avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.