Giter Site home page Giter Site logo

azeezsanya / nlp-topic-modeling-and-sentiment-analysis-of-luxury-car--reviews Goto Github PK

View Code? Open in Web Editor NEW
38.0 1.0 29.0 11.53 MB

Using NLP and LDA for Topic Modeling and Sentiment Analysis

License: GNU General Public License v3.0

Jupyter Notebook 53.11% HTML 46.89%
nlp-machine-learning lda topic-modeling sentiment-analysis gensim

nlp-topic-modeling-and-sentiment-analysis-of-luxury-car--reviews's Introduction

Image description

Introduction

Natural Language Processing is a Machine Learning method used to teach computers how to understand natural human language. It is not an easy task to teach these languages to a computer, but with the help of NLP process it's possible for the computer to read, decipher, understand, and make sense of the human languages in a manner that is valuable.

In machine learning and natural language processing, a topic model is a statistical model used for discovering the abstract "topics" that occur in a collection of documents or in a corpus. A topic is a collection of dominant keywords that express the general meaning of the text.

NLP could be used to extract topics from reviews, social media feeds, comments, articles, emails as well as user feedbacks. Understanding what customers are talking about in a particular product is very vital to companies especially e-commerce industry. However, since these online reviews are quite often overwhelming in terms of numbers and information, an intelligent system, capable of finding key insights (topics) from these reviews, will be of great help for both customers and companies. The goal of my project is to help customers make a decision when buying luxury cars and help these car brands to understand what their customers are saying and make some improvements on their products. I got a lot of inspiration from this article and this article by PRATEEK JOSHI

I also learn from Alice Zhao's project on Topic modeling and Sentiment Analysis

One of the most effective ways of doing topic modeling is by using Gensim LDA model. In this project we will be using the following 2 versions of gensim LDA model to see which one is the fastest computationally, provides meaningful topics and has the best coherence score:

  • LDAMulticore
  • LDAMallet

Project objective

Abstract

There has been a rapid growth in sales of luxury cars over the last decade especially in North America. According to Driving.ca's article, automobile sales generated by premium brands are taking a hit in 2019. So I wanted to understand what are the qualities that are most important to buyers through the customer reviews.

My project will be using NLP and Latent Dirichlet Allocation (LDA) for topic modeling and sentiment analysis of the reviews of 5 luxury car brands

  • With the customer reviews, I want to use NLP and LDA to understand what are the qualities that are important to buyers.
  • I will also investigate the sentiment in the reviews You can find the codes of these two questions in the Python code Folder

Tools and Libraries used

NLTK (Words tokenizer, stopwords and WordNetLemmatizer)

  • Spacy
  • Seaborn and Matplotlib
  • Gensim libraries for Topic modeling
  • TextBlob (sentiment analysis library)
  • Pickle
  • WordCloud
  • Plotly
  • Numpy and Pandas
  • PyLDAViz

Dataset

Source:

The data was downloaded from kaggle. It was originally scrapped from Edmund, an American auto review website.

Data content:
  • 5 luxurious car brands (Audi, Mercedes, BMW, Infiniti and Lexus)

  • contain 41520 rows

  • 7 columns

Project Process

  • I got the datasets from this link. You can find the merged dataset on CSV FILES on my GitHub page
    • load and merge the 5 dataset in one dataframe
    • check the data info, type, shape and null values
  • Pre-processing
    • Remove null-values i.e rows with no reviews
    • Drop some unwanted columns
    • Do some feature engineering
    • Change the data type of some of the columns like date column
    • Remove stop-words with NLTK
    • Remove number from text with regular expression function
    • Lower the text and remove words lower than 3 letters
    • Bring the text back to it's base via lemmatization with Spacy
  • EDA
    • Do some visualization, e.g wordcloud to uderstand the common words in the review
  • LDA Model fitting for topic modeling
    • Create a dictionatry and a corpus with the review text (the 2 are needed for the LDA models)
    • Try out the 2 LDA models (LDAMulticore and LDAMallet) to see the one with the highest coherence score and with meaningful topics from the text
  • Sentiment Analysis
    • Do some feature engineering to get the sentiment in the reviews
    • Used TextBlob to derive the polarity and subjectivity in the reviews
  • Visualization
    • Plot the the topics PyLDAviz
    • Use both plotly and Tableau to visualize the result of the sentiments
  • Communicate insights
    • Conclusion
    • Future work to improve the project

nlp-topic-modeling-and-sentiment-analysis-of-luxury-car--reviews's People

Contributors

azeezsanya avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

nlp-topic-modeling-and-sentiment-analysis-of-luxury-car--reviews's Issues

No able to find "brand_with_part_of_year.pkl"

In your CAPSTONE(Sentiment analysis of review).ipynb

"sentiment_df = pd.read_pickle('brand_with_part_of_year.pkl')
sentiment_df.head()"

reading the data from brand_with_part_of_year.pkl. But this file is not available. Could you please help me to find this file?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.