Giter Site home page Giter Site logo

tripadviser_reviews_analysis's Introduction

tripadviser_reviews_analysis

analysis of tripadviser reviews, training a bert model for rating prediction and then doing topic modeling

1. Dataset Explanation:

a. The tripadviser dataset has 20491 reviews with ratings from 1-5. My goal is to analyse this data, train a model for rating prediction and then do topic modeling to figure out import topics that can tell me about areas, where the hotel is doing good or bad.

b. A quick EDA on the data shows that It has some reviews that are very long and they are in small numbers(outliers). I  removed around 262 such reviews. Also the ratings are skewed, meaning ratings like 1,2 has less number of reviews compared to 4,5 ratings 

c. ratings distribution histogram

ratings_dist

d. text length distribution

text length

2. Rating Prediction and Evaluation:

a. I have trained a model using hugging face transformer for multi-class classification(bert-base-uncased)
b. This model gave a f1 score of .65 on the test data. 
c. As I checked the wrong outputs I figured out that some failure cases are actually hard to predict by human too. Also most of the wrong predictions are off by 1 values. Meaning if the review had an actual rating of 5, the predicted rating is 4.
d. I found some 10 example on the test data where the model is failing badly. prediction is off by more than 3. Meaning if true label is 5 and predicted is 1 or 2.
e. Improvement steps are mentioned on the evaluation notebook

3. Topic Modeling:

a. I tried doing LDA topic modeling to figure out important topics on the entire review dataset
c. The standard preprocessing steps are mentioned on the topic modelling notebook
d. Doing topic modeling on the entire reviews didn't give much insights. It was hard to interpret the topics. (check the section **Result Analysis on Entire data** of the topic modelling notebook)
d. here are some results from the topic modelling on the **entire data** 

topic_all_data

e.here are some results from the topic modelling on the **high rated data** 

topic_neg

f.here are some results from the topic modelling on the **low rated data**

topic_pos

tripadviser_reviews_analysis's People

Contributors

trinanjan12 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.