Giter Site home page Giter Site logo

ml_project_cis419's Introduction

CIS 4190 Applied Machine Learning Project: Group 045

For this project, we were interested in studying sentiment prediction in NLP. Sentiment analysis is an important tool for organizations and businesses, as they seek to understand large amounts of text data.

We were primarily interested in seeing how sentiments towards food were reflected in review data. For this project, we used the Amazon Fine Foods Dataset, taking in the review texts as the raw inputs to our model and trying to predict whether or not the reviews were overall positive or negative.

A complete description of the project can be found here.

Instructions on how to run each file:

lstm.ipynb

The following files should be available in the same directory:

  • Reviews.csv: This file can be downloaded from the Kaggle link above.
  • glove.840B.300d.txt: It can be downloaded here. This file provides us with pretrained glove word vectors that have been trained on Common Crawl data, a snapshot of the whole web.
  • movie_train.tsv: Needed only for the dataset shift portion of the code.

The trainig process for this file was done using an EC2 instance from AWS. Apart from that, the other code cells should run in under a few minutes in most laptops.

The best performance achieved on the validation set (which contained an equal number of samples from each class) was close to 90%.

Some examples of sentences and their classification:

image

Snapshot of the EC2 Training:

image

xgboost.ipynb

The following file should be available in the same directory:

The training process for this file was done both locally and in an instance of SageMaker from AWS. The rest of the cells provided should run in under five minutes on most laptops, and comments should provide the best hyperparameters we used (thus saving on GridSearch time).

The best performance achieved on a balanced testing set (which contained equal samples from each class) was about 84%.

Example of sentiment analysis on a short user-generated sentence for our best performing XGBoost model.

image

Snapshot of boosting rounds:

image

bert.ipynb

The following file should be available in the same directory:

  • Reviews.csv: This file can be downloaded from the Kaggle link above. The trainig process for this file was done using an EC2 instance from AWS. Apart from that, the data preprocessing cells into BERT tokens should take no more than 10 minutes.

The model was trained on a balanced data set sampled from an equal number of positive and negative reviews. The hyperparameters need to be further adjusted to provide better accuracy.

Example of training loop loss and accuracy:

image

ml_project_cis419's People

Contributors

sevdari avatar mycow2 avatar helenazzzzz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.