This is the assignment and project files for USC-2️⃣2️⃣Fall-CSCI5⃣️4⃣️4⃣️ Applied Natural Language Processing
-
Amazon Review Sentiment Analysis Based on Machine Learning - a five class classification problem [HW1] A project included text representations and the use of text classification for sentiment analysis for the Amazon reviews dataset which contains real reviews for jewelry products sold on Amazon, using Python in Jupyter Notebook.
- Basic data cleaning and preprocessing (cleaned the raw data, removed the stop words, and performed lemmatization) using NLTK and Beautiful Soup.
- TF-IDF features extraction with Sklearn.
- Train a Perceptron model, an SVM model, a Logistic Regression model and a Multinomial Naive Bayes model on training dataset using the sklearn built-in implementation.
- Report Precision, Recall, and f1-score per class and their averages on the testing split of dataset.
-
Amazon Review Sentiment Analysis Based on Machine Learning - a five class classification problem - extension [HW2]
- Word2Vec features extraction with Gensim.
- Use PyTorch to implement the neural network models like FNN(Feedforward Neural Networks) and RNN(Recurrent Neural Networks), also, considering a GRN(gated recurrent unit cell).
- Report accuracy on the testing split of dataset.
- Use CUDA to accelerate with GCP(Google Compute Engine on GCP).
-
Implemented a Hidden Markov Model part-of-speech tagger for Italian, Japanese, and a surprise language from scratch. [HW3]
- The training data are provided tokenized and tagged; the test data will be provided tokenized, and the tagger will add the tags.
-
Implemented perceptron classifiers (vanilla and averaged) to identify hotel reviews as either truthful or deceptive, and either positive or negative. [HW4]
- Used the word tokens as features, or any other features that can be devised from the text.