These are the projects that I have done while doing my NLP course.
File Name : Token word count.py
The aim of this project is to count word tokens, word types from each genre/category of the Brown corpus. Additionally, the vocabulary size of the whole corpus is also calculated. This project contains the following segments:
- removing special character and lower case conversion
- counting tokens
- removing stopwords
- applying lemmatization
- applying stemming
- counting word type
- main
File Name: Sent gen Bi Tri gram.py
In this project the goal is to generate random sentences using Bi-gram and Tri-gram approaches. This project contains the following segments:
- removing special character and lower case conversion
- removing stopwords
- tokenizing the entire brown corpus incorporating tokens for starting and ending of each sentences
- make an uni-gram table
- make a bi-gram table
- make a tri-gram table
- generate and print a random sentence using bi-gram
- generate and print a random sentence using tri-gram
- main
File Name: Text classification LogRef Multi Layer.py
This project aims to classify text using Logistic regression and Multi-layer neural network on Brown corpus. This project contains the following segments:
- removing special character and lower case conversion
- removing stopwords
- applying lemmatization
- generates features matrix and labels
- main
File Name: Text classification Naive Bayes.py
This project aims to classify text using Naive Bayes classifier on Brown corpus. This project contains the following segments:
- removing special character and lower case conversion
- removing stopwords
- applying lemmatization
- generates features matrix and labels
- main
File Name: Cosine similarity.py
This program aims to measure cosine similarity within and between the clusters. This project contains the following segments:
- fetch similarity between two sentences from matrix (using tf-idf and word2vec)
- to calculate similarity between two documents (using tf-idf and word2vec)
- main