Natural-Language-Processing-Projects

These are the projects that I have done while doing my NLP course.

Project-1

File Name : Token word count.py

The aim of this project is to count word tokens, word types from each genre/category of the Brown corpus. Additionally, the vocabulary size of the whole corpus is also calculated. This project contains the following segments:

removing special character and lower case conversion
counting tokens
removing stopwords
applying lemmatization
applying stemming
counting word type
main

Project-2

File Name: Sent gen Bi Tri gram.py

In this project the goal is to generate random sentences using Bi-gram and Tri-gram approaches. This project contains the following segments:

removing special character and lower case conversion
removing stopwords
tokenizing the entire brown corpus incorporating tokens for starting and ending of each sentences
make an uni-gram table
make a bi-gram table
make a tri-gram table
generate and print a random sentence using bi-gram
generate and print a random sentence using tri-gram
main

Project-3

File Name: Text classification LogRef Multi Layer.py

This project aims to classify text using Logistic regression and Multi-layer neural network on Brown corpus. This project contains the following segments:

removing special character and lower case conversion
removing stopwords
applying lemmatization
generates features matrix and labels
main

Project-4

File Name: Text classification Naive Bayes.py

This project aims to classify text using Naive Bayes classifier on Brown corpus. This project contains the following segments:

removing special character and lower case conversion
removing stopwords
applying lemmatization
generates features matrix and labels
main

Project-5

File Name: Cosine similarity.py

This program aims to measure cosine similarity within and between the clusters. This project contains the following segments:

fetch similarity between two sentences from matrix (using tf-idf and word2vec)
to calculate similarity between two documents (using tf-idf and word2vec)
main

sara-zinnat / natural-language-processing-projects Goto Github PK