Natural-Language-Processing

Natural Language Processing is a subfield of Artificial Intelligence that deals with the interaction between Humans and computers using common natural language. Using this branch, computer can understand, analyze and derives meaning from human language and understands how human communication text place. Natural Language Processing have hierarchical structure of language such as characters, words, sentence, text etc. Natural Language Processing helps real-world applications like Text Summarization, Named Entity Recognition, Text Translation, Relationship Between Words, Sentiment Analysis, Topic Segmentation, Speech Recognition etc.

Platform

Jupyter Notebook is a open source web based environment for creating documents. You can run your python scripts on this environment. You can install Jupyter Notebook using following command: pip install notebook

Tools used in Natural Language Processing

You can use open source NLP libraries listed below:

• NLTK : NLTK is a Natural LAnguage Toolkit is a collection of programs and libraries for Statistical NLP for English Language

• spaCy : spaCy is a open source library for NLP that contains advanced features. spacy tools supports 16 different languages.

• iNLTK : iNLTK is a library specially supports indian langauges. iNLTK aims to provide out of the box support for various NLP tasks

Contents : Natural Language Processing Hands-On

Part : 1 Introduction to NLTK

•	1 Download NLTK
•	2 IMPORT BROWN CORPUS AND ACCESSING DATA
•	3 IMPORT INAUGRAL CORPUS AND ACCESS DATA
•	4 IMPORTING WEBTEXT CORPUS AND ACCESS DATA
•	5 FREQUENCY DISTRIBUTION OF WORDS IN A TEXT
•	6 CONDITIONAL FREQUENCY DISTRIBUTION OF WORDS IN A TEXT

Practice

•	1 IMPORT INAUGURAL CORPUS AND ACCESSING DATA
•	2 READ CONTENT OF THE TEXT FILE
•	3 READ WORDS OF THE TEXT FILE
•	4 FREQUENCY DISTRIBUTION OF WORDS
•	5 Conditional Frequency Distribution
•	6 Conditional Frequency Distribution for 4,5 6- letter words
•	7 Words in Ascending Order
•	8 Words in Descending Order

Additional

•	1  Various Predefined Text Access
•	2  Import Reuters Corpus and Accessing Data
•	3  Read the Content of Text
•	4  Words That Occur Together
•	5  Find A Specific Word
•	6  Words in Specific Fileid
•	7  Total Sentence, words in text
•	8  Frequency of Words Matches With List
•	9  Sentence startswith() Specified Word
•	10 Sort the Words
•	11 Reverse Sentence
•	12 Frequency of Each Word
•	13 Length of Longest Sentence
•	14 Length of Smallest Sentence
•	15 Part of speech

Part 2 : STEMMING OF WORDS

•	1  PorterStemmer
•	2  SnowballStemmer
•	3  Lemmatizer
•	4  RegexpStemmer
•	5  LancasterStemmer

Part 3 : Wordnet, CMU Pronouncing Dictionary and Stopwords

•	1 WordNet
•	2 CMU Pronounciation Dictionary
•	3 StopWords

Part 4 : Text Classification using Naive Bayes Classifier

•	1  Import and Access names corpus
•	2  Import random Library
•	3  Create Feature set
•	4  Split the Data into Training and Testing Set
•	5  Apply Naive Bayes Classifier
•	5  Classify names using classifier
•	6  Accuracy of Test set

Part 5 : Vectorisers & Cosine Similarity

•	1  Import CountVectorizer
•	2  Define Corpus
•	3  Create Vocabulary
•	4  Transform into Vector
•	5  Cosine Similarity

Part 6 : Tasks for Marathi Language

•	1  Import and Access Marathi Language
•	2  Words From Speciefied File
•	3  Print Content of the File
•	4  Sentence startswith() Specified Word
•	5  Tokenization
•	6  Read Words
•	7  Total Tokens
•	8  Frequency of All Words
•	9  Frequency of Most Common Words
•	10 Part-of-Speech tagging
•	11 Stemmer
•			1 RegexpStemmer
•	12 Word Embedding
•	13 Wheather a character is vowel or consonant?
•	14 Cosine Similarity

Part 7 : Text Pipeline Processing

•	1 Import and Access Corpus
•	2 Corpus - product_reviews_2, Access fileids
•	3 Print part of text
•	4 Tokenization
		o	1 Sentence Tokenizer
		o	2 Word Tokenizer
		o	3 Total tokens in selected text
		o	4 All Tokens in Sorted Order
		o	5 Frequency Distinct in The Text
•	5 Stemmer
		o	1 Porter stemmer
		o	2 Snowballstemmer
		o	3 Lemmatizer
•	6 Part of Speech Tagging

Part 8 : Functionality using NLP Tool:Spacy Tool

•	1 Import and Load Model for english Language
•	2 Preprocessing Step : Tokenization and stopwords
•	3 Part-of-Speech tags
•	4 Dependency Parsing
•	5 Named Entity Recognition
•	6 Conclusion
•	7 References

Part 9 : Web Scrapping for News Article

•	1 Extracting text from url for news article
•	2 Preprocessing and cleaning the text
•	3 Tokenization
•	4 POS Tagging
•	5 Named Entity Recognition

Part 10 : Word Embedding and Chunking

•  	1 One-hot encoding (CountVectorizing)
•	2 TF-IDF transforming
•	3 Chunking

Part 11 : Sentiment Analysis using Logistic Regression

•	1 Loading the dataset	
•	2 Transforming Docs into Feature Vectors
•	3 Term Frequency & Inverse DOC frequency
•	4 Doc classification using Logistic Regression
•	5 Model Evaluation

Part 12 : Bigram Model

•	1 Tokenization
•	2 Remove Stopwords
•	3 bigram_collocation

cmungun / natural-language-processing Goto Github PK