Giter Site home page Giter Site logo

natural-language-processing's Introduction

Natural-Language-Processing

Natural Language Processing is a subfield of Artificial Intelligence that deals with the interaction between Humans and computers using common natural language. Using this branch, computer can understand, analyze and derives meaning from human language and understands how human communication text place. Natural Language Processing have hierarchical structure of language such as characters, words, sentence, text etc. Natural Language Processing helps real-world applications like Text Summarization, Named Entity Recognition, Text Translation, Relationship Between Words, Sentiment Analysis, Topic Segmentation, Speech Recognition etc.

Platform

Jupyter Notebook is a open source web based environment for creating documents. You can run your python scripts on this environment. You can install Jupyter Notebook using following command: pip install notebook

Tools used in Natural Language Processing

You can use open source NLP libraries listed below:

NLTK : NLTK is a Natural LAnguage Toolkit is a collection of programs and libraries for Statistical NLP for English Language

spaCy : spaCy is a open source library for NLP that contains advanced features. spacy tools supports 16 different languages.

iNLTK : iNLTK is a library specially supports indian langauges. iNLTK aims to provide out of the box support for various NLP tasks

Contents : Natural Language Processing Hands-On

Part : 1 Introduction to NLTK

•	1 Download NLTK
•	2 IMPORT BROWN CORPUS AND ACCESSING DATA
•	3 IMPORT INAUGRAL CORPUS AND ACCESS DATA
•	4 IMPORTING WEBTEXT CORPUS AND ACCESS DATA
•	5 FREQUENCY DISTRIBUTION OF WORDS IN A TEXT
•	6 CONDITIONAL FREQUENCY DISTRIBUTION OF WORDS IN A TEXT

Practice

•	1 IMPORT INAUGURAL CORPUS AND ACCESSING DATA
•	2 READ CONTENT OF THE TEXT FILE
•	3 READ WORDS OF THE TEXT FILE
•	4 FREQUENCY DISTRIBUTION OF WORDS
•	5 Conditional Frequency Distribution
•	6 Conditional Frequency Distribution for 4,5 6- letter words
•	7 Words in Ascending Order
•	8 Words in Descending Order

Additional

•	1  Various Predefined Text Access
•	2  Import Reuters Corpus and Accessing Data
•	3  Read the Content of Text
•	4  Words That Occur Together
•	5  Find A Specific Word
•	6  Words in Specific Fileid
•	7  Total Sentence, words in text
•	8  Frequency of Words Matches With List
•	9  Sentence startswith() Specified Word
•	10 Sort the Words
•	11 Reverse Sentence
•	12 Frequency of Each Word
•	13 Length of Longest Sentence
•	14 Length of Smallest Sentence
•	15 Part of speech

Part 2 : STEMMING OF WORDS

•	1  PorterStemmer
•	2  SnowballStemmer
•	3  Lemmatizer
•	4  RegexpStemmer
•	5  LancasterStemmer

Part 3 : Wordnet, CMU Pronouncing Dictionary and Stopwords

•	1 WordNet
•	2 CMU Pronounciation Dictionary
•	3 StopWords

Part 4 : Text Classification using Naive Bayes Classifier

•	1  Import and Access names corpus
•	2  Import random Library
•	3  Create Feature set
•	4  Split the Data into Training and Testing Set
•	5  Apply Naive Bayes Classifier
•	5  Classify names using classifier
•	6  Accuracy of Test set

Part 5 : Vectorisers & Cosine Similarity

•	1  Import CountVectorizer
•	2  Define Corpus
•	3  Create Vocabulary
•	4  Transform into Vector
•	5  Cosine Similarity

Part 6 : Tasks for Marathi Language

•	1  Import and Access Marathi Language
•	2  Words From Speciefied File
•	3  Print Content of the File
•	4  Sentence startswith() Specified Word
•	5  Tokenization
•	6  Read Words
•	7  Total Tokens
•	8  Frequency of All Words
•	9  Frequency of Most Common Words
•	10 Part-of-Speech tagging
•	11 Stemmer
•			1 RegexpStemmer
•	12 Word Embedding
•	13 Wheather a character is vowel or consonant?
•	14 Cosine Similarity	

Part 7 : Text Pipeline Processing

•	1 Import and Access Corpus
•	2 Corpus - product_reviews_2, Access fileids
•	3 Print part of text
•	4 Tokenization
		o	1 Sentence Tokenizer
		o	2 Word Tokenizer
		o	3 Total tokens in selected text
		o	4 All Tokens in Sorted Order
		o	5 Frequency Distinct in The Text
•	5 Stemmer
		o	1 Porter stemmer
		o	2 Snowballstemmer
		o	3 Lemmatizer
•	6 Part of Speech Tagging

Part 8 : Functionality using NLP Tool:Spacy Tool

•	1 Import and Load Model for english Language
•	2 Preprocessing Step : Tokenization and stopwords
•	3 Part-of-Speech tags
•	4 Dependency Parsing
•	5 Named Entity Recognition
•	6 Conclusion
•	7 References

Part 9 : Web Scrapping for News Article

•	1 Extracting text from url for news article
•	2 Preprocessing and cleaning the text
•	3 Tokenization
•	4 POS Tagging
•	5 Named Entity Recognition

Part 10 : Word Embedding and Chunking

•  	1 One-hot encoding (CountVectorizing)
•	2 TF-IDF transforming
•	3 Chunking

Part 11 : Sentiment Analysis using Logistic Regression

•	1 Loading the dataset	
•	2 Transforming Docs into Feature Vectors
•	3 Term Frequency & Inverse DOC frequency
•	4 Doc classification using Logistic Regression
•	5 Model Evaluation

Part 12 : Bigram Model

•	1 Tokenization
•	2 Remove Stopwords
•	3 bigram_collocation		

natural-language-processing's People

Contributors

priyankayawalkar avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.