This short lesson summarizes key takeaways from section 37
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
The key takeaways from this section include:
- NLP has become increasingly popular over the past few years, and NLP researchers have achieved very insightful analyses.
- The Natural Language Tool Kit (NLTK) is one of the most popular Python libraries for NLP
- Regular Expressions are an important part of NLP, which can be used for pattern matching and filtering
- Regular Expressions can become confusing, so make sure to use our provided cheat sheet the first few times you work with regex
- It is strongly recommended you take some time to use a regex tester websites to ensure you understand how changing your regex pattern affects your results when working towards a correct answer!
- Feature Engineering is essential when working with text data, and to understand the dynamics of your text
- Common feature engineering techniques are removing stop words, stemming, lemmatization, and bigrams
- When diving deeper deeper into grammar and linguistics, Context-Free Grammars and Part-Of-Speech tagging is important
- In this context, parse trees can help a computer rule when dealing with ambiguous words
- How you clean and preprocess your data will have a major effect on the conclusions you'll be able to draw in your NLP classification problem