Giter Site home page Giter Site logo

dsc-4-37-01-introduction-data-science-demo's Introduction

Introduction

Introduction

This lesson summarizes the topics we'll be covering in section 37 and why they'll be important to you as a data scientist.

Objectives

You will be able to:

  • Understand and explain what is covered in this section
  • Understand and explain why the section will help you to become a data scientist

Foundations of Natural Language Processing (NLP)

In this section we will be covering Natural Language Processing (NLP), which refers to analytics tasks that deal with natural human language, in the form of text or speech.

Natural Language Tool Kit (NLTK)

We start by providing more context on the Natural Language Toolkit, NLTK for short. most common python library used for NLP tasks is Natural Language Tool Kit, or NLTK for short. This library was developed by researchers at the University of Pennsylvania, and has quickly become the most powerful and complete library of NLP tools available.

Regular Expressions

Data preprocessing is an essential part of NLP, and that's why being very familiar with regular expressions is extremely important. Regular Expressions, or "Regex" is extremely useful for NLP. We can use regex to quickly pattern match and filter through text documents.

Feature Engineering for Text Data

Working with text data comes with a lot of ambiguity. Feature engineering for NLP is pretty specific, and in this section you'll learn some feature engineering techniques that are essential when working with text data. You'll learn how to remove stop words from your text, as well as how to create frequency distributions, representing histograms that give us an overview of the total number of times each word occurs in a given text corpus.

Additionally, you'll learn a about stemming and lemmatization, which is the technique of removing suffixes from our words (and can enhance our text insight by creating frequency histograms after having performed stemming or lemmatization!). You'll also learn how to create bigrams, which creates an insight on how often two words occur together!

Context-Free Grammars and Part-Of-Speech (POS) Tagging

In NLP, it is important to understand what Context-Free Grammars and Part-Of-Speech Tagging are. Context-Free Grammars refer to bits of text that are gramatically correct, but feel like complete nonsense when considering the same bit of text on the semantic level. POS tagging refers to the act of helping a computer understand how to interpret a sentence. The CFG defines the rules of how sentences can exist. You'll see multiple examples on how to use both Context-Free Grammars (CFG) and POS tagging, and why they are important!

Text Classification

We will finish off this section by explaining the general process to set text data sets up for classification problems.

Summary

In this section, you'll learn the foundations of NLP and different technicues to make a computer understand text!

dsc-4-37-01-introduction-data-science-demo's People

Contributors

loredirick avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.