Giter Site home page Giter Site logo

web-scraping-and-nlp-cldspn's Introduction

Web Scraping and NLP

Event page - https://www.meetup.com/central_london_data_science/events/247384261/

Natural language processing (NLP) is a popular field of data science. It focuses on the analysis of unstructured text i.e blocks of text. At a previous event, we used NLP to predict if a comment scrapped from YouTube is from a troll using word frequency analysis. A more common use-case is in sentiment analysis which evaluates how negative or positive a piece of text is. This is a useful feature in determining the objectivity of texts such as news articles.

In this meetup we will show you how to scrape text from websites using Python (and a tool in python called 'beatutifulsoup') and then how you can perform NLP on the scraped text. By the end of the event, we aim to get everyone analysing the text of different websites automatically using a scraping to NLP pipeline.

Gettting Started

Work through the Web Scraping and NLP.ipynb notebook and if you get stuck you can look at the [COMPLETED] Web Scraping and NLP.ipynb notebook.

As a backup if you cant get it running on your computer

The notebook is published as a Kaggle kernal aswell:

  1. Create a Kaggle account if you haven't already

  2. Go to https://www.kaggle.com/zackakil/web-scraping-nlp-cldspn/notebook

  3. Fork the kernal

Have fun,

Zack.

web-scraping-and-nlp-cldspn's People

Contributors

zackakil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.