The itaur-short from vaidasmo

Quantitative Text Analysis Using R

GESIS Computational Social Science Winter Symposium, Köln (Cologne), 1 December 2015

Kenneth Benoit, Department of Methodology, LSE
Paul Nulty, Department of Methodology, LSE

Version: 1 December 2015

This repository contains the workshop materials for the short workshop Quantitative to Text Analysis Using R taught on December 1, 2015 by Kenneth Benoit and Paul Nulty. This workshop is part of the pre-conference training sessions of the GESIS Computational Social Science Winter Symposium, 1-3 December 2015. Ken Benoit and Paul Nulty's involvement is supported through European Research Council grant ERC-2011-StG 283794-QUANTESS.

Instructions for using this resource

You have three options for downloading the course material found on this page:

You can download the materials by clicking on each link.
You can "clone" repository, using the buttons found to the right side of your browser window as you view this repository. This is the button labelled "Clone in Desktop". If you do not have a git client installed on your system, you will need to get one here and also to make sure that git is installed. This is preferred, since you can refresh your clone as new content gets pushed to the course repository. (And new material will get actively pushed to the course repository at least once per day as this course takes place.)
Statically, you can choose the button on the right marked "Download zip" which will download the entire repository as a zip file.

You can also subscribe to the repository if you have a GitHub account, which will send you updates each time new changes are pushed to the repository.

Objectives

This workshop covers how to perform common text analysis and natural language processing tasks using R. When used properly, R is a fast and powerful tool for managing even very large text analysis tasks.

The course consists of instructor presentations in three sets, followed by exercises that students are meant to do in class. Computers should be available, but we suggest you bring your own.

We will cover how to format and input source texts, how to structure their metadata, and how to prepare them for analysis. This includes common tasks such as tokenisation, including constructing ngrams and "skip-grams", removing stopwords, stemming words, and other forms of feature selection. We show how to: get summary statistics from text, search for and analyse keywords and phrases, analyse text for lexical diversity and readability, detect collocations, apply dictionaries, and measure term and document associations using distance measures. Our analysis covers basic text-related data processing in the R base language, but most relies on the “quanteda” (https://github.com/kbenoit/quanteda) package for the quantitative analysis of textual data. We also cover how to pass the structured objects from quanteda into other text analytic packages for doing topic modelling, latent semantic analysis, regression models, and other forms of machine learning.

Prerequisites

While it is designed for those who have used R in some form previously, expertise in R is not required, and even those with no previous knowledge of R are welcome.

Part 1: Getting Started and Basic Text Analysis

Setting up RStudio and quanteda:

CRAN for downloading and installing R
GitHub page for the quanteda package
Additional packages to install: STM, topicmodels, glmnet
Configuration test: Try running this RMarkdown file: test_setup.Rmd. If it builds without error and looks like this, then you have successfully configured your system.

Basic Text Analysis:

Getting started, text import, and basic analysis
Study this recommended work flow document.
Exercise: Step through execution of the .Rmd file
Sample data files: SOTU_metadata.csv, inaugTexts.csv, tweetSample.RData

Part 2: Descriptive text analysis using R

Descriptive analysis of texts
Exercise: Step through execution of the 2_descriptive.Rmd file.

You also might want to look at the following:

More manipulation of texts

Part 3: Advanced analysis and working with other text packages

Advanced analysis and working with other packages
Exercise: Step through execution of the .Rmd file
Twitter analysis example, and the instructions for setting up your own Twitter app, in Twitter.Rmd.

Additional Resources

Designed to be done before the course or after, to augment what is presented during the course. These are just suggestions -- no reading for the course is required.

[Sanchez, G. (2013) Handling and Processing Strings in R Trowchez Editions. Berkeley, 2013.](http://www.gastonsanchez.com/Handling and Processing Strings in R.pdf)
stringi package page, which also includes a good discussion of the ICU library
Some guides to regular expressions: Zytrax.com's User Guide or the comprehensive resources from http://www.regular-expressions.info
See the quanteda tag on Stack Overflow, where you can pose questions and see some brilliant answers by our development team.

vaidasmo / itaur-short Goto Github PK

itaur-short's Introduction

Quantitative Text Analysis Using R

GESIS Computational Social Science Winter Symposium, Köln (Cologne), 1 December 2015

Instructions for using this resource

Objectives

Prerequisites

Part 1: Getting Started and Basic Text Analysis

Part 2: Descriptive text analysis using R

Part 3: Advanced analysis and working with other text packages

Additional Resources

itaur-short's People

Contributors

Stargazers

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent