Syllabus
Autumn 2018
Introduction to Data Processing and Visualization Design in R
Graham Bearden
[email protected]
Time: 8:30AM - 2:20PM
Class Dates: 10/6, 10/13
Location: SMI 304
This workshop teaches the fundamental skills required to prepare, analyze and report data in R. This is an introductory course that will avail R functionality in a way that is accessible to new R programmers.
Course materials are prepared in order to offer a survey of selected R functionality. Students with interests in more sophisticated R programming than what is taught in the workshop are encouraged to seek additional opportunities to apply their R programming skills.
This is an applied course that focuses on successful execution of code. While students should consider theory, style, and efficiency of R programming, getting the job done right will be the first priority of the workshop.
- Attend both classes. Class dates are October 6 and 13. A full class is 5 hours and 50 minutes, breaks included.
- Submit a 1-page report generated in R Markdown that includes bullet points, at least one table, and at least one chart produced with the ggplot library
- Due by October 22
- Do not print code in output (echo = FALSE)
Please let me know if you need any specific accomodations. I will help provide accomodations in accordance with Evans School policy.
Part 1 will focus on the basics of R. Students will learn about key R concepts like data types and the anatomy of a function, how to read (import) and write (export) data, and the essentials of writing successful code.
We will also begin to explore the tidyverse as a data processing tool and troubleshooting tactics students can use to solve coding errors.
At the end of Part 1, students should be familiar with the R interface, know how to perform basic operations, and walk through a series of steps to troubleshoot code.
Course materials: Base R cheat sheet, Data transformation cheat sheet
Part 2 will focus on data processing. We will use a variety of methods to manipulate data and prepare it for analysis. The Tidyverse will be emphasized. Students will learn how to manipulate data at various levels, including at the dataset-level, the variable-level, and the value-level.
At the end of Part 2, students should be able to transform datasets of varying levels of complexities into data suitable for analysis.
Class reference materials: Data transformation cheat sheet
Part 3 will focus on data visualization. We will use ggplot, a visualization library that is part of the tidyverse, to build a variety of charts. The modularity of ggplot charting functions will be emphasized as will the distinction between visualizing aggregated and non-aggregated data.
At the end of Part 3, students should be able to produce a standard set of charts and manipulate chart aesthetics.
Class reference materials: Data visualization cheat sheet, Coolors
Part 4 will focus on reporting analyses in markdown with knitr. Markdown is a simple language used to streamline documentation/reporting and integrates nicely with R. Markdown integration allows you to 'pipe in' analyses calculated in R in the form of text, tables, and charts. knitr is a built-in R Studio tool (as well as an independent R package) that helps R programmers produce documents derived from code at the press of a button. Mastery of knitr and markdown sets you on the path of learning to automate production of reports.
At the end of Part 4, students should be able to write documents in markdown that include R-generated analyses.
Class reference materials: R Markdown cheat sheet, Daring Fireball, Pandoc Markdown
We will use the datasets below in our workshop. These datasets are also used in the Computational Thinking for Governance Analytics course.
- Contributions to Candidates and Political Committees
- Seattle Police Department 911 Incident Response
R community members produce valuable content to help you learn how to program in R. Here are some of my favorite resources that you might find useful as a new R programmer:
- R Studio. The R software we will use in the course.
- tidyverse. The launch page for tidyverse.
- Cheat sheets
- Base R. This is useful to get started.
- Data transformation. I use this all the time when working with dplyr and tidyr.
- Data visualization. A great quick-reference for the ggplot library.
- R Markdown. A useful reference for using R Markdown.
- Lubridate. Dates in R are a pain in the butt. This cheat sheet and the lubridate library will make your life easier.
- Karl Broman on data organization. Prepare your data according to Karl's recommendations and R will become a lot easier. (The tip is relevant to all other KB recommendations.)
- Jose Manuel Magallanes on data collection and organization. Jose teaches the Computational Thinking for Governance Analytics course at the Evans School. He recently published this book!
- R Bloggers. All things R. Folks from various industries - many of whom are super smart with impressive R experience - share tips, tricks, and anecdotes here.
- stackoverflow. This is where you come when you get stuck. Google the problem you are having and you'll see a possible solution on the stackoverflow site. Guaranteed.
- Google R Style Guide. Your code can get really messy really quickly. This guide can help.
- Daring Fireball. A page maintained by the markdown creator, John Gruber.
- Pandoc Markdown. Among other things, Pandoc is great for formatting tables in R Markdown. I use it regularly.
- Coolors. I love this site. It is great inspiration when building charts.
- Top 50 ggplot2 Visualizations. Check this out when you want to produce new chart types. It is full of good ideas.
- Github. If you're serious about any sort of programming, you should create a Github account.
- R Notebook Template
- PDF Document Example
Click here for the Winter 2018 syllabus.