Giter Site home page Giter Site logo

datacarpentry / r-ecology-lesson Goto Github PK

View Code? Open in Web Editor NEW
302.0 31.0 510.0 52.48 MB

Data Analysis and Visualization in R for Ecologists - the version at https://github.com/datacarpentry/R-ecology-lesson-alternative will be merged on 8th July 2024

Home Page: https://datacarpentry.org/R-ecology-lesson/

License: Other

R 99.95% Shell 0.05%
carpentries data-carpentry lesson r data-wrangling data-visualisation data-visualization english ecology stable

r-ecology-lesson's Introduction

Build and Deploy Website Create a Slack Account with us Slack Status DOI

Data carpentry: R for data analysis and visualization of Ecological Data

This lesson will be replaced with a redesigned version on 8th July 2024. If you plan to teach this lesson before that date, we recommend that you consider using the redesigned version instead. Feedback on the new version of the lesson and your experience teaching it is very welcome as issues on the source repository.


This is an introduction to R designed for participants with no programming experience. It can be taught in 3/4 of a day (approximately 6 hours). The lesson starts with some basic information about syntax for the R programming language, the RStudio interface, and moves through to specific programming tasks, such as importing CSV files, the structure of data frame objects in R, dealing with categorical variables (i.e. factors), basic data manipulation (adding/removing rows and columns), and finishing with calculating summary statistics and a brief introduction to plotting. There is also a lesson on how to use databases from R that is intended to be taught after the SQL lesson, and ideally at the end of a Data Carpentry workshop.

Prerequisites

The lesson assumes no prior knowledge of R or RStudio. Learners should have R and RStudio installed on their computers. They will also need to be able to install R packages from CRAN, create directories, and download files. See the lesson website for instructions on installing R, RStudio, and the required R packages.

Topics

Code handout

There is a code handout that is intended to be distributed to the participants. This file includes some of the examples used during teaching and the titles of the section. It provides a guide that the participants can fill in as the lesson progresses. Participants can also source code from this file to avoid typos in more complex examples.

Contributing

Contributions to the content and development of these lesson are very welcome! If you would like to contribute, we encourage you to review our contributing guide.

Questions

If you have any questions or feedback, please open an issue, contact the maintainers, or come chat with us on the Slack Channel for this lesson. If you don't already have a Slack account with the Carpentries, you can create one.

  • Tobias Busch
  • Ana Costa Conrado
  • François Michonneau
  • Maneesha Sane
  • Brian Seok
  • Ashwin Srinath

Citation

Please cite as

François Michonneau, Tracy Teal, Auriel Fournier, Brian Seok, Adam Obeng, Aleksandra Natalia Pawlik, … Ye Li. (2019, July 1). datacarpentry/R-ecology-lesson: Data Carpentry: Data Analysis and Visualization in R for Ecologists, June 2019 (Version v2019.06.1). Zenodo. http://doi.org/10.5281/zenodo.3264888

r-ecology-lesson's People

Contributors

adamobeng avatar anacost avatar apawlik avatar aurielfournier avatar bbolker avatar benmarwick avatar chriseshleman avatar dklinges9 avatar emhart avatar erinbecker avatar ethanwhite avatar fmichonneau avatar hdashnow avatar k8hertweck avatar karawoo avatar karthik avatar kathy0305 avatar katrinleinweber avatar maneesha avatar mkuzak avatar mondorescue avatar njlyon0 avatar rekyt avatar semacu avatar steltenpower avatar tavareshugo avatar teebusch avatar tobyhodges avatar tracykteal avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

r-ecology-lesson's Issues

tidyverse instead of dplyr, tidyr and ggplot2?

This might be more philosophical, but now that Wickham et al. have released their "tidyverse" package (see RStudio Blog and CRAN), this would present a more convenient way of obtaining the dplyr, tidyr and ggplot2 packages needed for the lesson.
Since the lesson material is already tightly knit to the tidyverse ideas, this would simplify teaching the lesson.

Again, this may be more about personal preferences; but in my opinion it could make it easier for students (or whoever is being taught), instead of installing the individual packages manually.

Looking forward to your thoughts!

having 'extras' lessons

In the core lessons we can't teach everything. Often there are places where there could be more detail on a particular topic, or something that is an extension of a given topic. In those cases there are opportunities to have lessons that people can go through after the workshop on their own ('extras' like in SWC or On Your Owns (OYOs)). We have one recent PR that could fit that category. #62

It would be nice to be able to point learners at these things, but there is then the added challenge in maintaining them. Are there thoughts on whether 'extras' should be included? Or other strategies on how they might be included or linked to?

@fmichonneau @ethanwhite @karthik @kcranston @hlapp @karawoo @mkuzak @naupaka & others

Repo and html needs a lot of clean up

So I've been poking around this repo and have been trying to make sense of all the clutter. There is lots of legacy stuff ported over from SWC without any clean plan.

What works

The Makefile is well written and does everything as far as updating lessons.

What doesn't work

  • The titles from all rendered html pages are "Learning objectives"
  • There is no template called "topic"
  • The font is too small (13px could easily be 15px)
  • We could use a different syntax highlighter (more static one) to make it easier to read and copy text. See a ropensci tutorial for what I mean.
  • The knitr opts does not differentiate between code comments and output. This can be fixed with something like:
knitr::opts_chunk$set(
  comment = "#>"
)
  • We might also consider adding a more readable web font (perhaps a free Google web font).

Would it be worthwhile to gut this repo and delete everything but the Rmd + Makefile and make an actual topic template? Thoughts?

Use () when calling functions

I suggest to use parenthesis after function names, when cited in the text. This mainly to differentiate them from arguments.

Possible Typo-lesson 3 Dataframes

colors <- c("red", "green", "blue", "yellow") counts <- c(50, 60, 65, 82) new_dataframe <- as.data.frame(cbind(colors, weights)) class(new_dataframe)

shouldn't it be counts instead of weights in the third line?

No mention of downloading "portal_mammals.sqlite" file in lesson 06 R and SQL

It is my impression that one should be able to teach more or less directly the R-ecology-lessons. This is not a problem for lesson 00 - 05. However, it is a bit more difficult for lesson 06 "R and SQL", because the requirement of specific files:

In lesson 06 (SQL databases and R) , under the section "Connecting R to sqlite database" there is the following R-script:

## Set dbname and driver out of convenience
myDB <- "data/portal_mammals.sqlite"
conn <- dbConnect(drv = SQLite(), dbname= myDB)

There is however no earlier mention for downloading the file "portal_mammals.sqlite".

I therefore suggest that lesson 06 be updated, by taking into use the download.file() function:

## download mammals database to data folder.
download.file("https://ndownloader.figshare.com/files/2292171", "data/portal_mammals.sqlite")

PS: All the file can be downloaded from http://dx.doi.org/10.6084/m9.figshare.1314459.

A short intro to tidyr is needed

We need material that would introduce the basics of tidyr. It's a really useful package, and I think it needs to be introduced in the context of a data carpentry workshop

Figshare has changed download links - need to update links to data in lesson

It seems that Figshare has changed the download links. So the links to the data in this lesson are no longer correct.

This affects

02-starting-with-data.Rmd
05-visualization-ggplot2.Rmd
CONTRIBUTING.md
index.md

combined.csv is now https://ndownloader.figshare.com/files/2292169
surveys.csv is https://ndownloader.figshare.com/files/2292172
plots.csv is https://ndownloader.figshare.com/files/3299474
species.csv is https://ndownloader.figshare.com/files/3299483

Broken links to R ecology lesson materials

The links on the DC lessons page to the ecology introduction to R and dplyr lessons gives 404 errors. This appears to be because those modules only exist as .Rmd files (no corresponding html files in the lesson repo). Cross-posting to website and lesson repos to try to get fast fix.

Thanks!

possible typo in data frame lesson

From Lindsay Brin:

colors <- c("red", "green", "blue", "yellow")
counts <- c(50, 60, 65, 82)
new_dataframe <- as.data.frame(cbind(colors, weights))
class(new_dataframe)

In the new data frame, it should be colors and counts, not colors and weights.

Also, it says to check it with class(), but then talks about automatic conversion of data types, which you would learn by checking it with str() rather than class().

Solutions to challenges

Some solutions to challenges are missing (for example "What was the heaviest animal measured in each year? Return the columns year, species, and weight."). It would be good to have a file which includes them all and maybe named "solutions.R"? Currently some stuff is in "handout" (?) but that is not obvious to the instructors.

Set up continuous integration?

While #151 wasn't caused by build errors, there were some Jekyll complaints that could have been caught by CI. I'd be happy to put together a travis build for this repo if there is interest.

Small edits to dplyr

I’d recommend choosing the RStudio mirror.

We should maintain a consistent tone in the lessons. Perhaps change it to We recommend the RStudio mirror.

The package dplyr is a fairly new (2014) package

Package is now well outside new and widely used everywhere. We should update this.

Objectives in "Writing Functions"

The objectives in the "Writing Functions" episode are:

  • Explain and identify the difference between function definition and function call.
  • Write a function that takes a small, fixed number of arguments and producing a single result.
  • Correctly identify local and global variable use in a function.
  • Correctly identify portions of source code that will be displayed as online help, and in particular distinguish docstrings from comments.
  • Write short docstrings for functions.

However the latter three are not covered in this episode. Variable scope has its own episode and docstrings/online help are part of the "Programming Style". Therefore this three points should be moved to their respective episodes.

Possible confusion about the term `object`

When working through the lesson material I noticed that what I thought of as variables was consistently called objects. Originally coming from a non R programming language I was slightly confused. So I did some research and it indeed seams to be more precise to use object instead of variable as the R language specification states:

In every computer language variables provide a means of accessing the data stored in memory. R does not provide direct access to the computer’s memory but rather provides a number of specialized data structures we will refer to as objects. These objects are referred to through symbols or variables. In R, however, the symbols are themselves objects and can be manipulated in the same way as any other object. This is different from many other languages and has wide ranging effects.

https://cran.r-project.org/doc/manuals/r-release/R-lang.html#Objects
So as the wording is already correct and consistent throughout the whole lesson this is not really an issue. But there might be others like me that get confused so it might be a good idea to point out this special meaning of object or that they are sometimes also referred to as variable in the lesson material or in the instructor notes. This might be a possible pitfall for instructors as well as learners.

Explain better what goes into `group_by()`

Every time I teach dplyr, I see learners try to put the continuous variable they want to summarize in the group_by() call (they confuse the concept of select() and group_by()). I have tried to explain this in several different ways but I haven't found a good way to make it stick. Does anyone have suggestions to clarify this better?

Cleanup after splitting out repo

The README, CONTRIBUTORS, CONTRIBUTING, and perhaps other areas of this repo will need be to updated to reflect the fact that it is now a stand alone repository.

index page title not correct

The front page for the lesson (http://www.datacarpentry.org/R-ecology-lesson/) has "Data Carpentry {{ page.topic }} for {{ page.domain }}" on it, and also "Data Carpentry R for data analysis for Ecology"

Since there's an index.html and index.md in the repo, it's not quite clear where the rendering is coming from. It would be good to get rid of the "Data Carpentry {{ page.topic }} for {{ page.domain }}" text though.

code issues with R chunks in ggplot2 episode

From Lindsay Brin:

There are some code issues with R chunks, right after “Now we would like to split line in each plot by sex of each individual measured.“ I don’t actually see any problems with the Rmd file on github, but the html file is missing the “<div class="sourceCode”>…” part there, so maybe it just wasn’t re-knit?

Improve the narrative in the ggplot2 lesson

We present 3 types of plots in the ggplot lesson:

  • scatterplots
  • boxplots
  • time series

However, it feels that this lesson could be made more interactive and the 3 types of plot presented don't seem well justified and/or included in a narrative.

Editing R & SQl lessons

hi @emhart and @fmichonneau . I am going through the SQL lesson now. I have some suggestions to clean things up a bit and will test things this week. For example i think we need to close the connection each time we open it. I have some other ideas as well - all quite minor.

How would you prefer we move forward with editing this lesson and the SQL one? Should I PR on this repo, and suggest changes that way?

Thanks Ted for all of your work on this lesson! it's looking great.
Whatever you guys prefer, i'm happy to do.
leah :)

referring to a warning as an error, should we change?

At the very bottom of 01-intro-to-R.html, there's a challenge where learners are asked to take the mean of a string vector that "appears" to contain numbers. As desired (by us) the command returns a warning, but the text refers to it as an "error". There is mention earlier in the lesson that learners should "Try to use the correct words to describe your problem" (and of course there are real differences) so I think it would be consistent to change the text to "warning". If people agree, I'll submit a PR.

Remove this block from README

image

This is a production repo. It should not contain miscellaneous notes and other material. If need be, we could have R-ecology-staging or something.

Shiny lesson

At the Federal Reserve Board workshop where we used the gap minder dataset @msmorul developed and taught a Shiny lesson

https://github.com/datacarpentry/2015-06-30-FederalReserveBoard/wiki/Shiny-Lesson

It would need a little more detail to hand to an instructor and to use the ecology dataset, but there were good comments on the lesson in the survey. It might be a module to develop as something more advanced or a demo at the end of R. It definitely has the 'wow' factor and is also useful.

New RStudio notebook

There's a new RStudio notebook out: http://rmarkdown.rstudio.com/r_notebooks.html

"An R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input."

It's still only available in Preview Release, but once it's in the standard version, it seems like it could be good for teaching. It would be interesting to hear comments if anyone gets a chance to try it out.

typo in dataframes page

Just above the "Inspecting data.frame Objects" header, there's a variable called "datarame" (twice) - should probably be changed to "dataframe" before a beginner tears out their hair debugging

[Could someone please tag this as beginner friendly?]

cbind to create data frame in challenge in 03-data-frame

Couple of issues with this example.

As written example is:

colors <- c("red", "green", "blue", "yellow")
counts <- c(50, 60, 65, 82)
new_dataframe <- as.data.frame(cbind(colors, weights))
class(new_dataframe)

First, typo is line 3 of this. should be counts and not weights.
Second, cbind returns a matrix and thus does more conversion here than I think we want to introduce.

I'd suggest re-doing this as

colors <- c("red", "green", "blue", "yellow")
counts <- c(50, 60, 65, 82)
new_dataframe <- data.frame(colors, weights)
class(new_dataframe)

Maybe a mention of cbind and rbind here as well, but pointing out the matrix as return...

Thoughts?

I can do a PR, but won't have time for a bit.

Oh and Kudos to the July 26-27 2016 BU workshop. They found these!!!

Confusing instructions in 02-starting-with-data.Rmd

There is first:

We are going to use the R function download.file() to download the CSV file that contains the survey data from figshare, and we will use read.csv() to load into memory (as a data.frame) the content of the CSV file:

Then

if (!require("curl")) {
install.packages("curl")
}
library("curl")
curl_download("https://dl.dropboxusercontent.com/u/22808457/portal_data_joined.csv",
"data/portal_data_joined.csv")

I don't know if most people can easily install the curl package on their windows machines. Also not sure why we need to deal with https at this time (one more layer of overload). Can't we post the surveys file somewhere non-https and use base R's download.file that everyone has?

# e.g.
download.file("http://inundata.org/surveys.csv", destfile = "surveys.csv")

An example how to explain the factors

Factor labels are hard to understand. Instead, levels could be explained using a simple example how to reorder them, see the code below:

day <- c("day1", "day5", "day10", "day1", "day5", "day10", "day10", "day1")

day10 comes second

fDay <- factor(days)
fDay
levels(fDay)

day10 comes last

fDayOrdered <- factor(days, levels=c("day1", "day5", "day10"))
fDayOrdered
levels(fDayOrdered)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.