datacarpentry / semester-biology Goto Github PK

View Code? Open in Web Editor NEW

73.0 9.0 113.0 392.81 MB

Forkable teaching materials for course on working with data in R

Home Page: http://datacarpentry.org/semester-biology

License: Other

HTML 75.53% Ruby 0.01% Python 0.79% CSS 1.52% R 0.54% Jupyter Notebook 21.40% JavaScript 0.03% TeX 0.19%

teaching-materials biology data-science data-carpentry sql r spatial-data

semester-biology's Introduction

Data Carpentry for Biologists - Semester Course

Forkable teaching materials for course on working with data in R.

This repository contains the complete teaching materials (excluding exams and answers to assignments) and website for a university style and self-guided course teaching computational data skills to biologists. The course is designed to work primarily as a flipped classroom, with students reading and viewing videos before coming to class and then spending the bulk of class time working on exercises with the teacher answering questions and demoing the concepts.

Helpful information is available regarding the structure and function of the course and website materials for customized development and delivery of the course.

We encourage collaborative development. This repository was used by @ethanwhite to teach a version of this course (Fall 2016) at the University of Florida. The course remains under active development. We welcome contributions to all aspects of the course/site and are especially seeking exercises and assignments for a range of disciplines. Key site and course materials are available as templates for contributions of new materials and other materials that are specific to the course (e.g., the syllabus) are developed in a way to facilitate easy customization.

Here are some examples of courses using the infrastructure and material from this course:

Data Science for Biologists at Virginia Commonwealth University
Data Science for Agriculture at Oklahoma State University
Data Visualization for Plant Pathologists at the University of Florida
Data Science for SAFS at the University of Washington
Data Carpentry for Pharmacists at the University of Health Sciences and Pharmacy in St. Louis
R Programming for Biologists at Stonehill College
Data Carpentry for Ecologists at the University of Georgia
Introduction to Data Analysis for Aquatic Sciences at the University of Washington
Data Science in Omics Introduction at Oklahoma State University
Ecoinformatics at Kenyon College
Data Management for Biologists at the University of Minnesota
Introducing Agroecology: The Basics of Agroecology for Practitioners at the University of Florida
Data Science with R

Where is everything

Core teaching materials are stored in exercises/, lectures/, and materials/.

Class specific materials are stored in the syllabus, schedule and assignments/.

Most of the other folders and files support creating the course website using Jekyll.

How to contribute

We use standard GitHub flow, so fork the repository, add or change material, and submit a pull request.

The goal of making this course forkable is to facilitate collaboration on developing this kind of material for university courses. The central component of a flipped computing course is the exercises, so one of the primary forms of contribution will be adding exercises to the pool of exercises. Individual instructors can then select from a rich pool of exercises the ones that fit the topics, languages, and scientific domains that best fit the material they want to cover in the course.

There are lots of great resources for being introduced to the individual concepts being taught in courses like this. Our philosophy is to use and improve these external resources when available instead of creating new versions of the same content. In particularly we actively use Data Carpentry and Software Carpentry workshop materials. However, in cases where the necessary material doesn't exist elsewhere it can certainly be added here.

Accessibility

New pull requests to this site are scanned using pa11y and pa11y-ci to ensure that additions to the site follow best practices for accessibility. If you discover any accessibility issues with the site please open an issue and we'll get them fixed.

Using Jekyll to build your own course website

Simple setup

The website is setup to be easy to run automatically through GitHub:

Fork or import the repository to https://github.com/yourusername/semester-biology.
Update # Setup information in _config.yml in the main directory for proper site rendering.
- You must push this change to your repository to build and browse your forked version.
- In a few minutes you should be able to see the site at: https://yourusername.github.io/semester-biology/
Edit any of the markdown (.md) files
Commit and push the changes
- The changes should now be reflected on the website
If you want to use a custom domain name instead of github.io, follow GitHub's instructions for setting up a custom domain.

If you have any problems please let us know and we'll be happy to help.

Previewing changes locally

If you want to view your changes locally, before pushing them to the live website, you'll need to setup Jekyll locally. GitHub provides a good introduction on how to do this.

If you have Jekyll properly installed, you can then run

bundle exec jekyll serve --baseurl ''

from the command line and navigate to http://localhost:4000/ in your browser to preview the current state of the website.

Creating new pages

If you want to add new exercises, lecture notes, etc. you do this by creating a markdown file in the appropriate directory. Each markdown file needs to start with some information that tells Jekyll what the page is. This is done using something called YAML, and the standard YAML for a new exercise would look like this:

---
layout: exercise
topic: Topic group of exercise
title: Name of exercise
language: [R, Python, SQL]
---

This is placed at the very beginning of the markdown file and provides information on what kind of content it is (e.g., exercise, page, etc.), the title of the page, and what language it applies to.

The page should then be available at a url based on where the file is located and what the file name is. So if you created a new exercise in the exercises/ folder called my_awesome_exercise.md it would be located at:

Locally: http://localhost:4000/exercises/my_awesome_exercise

After pushing to GitHub: https://yourusername.github.io/semester-biology/exercises/my_awesome_exercise

Dependencies

Building the site locally requires a local Ruby installation with 3 packages (gems):

jekyll
github-pages
jekyll-sitemap

For help with installation see:

One you have installed Ruby and the jekyll gem go to the root of the site repository and run:

bundle install

to install the rest of the dependencies.

Acknowledgements

Development of this material is funded by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through Grant GBMF4563 to Ethan White and the National Science Foundation as part of a CAREER award to Ethan White.

semester-biology's People

Contributors

Stargazers

Watchers

Forkers

brymz ethanwhite dmcglinn juefish palderman burkesquires atredennick oxpeter davharris renytysonmoore mvevans89 potterzot dylancraven grace89 dcsemester libyarlaylab aphanotus humberto-ortiz idi-bd2k ryanpeek josephaandreoli dlavrov aaronsheldon malynda mmesbahu dlizcano kristinariemer tvanlaar mikoontz bhargava-morampalli civcm cactusolo ory-data-science geauxdojang biol390 sr320 ericlind globalecologybiogeography trec-agroecology bluegenes kbjornen catherinehulshof rfurrow refurrow mchiapello rsh249 htnani deletunde zakher21 haithamsghaier amrcode1 jldimond ravipurama fei0810 oceanbird l-dao waughsh heathervanh fdbesanto2 barejaa aewhite100 kachieng gpsykes garezana r-tutorials shucez amitkumardeol timothylwarren atyre2 norberello albertl standardgalactic dncgst carzamora pyoelii smwindecker andrewmarx rahm0054 abdulrhmankamel punama paulesantos mtoqeerpk philarevalo quantmarineecolab eastonwhite zuber-bioinfo konradhoeffner gejielin jpomz ciksuriyati owensgl danieleweeks sathish-t beastyblacksmith cyrillemidingoyi bleds22e asntech

semester-biology's Issues

R curriculum discussion

Re: Basic-Python2.md exercise comments.
I include a mention of built-in functions in this exercise. I'm not sure that custom functions are required here, so maybe it would be best to introduce the idea in a later lesson.
We should introduce the various data classes (character, factor, numeric) and organizational structures (list, matrix, array, data frame) in an early lesson. Not sure where is the best place.

Create R versions of all Python code solutions

Outside of the repository for the moment

Split out code solutions for Python exercises into individual solutions files

Outside of the repository for the moment

Update website structure to reflect single 1 semester class

Create R versions of all Python lecture material

Update answer to SQL 5 to match change in dates in exercise

Create database code solutions

Outside of the repository for the moment

Add Tidy Data problems based on Data Carpentry messy data

Rather than have students use a database with some problems for several weeks and potentially internalize poor structure, we've switched over to using the Portal Project Teaching Database for the main SQL exercises. This means we need messy data for database structure/tidy data problems.

Data Carpentry has messy data designed for looking at this problem. See, e.g.,
http://datacarpentry.github.io/spreadsheet-ecology-lesson/01-format-data.html

We should concert the current database structure problems to something based on this data, or add new "Tidy Data" problems based on this data and tweak the existing database structure problems to have the students download a the original database file that has all of the structural issues in it.

Add schedule to nav bar

Fix all_assignments.md links

Add an exercise that loops over multiple files

A nice example of looping that applies to a lot of folks' research is to loop a bunch of different files doing the same thing to each file. We should develop a problem around this. If there's something in the Genomics lesson space (see: http://datacarpentry.github.io/lessons/) that works it would be nice to add a genomics exercise or two.

Add text/graphic solutions to Graphing and Statistics exercises

(Related PRs - #54, #56, #57, #58)

How do we organize 'complex' solutions that include multiple image (.jpg) files? text and image files?

How do we organize output .csv files? name them by the exercise name? name them by the name given to the output file in the exercise?

Translate Lists-2 to Lists-9

(Subissue #1 )

There are nine Lists exercises, but only Lists-1 is used in an assignment. Are we going to use them?
Lists-2 and Lists-3 follow-up on Lists-1.
Lists-5 can introduce matrices.
Also, we should come up with an exercise to introduce lists. I use them when I have tables or lists of multiple data types that is entered from the script. Data with multiple data types in .csv get entered as data frames.

***I've been meaning to have this 'what to do with unused exercises` chat more broadly.

Remove 'commit' language from 'Sql-updating-records'

There is some language about committing that isn't appropriate when using this as a stand alone SQL problem.

Add links to output solutions

Add these links to:

The bottom of each problem in an assignment
Parenthetically after each exercise on the Exercises page

Code blocks

Reduce length of code blocks to match web translation.
Code chunks that take up a whole line should be placed in a code block.

Remove 'smart quotes' from .md

(Subissue #1)

Consider revising schedule.md

While looking through the schedule.md, it struck me that we could organize the order of videos/readings to follow the order of the exercises. In my mind, the structure would look like:

Topic 1 [reading link] | [video link]
Topic 2 [reading link]
Topic 3 [video link]
etc.

Create R versions of all Python exercises

Create outcome solutions for all exercises

For both R and Python exercises we need a way to help both self-directed learners and university students check their work, but without giving them answers in code that they could just cut and paste for assignments. By showing them what the outcome of successfully running the code should look like, we both clarify the intent of the question and help students check their work. This also begins to introduce the benefits of testing.

The result here would be a new folder containing the "solutions" (i.e., what the output should look like) for each exercise, using the same naming structure as the associated exercise. Separate solutions will be necessary for R and Python since the details of the output won't be the same.

Ouput Pages ERROR '404: Page not found'

On both ethanwhite.github.io and my local host, the Output links return a 404: Page not found error.

I made sure that the links match the repo directory, so I;m not sure what's going on:
http://localhost:4000/solutions/Combining-the-basics-2-Python.txt
http://ethanwhite.github.io/solutions/Expressions-variables-1-R.txt

Convert database exercises to SQLite

For database exercises that don't involve Reports and Forms, convert these exercises to SQLite.

Develop start-up guide for new self-guided students

As per #203.

Update SQL answers to use Portal Teaching Database

I called a last minute audible and switched to using:
http://figshare.com/articles/Portal_Project_Teaching_Database/1314459

for the database. All of the solutions are still based on the full dataset on Ecological Archives, so we'll need to update the solutions.

Update urls to gh-pages links

(Subtask of #1 )

OLD: [Functions 5]({{ site.baseurl }}/exercises/Functions-5/)

NEW: [Functions 5]({{ site.baseurl }}/exercises/Functions-5-R/)

Translate Making-choices-4

(subissue #1)

I will skip this exercise for now because it is not in the assignments list, but I'd like to revisit it as it looks like a strong exercise.

Create database output solutions

As in #2 it is useful to show the students what they should be getting as output.

Move relevant pieces of Python and R lecture material into main Data Carpentry repositories

Add Advanced Python Exercises' Code and Output Solutions

Lists of problem tasks and indentation

Determine a standard use of bullet and numbered list. Edit to standard indentation.

Translate Loops-4 & Loops-5

(Subissue #1)

Loops-4 seems like a useful extension of Loops-2 (old name: 'Loops-3')
Not sure what Loops-5 is about.

list : data.frame :: vector : matrix

(Subissue of #12)

Ran into subtle differences for output from Functions-5 and Functions-6

Now, that I have an explanation I wonder if/where it fits in the curriculum.
https://twitter.com/ZackBrym/status/595651701945794561

It might fit along with our introduction of dplyr [Documentation Link].

Update file names

(mentioned in discussion for #1)
(related to #35)

Make sure file names / titles that were changed get updated in all files and urls.

Jeckyl Formatting

(For down the road.)

I saw the newest Software Carpentry lesson (http://swcarpentry.github.io/web-data-python/) and it made me think about the way we format our exercises and how that will look when rendered by Jeckyl.

I'd like to look through a couple examples to gather some thoughts and chat with you sometime. We can also look through some of the Data Carpentry lessons, though they like look they are still mostly 'generic' github wiki pages.

Change database reading to Data Carpentry material

Once datacarpentry/sql-ecology-lesson#34 or datacarpentry/sql-ecology-lesson#31 goes in we should update this material to use the Data Carpentry material for pre-class reading.

^b --> <sup>b</sup>

Strings-4-R.md vs Functions-3-R.md

Develop new syllabus as combination of ProgBio and Stat545

ProgBio: http://www.programmingforbiologists.org/
Stat545: https://stat545-ubc.github.io/

Advanced Topics

(sub issue #1; related to #46)

I have gone through the advanced course exercises and chosen a small(ish) set of topics and exercises I think would be worth considering for inclusion in the project. My idea is that these would provide an opportunity for classroom students to continue on from the course and have a direction for what is next if they are to continue pursuit of scientific programming and for at-home students to learn a handful of important, but a bit more complicated, skills.

After completing this list, #1 will be complete. We can also decide any or all are not worth it, and can be done with #1 now.

The list of exercises breaks into two categories.

New 'advanced' skills:
-'Higher Order Functions 2'
-Regular Expressions 1'
-'Debugging'
-'Tests 1'
Challenging review:
-'Basic 1'
-'Basic 2'
-'Making Choices 4'
-'Scientific Python 3'

Add subtitles to assignments

I will complete this after all of the PRs with assignment changes are closed to avoid conflict.

Develop R-SQL Assignment and Exercises

@ethanwhite I'll need some direction before I can make progress on this. Is there a package to work from? What kind of exercises do you have in mind?

Create Unified Weecology style guide.

We could include commentary on code style in markdown, also.

Organizing page links by title always chooses Python exercises

(Subissue #85, Related PR #91)

assignments/index.md directs Jekyll to arrange find a list of exercises and arrange an assignments page using the exercise titles. Python and R assignments share titles, which means that currently the R assignments list is populated by Python exercises. Will have to code in the language from yaml here or change the titles throughout.

Issues with formatting in assignments

Something got mixed up a little in the formatting of assignments. Compare:

I think this is happening because the assignment name is capitalized. See the commit message for my solution to this:
ethanwhite/progbio@79c5bbc

This means that the assignments need to start with lower case letters and the exercises start with capital letters. Yes, it is awful.

Descriptive Titles

(subissue #1)

'Graphing 3' used to be called 'Graphing adult size vs newborn size'. Simplifying the descriptive title to a number made me wonder if all of the problems should have a descriptive title. The descriptive title would identify the new problem/solution presented in the exercise. One of the strings exercises might get a descriptive title of 'Basic stringr functions'. A making choices exercise might get a descriptive title of 'Using mathematical operators' or 'if else statements'.

I think it makes sense to organize the directory using the current Name-X 'titles' and add a 'descriptive title' or 'subtitle' to the exercise yaml.

print x

print(x)

`dplyr` module

(Reference PRs #49, #60)

We need a module set that introduces dplyr. I like the Dr. Granger - shrub carbon problem set for this. The order would be 'Scientific 0', 'Combining Basics', 'Statistics 2'.

datacarpentry / semester-biology Goto Github PK

semester-biology's Introduction

Data Carpentry for Biologists - Semester Course

Where is everything

How to contribute

Accessibility

Using Jekyll to build your own course website

Simple setup

Previewing changes locally

Creating new pages

Dependencies

Acknowledgements

semester-biology's People

Contributors

Stargazers

Watchers

Forkers

semester-biology's Issues

Recommend Projects

Recommend Topics

Recommend Org