Giter Site home page Giter Site logo

tidyverse / datascience-box Goto Github PK

View Code? Open in Web Editor NEW
907.0 61.0 415.0 496.14 MB

Data Science Course in a Box

Home Page: https://datasciencebox.org

License: Other

CSS 10.31% R 1.65% TeX 0.10% HTML 35.69% JavaScript 51.65% SCSS 0.59%
rstats r education teaching data-science

datascience-box's Introduction

Netlify Status

Data Science Course in a Box

Data Science in a Box contains the materials required to teach (or learn from) an introductory data science course using R, all of which are freely-available and open-source. They include course materials such as slide decks, homework assignments, guided labs, sample exams, a final project assignment, as well as materials for instructors such as pedagogical tips, information on computing infrastructure, technology stack, and course logistics.

See datasciencebox.org for everything you need to know about the project!

Note that all materials are released with Creative Commons Attribution Share Alike 4.0 International license.

Questions, bugs, feature requests

You can file an issue to get help, report a bug, or make a feature request.

Before opening a new issue, be sure to search issues and pull requests to make sure the bug hasn't been reported and/or already fixed in the development version. By default, the search will be pre-populated with is:issue is:open. You can edit the qualifiers (e.g. is:pr, is:closed) as needed. For example, you'd simply remove is:open to search all issues in the repo, open or closed.

If your issue involves R code, please make a minimal reproducible example using the reprex package. If you haven't heard of or used reprex before, you're in for a treat! Seriously, reprex will make all of your R-question-asking endeavors easier (which is a pretty insane ROI for the five to ten minutes it'll take you to learn what it's all about). For additional reprex pointers, check out the Get help! section of the tidyverse site.

Code of Conduct

Please note that the datascience-box project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

datascience-box's People

Contributors

actions-user avatar bboti86 avatar davidkane9 avatar debbieyuster avatar gl-eb avatar jananiravi avatar jarvisc1 avatar jonthegeek avatar kcarnold avatar lucymcgowan avatar magic-lantern avatar mine-cetinkaya-rundel avatar naclomi avatar pat-s avatar spcanelon avatar staceyhancock avatar stats-tgeorge avatar vcannataro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datascience-box's Issues

Lab 04: Wrangling spatial data

Finish Denny's / La Quinta analysis, learning goals:

  1. joins
  2. using a custom function (not writing one)
  3. use dplyr single table verbs with less scaffolding than before
  4. actually address the mitch hedberg joke quantitatively

Course website?

Should this repo include a sample course website as well? If so, should that website be made in blogdown? One advantage of using a blogdown website is that Rmd files for slides, assignments, etc. could all be rendered with build_site() but placing those into a blogdown structure seems like it would be nesting them too deep in the files.

Simpson's paradox images

u1_d05-confounding-simpsons-paradox

  • Better images for cereal study and its three explanations
  • Better visual for Simpson's paradox

Better styling for slides

The styling of slides could be better. This issue should be revisited after a batch of slides have been completed, and if any improvement is needed, changes should be made to slides.css.

LICENSE

Currently using the same license as tidyverse, which seems appropriate. Confirm that this is the right choice.

Create Exam 01

Focusing on data wrangling and viz, using the NYC flights data

issues knitting Rmd of labs

I am working with your labs and I just have a question.

I can only knit the Rmd of the lab if I previously install.packages("tufte") and library("tufte") prior to knitting. Is there a way to have knitting happen without having to install first?

BTW - I am new to all of this - just trying to modernize my stats course - and I think your material is excellent.

(if this is not the proper way to ask these questions - please let me know a better method)

Make it easy to browse the lecture slides

Currently, if I want to browse the slides for Lecture 2 (or whatever), I need to generate them myself. It is cool that you make this possible (although, to be honest, I have not tried) but it would also be nice if there were a PDF (or other version) in the directory for each lecture that one could browse.

HW 01: Data wrangling and visualization

Dataset: Pokemon Go

Learning goals:

  • Load data
  • Recreate a plot based on an image (require both filtering with dplyr and plotting with ggplot2)
  • Use dplyr to calculate conditional probabilities (group_by -> summarise)
  • Make conclusions in context of the data

Website glitch: footer

Reload homepage or navigate to any page, the "Built with..." text at the bottom left corner jumps up and down

Lab 08: Bootstrapping

Learning goals:

  • Build intuition for estimation via bootstrapping
  • Implement bootstrapping with infer
  • Stretch goal: Build the bootstrap distribution "manually" with for loop or purrr?

Pedagogy write up

One or two pages on structuring teams and team exercises + projects.

#35 is also relevant to reference here.

New lab: Shiny

Build a Shiny app or flexdashboard. Not sure which yet.

Write Lab 01 - Hello R!

Should include intro to RStudio Cloud, working with git within Cloud, tidyverse. Use datasauRus for fun.

Data Science Box Onboarding Review

From @vmellison

Hello Section

Comments:

Some grammatical errors/typos

Question/Comments:

  • For instance, your philosophy for the course begins with "Assuming you like chocolate and strawberries...". Let's suppose I don't like strawberries, but I know how to bake cakes (as the instructor). Can you comment on what these "strawberries" might be? (Like what is your philosophy for why you chose these particular cakes? Or, why choose these topics in data science, but not others?)
  • Can you comment on... "How can the teacher best prepare for this class?"
    • How much of this course requires "backend" teacher knowledge?
    • I saw that there is a book, but it looks like it is for the computing section. If there is not a book with topic content, what resources/books/articles should the instructor read up on in order to have a rooted knowledgebase on what is going to be taught in this course?
  • Also... "How does one go about assessing students in the initial phases of "cake baking"?"
    • I think once you get down to the "underlying schematics" part of a course (at the end), more instructors would be more likely to agree on what the learning objectives should be and what students should be assessed on. However, I'm wondering if the initial part of the course where you're showing students a "specific strawberry cake" it may be a little less clear about what to assess the students on and what the learning outcomes should be.

Content Section

I'd also be interested in the following:

  • What parts of the course structure can you alter? Which should you not alter?
  • What are the learning objectives that students should understand from the slides? How do these learning objectives differ/relate/connect to/from what's in the other course materials?

Infrastructure

  • This part seems pretty straightforward and helpful.

Move data files into one high level folder

There should be a /data folder in the project root directory so that copies of data files are not resaved if using the same dataset for multiple lectures, assignments, etc.

Generic advice on how to handle Day 1 with RStudio Cloud

The section on RStudio is excellent! But I am confused exactly how you handle Day 1, and how you would recommend others (without access to Duke's infrastructure) should handle Day 1. In particular, how do you get to a plot within 10 minutes? What are the precise steps?

I imagine that someone just using public RStudio cloud services would need to have each student sign up for RStudio cloud. Is that correct? Do you assume (require?) that each student come with a laptop to Day 1? Or can they sign up from their phones easily?

Or maybe, on Day 1, they don't need to have their own account? You can just send them the magic link, they click it and, even without an account they can start to play with things?

Add sample syllabus

A sample syllabus (with policies, grading details, etc.) can be helpful especially for new teachers. It should be clearly noted that this is just meant to be inspirational/exemplary, not a requirement for teaching this content.

Update usage of FiveThirtyEight data

fivethirtyeight package is updated on CRAN

  • check that majors data is corrected
  • if so, update the slides to use the data directly from the package

What's the best place for coding style?

Also, how long is it? It's < one "lecture", what best could it be combined with? Or would adding exercises to bulk up be better?

Rethink hands-on content (or lack thereof) and placement of p1_d08-coding-style

Start a data package

  • Start a data package for all datasets used in this project
  • Consider any read_csv call in the slides to load the data from the package as long as the point isn't to show how to load a CSV file

(This issue touches multiple projects)

Lab 12: Web scraping

Learning goals:

  • Use rvest to scrape from a semi-structured HTML table
  • Use purrr to automate scraping from many sites structured similarly.
  • Use EDA to analyze the data
  • Stretch goal: Use modeling to further analyze the data

Write Lab 05: Simpson's paradox

Use tidyverse tools learned so far to explore an important statistical concept - Simpson's paradox

  • bivariate visualization to motivate developing a hypothesis about relationship between two variables
  • multivariate visualization (via faceting) to motivate debunking the hypothesis from earlier

Articulate learning objectives

I'm keen to adapt these materials to fit in a quarter-long (10 week) class and learning objectives at two levels would useful:

  1. Course level objectives To help understand the scope of the class, to help get though our local course proposal system, and to guide any adaptation so that objectives are still met (or explicitly altered or dropped).

  2. Module level (or maybe even lesson level?) objectives To help understand how the pieces build on each other, as a guide on where lessons might be skipped, and as a guide for any swapping out of data or examples to ensure the original objectives are still met.

Are these articulated somewhere that I just haven't found yet?

Peer Evaluation for Final Projects

To help make sure the final projects are on track and that students have experience providing feedback to each other, I wonder if it would be useful to include a component to the final project where another group goes through your project and does some preliminary evaluation, say, a week before presentations.

This could help mitigate procrastination, give feedback to groups before presentation (and review the grading requirements for them), and also get students used to doing peer evaluations and giving feedback.

Lab 09: Hypothesis testing

Learning goals:

  • Build intuition for hypothesis testing
  • Application: A/B testing?
  • Use infer for implementation

Streamline package loading in slides

Currently some make it explicit, some do it in the background (e.g. library(tidyverse)).

Consider streamlining this, maybe all with logos (if they have)?

This will touch many sets of slides.

Where to host csvs?

Some of the materials use data stored in the dsbox package. Some use csvs or other files not in that package, specifically for students to learn to get a file off the web and use it in their analysis. Where should these files be stored?

multivariate vs multivariable ๐Ÿ˜ฌ

this is an awesome resource, thank you! I was looking through the data + vis slides and noticed there is a misuse of multivariate. This is sort of semantics, but I believe the line is actually referring to a multivariable data analysis (statistics has such annoying terminology because this obviously doesn't fit nicely with univariate / bivariate, but multivariate is actually referring to when there are multiple outcomes not multiple variables). I'm happy to submit a PR, but wanted to check if there is a different wording that would be preferred.

Multivariate data analysis - relationship between many variables at once, usually focusing on the relationship between two while conditioning for others

https://github.com/rstudio-education/datascience-box/blob/61fafe5d47132dedd9cee56fa07c116279545e18/slides/u1_d02-data-and-viz/u1_d02-data-and-viz.Rmd#L418

Create Exam 02

Focusing on modeling and inference

Dataset -- not sure as of now

Resources for data science eduation

I've posted some on the README, I would love to list more directly relevant resources, e.g. an instructor wants to teach data science, what should be on their reading list. This is an open call to reply with more resources to the issue, and I'll move them to the README list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.