Giter Site home page Giter Site logo

zdelrosario / data-science-curriculum Goto Github PK

View Code? Open in Web Editor NEW
24.0 24.0 18.0 65.62 MB

Home Page: https://zdelrosario.github.io/data-science-curriculum/index.html

License: Creative Commons Attribution Share Alike 4.0 International

Makefile 0.99% HTML 93.34% Shell 0.40% Python 0.46% R 1.98% TeX 2.83%

data-science-curriculum's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

data-science-curriculum's Issues

Overall Feedback

Here are my overall comments with some suggestions. I've put it all in one issue, because some are overlapping. Perhaps you could convert the ones you're creating into issues using the new task list feature if that's better for you!

Overall

  • I really appreciate that all your material is open under an MIT license! Thank you for doing this!

  • Though I did not go through every exercise, the quality of your material is quite high overall.

  • Having things in a GitHub repo is nice, it might make your content more accessible and discoverable if it was set up as a Bookdown (https://bookdown.org/yihui/bookdown/) or Jupyter Book (https://jupyterbook.org/intro.html)

  • I really think this work would benefit from a better description of what it is you're offering to the community. Frankly, paper.md sells the content in your repository well, well short!

    • This is related to the "Does the paper describe the learning materials and sequence?" point in the checklist

    • "Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?" --> this is also missing from the paper

    • "Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?" --> Also missing from the paper

  • You could provide a schematic or a sketch of the key "modules" (communication, data, model, reproducibility, setup, statistics, visualizations) and a suggested ordering of going through them. You could also bring in some pedagogical reasons (mentioned in your repo) for your suggestions such as blocked practice.

  • If you have the list of learning outcomes already prepared and ready, could you consolidate them into a single file so instructors could get a sense of what your material is all about without opening every file?

Add test for duplicate chunk names

We don't presently have a test (i.e. under ./tests/test_exercises.R) to check for duplicate chunk names. Presently I do this locally by (programmatically) running all exercises with NCmisc::list.functions.in.file() (see line), but that's a heavyweight solution.

Adding a test for duplicate chunk names (and 'runs without errors') would be useful enhancement!

Make Command not Found

Hi @zdelrosario!

First of all, thanks for this course! It's really cool and I'm excited to really dive into it. That said, I'm hitting a snag that I'm hoping you can help with.

I can't get the makefile to run because it says the command make is not found (I tried this both in Git Bash and the RStudio terminal). I've been working through some of the exercises unmade but I'd rather have them like they're supposed to be!

I'm running on Windows, and I'm attaching screenshots of the errors so you can see them.

make command not found
exercises command not founbd

License

For the content, the JOSE review checklist indicates that a Creative Commons license should be used. MIT is more appropriate for code

Usage

I think learners could use more support to get going. I wasn't sure whether I'm supposed to be working within the Rmd files or if make should have generated html files that I worked with. When I didn't find html files, I worried that something was wrong with the Makefile.

The links in the README and the files in the paper refer to the raw versions of the files in exercises/ rather than exercises_sequenced/. I think it would be better, from the learner's perspective to refer to things in exercises_sequenced/ since that's the directory they are supposed to be working in.

Perhaps the repository could include a Rproj file so that learners can open Rstudio in the correct working directory?

Improve scaffolding for c07

Connection between probability and expectation is not very clear; could scaffold better. Consider adding notes to e-stat02

Sequence

Sequence.md

  • It's nice that you have the sequence, but it would be better for learners if the list of files were linked. (This comment should be ignored if you choose to create a bookdown)

Add documentation on use cases

Should document how to use this repo for different use cases; for instance, when to use the template (setting up with GitHub action infrastructure) and when to fork (to make a PR).

Missing solution for e-stat03-descriptive

Currently, there are two task blocks under the Observations section - guessing the second should be a solution block?

<!-- task-begin -->
- For what values of `slope` is the correlation positive?
- For what values of `slope` is the correlation negative?
<!-- task-end -->
<!-- task-begin -->
- For what values of `slope` is the correlation positive?
- For what values of `slope` is the correlation negative?
<!-- task-end -->

Statement of Need

The statement of need could be more clear in the README and in the paper. There are active learning materials out there - e.g., The Carpentries, Codeacademy. What is the particular niche that your materials are filling?

I'm not sure that the "Notably, if teachers held ..." sentence is constructive. Surely the published books, blog posts, and videos have instructional value even if they aren't examples of active learning materials?

Dead link in setup page

Hi! Running along at home with a friend, found a dead link. On this page: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.md

The "download the source" link is dead for me: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.Rmd

Screenshot from 2020-07-08 19-10-52

Let me know if you don't want issues raised, figured it could be helpful. No pressure to address them, for example this can be worked around easily with cloning.

Day 29 Deprecated Syntax

Day 29 (data 09) q4-setup-count has deprecated code:

rowAny <- function(x) rowSums(x) > 0
countna <- function(df, vars_lagged) {
  df %>%
    filter(rowAny(across(vars_lagged, is.na))) %>%
    dim %>%
    .[[1]]
}

countna(df_q3, c("region"))

This works now, but updated syntax would probably be ideal

Math not rendering

I'm on the ubuntu RStudio, which may make a difference. Noticed this first in e-stat02-probability, although that may not be where it first pops up. When knitting the R documents to view the pretty version, math looks like this:
Screenshot from 2020-09-09 20-34-36

It does render in the edit view, oddly enough:
Screenshot from 2020-09-09 20-35-18

But that doesn't apply for values in the middle of a paragraph, so those can be hard to match up with the equations.

Installation instructions

I would remove the $ prompt from before make in the README.md. A learner may make the mistake of entering "$ make" after the prompt. Actually, I wonder if I'm supposed to run make main to build the materials and then make challenges to build the challenge problems. The latter gave me an error...

$ make challenges
cd ../data-science-challenges/challenges; make
/bin/sh: line 0: cd: ../data-science-challenges/challenges: No such file or directory
cd exercises; make
make[2]: Nothing to be done for `all'.
./prepend.py
cp -rf exercises/images exercises_sequenced/.
cp -rf exercises/data/tiny.csv exercises_sequenced/data/.
cp -f ../data-science-challenges/challenges/*-assignment.Rmd challenges/.
cp: ../data-science-challenges/challenges/*-assignment.Rmd: No such file or directory

Is requiring learners to build the sequenced exercises and challenges even necessary? I can't imagine the files are that big that you couldn't include them in the repository. Requiring the learners to run make assumes a level of sophistication that many might not be capable of and could limit Windows users who do not have the linux subsystem installed and possibly Mac users that don't have xcode command line tools installed. It also requires learners to have python installed. Instructions are provided for these steps, but I feel that it's unnecessary for someone who is trying to get started with RStudio.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.