The data-science-curriculum from zdelrosario

data-science-curriculum's Issues

Overall Feedback

Here are my overall comments with some suggestions. I've put it all in one issue, because some are overlapping. Perhaps you could convert the ones you're creating into issues using the new task list feature if that's better for you!

Overall

I really appreciate that all your material is open under an MIT license! Thank you for doing this!
Though I did not go through every exercise, the quality of your material is quite high overall.
Having things in a GitHub repo is nice, it might make your content more accessible and discoverable if it was set up as a Bookdown (https://bookdown.org/yihui/bookdown/) or Jupyter Book (https://jupyterbook.org/intro.html)
I really think this work would benefit from a better description of what it is you're offering to the community. Frankly, paper.md sells the content in your repository well, well short!
- This is related to the "Does the paper describe the learning materials and sequence?" point in the checklist
- "Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?" --> this is also missing from the paper
- "Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?" --> Also missing from the paper
You could provide a schematic or a sketch of the key "modules" (communication, data, model, reproducibility, setup, statistics, visualizations) and a suggested ordering of going through them. You could also bring in some pedagogical reasons (mentioned in your repo) for your suggestions such as blocked practice.
If you have the list of learning outcomes already prepared and ready, could you consolidate them into a single file so instructors could get a sense of what your material is all about without opening every file?

Add test for duplicate chunk names

We don't presently have a test (i.e. under ./tests/test_exercises.R) to check for duplicate chunk names. Presently I do this locally by (programmatically) running all exercises with NCmisc::list.functions.in.file() (see line), but that's a heavyweight solution.

Adding a test for duplicate chunk names (and 'runs without errors') would be useful enhancement!

Make Command not Found

Hi @zdelrosario!

First of all, thanks for this course! It's really cool and I'm excited to really dive into it. That said, I'm hitting a snag that I'm hoping you can help with.

I can't get the makefile to run because it says the command make is not found (I tried this both in Git Bash and the RStudio terminal). I've been working through some of the exercises unmade but I'd rather have them like they're supposed to be!

I'm running on Windows, and I'm attaching screenshots of the errors so you can see them.

License

For the content, the JOSE review checklist indicates that a Creative Commons license should be used. MIT is more appropriate for code

Usage

I think learners could use more support to get going. I wasn't sure whether I'm supposed to be working within the Rmd files or if make should have generated html files that I worked with. When I didn't find html files, I worried that something was wrong with the Makefile.

The links in the README and the files in the paper refer to the raw versions of the files in exercises/ rather than exercises_sequenced/. I think it would be better, from the learner's perspective to refer to things in exercises_sequenced/ since that's the directory they are supposed to be working in.

Perhaps the repository could include a Rproj file so that learners can open Rstudio in the correct working directory?

Improve scaffolding for c07

Connection between probability and expectation is not very clear; could scaffold better. Consider adding notes to e-stat02

Sequence

Sequence.md

It's nice that you have the sequence, but it would be better for learners if the list of files were linked. (This comment should be ignored if you choose to create a bookdown)

Pedagogical Design

From the paper,

"The full set of desired learning outcomes is documented in the project repository"

Where are the learning outcomes in the repository? Consider a direct link into the paper?
Add a citation for flipped classroom
- Perhaps this one? https://peer.asee.org/the-flipped-classroom-a-survey-of-the-research; any one will do.

Add documentation on use cases

Should document how to use this repo for different use cases; for instance, when to use the template (setting up with GitHub action infrastructure) and when to fork (to make a PR).

Missing solution for e-stat03-descriptive

Currently, there are two task blocks under the Observations section - guessing the second should be a solution block?

data-science-curriculum/exercises/e-stat03-descriptive-master.Rmd

Lines 405 to 412 in e47f4a3

    
           <!-- task-begin --> 
        
           - For what values of `slope` is the correlation positive? 
        
           - For what values of `slope` is the correlation negative? 
        
           <!-- task-end --> 
        
           <!-- task-begin --> 
        
           - For what values of `slope` is the correlation positive? 
        
           - For what values of `slope` is the correlation negative? 
        
           <!-- task-end -->

Statement of Need

The statement of need could be more clear in the README and in the paper. There are active learning materials out there - e.g., The Carpentries, Codeacademy. What is the particular niche that your materials are filling?

I'm not sure that the "Notably, if teachers held ..." sentence is constructive. Surely the published books, blog posts, and videos have instructional value even if they aren't examples of active learning materials?

Dead link in setup page

Hi! Running along at home with a friend, found a dead link. On this page: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.md

The "download the source" link is dead for me: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.Rmd

Let me know if you don't want issues raised, figured it could be helpful. No pressure to address them, for example this can be worked around easily with cloning.

Cut new release after feedback is implemented?

Perhaps when all the feedback is implemented, it might be worth creating a new release ?

There have been some commits since Oct 21, 2020, might be worth cutting a new release

Day 29 Deprecated Syntax

Day 29 (data 09) q4-setup-count has deprecated code:

rowAny <- function(x) rowSums(x) > 0
countna <- function(df, vars_lagged) {
  df %>%
    filter(rowAny(across(vars_lagged, is.na))) %>%
    dim %>%
    .[[1]]
}

countna(df_q3, c("region"))

This works now, but updated syntax would probably be ideal

Math not rendering

I'm on the ubuntu RStudio, which may make a difference. Noticed this first in e-stat02-probability, although that may not be where it first pops up. When knitting the R documents to view the pretty version, math looks like this:

It does render in the edit view, oddly enough:

But that doesn't apply for values in the middle of a paragraph, so those can be hard to match up with the equations.

Transition away from Travis CI to GitHub Actions

Given that Travis CI is going extremely commercial, transition CI tools to GitHub Actions.

Installation instructions

I would remove the $ prompt from before make in the README.md. A learner may make the mistake of entering "$ make" after the prompt. Actually, I wonder if I'm supposed to run make main to build the materials and then make challenges to build the challenge problems. The latter gave me an error...

$ make challenges
cd ../data-science-challenges/challenges; make
/bin/sh: line 0: cd: ../data-science-challenges/challenges: No such file or directory
cd exercises; make
make[2]: Nothing to be done for `all'.
./prepend.py
cp -rf exercises/images exercises_sequenced/.
cp -rf exercises/data/tiny.csv exercises_sequenced/data/.
cp -f ../data-science-challenges/challenges/*-assignment.Rmd challenges/.
cp: ../data-science-challenges/challenges/*-assignment.Rmd: No such file or directory

Is requiring learners to build the sequenced exercises and challenges even necessary? I can't imagine the files are that big that you couldn't include them in the repository. Requiring the learners to run make assumes a level of sophistication that many might not be capable of and could limit Windows users who do not have the linux subsystem installed and possibly Mac users that don't have xcode command line tools installed. It also requires learners to have python installed. Instructions are provided for these steps, but I feel that it's unnecessary for someone who is trying to get started with RStudio.

	<!-- task-begin -->
	- For what values of `slope` is the correlation positive?
	- For what values of `slope` is the correlation negative?
	<!-- task-end -->
	<!-- task-begin -->
	- For what values of `slope` is the correlation positive?
	- For what values of `slope` is the correlation negative?
	<!-- task-end -->

zdelrosario / data-science-curriculum Goto Github PK

data-science-curriculum's People

Stargazers

Watchers

Forkers

data-science-curriculum's Issues

Overall

Sequence.md

Recommend Projects

Recommend Topics

Recommend Org