zdelrosario / data-science-curriculum Goto Github PK
View Code? Open in Web Editor NEWHome Page: https://zdelrosario.github.io/data-science-curriculum/index.html
License: Creative Commons Attribution Share Alike 4.0 International
Home Page: https://zdelrosario.github.io/data-science-curriculum/index.html
License: Creative Commons Attribution Share Alike 4.0 International
Here are my overall comments with some suggestions. I've put it all in one issue, because some are overlapping. Perhaps you could convert the ones you're creating into issues using the new task list feature if that's better for you!
I really appreciate that all your material is open under an MIT license! Thank you for doing this!
Though I did not go through every exercise, the quality of your material is quite high overall.
Having things in a GitHub repo is nice, it might make your content more accessible and discoverable if it was set up as a Bookdown (https://bookdown.org/yihui/bookdown/) or Jupyter Book (https://jupyterbook.org/intro.html)
I really think this work would benefit from a better description of what it is you're offering to the community. Frankly, paper.md
sells the content in your repository well, well short!
This is related to the "Does the paper describe the learning materials and sequence?" point in the checklist
"Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?" --> this is also missing from the paper
"Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?" --> Also missing from the paper
You could provide a schematic or a sketch of the key "modules" (communication, data, model, reproducibility, setup, statistics, visualizations) and a suggested ordering of going through them. You could also bring in some pedagogical reasons (mentioned in your repo) for your suggestions such as blocked practice.
If you have the list of learning outcomes already prepared and ready, could you consolidate them into a single file so instructors could get a sense of what your material is all about without opening every file?
We don't presently have a test (i.e. under ./tests/test_exercises.R) to check for duplicate chunk names. Presently I do this locally by (programmatically) running all exercises with NCmisc::list.functions.in.file()
(see line), but that's a heavyweight solution.
Adding a test for duplicate chunk names (and 'runs without errors') would be useful enhancement!
Hi @zdelrosario!
First of all, thanks for this course! It's really cool and I'm excited to really dive into it. That said, I'm hitting a snag that I'm hoping you can help with.
I can't get the makefile to run because it says the command make is not found (I tried this both in Git Bash and the RStudio terminal). I've been working through some of the exercises unmade but I'd rather have them like they're supposed to be!
I'm running on Windows, and I'm attaching screenshots of the errors so you can see them.
For the content, the JOSE review checklist indicates that a Creative Commons license should be used. MIT is more appropriate for code
I think learners could use more support to get going. I wasn't sure whether I'm supposed to be working within the Rmd files or if make should have generated html files that I worked with. When I didn't find html files, I worried that something was wrong with the Makefile.
The links in the README and the files in the paper refer to the raw versions of the files in exercises/
rather than exercises_sequenced/
. I think it would be better, from the learner's perspective to refer to things in exercises_sequenced/
since that's the directory they are supposed to be working in.
Perhaps the repository could include a Rproj file so that learners can open Rstudio in the correct working directory?
Connection between probability and expectation is not very clear; could scaffold better. Consider adding notes to e-stat02
From the paper,
"The full set of desired learning outcomes is documented in the project repository"
Where are the learning outcomes in the repository? Consider a direct link into the paper?
Add a citation for flipped classroom
Should document how to use this repo for different use cases; for instance, when to use the template (setting up with GitHub action infrastructure) and when to fork (to make a PR).
Currently, there are two task
blocks under the Observations
section - guessing the second should be a solution
block?
data-science-curriculum/exercises/e-stat03-descriptive-master.Rmd
Lines 405 to 412 in e47f4a3
The statement of need could be more clear in the README and in the paper. There are active learning materials out there - e.g., The Carpentries, Codeacademy. What is the particular niche that your materials are filling?
I'm not sure that the "Notably, if teachers held ..." sentence is constructive. Surely the published books, blog posts, and videos have instructional value even if they aren't examples of active learning materials?
Hi! Running along at home with a friend, found a dead link. On this page: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.md
The "download the source" link is dead for me: https://github.com/zdelrosario/data-science-curriculum/blob/master/exercises/e-setup00-install.Rmd
Let me know if you don't want issues raised, figured it could be helpful. No pressure to address them, for example this can be worked around easily with cloning.
Perhaps when all the feedback is implemented, it might be worth creating a new release ?
Day 29 (data 09) q4-setup-count has deprecated code:
rowAny <- function(x) rowSums(x) > 0
countna <- function(df, vars_lagged) {
df %>%
filter(rowAny(across(vars_lagged, is.na))) %>%
dim %>%
.[[1]]
}
countna(df_q3, c("region"))
This works now, but updated syntax would probably be ideal
I'm on the ubuntu RStudio, which may make a difference. Noticed this first in e-stat02-probability, although that may not be where it first pops up. When knitting the R documents to view the pretty version, math looks like this:
It does render in the edit view, oddly enough:
But that doesn't apply for values in the middle of a paragraph, so those can be hard to match up with the equations.
Given that Travis CI is going extremely commercial, transition CI tools to GitHub Actions.
I would remove the $
prompt from before make
in the README.md
. A learner may make the mistake of entering "$ make" after the prompt. Actually, I wonder if I'm supposed to run make main
to build the materials and then make challenges
to build the challenge problems. The latter gave me an error...
$ make challenges
cd ../data-science-challenges/challenges; make
/bin/sh: line 0: cd: ../data-science-challenges/challenges: No such file or directory
cd exercises; make
make[2]: Nothing to be done for `all'.
./prepend.py
cp -rf exercises/images exercises_sequenced/.
cp -rf exercises/data/tiny.csv exercises_sequenced/data/.
cp -f ../data-science-challenges/challenges/*-assignment.Rmd challenges/.
cp: ../data-science-challenges/challenges/*-assignment.Rmd: No such file or directory
Is requiring learners to build the sequenced exercises and challenges even necessary? I can't imagine the files are that big that you couldn't include them in the repository. Requiring the learners to run make
assumes a level of sophistication that many might not be capable of and could limit Windows users who do not have the linux subsystem installed and possibly Mac users that don't have xcode command line tools installed. It also requires learners to have python installed. Instructions are provided for these steps, but I feel that it's unnecessary for someone who is trying to get started with RStudio.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.