datacarpentry / r-socialsci Goto Github PK
View Code? Open in Web Editor NEWR for Social Scientists
Home Page: https://datacarpentry.org/r-socialsci/
License: Other
R for Social Scientists
Home Page: https://datacarpentry.org/r-socialsci/
License: Other
similarly to the R ecology lesson, this lesson should have a code out with the skeleton of code chunks, links to data files, etc.
In the starting with data episode, one of the exercises has the learners get the middle row by using n_row / 2
. There are 131 rows in the dataframe, so the result is 65.5. Using this in subsetting, however, gives the 65th row for most learners. Some get an error.
The prerequisites episode leads to a 404 page.
R Markdown is not rendering correctly somewhere around line 153:
"```{r, eval = FALSE} SN7577_tab[, -1] # The whole data.frame, except the first column SN7577_tab[-c(7:34786), ] # Equivalent to head(surveys)"
http://www.datacarpentry.org/r-socialsci/02-Reading-text-files/
Code chunks should have the tag
.language-r
as described here. This will make the code show up with standard R color coding and other standard aesthetics.
See also: datacarpentry/python-socialsci#12
So myself and @langtonhugh have just completed our data carpentry training instructor 2-day workshop, and in looking to contribute to something for our checkout tasks, we came across this re-writing of the R material for social sciences.
We thought that the best use of our skills would be to contribute to this by helping proof-read, and amend the sessions with social-sciences related examples. We just wanted to ask if there is any direction on where to focus, or if you had any particular tasks/ requirements in mind? I had a look at the "reading in data" section, and have some ideas in mind, and have spotted some typos, so happy to make some changes there and then submit a pull request, but also open to steer to focus on other elements if that would be more helpful? Let us know!
Thanks for contributing! If this contribution is for instructor training, please send an email to [email protected] with a link to this contribution so we can record your progress. You’ve completed your contribution step for instructor checkout just by submitting this contribution.
Please keep in mind that lesson maintainers are volunteers and it may be some time before they can respond to your contribution. Although not all contributions can be incorporated into the lesson materials, we appreciate your time and effort to improve the curriculum. If you have any questions about the lesson maintenance process or would like to volunteer your time as a contribution reviewer, please contact Kate Hertweck ([email protected]).
In episode 02, there's supposed to be a link to a description of the example dataset:
Full details of the SN7577 dataset are available [here]
but the URL is missing. I'm not sure what this is supposed to be, so I'm leaving it blank for my current batch of reformatting edits.
It's possible that this information should also be added to the https://github.com/datacarpentry/python-socialsci lesson, too.
This issue is meant to address the some of the basic items in the Lesson Release Checklist for this repository.
Some of the exercises (e.g. in this episode) are not formatted as exercises. Exercises and their solutions should be marked with specific syntax detailed here.
In the Barplot section of the ggplot lesson, the narrative text says that we want to create a dataframe with "the percent of each house type in each village." However, the code calculates percentages across the entire dataset, not by village. Note that the totals for each village are clearly well below 100%.
The code should be
percent_wall_type <- interviews_plotting %>%
filter(respondent_wall_type != "cement") %>%
count(village, respondent_wall_type) %>%
group_by(village) %>%
mutate(percent = n / sum(n))
ungroup()
Episode 2 has a good exercise for fixing formatting issues in a badly formatted spreadsheet, however this may lead the learner to infer that manually formatting a spreadsheet is good practice (despite the fact that it's not reproducible). It would be good to note here the following (with a little exposition of each):
Episode 2 has this text:
The problems that we can see are as follows:
White space to the top and to the left of the data.
There are two header line types with different data items in each
One of the header line has two separate data items
but it doesn't explain why these are problematic. It would be good to have more exposition about why these are problems (ie. what will happen to the user downstream if they have these formatting issues in their data sheets).
the date/time section is quite extensive, and some details could be simplified. Specific learning objectives for working with date/time data should be articulated so this section can be made more precise.
This always leads to a long conversation because learners have heard of one or the other of these and want to talk about the differences. Think about how this can be incorporated into the curriculum as an example of reading helpfiles and understanding default options.
See #23 for an example
all the episodes need to be transferred to the _episodes_rmd
folder, and renamed to end in a Rmd extension.
the same dataset should be used across lessons and episodes in Data Carpentry workshops.
Is it necessary to cover how to read SPSS file in this lesson? What is the motivation behind it?
Addresses these issues associated with lesson release checklist
For each input chunk of code, show the output that will be produced. This will help the learners and the instructors to know that they are getting the expected output.
Output code should be marked with .output:
as described here.
See also datacarpentry/python-socialsci#11
In the starting with data episode, one of the exercises has the learners get the middle row by using n_row / 2
. There are 131 rows in the dataframe, so the result is 65.5. Using this in subsetting, however, gives the 65th row for most learners. Some get an error.
In the 02-starting-with-data lesson the language is about data frames, but the output window and examples is displayed in tibbles. This seems a little confusing. Would it be clearer to emphasize the tibble portion a little earlier?
[This is submitted as part of the instructor training closeout.]
from "reading text files" to "reading CSV files"
reading text files would suggest we are reading full-text data.
The !
operator is mentioned on line 294, but it may be useful to introduce it earlier, and in a full section rather than a code comment. I propose adding it to Episode 01 R Basics around line 416, where other operators like <, >, ==, and != are introduced.
http://www.datacarpentry.org/r-socialsci/03-Introducing-dplyr-and-tidyr/
http://www.datacarpentry.org/r-socialsci/01-R-basics/
First plot (unnamed-chunk-3) is blank.
http://www.datacarpentry.org/r-socialsci/04-Data-visualisation-with-ggplot2/
For each lesson release, copy this checklist to an issue and check off
during preparation for release
Scheduled Freeze Date: 2018-04-27
Scheduled Release Date: 2018-04-30
Checklist of tasks to complete before release:
See #23 for example
There are a few places that could use a better description of the code being run:
line 79
results <- dbSendQuery(mydb, "SELECT * FROM Question1")
lines 156-165
dbfile_new = "a_newdb.sqlite"
mydb_new = dbConnect(dbDriver("SQLite"), dbfile_new)
dbWriteTable(conn = mydb_new , name = "SN7577", value = "SN7577.csv",
row.names = FALSE, header = TRUE)
dbWriteTable(conn = mydb_new , name = "Q1", value = Q1,
row.names = FALSE)
dbListTables(mydb_new)
Add details re: line 104 - why should the connection be closed? Does it need to be reopened to run code in following chunks?
"Once you have retrieved the data you should close the connection."
Clarify lines 177 and 202 - unsure what they are trying to say
"as is the mthod for running queries. However using the 'tbl' functionwe still need to provide avalid SQL string."
"Notice that on the nrow
command we get NA rather than a count of rows. Thisis because dplyr
doesn't hold the full table even after the 'Select * ...' "
The lesson infrastructure committee unanimously approved the proposal of using the same set of labels across all our repositories during its last meeting on May 23rd, 2018.
This repository has now been converted to use the standard set of labels.
If this repository used the previous set of recommended labels by Software Carpentry, they have been converted to the new one using the following rules:
SWC legacy labels | New 'The Carpentries' labels |
---|---|
bug | type:bug |
discussion | type:discussion |
enhancement | type:enhancement |
help-wanted | help wanted |
newcomer-friendly | good first issue |
template-and-tools | type:template and tools |
work-in-progress | status:in progress |
The label instructor-training
was removed as it is not used in the workflow of certifying new instructors anymore. The label question
was left as is when it was in use, and removed otherwise. If your repository used custom labels (and issues were flagged with these labels), they were left as is.
The lesson infrastructure committee hopes the standard set of labels will make it easier for you to manage the issues you receive on the repositories you manage.
The lesson infrastructure committee will evaluate how the labels are being used in the next few months and we will solicit your feedback at this stage. In the meantime, if you have any questions or concerns, please leave a comment on this issue.
-- The Lesson Infrastructure subcommittee
PS: we will close this issue in 30 days if there is no activity.
See #23 for example
surveys_complete data is missing
http://www.datacarpentry.org/r-socialsci/04-Data-visualisation-with-ggplot2/
posessions
should be possessions
possessions %in% c("car", "bicycle", "motorcycle", "truck", "boat")
vs c("car", "bicycle", "motorcycle", "truck", "boat") %in% possessions
In the "Extracting subsets from vectors" section of 01-R-basics there is an error message because of a mis-spelled variable name. This PR fixes:
#43
The first exercise in episode 02 asks the student to load "data/SN7577_spss.sav"
using the RStudio import wizard.
Would it be worthwhile to show how to load such datasets without using RStudio's import wizard? For example foreign::read.spss
or haven::read_spss
. The latter is included as part of the tidyverse
.
Code chunks like this
~~~
library(ggplot2)
library(readr)
SAFI_results <- read_csv("SAFI_results.csv")
~~~
need to be converted to:
```{r}
library(ggplot2)
library(readr)
SAFI_results <- read_csv("SAFI_results.csv")
```
The headers in episode 3 seems inconsistent. Pipes are a double header (## Pipes), while the mutate function is triple (### Mutate), and the summarize function is quadruple (#### The summarize()
function). Is there a style guide for when to use which headers?
data/
folder to follow good practices of working directory organization mentioned in first episodeCurrently, many datasets are shown in their entirety and it's distracting, we need to use head()
, or adjust options()
to limit the size of the outputs.
The discussion of data types and data structures in "Vectors and data types" could be clarified. Perhaps even defining these terms before using them would help. Also note that the first sentence of the section reads "A vector is the most common and basic data type in R, and is pretty much the workhorse of R." perhaps this should be changed to "basic data structure"
in the solution section for the renaming factors exercise, we're missing the solution for part 1 (where you rename the factors, before plotting them in a specific order). The method is explained in the lesson (memb_assoc[is.na(memb_assoc)] <- "undetermined"), but it would be helpful to have it in the solutions to for helpers who haven't been following along with every part of the lesson.
Page is blank, please add content!
The Spanish Mac keyboard does not have a |
key. This character can be created using:
alt
+ 1
Add to instructor notes?
One of the exercises in episode 02 refers to loading the data from an SPSS output file.
Use the import dataset wizard to import the SN7577_spss.sav dataset.
I didn't find this file in the data
subfolder or in the python version of the lesson. Does this need to be added to the repo?
See below.
Something like this: http://www.datacarpentry.org/R-ecology-lesson/00-before-we-start.html#getting_set_up
Here is the instance that might warrant clarification
" You can change it from the menu items for the tab, or more likely it will change when you create a project with its own folder as we will be doing later. " I think the first it refers to the current working directory, but may not be crystal clear for new users. Also, menu items for the tab is a little unclear. In the Sessions menu there is a Set Working Directory option, but I am not sure if this is what is being referenced. I think this is an important point (the whole working directory idea) for new users.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.