Comments (53)
[Neale][Cholera - read non-DHIS data] In the commented instructions line 245, update description of data dictionary column names from "Code" to "option_code" and "Name" to "option_name"
from sitrep.
[Maria] [sitrep - read data] Let's consider putting the code for generating the dummy dataset before the code for reading in your own dataset, so that if the user forgets to comment/delete the gen_data line the script will still use the data they import.
Yeah agree - maybe we should put it in a seperate chunk alltogether
from sitrep.
[Kate] [Sitrep - load packages] Put commented lines with commands for installing (and loading) optional packages like askpass, excel.link, write.xl, etc.
from sitrep.
[Kate] Renaming variables without any order is exquisitely painful
maybe worth reintroducing that function which chucks all the variable names out from data dictionary... which allows you to paste back in to the script ... (alex explanation but you know what i mean)
kate : "i mean trying to determine what the DHIS2 name is versus what i have
especially because eg age_days is next to age_years but then age_months is like 4 lines down"
from sitrep.
[kate] [general] people dont seem to be reading comments and are then confused about whats happening/where they are.
Hopefully training material and the wiki (outlining the structure and content of scripts will help...)
[Annick] [general] I think we should have start lines with stuff like:
###INSTRUCTIONS
###CODE OPTION
(i.e. label different things more clearly!)
from sitrep.
[kate] [data cleaning] doesnt like that you are filtering and then not assigning to a new new data set, i.e. have linelist_raw, linelist_clean and then when filtering assign to e.g. linelist_analysis.
All came about because of mistake when passing arguments to filter command which left 0 cases in dataset. New people would justifiably freak out!
Maybe another thing for training??
from sitrep.
[Isidro] [installing] Add a note to restart R before installing new packages
from sitrep.
[Kate] [cholera CFR] - add an option to turn off CIs in cfr calculation - for eg if it is a closed population and are certain that all deaths are being captured (i.e. inpatients). If making an assumption that those deaths are representative of community-wide deaths then need to have CIs.
from sitrep.
[Kate][Sitrep - xl.read.file] There is an issue with protected sheets:
linelist_raw <- xl.read.file("hack_data/bituvis-cholera.xlsx",
xl.sheet = "xxxxx",
password = askpass::askpass("xxxxx"))
Error in top_left_corner["CurrentRegion"] : You cannot use this command on a protected sheet. To use this command, you must first unprotect the sheet (Review tab, Changes group, Unprotect Sheet button). You may be prompted for a password. (Microsoft Excel) 80020009
Also note, the askpass field is not particularly intuive.
Also need to specify that these packages need to be installed in order to be used.
from sitrep.
[Kate] [Training] walk people through commenting / uncommenting lines
from sitrep.
[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"
from sitrep.
[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"
This is solved by updating epitrix and including protect = "#"
from sitrep.
[Neale] [Sitrep - read data] Pasting filepaths to rio::import might give an error which required the user to change the slashes from \ to / . (this happened to me)
linelist_raw <- rio::import("C:\Users\Neale\OneDrive - Neale Batra\Documents\Jobs\MSF\Moz_Cholera_LineList_DeID_forTesting.xlsx", which = "Sheet1")
Error: '\U' used without hex digits in character string starting ""C:\U"
edit: @nsbatra, I think this issue may be solved if you use single quotes instead of double quotes? Would you mind checking? -Zhian
Alex: no confirm doesnt work we should probably add comment to templates
from sitrep.
[Kate][cholera cleaning] There does not appear to be an ID field in the Cholera dictionary
from sitrep.
General: add instructions for setting up RStudio project to link data and project
from sitrep.
[Maria][ajs_outbreak save_cleaned_data]: writexl needs to be installed first
from sitrep.
[Elburg][AJS 313] this is missing a parens: # linelist_cleaned <- select(linelist_cleaned, c(1:3, "age_years", "sex")
IMO, this should be changed to highlight the fact that they can use this method to create temporary data sets.
from sitrep.
[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y
Add origin to as.dates and say that for excel windows that it starts as 1899-12-30
from sitrep.
[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y
Note: this needed to be in %d-%b-%y
from sitrep.
[Kate][Cholera]: in cholera for obs_days, should it use date_of_admission instead of date_of_consultation_admission?
Alex: no date_of_consultation_admission is correct. We actually should delete date_of_admission from the dictionary.
from sitrep.
[Annick] [training] Note for teaching: the population data setcion isnt really clear - we need to make something that people understand and also an instruction on how to format the population data in the best way.
from sitrep.
[Elburg] [variable_naming] for non DHIS2 data, need to clarify how much needs to fit (variables!) in order for it to run.
"I think I missed somehting, do I need to rename all the variables so that they match the dictionary?"
from sitrep.
[kate] [clean variables - factors] case_when creates a character variable, make sure to say as much in the script comments. Currently says will recode to a factor. Then again we need to re-do all the factor stuff anyway.
from sitrep.
[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.
from sitrep.
[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.
This is included in the FAQ: https://github.com/R4EPI/sitrep/wiki/4)-FAQ#i-have-a-date-column-and-a-time-column-how-do-i-combine-the-two
Alex: worth adding to training material
from sitrep.
[Kate] [cholera - dict] msf dict category options show up wrong e.g. sex shows options for pregnancy trimester
ZNK: this turned out to be irreproducible
Alex: i definitely had the same thing at the time when i checked - but it seems to have resolved itself now.
from sitrep.
[Kate] Move reporting week to preamble
[Annick]: use knitr params
from sitrep.
[Kate]: Not clear that sex/gender needs to be a factor
from sitrep.
[Alanah] [recode_factors] "" = NA_character_ when actually the missings where " ". Error returns zero length variable... not super intuitive to understand whats wrong...
from sitrep.
[Elburg] [general]: One of my parting thoughts is idd when you give examples to keep them in line with the rest of the script. I followed the renaming example of sex -> gender and now have to change sex to gender every where in the scripts
from sitrep.
[Kate][General]: rio::import()
converts files to UTF-8, but readr::read_csv()
does not, which causes problems for cleaning functions that expect utf8
from sitrep.
[Ettiene] [Install] Admin rights issue, cant install package - also cant change .rprofile to change where packages are written to.
This had the error from sf that it couldn't find the object group_map (but in French, so ¯\_(ツ)_/¯)
Eventually resolved itself by installing each of packages again.
from sitrep.
[Anna] [renaming variables] easier way to list out all the variable names in datasets rather than clicking back and forth.
from sitrep.
[Ettiene] [reading_data] importing Excel data from a OCG linelist, the data do not iatart n the first row first column, there are in a specific range,
Add a walkthrough of how to read in specific cell range.
linelist_raw <- rio::import(chemin, which = "Data", range="E12:AD3941")
from sitrep.
[Pat] [reading_dhis_excel_data] wall of text dense. Considering breaking up in to smaller chunks.
consider also not having code commented out for the alternative options because its hard to find it inbetween actual commented text.
Instead consider breaking in to small chunks and have alternative options in seperate chunks where can turn eval = TRUE / FALSE - and provide instructions on that
from sitrep.
[Anna] [renaming variables] measles reading non-dhis2 data, the renmaing examples dont match the dictionary so is confusing.
from sitrep.
[Kate/Isidro] [population data] considering adding the option to also type in population counts (not props) e.g. for village wouldnt necessarily have proportions. But still useful to have proportions for age group breakdowns. (and to show how to read in counts from an excel)
Also for age group break downs fix for all the templates! (alex fucked up all the proportion counting)
All ages 100.00% 0 - 4 y 15.89% 5 - 14 y 26.78% 15 - 29 y 27.72% 30 - 44 y 16.28% >= 45 y 13.33%
Total U5 15.89% 0 - 11 m 3.29% 12 - 23 m 3.29% 24 - 35 m 3.10% 36 - 47 m 3.10% 48 - 59 m 3.10%
see annick email titled population denominator tool. (double check neale also gets this)
Also for all disease templates make more realistic age group examples - e.g. for measles vaccination is up to 59 months, then 10 years and 14 years. [DISCUSS WITH ANNICK!]
from sitrep.
[Kate] fmt_counts in cholera treatment plans if there are zero counts in there then returns character(0) - unsure wether fmt_count will fix this automatically and output 0 (0%) in the word doc.
May need to add to fmt_count that returns 0 if as.character comes back.
from sitrep.
[kate] [cholera obs_days] Cholera -- Median, min, and max days_obs need an na.rm = T added
from sitrep.
[Pat] [fixing dates] set unrealistic dates to NA, based on having browsed dates in the previous chunk.
Need to set this ~NA
to as.Date(NA)
## set unrealistic dates to NA, based on having browsed dates in the previous chunk
linelist_cleaned <- mutate(linelist_cleaned,
date_of_onset < as.Date("2017-11-01") ~ NA,
date_of_onset == as.Date("2081-01-01") ~ as.Date("2018-01-01"))
otherwise get
Error: Column date_of_onset < as.Date("2017-11-01") ~ NA
is of unsupported type quoted call
from sitrep.
[kate] [cholera Attacke rate] overall and by age group - different multiplier. Overall is by 10,000 and age group by 100,000.
Change all to be 10,000 - double check in other templates to see if same.
from sitrep.
[Elburg][symptoms]. There was an issue with symptoms in which Elburg had data that had various calls for yes and no. This needed to be cleaned. I've templated a small example
library("sitrep")
library("dplyr")
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dat <- data.frame(
tidy_symptoms = sample(c("Yes", "No", "Unknown"), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
num_symptoms = sample(c(0, 1, NA), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
mixed_symptoms = sample(c("Yes", "No", "Unknown", 1, 0), 100, replace = TRUE, prob = c(0.3, 0.6, 0.05, 0.025, 0.025))
)
NAMES <- colnames(dat)
sitrep::multi_descriptive(dat, NAMES)
#> converting numeric variable to factor
#> # A tibble: 3 x 25
#> symptom No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidy_s… 50 50 100 100 11 11 39
#> 2 num_sy… NA NA 100 100 NA NA NA
#> 3 mixed_… 57 57.0 100 100 6 6 35
#> # … with 17 more variables: Yes_prop <dbl>, `(0,0.2]_n` <dbl>,
#> # `(0,0.2]_prop` <dbl>, `(0.2,0.4]_n` <dbl>, `(0.2,0.4]_prop` <dbl>,
#> # `(0.4,0.6]_n` <dbl>, `(0.4,0.6]_prop` <dbl>, `(0.6,0.8]_n` <dbl>,
#> # `(0.6,0.8]_prop` <dbl>, `(0.8,1]_n` <dbl>, `(0.8,1]_prop` <dbl>,
#> # Missing_n <dbl>, Missing_prop <dbl>, `0_n` <dbl>, `0_prop` <dbl>,
#> # `1_n` <dbl>, `1_prop` <dbl>
dat2 <- dat %>%
mutate_at(vars(NAMES), ~case_when(
. == "Yes" ~ "Yes",
. == "y" ~ "Yes",
. == "Y" ~ "Yes",
. == "No" ~ "No",
. == "N" ~ "No",
. == "n" ~ "No",
. == 1 ~ "Yes",
. == 0 ~ "No",
TRUE ~ "Unknown"
))
sitrep::multi_descriptive(dat2, NAMES)
#> # A tibble: 3 x 9
#> symptom No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 tidy_s… 50 50 100 100 11 11 39
#> 2 num_sy… 22 22 100 100 6 6 72
#> 3 mixed_… 58 58.0 100 100 6 6 36
#> # … with 1 more variable: Yes_prop <dbl>
Created on 2019-07-05 by the reprex package (v0.3.0)
from sitrep.
[kate] [cholera mortality rate] - with zero deaths then the table comes out with deaths - and population - and CIs of NA-NA .... considering adding to function to have 0 come out rather than - and NA
DeathsPopulationMortality (per 10,000)95%CI
- - 10000 (NA–NA)
SAME for the mortality_rate_region section in cholera
mortality_rate(deaths$deaths, deaths$population, multiplier = 10000) %>%
# add the region column to table
bind_cols(select(deaths, zone_sante), .) %>%
merge_ci_df(e = 4) %>% # merge the lower and upper CI into one column
rename("Region" = zone_sante,
"Deaths" = deaths,
"Population" = population,
"Mortality (per 10,000)" = `mortality per 10 000`,
"95%CI" = ci) %>%
kable(digits = 2)
from sitrep.
[Elburg] [AJS lab tests] did not have all the lab variables listed in LABS - add an explanation to comment out the ones dont have. Also if comment out the last one in the list then ahve an extra comma which throws an error and need to add a NULL (or just delete the comma).
Alternatively Zhian consider adding to multi_descriptive function to drop non existant variables and return a warning message that those variables have been ignored.
from sitrep.
[Elburg][AJS lab tests] Same as #136 (comment), she didn't have the expected variables and so select and rename were failing.
from sitrep.
[Kate] [loading packages] here::here package is not loaded ... but used in the spatial data section.
Also need to add explanation of how here works.
from sitrep.
[Anna] [standardise_clean filtering] a lot of datasets will fill case_id down automatically even if nothing else is filled in yet. Show example code at the filter bit to show how to drop
from sitrep.
[Kate] [reading shapefiles] actually read_sf does need to be told .shp at the end! i thought it recognises automatically from name
from sitrep.
[kate/Annick] [maps] change colour palletes - dark for highest AR
from sitrep.
[Elburg] [epicurves] show an example of how to change the x axis labels in epicurves. Because for example with loads of data is then uuuugly.
from sitrep.
from kate:
A weird thing -- I originally had a line list with data from 2016--now. I filtered to only include 2019. So when I am running attack rate by week, there is a piece of code that keeps all the factor levels, which mucks up my AR by week.
# counts and cumulative counts by week
cases <- linelist_cleaned %>%
arrange(date_of_admission) %>% # arrange by date of onset
count(epiweek, .drop = FALSE) %>% # count all epiweeks and include zero counts
mutate(cumulative = cumsum(n)) # add a cumulative sum
Z: I guess you need to add a new filter statement in there for epiweek
K: Fair
K: Similar issue with describe_admissions_by_epiweek and describe_exits_by_epiweek -- not sure how to filter there
Z: I forget what disease template you're on?
K: Cholera. But it's basically that i have unused factor levels in epiweek -- any easy way to drop them?
Z: Convert it to character. We added that in there so that any missing weeks would be highlighted.
K: Can I do that in the initial creation, or needs to be in that statement only?
Z: You can do it in the initial creation if you'd like, so you can select factor = FALSE in the epiweek creation. If you want the ability to count missing weeks, then you can use factor_aweek() after you filter your data and it will create levels that span the range of your data.
K: follow-up, changed it to this so i could keep unused levels but only in 2019
descriptive(linelist_cleaned, "epiweek", coltotals = TRUE, rowtotals = TRUE) %>%
filter(grepl("2019", epiweek)) %>%
augment_redundant("_n$" = " (n)") %>% # modify _n to (n)
rename_redundant("prop" = "%") %>% # rename proportions to %
kable(digits = 2)
Z:is epiweek a character in your linelist or a factor? Oh, duh, you can just drop those levels with forcats::fct_drop(): https://forcats.tidyverse.org/reference/fct_drop.html
But yeah, it shouldn't matter too much. I think what you have above is good.
from sitrep.
Making a flow chart to guide users through the templates would be helpful
from sitrep.
closing as have the cleaned hackathon to-do-list in place
from sitrep.
Related Issues (20)
- fix survey gtsummary::inline_text
- add msf_dict_rename_helper() to surveys HOT 1
- review zcurve() function
- add a decision tree overview of rmd structure
- discuss {rnssp-rmd-templates}
- fix {sitrep} re-exports HOT 1
- shift templates to quarto HOT 1
- move to r-universe
- update google analytics tags
- finalising {gtsummary} wrappers
- put geom_event() function in {epikit} HOT 1
- find_start_date defaults
- languages {pkgdown} website
- r-opensci package peer review
- add tests for data imports HOT 1
- tidyverse updates
- sitrep logo
- final steps
- separate re-coding script
- expanding templates
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sitrep.