Morning <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Hackathon Issues,about r4epi/sitrep

nsbatra commented on June 3, 2024 3

[Neale][Cholera - read non-DHIS data] In the commented instructions line 245, update description of data dictionary column names from "Code" to "option_code" and "Name" to "option_name"

from sitrep.

nsbatra commented on June 3, 2024 3

[Maria] [sitrep - read data] Let's consider putting the code for generating the dummy dataset before the code for reading in your own dataset, so that if the user forgets to comment/delete the gen_data line the script will still use the data they import.

Yeah agree - maybe we should put it in a seperate chunk alltogether

from sitrep.

nsbatra commented on June 3, 2024 2

[Kate] [Sitrep - load packages] Put commented lines with commands for installing (and loading) optional packages like askpass, excel.link, write.xl, etc.

from sitrep.

zkamvar commented on June 3, 2024 2

[Kate] Renaming variables without any order is exquisitely painful

maybe worth reintroducing that function which chucks all the variable names out from data dictionary... which allows you to paste back in to the script ... (alex explanation but you know what i mean)

kate : "i mean trying to determine what the DHIS2 name is versus what i have
especially because eg age_days is next to age_years but then age_months is like 4 lines down"

from sitrep.

aspina7 commented on June 3, 2024 2

[kate] [general] people dont seem to be reading comments and are then confused about whats happening/where they are.
Hopefully training material and the wiki (outlining the structure and content of scripts will help...)

[Annick] [general] I think we should have start lines with stuff like:
###INSTRUCTIONS
###CODE OPTION

(i.e. label different things more clearly!)

from sitrep.

aspina7 commented on June 3, 2024 2

[kate] [data cleaning] doesnt like that you are filtering and then not assigning to a new new data set, i.e. have linelist_raw, linelist_clean and then when filtering assign to e.g. linelist_analysis.
All came about because of mistake when passing arguments to filter command which left 0 cases in dataset. New people would justifiably freak out!
Maybe another thing for training??

from sitrep.

aspina7 commented on June 3, 2024 2

[Isidro] [installing] Add a note to restart R before installing new packages

from sitrep.

aspina7 commented on June 3, 2024 2

[Kate] [cholera CFR] - add an option to turn off CIs in cfr calculation - for eg if it is a closed population and are certain that all deaths are being captured (i.e. inpatients). If making an assumption that those deaths are representative of community-wide deaths then need to have CIs.

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate][Sitrep - xl.read.file] There is an issue with protected sheets:

linelist_raw <- xl.read.file("hack_data/bituvis-cholera.xlsx",
xl.sheet = "xxxxx",
password = askpass::askpass("xxxxx"))
Error in top_left_corner["CurrentRegion"] : You cannot use this command on a protected sheet. To use this command, you must first unprotect the sheet (Review tab, Changes group, Unprotect Sheet button). You may be prompted for a password. (Microsoft Excel) 80020009

Also note, the askpass field is not particularly intuive.
Also need to specify that these packages need to be installed in order to be used.

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [Training] walk people through commenting / uncommenting lines

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate] [cleaning] clean_colnames from epitrix will remove a column called "#"

This is solved by updating epitrix and including protect = "#"

from sitrep.

nsbatra commented on June 3, 2024 1

[Neale] [Sitrep - read data] Pasting filepaths to rio::import might give an error which required the user to change the slashes from \ to / . (this happened to me)

linelist_raw <- rio::import("C:\Users\Neale\OneDrive - Neale Batra\Documents\Jobs\MSF\Moz_Cholera_LineList_DeID_forTesting.xlsx", which = "Sheet1")

Error: '\U' used without hex digits in character string starting ""C:\U"

edit: @nsbatra, I think this issue may be solved if you use single quotes instead of double quotes? Would you mind checking? -Zhian

Alex: no confirm doesnt work we should probably add comment to templates

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate][cholera cleaning] There does not appear to be an ID field in the Cholera dictionary

from sitrep.

zkamvar commented on June 3, 2024 1

General: add instructions for setting up RStudio project to link data and project

from sitrep.

zkamvar commented on June 3, 2024 1

[Maria][ajs_outbreak save_cleaned_data]: writexl needs to be installed first

from sitrep.

zkamvar commented on June 3, 2024 1

[Elburg][AJS 313] this is missing a parens: # linelist_cleaned <- select(linelist_cleaned, c(1:3, "age_years", "sex")

IMO, this should be changed to highlight the fact that they can use this method to create temporary data sets.

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y

Add origin to as.dates and say that for excel windows that it starts as 1899-12-30

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate] [Cholera clean_dates] Error in charToDate(x) : character string is not in a standard unambiguous format - despite all in format %d %B %Y

Note: this needed to be in %d-%b-%y

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate][Cholera]: in cholera for obs_days, should it use date_of_admission instead of date_of_consultation_admission?

Alex: no date_of_consultation_admission is correct. We actually should delete date_of_admission from the dictionary.

from sitrep.

aspina7 commented on June 3, 2024 1

[Annick] [training] Note for teaching: the population data setcion isnt really clear - we need to make something that people understand and also an instruction on how to format the population data in the best way.

from sitrep.

aspina7 commented on June 3, 2024 1

[Elburg] [variable_naming] for non DHIS2 data, need to clarify how much needs to fit (variables!) in order for it to run.
"I think I missed somehting, do I need to rename all the variables so that they match the dictionary?"

from sitrep.

aspina7 commented on June 3, 2024 1

[kate] [clean variables - factors] case_when creates a character variable, make sure to say as much in the script comments. Currently says will recode to a factor. Then again we need to re-do all the factor stuff anyway.

from sitrep.

nsbatra commented on June 3, 2024 1

[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.

from sitrep.

zkamvar commented on June 3, 2024 1

[Neale] [Sitrep general] Just noting that my non-DHIS 2 dataset has separate columns for Day (numeric) and for Month (text). A user would need to clean/paste them for use in these templates. This may not be something worth addressing structurally in linelist or sitrep, but let's chat about how likely this is and whether it's worth mentioning somewhere in the commented instructions.

This is included in the FAQ: https://github.com/R4EPI/sitrep/wiki/4)-FAQ#i-have-a-date-column-and-a-time-column-how-do-i-combine-the-two

Alex: worth adding to training material

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [cholera - dict] msf dict category options show up wrong e.g. sex shows options for pregnancy trimester

ZNK: this turned out to be irreproducible

Alex: i definitely had the same thing at the time when i checked - but it seems to have resolved itself now.

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate] Move reporting week to preamble

[Annick]: use knitr params

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate]: Not clear that sex/gender needs to be a factor

from sitrep.

aspina7 commented on June 3, 2024 1

[Alanah] [recode_factors] "" = NA_character_ when actually the missings where " ". Error returns zero length variable... not super intuitive to understand whats wrong...

from sitrep.

zkamvar commented on June 3, 2024 1

[Elburg] [general]: One of my parting thoughts is idd when you give examples to keep them in line with the rest of the script. I followed the renaming example of sex -> gender and now have to change sex to gender every where in the scripts

from sitrep.

zkamvar commented on June 3, 2024 1

[Kate][General]: rio::import() converts files to UTF-8, but readr::read_csv() does not, which causes problems for cleaning functions that expect utf8

from sitrep.

aspina7 commented on June 3, 2024 1

[Ettiene] [Install] Admin rights issue, cant install package - also cant change .rprofile to change where packages are written to.
This had the error from sf that it couldn't find the object group_map (but in French, so ¯\_(ツ)_/¯)
Eventually resolved itself by installing each of packages again.

from sitrep.

aspina7 commented on June 3, 2024 1

[Anna] [renaming variables] easier way to list out all the variable names in datasets rather than clicking back and forth.

from sitrep.

aspina7 commented on June 3, 2024 1

[Ettiene] [reading_data] importing Excel data from a OCG linelist, the data do not iatart n the first row first column, there are in a specific range,
Add a walkthrough of how to read in specific cell range.
linelist_raw <- rio::import(chemin, which = "Data", range="E12:AD3941")

from sitrep.

aspina7 commented on June 3, 2024 1

[Pat] [reading_dhis_excel_data] wall of text dense. Considering breaking up in to smaller chunks.
consider also not having code commented out for the alternative options because its hard to find it inbetween actual commented text.
Instead consider breaking in to small chunks and have alternative options in seperate chunks where can turn eval = TRUE / FALSE - and provide instructions on that

from sitrep.

aspina7 commented on June 3, 2024 1

[Anna] [renaming variables] measles reading non-dhis2 data, the renmaing examples dont match the dictionary so is confusing.

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate/Isidro] [population data] considering adding the option to also type in population counts (not props) e.g. for village wouldnt necessarily have proportions. But still useful to have proportions for age group breakdowns. (and to show how to read in counts from an excel)

Also for age group break downs fix for all the templates! (alex fucked up all the proportion counting)
All ages 100.00% 0 - 4 y 15.89% 5 - 14 y 26.78% 15 - 29 y 27.72% 30 - 44 y 16.28% >= 45 y 13.33%
Total U5 15.89% 0 - 11 m 3.29% 12 - 23 m 3.29% 24 - 35 m 3.10% 36 - 47 m 3.10% 48 - 59 m 3.10%
see annick email titled population denominator tool. (double check neale also gets this)

Also for all disease templates make more realistic age group examples - e.g. for measles vaccination is up to 59 months, then 10 years and 14 years. [DISCUSS WITH ANNICK!]

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] fmt_counts in cholera treatment plans if there are zero counts in there then returns character(0) - unsure wether fmt_count will fix this automatically and output 0 (0%) in the word doc.
May need to add to fmt_count that returns 0 if as.character comes back.

from sitrep.

aspina7 commented on June 3, 2024 1

[kate] [cholera obs_days] Cholera -- Median, min, and max days_obs need an na.rm = T added

from sitrep.

aspina7 commented on June 3, 2024 1

[Pat] [fixing dates] set unrealistic dates to NA, based on having browsed dates in the previous chunk.
Need to set this ~NA to as.Date(NA)

## set unrealistic dates to NA, based on having browsed dates in the previous chunk
  linelist_cleaned <- mutate(linelist_cleaned,
                             date_of_onset < as.Date("2017-11-01") ~ NA, 
                             date_of_onset == as.Date("2081-01-01") ~ as.Date("2018-01-01"))

otherwise get
Error: Column date_of_onset < as.Date("2017-11-01") ~ NA is of unsupported type quoted call

from sitrep.

aspina7 commented on June 3, 2024 1

[kate] [cholera Attacke rate] overall and by age group - different multiplier. Overall is by 10,000 and age group by 100,000.
Change all to be 10,000 - double check in other templates to see if same.

from sitrep.

zkamvar commented on June 3, 2024 1

[Elburg][symptoms]. There was an issue with symptoms in which Elburg had data that had various calls for yes and no. This needed to be cleaned. I've templated a small example

  library("sitrep")
  library("dplyr")
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dat <- data.frame(
  tidy_symptoms = sample(c("Yes", "No", "Unknown"), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
  num_symptoms = sample(c(0, 1, NA), 100, replace = TRUE, prob = c(0.3, 0.6, 0.1)),
  mixed_symptoms = sample(c("Yes", "No", "Unknown", 1, 0), 100, replace = TRUE, prob = c(0.3, 0.6, 0.05, 0.025, 0.025))
)


NAMES <- colnames(dat)
sitrep::multi_descriptive(dat, NAMES)
#> converting numeric variable to factor
#> # A tibble: 3 x 25
#>   symptom  No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#>   <chr>   <dbl>   <dbl>   <dbl>      <dbl>     <dbl>        <dbl> <dbl>
#> 1 tidy_s…    50    50       100        100        11           11    39
#> 2 num_sy…    NA    NA       100        100        NA           NA    NA
#> 3 mixed_…    57    57.0     100        100         6            6    35
#> # … with 17 more variables: Yes_prop <dbl>, `(0,0.2]_n` <dbl>,
#> #   `(0,0.2]_prop` <dbl>, `(0.2,0.4]_n` <dbl>, `(0.2,0.4]_prop` <dbl>,
#> #   `(0.4,0.6]_n` <dbl>, `(0.4,0.6]_prop` <dbl>, `(0.6,0.8]_n` <dbl>,
#> #   `(0.6,0.8]_prop` <dbl>, `(0.8,1]_n` <dbl>, `(0.8,1]_prop` <dbl>,
#> #   Missing_n <dbl>, Missing_prop <dbl>, `0_n` <dbl>, `0_prop` <dbl>,
#> #   `1_n` <dbl>, `1_prop` <dbl>
dat2 <- dat %>%
  mutate_at(vars(NAMES), ~case_when(
    . == "Yes" ~ "Yes",
    . == "y"   ~ "Yes",
    . == "Y"   ~ "Yes",
    . == "No"  ~ "No",
    . == "N"   ~ "No",
    . == "n"   ~ "No",
    . == 1     ~ "Yes",
    . == 0     ~ "No",
    TRUE       ~ "Unknown"
    ))
sitrep::multi_descriptive(dat2, NAMES)
#> # A tibble: 3 x 9
#>   symptom  No_n No_prop Total_n Total_prop Unknown_n Unknown_prop Yes_n
#>   <chr>   <dbl>   <dbl>   <dbl>      <dbl>     <dbl>        <dbl> <dbl>
#> 1 tidy_s…    50    50       100        100        11           11    39
#> 2 num_sy…    22    22       100        100         6            6    72
#> 3 mixed_…    58    58.0     100        100         6            6    36
#> # … with 1 more variable: Yes_prop <dbl>

^{Created on 2019-07-05 by the reprex package (v0.3.0)}

from sitrep.

aspina7 commented on June 3, 2024 1

[kate] [cholera mortality rate] - with zero deaths then the table comes out with deaths - and population - and CIs of NA-NA .... considering adding to function to have 0 come out rather than - and NA

DeathsPopulationMortality (per 10,000)95%CI
- - 10000 (NA–NA)

SAME for the mortality_rate_region section in cholera

mortality_rate(deaths$deaths, deaths$population, multiplier = 10000) %>%
  # add the region column to table
  bind_cols(select(deaths, zone_sante), .) %>% 
  merge_ci_df(e = 4) %>% # merge the lower and upper CI into one column
  rename("Region" = zone_sante, 
         "Deaths" = deaths, 
         "Population" = population, 
         "Mortality (per 10,000)" = `mortality per 10 000`, 
         "95%CI" = ci) %>% 
  kable(digits = 2)

from sitrep.

aspina7 commented on June 3, 2024 1

[Elburg] [AJS lab tests] did not have all the lab variables listed in LABS - add an explanation to comment out the ones dont have. Also if comment out the last one in the list then ahve an extra comma which throws an error and need to add a NULL (or just delete the comma).
Alternatively Zhian consider adding to multi_descriptive function to drop non existant variables and return a warning message that those variables have been ignored.

from sitrep.

zkamvar commented on June 3, 2024 1

[Elburg][AJS lab tests] Same as #136 (comment), she didn't have the expected variables and so select and rename were failing.

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [loading packages] here::here package is not loaded ... but used in the spatial data section.
Also need to add explanation of how here works.

from sitrep.

aspina7 commented on June 3, 2024 1

[Anna] [standardise_clean filtering] a lot of datasets will fill case_id down automatically even if nothing else is filled in yet. Show example code at the filter bit to show how to drop

from sitrep.

aspina7 commented on June 3, 2024 1

[Kate] [reading shapefiles] actually read_sf does need to be told .shp at the end! i thought it recognises automatically from name

from sitrep.

aspina7 commented on June 3, 2024 1

[kate/Annick] [maps] change colour palletes - dark for highest AR

from sitrep.

aspina7 commented on June 3, 2024 1

[Elburg] [epicurves] show an example of how to change the x axis labels in epicurves. Because for example with loads of data is then uuuugly.

from sitrep.

zkamvar commented on June 3, 2024 1

from kate:
A weird thing -- I originally had a line list with data from 2016--now. I filtered to only include 2019. So when I am running attack rate by week, there is a piece of code that keeps all the factor levels, which mucks up my AR by week.

    # counts and cumulative counts by week
    cases <- linelist_cleaned %>%
      arrange(date_of_admission) %>%        # arrange by date of onset
      count(epiweek, .drop = FALSE) %>% # count all epiweeks and include zero counts
      mutate(cumulative = cumsum(n))    # add a cumulative sum

Z: I guess you need to add a new filter statement in there for epiweek
K: Fair
K: Similar issue with describe_admissions_by_epiweek and describe_exits_by_epiweek -- not sure how to filter there
Z: I forget what disease template you're on?
K: Cholera. But it's basically that i have unused factor levels in epiweek -- any easy way to drop them?
Z: Convert it to character. We added that in there so that any missing weeks would be highlighted.
K: Can I do that in the initial creation, or needs to be in that statement only?
Z: You can do it in the initial creation if you'd like, so you can select factor = FALSE in the epiweek creation. If you want the ability to count missing weeks, then you can use factor_aweek() after you filter your data and it will create levels that span the range of your data.

K: follow-up, changed it to this so i could keep unused levels but only in 2019

    descriptive(linelist_cleaned, "epiweek", coltotals = TRUE, rowtotals = TRUE) %>% 

      filter(grepl("2019", epiweek)) %>%

      augment_redundant("_n$" = " (n)") %>% # modify _n to (n)

      rename_redundant("prop" = "%")    %>% # rename proportions to %

      kable(digits = 2)

Z:is epiweek a character in your linelist or a factor? Oh, duh, you can just drop those levels with forcats::fct_drop(): https://forcats.tidyverse.org/reference/fct_drop.html
But yeah, it shouldn't matter too much. I think what you have above is good.

from sitrep.

pbkeating commented on June 3, 2024 1

Making a flow chart to guide users through the templates would be helpful

from sitrep.

aspina7 commented on June 3, 2024

closing as have the cleaned hackathon to-do-list in place

from sitrep.

Hackathon Issues about sitrep HOT 53 CLOSED

Comments (53)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent