Giter Site home page Giter Site logo

r4epi / sitrep Goto Github PK

View Code? Open in Web Editor NEW
39.0 4.0 14.0 32.93 MB

Report templates and helper functions for applied epidemiology

Home Page: https://r4epi.github.io/sitrep/

License: GNU General Public License v3.0

R 100.00%
r-package report-generator epidemiology outbreaks msf

sitrep's Introduction

Sitrep

Lifecycle: maturing CRAN status Codecov test coverage R build status

The goal of {sitrep} is provide report templates for common epidemiological surveys and outbreak reports. The package further contains helper function that standardize certain analyses.

While templates are primarily for MSF analyses - they have been setup to be as generic as possible for use by the general applied epidemiology community.

Detailed information about the project and the templates can be found at https://r4epis.netlify.com.
A reference website for the functions in {sitrep} can be found at https://r4epi.github.io/sitrep.

{sitrep} includes a number of other R packages which facilitate specific analysis:
{epitabulate}: Tables for epidemiological analysis
{epidict}: Epidemiology data dictionaries and random data generators
{epikit}: Miscellaneous helper tools for epidemiologists
{apyramid}: Age pyramid construction and plotting

Installation

The {sitrep} package, is currently stored in a GitHub repository. Therefore, the procedure to install these packages have one extra step required.

To install sitrep from GitHub you must first install the remotes package.

# install.packages("remotes")
remotes::install_github("r4epi/sitrep")

If you are getting errors, check the frequently asked questions.

Available templates

Sitrep has four outbreak templates and four survey templates available. These templates will generate the following:

  1. A word document with the situation report
  2. A plain text markdown document (for conversion to other formats such as HTML or PDF)
  3. A directory with all of the figures produced

You can access the list of templates in R Studio by clicking (see example below): file > New file > R Markdown… > From Template

Example of how to open and save the cholera template

You can generate an example template by using the check_sitrep_templates() function:

library("sitrep")
output_dir <- file.path(tempdir(), "sitrep_example")
dir.create(output_dir)

# view the available templates, categorized by type
available_sitrep_templates(categorise = TRUE)
#> $outbreak
#> [1] "ajs_outbreak"        "cholera_outbreak"    "measles_outbreak"   
#> [4] "meningitis_outbreak"
#> 
#> $survey
#> [1] "mortality"         "nutrition"         "vaccination_long" 
#> [4] "vaccination_short"

# generate the measles outbreak template in the output directory
check_sitrep_templates("measles_outbreak", path = output_dir)
#> [1] "C:\\Users\\alexf\\AppData\\Local\\Temp\\Rtmpcv1H8d/sitrep_example"

# view the contents
list.files(output_dir, recursive = TRUE)
#> [1] "measles_outbreak.Rmd"

Please note that the ‘sitrep’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

sitrep's People

Contributors

aspina7 avatar dirkschumacher avatar jarvisc1 avatar kdoyle514 avatar lukric avatar nsbatra avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

sitrep's Issues

Validate confidence intervals for mortality surveys

I created three helper functions build up mortality surveys. Currently the default confidence intervals are used (which can be negative for proportions). This needs either to be reviewed or someone can point to the preferred method. cc @aspina7

First draft nutrition survey

We need a first draft for a nutrition survey:

The package anthro provides most of the necessary z-score calculations and is a good contender to be used here. However it seems that prevalence calculation has slightly different rules compared to the ones implemented in the package.

Updating package versions?

Eventually need to work out what package versions need to be the minimum.
Had error with dplyr on first install (old version and had to update)

Helper for statistic + CI

It is tedious to always write "x% CI (y-z)" within text. A helper function shall automate that

str_rate(o = 5, n = 10, alpha = 0.05, method = "wilson")
str_odds(de = 5, dn =5, he = 10, hn = 5,  alpha = 0.05)
str_rr(de = 5, dn =5, he = 10, hn = 5,  alpha = 0.05)
# where de = disease exposed, dn = disease not exposed, 
# he = healthy exposed, hn = healthy not exposed
# variables names as on wikipedia

Used within markdown like this:

The CFR in Bamako is r str_rate(o = 10, n = 50)
=>
The CFR in Bamako is 20% (CI 11.24-33.04)

Hackathon Issues

Morning @zkamvar - Think probably the easiest thing to do is to start a bullet list, then for each bullet sit it out so that it contains the name of person who raised issue in brackets, then the disease template (cholera, ajs, measles, meningitis) they were using, followed by a dash and the codechunk (or general) and then the error or issue. followed by solution in double brackets, at the end of it once fixed. e.g.:

  • [Annick] [Cholera - general] Didnt know how to turn computer on [[Solution: pushed the button]]
  • [Annick] [Cholera - read_dhis_data] Didnt know what an excel file was [[Solution: googled it]]
  • [Elburg] [Sitrep - general] Error when updating packages based on wiki - Error: failed to install 'sitrep' from github: (converted from warning) cannot remove prior installation of package 'rlang' [[Solution: Turn on and off]]
  • [Zhian] [Outbreaks - gen_data] simulated data: sometimes it will generate an explicit missing variable for gender and other times not
  • [Kate] [Sitrep - general] we deleted the general outbreak template from sitrep. Kate gets an error that cannot read yaml because doesnt exist. Cannot open file 'templates/outbreak/template.yaml': No such file or directory

Split sitrep into smaller unit packages

Currently sitrep does a few disparate things:

  • provides inline styling of numbers with the fmt_count() family of functions
  • tabulates linelist data with descriptive()
  • tabulates survey data with tabulate_survey()
  • plots age pyramids with plot_age_pyramid()
  • generates fake data sets and surveys
  • provides a parser for the MSF dictionaries
  • creates age categories and groups them into a single variable
  • provides helpers to calculate proportions (CFR, etc...)

It's safe to say that some of these can be broken into different packages because there are currently 34 packages that we import (which is not the number of packages that get installed).

When we know these templates are working, it would be a good idea to split some of these up into smaller packages that are viable stand-alone CRAN packages that sitrep can depend on (a la a kind of 'tidyverse' for epi packages).

Setup CI

Set up the CI system using travis and appveyor. Ideally test all R versions >= 3.1
(There aren't any tests yet, but still needs to be done)

make functions pipe-ready?

A lot of the flow for the outbreaks template (at least) uses pipes, but introduces an awkward do() component. Having a version that passes a data frame and respects groups, would help make things less awkward.

Spatial analysis issues

I have added a basic section to spatial analysis using the "tmap" package (with "tmaptools" too).
I chose this because it allows both sf and sp files and has similar syntax to ggplot2 but simpler for manipulating spatial data.
The reason I decided against ggmap - is that it really didnt work well in the field on poor internet connection (would often just get kicked out and it wouldnt download at all) - and the style options arent great either.
In general tmap is quite a nice package, allowing for static and interactive maps.
That comes with issues though as it has a lot of dependencies!

The biggest issue is that to get static tiles it uses the "openstreetmap" package which is rjava dependent and produces quite poor resolution rasters.
For interactive maps it uses leaflet package. Which has great tile backgrounds but only interactive ones. Leaflet also has more server options, and particularly important is that it supports openstreetmap black&white as well as humanitarian openstreetmap (no other packages seem to provide these and they are the most important ones!!).

It would be really awesome if we could somehow manage to pull leaflet backgrounds in to static tiles. Alternatively "openstreetmap" package allows to specify server destinations to pull tiles from - so maybe theres a possible work around in there?? (just need to find a stable server with an api?)

The second thing I couldnt seem to get to work is creating kernel density (or heatmaps) - using the smooth_map function from the "tmaptools" package - based on points data.

If we manage a work around for getting tiles in (and dont want all the dependencies of tmap) then using the sf package with ggplot2 is equally simple and functional!

Generate complete directories for analyses

Just an idea, but in addition to a single markdown template we could also provide functionality that helps scaffolding a complete set of directories according to some best practice.

E.g. scaffold_weekly_sitrep(path = ".") creates a set of directories, please a readme on how to organize data, do analyses and archive the reports.

To Do List

Everywhere:

  • [Zhian] text size in plots doesnt respond to theme_set base_size from setup chunk; only when setting element_text in the plot itself. Need to adapt outbreak and survey templates accordingly.

  • [Discussion] would be good to also output a document (beside the word) which summarises what datasets were used (and their file paths), starting cases, dropped cases where the output is. And maybe warnings/errors? [moved to nice-to-haves pr]

  • [Zhian] in tab_* functions, change the warning when dropping NAs to be "call. = FALSE" to the warning function - as descriptive is no longer a user-facing function.

  • [Zhian] add a license to the repo? see #44

  • [Kate/Neale / Alex / Zhian] decide if also want to show code for making implicit NAs out of explicit e.g. "Missing" chars to NA. So if you want to make "No answer" an implicit NA, you would use fct_recode(NULL = "No answer")

  • [Kate] update all sitrep::descriptive tables with groupers to use sitrep::rename_redundant and augment_redundant for renaming similar columns

  • [Neale] Add chi-squared test and t.test examples to website (there is already an example of this in the vaccination survey) [moved to https://github.com/R4EPI/R4EPIs-website/issues/19#issue-494502584 on website]

  • [Zhian/Alex] consider dropping sitrep::discreptive in favour of arsenal::tableby (dplyr compatible? simple enough syntax for beginners?) - decided against, syntax too dense

  • [Neale] Under "## Installing and loading required packages " add that can check where your packages are using .libPaths() and give link to wiki/training material. Currently only says: "Program Files/R/R-[version]/library" which wont always be the case (particularly on MSF computers).

  • [Neale?] double check that case_when is doing what it should when NAs are involved. case_when doesnt leave NAs as NAs, need to add extra argument. see below:

x <- c(NA, "good", "bad")
dplyr::case_when(
  is.na(x)    ~ NA_character_,
  x == "good" ~ "YAY",
  x == "bad"  ~ "BOO!",
  TRUE        ~ "WAT"
)
#> [1] NA     "YAY"  "BOO!"
  • [Neale?] switch all factor stuff to forcats (swap recode_factor out) - cross reference with point under "surveys" - all of the factor cleaning needs a bit of a fix
  • [Neale] add to intro to templates that feedback is always welcome via github. a la #33 (comment)
  • [Alex] check the ceramics package as a map tile solution. See spatial analysis issue. Package is not quite there yet, but may be in the future.
  • [Alex] Clean up sitrep::univariate_analysis - lot of unecessary repeating. Also need to add stratified.

Surveys:

  • [Alex] all surveys - fix factor cleaning according to neale examples from outbreak templates (neale may have already done this)
  • [Alex] mortality and nutrition: finalise reason_no_consent chunk once descriptive function is able to deal with multiple choice variables split over several columns (requires new tab_linelist function)
  • [Alex] nutriton: add a weighted table by household if received soap or not (requires fixing weighting below).
  • [Alex/Zhian] Add option for cluster design to add_weights. Input the number of clusters (i.e. villages), then the number of households within those clusters and the numberof kids for each house (i.e.) ... TO BE DISCUSSED once hear back from statistician
  • [Alex] add examples of stratified design and analysis by region.
  • [Alex] nutrition survey - confirm that weighted proportions are the same as generated by the anthro package see #140
  • [Alex] Nutrition: updated dictionary to switch measles for programme penetrance variable (soap)
  • [Alex/Zhian] Fix surveys cleaning section factors (cause of death and no consent somehow in same cleaning step) - may just be for vaccination.
  • [Alex/Zhian] Add replace NA explicit in factor cleaning for vaccination (from mortality)
  • [Alex] Change mortality and vaccination to use surv_weight variable after change to the add_weight function output change!
  • [Zhian - discuss with alex] the chunk "descriptive_sampling_bias" throws an error if the age group variable has missings (because descriptive now returns a row with "missings") - and the population data frame obviously doesnt have that extra row .... so row numbers differ. Do we just add a comment??
  • [Alex] fix the cluster_hh_size chunk - counting number of houses definitely wrong.
  • [Alex] add cluster to mortality survey (need to define in dict)
  • [Alex] add descriptive of non-consent reasons at the begining results where describe sample
  • [Alex] add a weighted table for reason not vaccinated (to vacc survey template)
  • [Alex] double check in all templates that using correct order for counter and stratifier variables (e.g. in vaccination - can't just swap to have age_group first in order to flip the table.... use the new transpose argument!
  • [Zhian] need to add design effect option to tabulate_survey (see example from epiet RAS case study)
  • [Zhian] tabulate_survey function when pretty=TRUE, returns % symbol as well as CI in each cell. It would be enough to just have the column heading showing % (95%CI), and then in the cell have e.g. 35 (21-90). This relevnt to all pretty merging functions...
  • [Zhian] tabulate_survey when stratified - possible to give option for row or col props, as well as of total?? (same as we do for the descriptive functin)

Extras:

  • [NICE TO HAVE] if have time then add in the options to add sample size calculations see #5 [moved to nice to have pr]
  • ~~[NICE TO HAVE] Consider implementing a variation of the wordr package which allows us to put pagebreaks in. ~~ [moved to nice to have pr]
  • [NICE TO HAVE] on all plots - make y axis numbering go to top of axis, this happens because of expand(c(0,0) [moved to nice to have pr]
  • [see: https://github.com/reconhub/incidence/issues/105] on epicurves - possible to add a 2 weekly moving average? - issue posted on recon. Long term issue case.[moved to nice to have pr]
  • [NICE TO HAVE] Consider adding example of creating age_categories grouping in months for under fives to generic template (added in cholera tempalte - copy paste if acceptable)
  • [NICE TO HAVE] add kates suggestion of removing previously infected cases from denom in consecutive epiweeks (So in theory this is methodologically correct if can actually confirm that those who were counted as cases were in fact confirmed to be from the disease in question. In practice – likely have many suspected cases therefor would not necessarily be removed from the at risk category. Therefor questionable) - decided against - rare use case.
  • [NICE TO HAVE] Confirm if there is an existing way to have discrete categories in choropleth maps
    https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/ Mense, just stick with what we have.
  • [NICE TO HAVE] adjust fmt_count to have an option of removing proportions, and an option for specifying a different denom (from measles - but works just as well with count...) - count is fine stick with that.
  • [NICE TO HAVE] switch from cowplot to patchwork when released on cran (https://github.com/thomasp85/patchwork) - patchwork not being released any time soon. (hasnt been worked on in a while)
  • [NICE TO HAVE] Add non binary gender option to plot_age_pyramid, see #102 (comment)

Outbreaks:

  • [ZHIAN] Fix cowplot alignment of epicurvs and ar/cfr, so that ticks are in the middle of bars

From measles:

  • No lab data in dictionary - add fake lab data with example of how to merge and create case def. (see generic outbreak template). Do all above commented out,- comment out other analyses stratified by case def (just leave in as an example)
  • in gen_data specify that only those who received vaccine have dose entered
  • add an example to data cleaning section for setting dose to NA where vaccine not given. (commented out)
  • add a thing about showing how many NA in each variable (summary has that) - add a bit to drop rows with xzy missing or based on bla... just use dplyr filter
  • add an example of writing cleaned dataset to excel (or double check that we have it there)

  • [ZHIAN] when creating the epiweek variable - define as a factor and then add all weeks between min and max as levels - so that dont have to fuck around in tables with zerocount weeks.
  • [ZHIAN] consider option of add_totals for proportions function,- so it just sums the counts of res, then runs proportions function and bind_rows. If you look at what I did in the CFR section of the cholera template, having to bind_rows of an overall and a group specific CFR calculation is a bit long winded....
  • [ZHIAN] Consider adding counts(proportions%) to inline_fun. See cholera template inline code before #### Demographics
  • [ZHIAN] fmt_ci_df(ar) adds a % sign at the end ... but if its per 10,000 population we dont want a % sign as seen after attack_rate code chunk
  • [ZHIAN] Get rid of do(..) and change functions to NSE? goes back to issue#48

  • [ZHIAN/ALEX] Try and make tables that are too big fit nicely in worddoc output (maybe shorten col names or merge categories....

  • [ALEX] add option to add a ceiling to age_group - e.g. to have the highest group in months end at 24months... (not be 24+)
  • [ALEX] update descriptive function with option to have percentage of total, rather than column specific.
  • [ALEX] Mapping section: consider changing the plotting of choropleths as categories rather than continuous... also make the points stuff better
  • [ALEX] make sure all the 95%CIs are merged in the document tables... (think is just mortality section left over)
  • [ALEX] add kates admissions/exits table in seperate tables

  • [DONE? ZHIAN?] Consider adding an option to age_pyramid which returns proportions rather than counts; and option to remove NAs; Horizontal_lines does not seem to work either....
  • [DONE?] When library(excel.link), message about someone called daniela - supress messages...
  • [DONE] on epicurves, when you use scale_x_date(date_breaks = "1 week") - the axis labels change to full dates, is it possible to keep it with the default 2013-W01 for example?
  • [DONE?] using fmt_ci_df function doesnt work if use the mergeCI function from props functions (e.g. attack_rate)
  • [DONE] Find a better way to reference lines/chunks (is there some kind of hyperlink function?), for "Introduction to this template" section - no solution really, just reference code chunk names

After Hackathon: set rlang version to >= 0.4.0

rlang was just released with version 0.4.0, which introduces the double moustache operator. This changes expressions like !! enquo(var) to {{ var }}, which is much easier to read, imo.

It would also change functions like this:

    # Calculate the survey proportion for both the stratifier and counter
    # @param xx a tbl_svy object
    # @param .x a single character value matching those found in the cod column
    # @param .y a single character value matching those found in the st column
    # @param cod a symbol specifying the column for the counter
    # @param st a symbol specifying the column for the stratifier
    # @return a data frame with five columns, the stratifier, the counter, 
    # proportion, lower, and upper.
    s_prop_strat <- function(xx, .x, .y, cod, st) {
      st  <- rlang::enquo(st)
      cod <- rlang::enquo(cod)
      res <- srvyr::summarise(xx, 
                              proportion = srvyr::survey_mean(!! cod == .x & !! st == .y,
                                                              proportion = TRUE,
                                                              vartype = "ci"))
      res <- dplyr::bind_cols(!! cod := .x, res)
      dplyr::bind_cols(!! st := .y, res)
    }

To this:

    # Calculate the survey proportion for both the stratifier and counter
    # @param xx a tbl_svy object
    # @param .x a single character value matching those found in the cod column
    # @param .y a single character value matching those found in the st column
    # @param cod a symbol specifying the column for the counter
    # @param st a symbol specifying the column for the stratifier
    # @return a data frame with five columns, the stratifier, the counter, 
    # proportion, lower, and upper.
    s_prop_strat <- function(xx, .x, .y, cod, st) {
      res <- srvyr::summarise(xx, 
                              proportion = srvyr::survey_mean({{ cod }} == .x & {{ st }} == .y,
                                                              proportion = TRUE,
                                                              vartype = "ci"))
      res <- dplyr::bind_cols({{ cod }} := .x, res)
      dplyr::bind_cols({{ st }} := .y, res)
    }

Using the survey package for survey analysis

A quick example to test some functions

library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
linelist <- outbreaks::fluH7N9_china_2013 %>% 
  group_by(province) %>% 
  filter(n() > 5) %>% 
  ungroup() %>% 
  filter(!is.na(outcome))

design <- svydesign(ids = ~1,  #no cluster within strata
                    strata = ~province, # strata, replace by ~1 if no strata
                    weights = ~ 1, # sampling weights
                    data = linelist)

# get totals
svytotal(~outcome, design)
#>                total     SE
#> outcomeDeath      27 3.9362
#> outcomeRecover    35 3.9362

# compute something by another group. E.g. the mean
svyby(~outcome, ~gender, design, svymean)
#>   gender outcomeDeath outcomeRecover se.outcomeDeath se.outcomeRecover
#> f      f    0.3529412      0.6470588      0.11713818        0.11713818
#> m      m    0.4651163      0.5348837      0.07686946        0.07686946

# you can also compute confidence intervals
confint(svyby(~outcome, ~gender, design, svymean))
#>                      2.5 %    97.5 %
#> f:outcomeDeath   0.1233546 0.5825278
#> m:outcomeDeath   0.3144549 0.6157776
#> f:outcomeRecover 0.4174722 0.8766454
#> m:outcomeRecover 0.3842224 0.6855451
confint(svytotal(~outcome, design))
#>                   2.5 %   97.5 %
#> outcomeDeath   19.28526 34.71474
#> outcomeRecover 27.28526 42.71474

Created on 2018-11-24 by the reprex package (v0.2.1)

Fix R CMD check issues

Currenlty a lot of prototype functions produce check errors. We should fix those and add documention dispite the fact that they are not stable yet. This can be worked on after #18 is merged.

Add Description field to template.yml

Related to #35, I'm getting errors invoking the template from the command line:

rmarkdown::draft("outbreak.Rmd", template = "outbreak", package = "epireports")
# Error in rmarkdown::draft("outbreak.Rmd", template = "outbreak", package = "epireports") :
#   template.yaml must contain name and description fields

This is from 145f6a8.

According to the official documentation, template files must have name and description

For each template, an included dataset would be great

Ideally a template could be run creating all outputs for a sample dataset. For the outbreak example I used a random dataset, there might be better. In particular we might want to create messy versions of these sample datasets

aweek start date is not propogated forward in outbreak templates

Currently, there are places in the cholera template that will not work if the epiweek does not start on Monday:

linelist_cleaned <- linelist_cleaned %>%
filter(date_of_onset <= week2date(sprintf("%s-7", reporting_week)))
# define the first week of outbreak (date of first case)
first_week <- levels(linelist_cleaned$epiweek)[1]
# outbreak start
# return the first day in the week of first case
obs_start <- week2date(sprintf("%s-1", first_week))
# return last day of reporting week
obs_end <- week2date(sprintf("%s-7", reporting_week))

This will lead to fencepost errors, and I've proposed a fix in aweek that will help with this situation: reconhub/aweek#17

Use rio as default data import package

@epiamsterdam has pointed out that the rio package has a simple interface of import() and export() and has sensible defaults for guessing the file format. This would be extremely useful for epis who just need to get their data into R. The only downside is that it will introduce a lot of hidden layers of dependencies.

Improve age pyramid

Not sure if ggpyramid of reconhub should be used or we improve the code within this package.

First draft vaccition survey

This is yet to be created, but is, according to the one sample I have "just" descriptive tables of counts and proportions and a point map. So just a couple of dplyr statements essentially + some sample code for data prepartation and cleaning. But that is probably similiar to all other surveys.

Issues with installing from github

Walking through installing from github and he got a whole bunch of errors - starting with not being able to update packages. See below.

Probs good to address next week before sending out to epis.

He needed to reinstall the package colorspace and restart the R session and then it worked for colorspace. Then got same error for rlang, repeated above steps and then epireports loaded correctly but templates are still not there....
@dirkschumacher @zkamvar
epireports_loading

Question: is rounding necessary for proportions?

I noticed that all of the proportion related functions (in cfr.R) have the digits argument, which defaults to 1. This causes rounding to a single decimal place for the results. Would a better solution just be to modify the "digits" option?

(x <- tibble::enframe(runif(5)))
#> # A tibble: 5 x 2
#>    name value
#>   <int> <dbl>
#> 1     1 0.981
#> 2     2 0.250
#> 3     3 0.279
#> 4     4 0.303
#> 5     5 0.276
knitr::kable(x)
name value
1 0.9806477
2 0.2499407
3 0.2786514
4 0.3034308
5 0.2764433
options(digits = 3)
knitr::kable(x)
name value
1 0.981
2 0.250
3 0.279
4 0.303
5 0.276
options(digits = 7)
knitr::kable(x, digits = 3)
name value
1 0.981
2 0.250
3 0.279
4 0.303
5 0.276

Created on 2019-01-22 by the reprex package (v0.2.1)

Nutrition zscore plots

For nutrition surveys, we need plots that compare the observed z-scores to the WHO Standard.

Expected output from an actual survey:
screen shot 2019-01-11 at 13 09 22

We should use ggplot2. Here is some quick sample code that uses the anthro package to generate the z-scores. But the function should really just take a vector of z-scores and necessary labels.

Quick illustration:

library(ggplot2)
library(anthro)

n <- 1000
res <- anthro_zscores(sex = sample(c(1, 2), n, TRUE),
                  age = as.integer(rnorm(n, 1000, 3)),
                  lenhei = rnorm(n, 94, 3))


ggplot(res[res$flen == 0, ]) + 
  geom_density(aes(x = zlen), color = "darkred") + 
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "darkgreen") +
  scale_x_continuous(limits = c(-6, 6))

screen shot 2019-01-11 at 14 25 59

Please use colorblind and printer friendly colors

Remove generic outbreak template

A while ago, we realized that we had tackled the issue of a generic outbreak template too early and decided to focus effort on creating disease-specific templates instead with the idea that we would re-visit the outbreak template and combine the common factors of the disease-specific templates into the generic.

At the moment, the generic outbreak template exists in a bit of a derelict state with old problems lain bare (e.g. handling functions that output vectors in a tidy framework):

```{r cfr_by_age_group}
# group by known outcome and agegroup
linelist_cleaned %>%
filter(!is.na(outcome)) %>% # remove rows with missing outcome
group_by(age_group) %>% # group by age_group
summarise(deaths = sum(outcome == "Death"), # tally deaths
population = n()) %>% # tally population
do(bind_cols(age_group = .$age_group, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
arrange(desc(lower)) %>% # sort by lower confidence interval
tidyr::complete(age_group) %>% # Ensure all levels are represented
rename("Age group (years)" = age_group,
"Deaths" = deaths,
"Population" = population,
"CFR (%)" = cfr,
"Lower 95%CI" = lower,
"Upper 95%CI" = upper) %>%
knitr::kable(digits = 2)
```

I propose that we scrap the template altogether and highlight the commonalities of the four disease-specific templates in the wiki

add_weights need warning if age_group not factor

@pbkeating pointed out that add_weights will wrongly add extra rows to a data set if age_groups from the population data set is not a factor.
This came up using the example code from the vaccination template below - it starts with 1000 cases and after add weight an extra 200 or so show up...
This issue doesnt happen if you use the gen_pop function as in the mortality template because the age_group variable resulting from that function is a facator.

vaccination_raw <- sitrep::gen_data(dictionary = "Vaccination", varnames = "column_name",
                         numcases = 1000)

# create fake population by age and sex 
population_data_age <- tibble(age_group = rep.int(c("0-0", "1-1", "2-2", "3-3", "4-4", "5+"), 2), 
                              sex = rep.int(c("Male", "Female"), 6))
population_data_age$population <- as.integer(runif(nrow(population_data_age), 
                                          min = 500, max = 2000))

# clean up the column names
colnames(vaccination_raw) <- clean_labels(colnames(vaccination_raw))


# Additional variable name cleaning and creation of vaccination status binary variable
vaccination_clean <- vaccination_raw %>%
                      mutate(age_in_years = as.integer(q10_age_yr),
                             age_in_months = as.integer(q55_age_mth),
                             age_group = sitrep::age_categories(age_in_years, breakers = c(0,1,2,3,4,5)),
                             sex = q5_sex,
                             vaccination_routine = q2_vaccine_9months,
                             vaccination_sia = q32_vaccine_sia,
                             vaccine_mass = q17_vaccine_mass,
                             disease_diagnosis = q47_disease_diagnosis,
                             area = q4_settlement,
                             cluster = q77_what_is_the_cluster_number)

######## WEIGHTING 
vaccination_clean <- add_weights(vaccination_clean, population_data_age, age_group, sex)

Field-ready mortality survey

The current mortality survey template has three sample analysis chunks. This needs to be extended, probably, in order to make it field ready. In particular more code comments need to be included.

In a similiar fashion to what @aspina7 is doing for the outbreak templates in #18.

Create example code for maps

  • Pointmap shape file / geojson
  • Pointmap dynamic base map
  • Choropleth map shape file / geojson
  • Choropleth map dynamic base map

For the two types of maps I would like to have examples where the shapefile/geojson already exists (plus some code how to download a shapefile beforehand e.g. from HDX) and code that downloads the basemap from the internet using a tileprovider (e.g. ggmap).

Add more authors to the package

@zkamvar @aspina7 please add yourself as authors to the packages.

Also I think it might make sense to add all other project folks as ctb. Or shall we leave the DESCRIPTION file for code contributions only?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.