r4epi / sitrep Goto Github PK

View Code? Open in Web Editor NEW

39.0 4.0 14.0 32.93 MB

Report templates and helper functions for applied epidemiology

Home Page: https://r4epi.github.io/sitrep/

License: GNU General Public License v3.0

R 100.00%

r-package report-generator epidemiology outbreaks msf

sitrep's Introduction

Sitrep

The goal of {sitrep} is provide report templates for common epidemiological surveys and outbreak reports. The package further contains helper function that standardize certain analyses.

While templates are primarily for MSF analyses - they have been setup to be as generic as possible for use by the general applied epidemiology community.

Detailed information about the project and the templates can be found at https://r4epis.netlify.com.
A reference website for the functions in {sitrep} can be found at https://r4epi.github.io/sitrep.

{sitrep} includes a number of other R packages which facilitate specific analysis:
{epitabulate}: Tables for epidemiological analysis
{epidict}: Epidemiology data dictionaries and random data generators
{epikit}: Miscellaneous helper tools for epidemiologists
{apyramid}: Age pyramid construction and plotting

Installation

The {sitrep} package, is currently stored in a GitHub repository. Therefore, the procedure to install these packages have one extra step required.

To install sitrep from GitHub you must first install the remotes package.

# install.packages("remotes")
remotes::install_github("r4epi/sitrep")

If you are getting errors, check the frequently asked questions.

Available templates

Sitrep has four outbreak templates and four survey templates available. These templates will generate the following:

A word document with the situation report
A plain text markdown document (for conversion to other formats such as HTML or PDF)
A directory with all of the figures produced

You can access the list of templates in R Studio by clicking (see example below): file > New file > R Markdown… > From Template

You can generate an example template by using the check_sitrep_templates() function:

library("sitrep")
output_dir <- file.path(tempdir(), "sitrep_example")
dir.create(output_dir)

# view the available templates, categorized by type
available_sitrep_templates(categorise = TRUE)
#> $outbreak
#> [1] "ajs_outbreak"        "cholera_outbreak"    "measles_outbreak"   
#> [4] "meningitis_outbreak"
#> 
#> $survey
#> [1] "mortality"         "nutrition"         "vaccination_long" 
#> [4] "vaccination_short"

# generate the measles outbreak template in the output directory
check_sitrep_templates("measles_outbreak", path = output_dir)
#> [1] "C:\\Users\\alexf\\AppData\\Local\\Temp\\Rtmpcv1H8d/sitrep_example"

# view the contents
list.files(output_dir, recursive = TRUE)
#> [1] "measles_outbreak.Rmd"

Please note that the ‘sitrep’ project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

sitrep's People

Contributors

Stargazers

Watchers

Forkers

dirkschumacher lukric pbkeating kdoyle514 chrisleboa idrissait laasousa zawmtun karthikbalajee phgis ishuk77 amataste ramsas88

sitrep's Issues

Validate confidence intervals for mortality surveys

I created three helper functions build up mortality surveys. Currently the default confidence intervals are used (which can be negative for proportions). This needs either to be reviewed or someone can point to the preferred method. cc @aspina7

First draft nutrition survey

We need a first draft for a nutrition survey:

The package anthro provides most of the necessary z-score calculations and is a good contender to be used here. However it seems that prevalence calculation has slightly different rules compared to the ones implemented in the package.

univariate analyses

@aspina7 looked at what Dan did and coded a quick prototype for RR based on his output format. Does this go in the right direction?
https://github.com/R4EPI/outbreaks/blob/master/oubreak-report.md

Updating package versions?

Eventually need to work out what package versions need to be the minimum.
Had error with dplyr on first install (old version and had to update)

Rename package to "sitrep"

As discussed in #11

Remove sample size calculation from surveys

We should exclude this from the templates. Maybe include this later in a separate component, but for remove the code from the package and templates

Make plot_age_distribution to include non-binary gender values

The age pyramid has a flaw in that it only considers binary gender variables. This will not work in situation reports that would include non-binary or transgender individuals.

One solution for this would be to additionally include a function that would plot the age distribution where the gender categories are dodged bars (see https://ggplot2.tidyverse.org/reference/position_dodge.html)

Helper for statistic + CI

It is tedious to always write "x% CI (y-z)" within text. A helper function shall automate that

str_rate(o = 5, n = 10, alpha = 0.05, method = "wilson")
str_odds(de = 5, dn =5, he = 10, hn = 5,  alpha = 0.05)
str_rr(de = 5, dn =5, he = 10, hn = 5,  alpha = 0.05)
# where de = disease exposed, dn = disease not exposed, 
# he = healthy exposed, hn = healthy not exposed
# variables names as on wikipedia

Used within markdown like this:

The CFR in Bamako is r str_rate(o = 10, n = 50)
=>
The CFR in Bamako is 20% (CI 11.24-33.04)

Test version R >= 3.3 on appveyor

I assume most of the users use Windows and we should test all supported versions of R as well.

R >= 3.3 is required because of rgdal

We should make this explicit

Hackathon Issues

Morning @zkamvar - Think probably the easiest thing to do is to start a bullet list, then for each bullet sit it out so that it contains the name of person who raised issue in brackets, then the disease template (cholera, ajs, measles, meningitis) they were using, followed by a dash and the codechunk (or general) and then the error or issue. followed by solution in double brackets, at the end of it once fixed. e.g.:

[Annick] [Cholera - general] Didnt know how to turn computer on [[Solution: pushed the button]]
[Annick] [Cholera - read_dhis_data] Didnt know what an excel file was [[Solution: googled it]]
[Elburg] [Sitrep - general] Error when updating packages based on wiki - Error: failed to install 'sitrep' from github: (converted from warning) cannot remove prior installation of package 'rlang' [[Solution: Turn on and off]]
[Zhian] [Outbreaks - gen_data] simulated data: sometimes it will generate an explicit missing variable for gender and other times not
[Kate] [Sitrep - general] we deleted the general outbreak template from sitrep. Kate gets an error that cannot read yaml because doesnt exist. Cannot open file 'templates/outbreak/template.yaml': No such file or directory

Split sitrep into smaller unit packages

Currently sitrep does a few disparate things:

provides inline styling of numbers with the fmt_count() family of functions
tabulates linelist data with descriptive()
tabulates survey data with tabulate_survey()
plots age pyramids with plot_age_pyramid()
generates fake data sets and surveys
provides a parser for the MSF dictionaries
creates age categories and groups them into a single variable
provides helpers to calculate proportions (CFR, etc...)

It's safe to say that some of these can be broken into different packages because there are currently 34 packages that we import (which is not the number of packages that get installed).

When we know these templates are working, it would be a good idea to split some of these up into smaller packages that are viable stand-alone CRAN packages that sitrep can depend on (a la a kind of 'tidyverse' for epi packages).

Include rmapshaper and lwgeom packages in Suggests

This stems from the fact that the function tmaptools::simplify_shape() uses suggested packages, so they are missed on installation as R doesn't install suggests of dependent packages.

The only issue is that rmapshaper needs rJava to work, which will be problematic on Linux systems (hi @thibautjombart 👋) unless the correct incantations are used to configure it correctly.

Setup CI

Set up the CI system using travis and appveyor. Ideally test all R versions >= 3.1
(There aren't any tests yet, but still needs to be done)

make functions pipe-ready?

A lot of the flow for the outbreaks template (at least) uses pipes, but introduces an awkward do() component. Having a version that passes a data frame and respects groups, would help make things less awkward.

writexl not available for r v.3.5.1

cholera template @epiamsterdam trying to save cleaned data set, but appears to not be available on newest version release?

Create an R4EPI drat repository

This is related to the multiple issues seen in #55, @dirkschumacher has suggested creating a drat repository so users don't have to rely on building the binary packages themselves (#55 (comment)).

Spatial analysis issues

I have added a basic section to spatial analysis using the "tmap" package (with "tmaptools" too).
I chose this because it allows both sf and sp files and has similar syntax to ggplot2 but simpler for manipulating spatial data.
The reason I decided against ggmap - is that it really didnt work well in the field on poor internet connection (would often just get kicked out and it wouldnt download at all) - and the style options arent great either.
In general tmap is quite a nice package, allowing for static and interactive maps.
That comes with issues though as it has a lot of dependencies!

The biggest issue is that to get static tiles it uses the "openstreetmap" package which is rjava dependent and produces quite poor resolution rasters.
For interactive maps it uses leaflet package. Which has great tile backgrounds but only interactive ones. Leaflet also has more server options, and particularly important is that it supports openstreetmap black&white as well as humanitarian openstreetmap (no other packages seem to provide these and they are the most important ones!!).

It would be really awesome if we could somehow manage to pull leaflet backgrounds in to static tiles. Alternatively "openstreetmap" package allows to specify server destinations to pull tiles from - so maybe theres a possible work around in there?? (just need to find a stable server with an api?)

The second thing I couldnt seem to get to work is creating kernel density (or heatmaps) - using the smooth_map function from the "tmaptools" package - based on points data.

If we manage a work around for getting tiles in (and dont want all the dependencies of tmap) then using the sf package with ggplot2 is equally simple and functional!

Add list of potential data errors

Amrish please :)

Generate complete directories for analyses

Just an idea, but in addition to a single markdown template we could also provide functionality that helps scaffolding a complete set of directories according to some best practice.

E.g. scaffold_weekly_sitrep(path = ".") creates a set of directories, please a readme on how to organize data, do analyses and archive the reports.

To Do List

Everywhere:

x <- c(NA, "good", "bad")
dplyr::case_when(
  is.na(x)    ~ NA_character_,
  x == "good" ~ "YAY",
  x == "bad"  ~ "BOO!",
  TRUE        ~ "WAT"
)
#> [1] NA     "YAY"  "BOO!"

[Neale?] switch all factor stuff to forcats (swap recode_factor out) - cross reference with point under "surveys" - all of the factor cleaning needs a bit of a fix
[Neale] add to intro to templates that feedback is always welcome via github. a la #33 (comment)
[Alex] check the ceramics package as a map tile solution. See spatial analysis issue. Package is not quite there yet, but may be in the future.
[Alex] Clean up sitrep::univariate_analysis - lot of unecessary repeating. Also need to add stratified.

Surveys:

Extras:

Outbreaks:

[ZHIAN] Fix cowplot alignment of epicurvs and ar/cfr, so that ticks are in the middle of bars

From measles:

No lab data in dictionary - add fake lab data with example of how to merge and create case def. (see generic outbreak template). Do all above commented out,- comment out other analyses stratified by case def (just leave in as an example)
in gen_data specify that only those who received vaccine have dose entered
add an example to data cleaning section for setting dose to NA where vaccine not given. (commented out)
add a thing about showing how many NA in each variable (summary has that) - add a bit to drop rows with xzy missing or based on bla... just use dplyr filter
add an example of writing cleaned dataset to excel (or double check that we have it there)

[ZHIAN] when creating the epiweek variable - define as a factor and then add all weeks between min and max as levels - so that dont have to fuck around in tables with zerocount weeks.
[ZHIAN] consider option of add_totals for proportions function,- so it just sums the counts of res, then runs proportions function and bind_rows. If you look at what I did in the CFR section of the cholera template, having to bind_rows of an overall and a group specific CFR calculation is a bit long winded....
[ZHIAN] Consider adding counts(proportions%) to inline_fun. See cholera template inline code before #### Demographics
[ZHIAN] fmt_ci_df(ar) adds a % sign at the end ... but if its per 10,000 population we dont want a % sign as seen after attack_rate code chunk
[ZHIAN] Get rid of do(..) and change functions to NSE? goes back to issue#48

[ZHIAN/ALEX] Try and make tables that are too big fit nicely in worddoc output (maybe shorten col names or merge categories....

[ALEX] add option to add a ceiling to age_group - e.g. to have the highest group in months end at 24months... (not be 24+)
[ALEX] update descriptive function with option to have percentage of total, rather than column specific.
[ALEX] Mapping section: consider changing the plotting of choropleths as categories rather than continuous... also make the points stuff better
[ALEX] make sure all the 95%CIs are merged in the document tables... (think is just mortality section left over)
[ALEX] add kates admissions/exits table in seperate tables

[DONE? ZHIAN?] Consider adding an option to age_pyramid which returns proportions rather than counts; and option to remove NAs; Horizontal_lines does not seem to work either....
[DONE?] When library(excel.link), message about someone called daniela - supress messages...
[DONE] on epicurves, when you use scale_x_date(date_breaks = "1 week") - the axis labels change to full dates, is it possible to keep it with the default 2013-W01 for example?
[DONE?] using fmt_ci_df function doesnt work if use the mergeCI function from props functions (e.g. attack_rate)
[DONE] Find a better way to reference lines/chunks (is there some kind of hyperlink function?), for "Introduction to this template" section - no solution really, just reference code chunk names

After Hackathon: set rlang version to >= 0.4.0

rlang was just released with version 0.4.0, which introduces the double moustache operator. This changes expressions like !! enquo(var) to {{ var }}, which is much easier to read, imo.

It would also change functions like this:

    # Calculate the survey proportion for both the stratifier and counter
    # @param xx a tbl_svy object
    # @param .x a single character value matching those found in the cod column
    # @param .y a single character value matching those found in the st column
    # @param cod a symbol specifying the column for the counter
    # @param st a symbol specifying the column for the stratifier
    # @return a data frame with five columns, the stratifier, the counter, 
    # proportion, lower, and upper.
    s_prop_strat <- function(xx, .x, .y, cod, st) {
      st  <- rlang::enquo(st)
      cod <- rlang::enquo(cod)
      res <- srvyr::summarise(xx, 
                              proportion = srvyr::survey_mean(!! cod == .x & !! st == .y,
                                                              proportion = TRUE,
                                                              vartype = "ci"))
      res <- dplyr::bind_cols(!! cod := .x, res)
      dplyr::bind_cols(!! st := .y, res)
    }

To this:

    # Calculate the survey proportion for both the stratifier and counter
    # @param xx a tbl_svy object
    # @param .x a single character value matching those found in the cod column
    # @param .y a single character value matching those found in the st column
    # @param cod a symbol specifying the column for the counter
    # @param st a symbol specifying the column for the stratifier
    # @return a data frame with five columns, the stratifier, the counter, 
    # proportion, lower, and upper.
    s_prop_strat <- function(xx, .x, .y, cod, st) {
      res <- srvyr::summarise(xx, 
                              proportion = srvyr::survey_mean({{ cod }} == .x & {{ st }} == .y,
                                                              proportion = TRUE,
                                                              vartype = "ci"))
      res <- dplyr::bind_cols({{ cod }} := .x, res)
      dplyr::bind_cols({{ st }} := .y, res)
    }

Add feedback section to templates and other documentation

It should be clear from the documentation that feedback and ideas for improvements are always welcome.

Find a better name for the package

There is the EpiReport already published by ECDC. Ideas welcome :)

change default output for templates to html_document?

Currently it is github_document. I think html_document is a sensible default. Change my mind.

Using the survey package for survey analysis

A quick example to test some functions

library(survey)
#> Loading required package: grid
#> Loading required package: Matrix
#> Loading required package: survival
#> 
#> Attaching package: 'survey'
#> The following object is masked from 'package:graphics':
#> 
#>     dotchart
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
linelist <- outbreaks::fluH7N9_china_2013 %>% 
  group_by(province) %>% 
  filter(n() > 5) %>% 
  ungroup() %>% 
  filter(!is.na(outcome))

design <- svydesign(ids = ~1,  #no cluster within strata
                    strata = ~province, # strata, replace by ~1 if no strata
                    weights = ~ 1, # sampling weights
                    data = linelist)

# get totals
svytotal(~outcome, design)
#>                total     SE
#> outcomeDeath      27 3.9362
#> outcomeRecover    35 3.9362

# compute something by another group. E.g. the mean
svyby(~outcome, ~gender, design, svymean)
#>   gender outcomeDeath outcomeRecover se.outcomeDeath se.outcomeRecover
#> f      f    0.3529412      0.6470588      0.11713818        0.11713818
#> m      m    0.4651163      0.5348837      0.07686946        0.07686946

# you can also compute confidence intervals
confint(svyby(~outcome, ~gender, design, svymean))
#>                      2.5 %    97.5 %
#> f:outcomeDeath   0.1233546 0.5825278
#> m:outcomeDeath   0.3144549 0.6157776
#> f:outcomeRecover 0.4174722 0.8766454
#> m:outcomeRecover 0.3842224 0.6855451
confint(svytotal(~outcome, design))
#>                   2.5 %   97.5 %
#> outcomeDeath   19.28526 34.71474
#> outcomeRecover 27.28526 42.71474

^{Created on 2018-11-24 by the reprex package (v0.2.1)}

Include fake population data

Provide common to cbind, rbind and join data.

Support sample size calculations for all three surveys

For now, the method of openEpi is implemented, but more code might be needed:

Finish, test and document the current functions
Decide if more methods are needed
Include code in all three report types

Fix R CMD check issues

Currenlty a lot of prototype functions produce check errors. We should fix those and add documention dispite the fact that they are not stable yet. This can be worked on after #18 is merged.

Travis build error on oldrel with gdal

Add Description field to template.yml

Related to #35, I'm getting errors invoking the template from the command line:

rmarkdown::draft("outbreak.Rmd", template = "outbreak", package = "epireports")
# Error in rmarkdown::draft("outbreak.Rmd", template = "outbreak", package = "epireports") :
#   template.yaml must contain name and description fields

This is from 145f6a8.

According to the official documentation, template files must have name and description

For each template, an included dataset would be great

Ideally a template could be run creating all outputs for a sample dataset. For the outbreak example I used a random dataset, there might be better. In particular we might want to create messy versions of these sample datasets

aweek start date is not propogated forward in outbreak templates

Currently, there are places in the cholera template that will not work if the epiweek does not start on Monday:

sitrep/inst/rmarkdown/templates/cholera_outbreak/skeleton/skeleton.Rmd

Lines 653 to 664 in 780dad8

    
           linelist_cleaned <- linelist_cleaned %>%  
        
             filter(date_of_onset <= week2date(sprintf("%s-7", reporting_week))) 
        
           # define the first week of outbreak (date of first case) 
        
           first_week <- levels(linelist_cleaned$epiweek)[1] 
        
           # outbreak start  
        
           # return the first day in the week of first case  
        
           obs_start <- week2date(sprintf("%s-1", first_week)) 
        
           # return last day of reporting week  
        
           obs_end   <- week2date(sprintf("%s-7", reporting_week))

This will lead to fencepost errors, and I've proposed a fix in aweek that will help with this situation: reconhub/aweek#17

Use rio as default data import package

@epiamsterdam has pointed out that the rio package has a simple interface of import() and export() and has sensible defaults for guessing the file format. This would be extremely useful for epis who just need to get their data into R. The only downside is that it will introduce a lot of hidden layers of dependencies.

Improve age pyramid

Not sure if ggpyramid of reconhub should be used or we improve the code within this package.

Question: who is the copyright holder?

First draft vaccition survey

This is yet to be created, but is, according to the one sample I have "just" descriptive tables of counts and proportions and a point map. So just a couple of dplyr statements essentially + some sample code for data prepartation and cleaning. But that is probably similiar to all other surveys.

Issues with installing from github

Walking through installing from github and he got a whole bunch of errors - starting with not being able to update packages. See below.

Probs good to address next week before sending out to epis.

He needed to reinstall the package colorspace and restart the R session and then it worked for colorspace. Then got same error for rlang, repeated above steps and then epireports loaded correctly but templates are still not there....
@dirkschumacher @zkamvar

Write a walkthrough for the outbreak template

It would be good to have walkthrough and an introduction for the outbreak report template in the wiki. We can then later take the text and put it on a dedicated website

Nice output for 2x2 tables

Currenlty we use epitools, but we might want to add functions that improve the output of these tests.

See previous discussions here:
R4EPI/outbreaks#6
R4EPI/projmgmt#1

Question: is rounding necessary for proportions?

I noticed that all of the proportion related functions (in cfr.R) have the digits argument, which defaults to 1. This causes rounding to a single decimal place for the results. Would a better solution just be to modify the "digits" option?

(x <- tibble::enframe(runif(5)))
#> # A tibble: 5 x 2
#>    name value
#>   <int> <dbl>
#> 1     1 0.981
#> 2     2 0.250
#> 3     3 0.279
#> 4     4 0.303
#> 5     5 0.276
knitr::kable(x)

name	value
1	0.9806477
2	0.2499407
3	0.2786514
4	0.3034308
5	0.2764433

options(digits = 3)
knitr::kable(x)

name	value
1	0.981
2	0.250
3	0.279
4	0.303
5	0.276

options(digits = 7)
knitr::kable(x, digits = 3)

name	value
1	0.981
2	0.250
3	0.279
4	0.303
5	0.276

^{Created on 2019-01-22 by the reprex package (v0.2.1)}

Add margin totals option to descriptive func

Just reminding myself to add the option for totals.
Dplyr has a weird workaround but janitor package makes easy.

Nutrition zscore plots

For nutrition surveys, we need plots that compare the observed z-scores to the WHO Standard.

Expected output from an actual survey:

We should use ggplot2. Here is some quick sample code that uses the anthro package to generate the z-scores. But the function should really just take a vector of z-scores and necessary labels.

Quick illustration:

library(ggplot2)
library(anthro)

n <- 1000
res <- anthro_zscores(sex = sample(c(1, 2), n, TRUE),
                  age = as.integer(rnorm(n, 1000, 3)),
                  lenhei = rnorm(n, 94, 3))


ggplot(res[res$flen == 0, ]) + 
  geom_density(aes(x = zlen), color = "darkred") + 
  stat_function(fun = dnorm, args = list(mean = 0, sd = 1), color = "darkgreen") +
  scale_x_continuous(limits = c(-6, 6))

Please use colorblind and printer friendly colors

Remove generic outbreak template

A while ago, we realized that we had tackled the issue of a generic outbreak template too early and decided to focus effort on creating disease-specific templates instead with the idea that we would re-visit the outbreak template and combine the common factors of the disease-specific templates into the generic.

At the moment, the generic outbreak template exists in a bit of a derelict state with old problems lain bare (e.g. handling functions that output vectors in a tidy framework):

sitrep/inst/rmarkdown/templates/outbreak/skeleton/skeleton.Rmd

Lines 396 to 413 in c116b9e

    
           ```{r cfr_by_age_group} 
        
           # group by known outcome and agegroup  
        
           linelist_cleaned %>%  
        
             filter(!is.na(outcome)) %>%                     # remove rows with missing outcome 
        
             group_by(age_group) %>%                        # group by age_group 
        
             summarise(deaths = sum(outcome == "Death"),    # tally deaths 
        
                       population = n()) %>%                # tally population 
        
             do(bind_cols(age_group = .$age_group, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate 
        
             arrange(desc(lower)) %>%                       # sort by lower confidence interval 
        
             tidyr::complete(age_group) %>%                 # Ensure all levels are represented 
        
             rename("Age group (years)" = age_group,  
        
                    "Deaths" = deaths,  
        
                    "Population" = population,  
        
                    "CFR (%)" = cfr,  
        
                    "Lower 95%CI" = lower, 
        
                    "Upper 95%CI" = upper) %>%  
        
             knitr::kable(digits = 2) 
        
           ```

I propose that we scrap the template altogether and highlight the commonalities of the four disease-specific templates in the wiki

add_weights need warning if age_group not factor

@pbkeating pointed out that add_weights will wrongly add extra rows to a data set if age_groups from the population data set is not a factor.
This came up using the example code from the vaccination template below - it starts with 1000 cases and after add weight an extra 200 or so show up...
This issue doesnt happen if you use the gen_pop function as in the mortality template because the age_group variable resulting from that function is a facator.

vaccination_raw <- sitrep::gen_data(dictionary = "Vaccination", varnames = "column_name",
                         numcases = 1000)

# create fake population by age and sex 
population_data_age <- tibble(age_group = rep.int(c("0-0", "1-1", "2-2", "3-3", "4-4", "5+"), 2), 
                              sex = rep.int(c("Male", "Female"), 6))
population_data_age$population <- as.integer(runif(nrow(population_data_age), 
                                          min = 500, max = 2000))

# clean up the column names
colnames(vaccination_raw) <- clean_labels(colnames(vaccination_raw))


# Additional variable name cleaning and creation of vaccination status binary variable
vaccination_clean <- vaccination_raw %>%
                      mutate(age_in_years = as.integer(q10_age_yr),
                             age_in_months = as.integer(q55_age_mth),
                             age_group = sitrep::age_categories(age_in_years, breakers = c(0,1,2,3,4,5)),
                             sex = q5_sex,
                             vaccination_routine = q2_vaccine_9months,
                             vaccination_sia = q32_vaccine_sia,
                             vaccine_mass = q17_vaccine_mass,
                             disease_diagnosis = q47_disease_diagnosis,
                             area = q4_settlement,
                             cluster = q77_what_is_the_cluster_number)

######## WEIGHTING 
vaccination_clean <- add_weights(vaccination_clean, population_data_age, age_group, sex)

Put all suggested packages from templates to Import

In the hope this makes running the templates easier. Otherwise folks run the template and run into problems with missing packages

Field-ready mortality survey

The current mortality survey template has three sample analysis chunks. This needs to be extended, probably, in order to make it field ready. In particular more code comments need to be included.

In a similiar fashion to what @aspina7 is doing for the outbreak templates in #18.

Add introduction (DELETEME text) to the outbreak template

How to use the template, what the different sections are, etc.

Create example code for maps

Pointmap shape file / geojson
Pointmap dynamic base map
Choropleth map shape file / geojson
Choropleth map dynamic base map

For the two types of maps I would like to have examples where the shapefile/geojson already exists (plus some code how to download a shapefile beforehand e.g. from HDX) and code that downloads the basemap from the internet using a tileprovider (e.g. ggmap).

Add more authors to the package

@zkamvar @aspina7 please add yourself as authors to the packages.

Also I think it might make sense to add all other project folks as ctb. Or shall we leave the DESCRIPTION file for code contributions only?

	linelist_cleaned <- linelist_cleaned %>%
	filter(date_of_onset <= week2date(sprintf("%s-7", reporting_week)))

	# define the first week of outbreak (date of first case)
	first_week <- levels(linelist_cleaned$epiweek)[1]

	# outbreak start
	# return the first day in the week of first case
	obs_start <- week2date(sprintf("%s-1", first_week))

	# return last day of reporting week
	obs_end <- week2date(sprintf("%s-7", reporting_week))

	```{r cfr_by_age_group}
	# group by known outcome and agegroup
	linelist_cleaned %>%
	filter(!is.na(outcome)) %>% # remove rows with missing outcome
	group_by(age_group) %>% # group by age_group
	summarise(deaths = sum(outcome == "Death"), # tally deaths
	population = n()) %>% # tally population
	do(bind_cols(age_group = .$age_group, case_fatality_rate(.$deaths, .$population))) %>% # calculate case fatality rate
	arrange(desc(lower)) %>% # sort by lower confidence interval
	tidyr::complete(age_group) %>% # Ensure all levels are represented
	rename("Age group (years)" = age_group,
	"Deaths" = deaths,
	"Population" = population,
	"CFR (%)" = cfr,
	"Lower 95%CI" = lower,
	"Upper 95%CI" = upper) %>%
	knitr::kable(digits = 2)
	```