epiverse-trace / cfr Goto Github PK

R package to estimate disease severity and under-reporting in real-time, accounting for reporting delays in epidemic time-series

Home Page: https://epiverse-trace.github.io/cfr/

License: Other

R 100.00%

r r-package case-fatality-rate epidemiology health-outcomes epidemic-modelling outbreak-analysis epiverse sdg-3

cfr's People

Contributors

Stargazers

Watchers

Forkers

jamesmbaazam avallecam

cfr's Issues

Replacement for `cases_to_infections`

The existing version of the function cases_to_infections is not used, and is not a particularly useful method, but will need replacement at some point in future.

Originally posted by @adamkucharski in #11 (comment)

Test for forward convolution of onsets to generate cases with expected known outcome

Any convolution method for known outcomes should produce same vector for 'cases that have known outcome on day X', based on equations 1-4 in Nishiura et al.

Suggest replacing `zoo::rollmean()` with `stats::runmed()`

Suggest replacing zoo::rollmean() with stats::runmed() in estimate_time_varying() - the median is less sensitive to outliers that might result from things like weekend effects, and this also avoids the dependency on {zoo}.

Originally posted by @pratikunterwegs in #23 (comment)

Estimation of under-ascertained cases from deaths

Add functionality that can estimate cases from reported deaths and a known CFR. Starting point could be the estimator in scale_cfr function (as was done in early version of CMMID COVID reports). Follow-up functionality could account for varying reporting over time, as implemented by Russell et al, 2020 – might be useful to compare with similar functionality in packages like EpiNow2 and coarseDataTools, if such methods are available, and data formats are same (e.g. cases & deaths over time).

Comparison with other methods

In vignette, would be useful to show comparisons with estimation in EpiNow2 and coarseDataTools, noting any differences in required data (e.g. cases & deaths vs recoveries & deaths). Could also build comparisons into episoap template for CFR estimation, along with calls to epiparameter and other relevant packages.

Fix dependencies on {epiparameter}

This issue is to request that {datadelay} be fixed to take into account recent changes to {epiparameter}, which are causing all workflows on main to fail. The last check on the PR now merged into main was before these changes, which meant that the issues were not picked up at the time.

The following functions depend on {epiparameter} indirectly, as they expect an argument delay_pmf, which was intended to be a PMF extracted from an epidist object:

known_outcomes(),
static_cfr(),
rolling_cfr()

The expected solution would be for these functions to instead accept an epidist object as an argument; the argument would also ideally be renamed to epi_dist or similar ("epi_dist" is preferred as it also indicates the class of the input).

Furthermore, tests for these functions also depend on the availability of an Ebola virus disease onset to death delay distribution in {epiparameter}. The solution to this would eventually be the inclusion of this distribution in {epiparameter}, but an adequate solution would be to use a manually defined epidist in the tests; distribution parameters would be taken from doi.org/10.1016/S0140-6736(18)31387-4..

Rename package

This issue is to suggest a change of name for the package to match the functionality included in it.

Options include:

{cfr}: The name {cfr} is available, and this package could add more CFR estimation methods such as that in https://pubmed.ncbi.nlm.nih.gov/16076827/
{casefatality}: Also available, perhaps more descriptive name for people in the field.
Please suggest others.

Rename `estimate_reporting()` to `estimate_ascertainment()`?

If the function estimate_reporting() is about estimating the ascertainment of cases, which also englobes the issue of cases not being reported, maybe a clearer way to present it is to name it estimate_ascertainment()?

Originally posted by @CarmenTamayo in #55 (comment)

Add data for examples to package

This issue is to request that Covid data used in the examples should be included in the package, rather than being downloaded via an API or taken from a data package. The data should also be used in the package vignettes and function examples.

This stems from the issue that these packages are either not on CRAN, or not actively developed, or both. This is the case for {covidregionaldata} and {owidR}. {owidR} was on CRAN but has been removed as of 8th August 2023. This reopens the issue raised in #61.

Refactor estimate_severity()

This issue is to report a possible issue with estimate_severity().

Issue: The function description says that the severity is calculated using the cases with known outcomes, which should usually be more than the reported deaths (as some cases will eventually result in deaths). But the function uses total deaths, not total known outcomes, in most cases - am I missing something here?

There are also instances of multiplying u_t * total_cases, but u_t = total_outcomes/total_cases - does this make sense?

Odd results in ascertainment plots in `estimate_ascertainment.Rmd` vignette

Looking through the processed version of the updated vignette, something looks off with the ascertainment plots – they're showing 100% for many countries, which isn't plausible (and I don't think matches the previous version of this plot?)

In the Rmd code, the ascertainment estimation seems to work OK if "United Kingdom" is replaced with another country in the single country example at the top, so seems like it's an issue in the nesting step?

Also, I noticed estimate_ascertainment() is returning error for a missing get_default_burn_in() function (although estimate_ascertainment still outputs a value) – get_default_burn_in is in man but not R so possibly removed in earlier commit?

Error message for missing dates (`estimate_static`)

I was getting the following error message when using datadelay's estimate_static function:

Error in data.frame(severity_me = severity_me, severity_lo = severity_lims[[1]], : arguments imply differing number of rows: 0, 1

Looking into it, @adamkucharski helped me figure out that this was because the dataset I was using with cases and deaths was accidentally missing some death dates (as a result of a previous data cleaning step), and the function requires data on each day without skipping any of the dates.

From a user perspective, I believe it would be useful to include this requirement in the documentation more clearly so that users make sure their dataset is in the correct format, and most importantly to include a more informative error message so that users can easily fix the mistake.

Standardise data formatting

If implementing and comparing different methods for estimation, need to ensure input and output data formats consistent, e.g. vectors of dates, cases and deaths. Tests also needed to check correct format (e.g. for date and numeric vector)

Generalise CFR plotting

The proposed plotting command assumes two CFRs being plotted together. We might want to make functionality more general in future (e.g. what if want to compare 3 CFR estimation methods)?

Originally posted by @adamkucharski in #11 (comment)

Smoothing option for CFR

Daily case data often exhibits cyclic variation (e.g. day-of-week effects) so worth adding option for smoothing either in the delay functions or as pre-processing step.

Add package logo

This issue it to request that the package logo be added as an SVG file. The logo and name are still liable to change based on PR #47.

Correct likelihood calculation in `estimate_ccfr`

The likelihood calculation in estimate_ccfr should match Nishiura et al. (2009): doi:10.1371/journal.pone.0006852; equation (7) taking the log of both sides.

Originally posted by @pratikunterwegs in #11 (comment)

Thanks to @adamkucharski for pointing out the fix is to multiply all instances of total_cases by the underestimation parameter u_t.

Simulation recovery

Would be useful to have some simple simulation recovery fuctionality, both for a small stochastic outbreak and a large epidemic, tested at different points (e.g. early rising stage, post-peak etc.)

Update Readme

This issue is to request that the Readme should be updated to reflect new package functionality, as the current Readme is out of date. This links to issue #34 and can be combined into the same PR.

Add pkgdown website workflow

This issue is to request the addition of a website building workflow using {pkgdown} and Github Pages.

Test estimate_time_varying()

This issue is to request that the estimate_time_varying() function should be tested, with reference to the potentially anomalous behaviour reported here:

Worth double-checking this function (maybe add a test?) as it's currently returning CFR=100% early on. Estimated CFR may well have been very high given most cases detected were severe, but may be an issue with the burn-in period used rather than reporting.

Originally posted by @adamkucharski in #23 (comment)

Use CSL JSON references

This issue is to request that the references stored as a Bibtex file be stored instead as a CSL JSON file, so as to not affect the package language statistics.

Add vignette describing user options

There are a few user options that are likely to be useful for estimation/comparison:

Fixed CFR (i.e. 'total deaths'/'total expected cases with known outcome') vs time-varying CFR (i.e. 'deaths on day X'/'expected cases with known outcome on day X')
Whole timeseries (i.e. 'total deaths'/'expected cases with known outcome') vs expanding window (i.e. 'total deaths up to day X'/'total expected cases with known outcome up to day X')
Small numbers of events (e.g. Ebola 1976) vs large numbers of events (e.g. COVID)
Raw vs smoothed timeseries (i.e. effectively implementing an observation model if reporting cyclical or noisy). Possibly something to implement in a separate package, as useful across package?
Efficiency (e.g. parallelisation) across countries/time periods

Allow tracking pkgdown/

This issue is to flag that we may want to add the pkgdown/ directory to tracking in the future.

This folder can contain important files we want to commit. Do you have any issues that lead you to add it here?

Originally posted by @Bisaloo in #54 (comment)

Correct grouping variables in estimate_severity

This issue is to request correction of the "group_by" argument in estimate_severity() and upstream wrapper functions. This argument is ambiguous and doesn't behave as users might expect from the better known dplyr::group_by().

{epiparameter} integration

I wonder if user-facing integration with {epiparameter} should be achieved in a similar way to the work in superspreading. Here I believe @joshwlambert has moved towards the epidist object being an optional argument. I think this is a nice as it does not force the use of epidist objects on the user.

Note this is not a question of dependencies ({cfr} would still need to import {epiparameter} for internal use) but of API design and consistency across the various packages.

This likely warrants a wider discussion but raising here initially.

Add lifecycle badge

The package is currently highly unstable and should be communicated with a lifecycle badge.

Adding forecasting functionality

Once we have an estimate of CFR, which based on current estimation methods will typically run up to the most recent death, it would be possible to generate a forecast forward in time based on the estimated CFR, time from onset-to-outcome and recent case numbers.

activate alert boxes in static_severity vignette

In estimate_static_severity vignette, sth is happening with the alert boxes for:

use case
what we have
concept...

This is how it looks to me:

I think the expected look is like in estimate_time_varying_severity vignette:

README images are missing

Images in the rom the following lines do not appear for me in the README:
https://github.com/epiverse-trace/datadelay/blob/3a7cb2a98fe9eea85585215ee01e939ae98ae249/README.Rmd#L98-L99

Pull in basic functions for CFR adjusting for delays

Functions for Nishiura et al, 2009 MLE approach are implemented here for a fixed CFR: https://github.com/adamkucharski/ebola-cfr

Report also includes comparison with known outcomes in script for 1976 and 2022 Ebola data, which would be useful comparison for timeseries methods

Remove plotting functions

This issue is to request removal of the plotting functions in {datadelay}. The Epiverse-TRACE philosophy has moved to not including plotting functions in packages, as they add substantial development overhead, can be dependency heavy, and because good advice for plotting epidemiological data exists in resources such as the Epi R Handbook.

Correct `rolling_cfr`

The rolling_cfr plot returns very simlar curves for naive CFR and corrected CFR, suggesting something is wrong with the rolling_cfr function

Originally posted by @adamkucharski in #11 (comment)

Avoiding use of time varying estimation for small datasets

For datasets with relatively small number of cases/deaths (like Ebola 1976 example in README), the estimate_time_varying function can be unstable, as it's distributing expected timings based on a small number of discrete events, rather than estimating the trend over events occuring daily (like the COVID timeseries).

Therefore, we should limit usage to estimate_static for these smaller datasets, e.g. in the README, as the earlier iteration of this package did: https://github.com/adamkucharski/ebola-cfr/blob/main/scripts/main_script.R

If we want to show a figure, we could show how the static CFR calculation changes as more and more data are included. This was in the above script: CFR_figure.pdf and in earlier version of this package as a plot, but may have been deprecated to prevent confusion between using an expanding time window of data to fit a static CFR (i.e. this PDF), and fitting a time-varying CFR to a fixed time window of data (i.e. estimate_time_varying ). However, given it nicely illustrates the difference between the naive and time-adjusted statitic method, maybe we should include again in the README.

Rethink "type" argument in format_output()

This issue is to request that the "type" argument options in this function should potentially be renamed to be clearer.

"Percent ascertained" might be clearer than "under-reporting", if type is easily changeable.

Originally posted by @adamkucharski in #23 (comment)

Input data from `incidence2`

datadelay's functions to estimate CFR require a specific df format of date-onset-death columns. When using incidence2 to go from a linelist to the daily cases/death counts, these counts can be obtained in one step, but this results in a df with a long format (i.e. the variables "onset_date" and "death" appear as rows rather than columns), and it is then necessary to pivot the table so that this could be used as input for datadelay.
Alternatively, users could extract the case/death counts separately using incidence2 and then merge them into a new dataset, or attempt to do this manually without using this package.
In any of these cases, I think that it will be quite tedious for users to have to add these many lines of code to their scripts, for such a predictable task.
I believe it would be very useful to add a function to provide a df with the right format in one step, I'm unsure of which package this would belong to, whether incidence2 or datadelay, but after a conversation with @Bisaloo I'm raising it here for wider discussion.

Uncertainty in time-varying CFR estimate

Currently cfr_calculation.R only calculates the expected value of CFR over time. Need to implement uncertainty for this (or deprecate the function if another is later decided to be preferable for main implementation).

add a pull request template file

The current link is not working

cfr/.github/CONTRIBUTING.md

Lines 12 to 14 in 34679c3

    
           See [pull request template](https://github.com/epiverse-trace/cfr/blob/main/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md)

This can recycle from other repos like https://github.com/epiverse-trace/.github/tree/main/.github

Grouping for <incidence2> objects

This issue is about how the prepare_data() for <incidence2> objects treats grouped objects.

The current behaviour is to error when grouped <incidence2> objects are passed, and to advise users to call incidence2::regroup() on the object before passing it.

Alternatives include respecting the grouping structure, but this could require taking on a data science repository such as {data.table} or {dplyr} + {tidyr}, although base-R only options could also be implemented.

Can we open another issue for the incidence2 grouping discussion please to make sure this stays on our radar? I think it's an important point.

Originally posted by @Bisaloo in #39 (comment)

Re-allow passing delay function as alternative to epidist

This issue is based on a suggestion by @TimTaylor in issue #59 to allow users to pass a custom delay function as an alternative to providing an <epidist> to the epi_dist argument. This was the implementation from PR #11 until PR #22.

The proposed solution is to allow passing a delay_function argument (renamed from delay_pmf) to which users would pass a function that wraps the PMF/PDF function for a distribution; e.g., function(x) stats::dgamma(x, shape, rate), or function(x) stats::density(distribution, at = x) for <distributional> or similar objects.

Pass epidist object instead of delay_pmf

Functions proposed in #11 have arguments that accept a probability mass function of the onset to death. This function is typically taken from epiparameter::epidist objects.

It is not currently possible to check that the user has correctly passed an onset to death distribution rather than some other distribution associated with that pathogen. This is because the distribution type data present in epiparameter::epidist objects is lost when subsetting for $pmf.

This could be solved by passing the full epiparameter::epidist object instead, allowing better input checking.

Remove basic plotting sections in vignettes

I'm wondering about adding these [plotting] sections that are about plotting the data prior to applying the functions, it's nice but maybe the content of the vignettes should go a bit more straight to the point of demonstrating the functionality of the package and the plotting that's not showcasing the outputs of the functions would fit better on a tutorial or learning platform like Applied Epi?

Originally posted by @CarmenTamayo in #55 (comment)

Use covid data from OWID rather than {covidregionaldata}

This issue is to request that the dependency on {covidregionaldata} data be swapped out in favour of Covid-19 data from Our World in Data, via the {owidR} package. {owidR} is on CRAN, removing one blocker dependency (see relevant discussion linked below).

Yep, could import from any alternative COVID data source that has cases and deaths over sufficiently long period to expect a change from accumulation of immunity (e.g. timeseries from OWID? https://ourworldindata.org/coronavirus). {covidregionaldata} was used as illustrative example in early versions of package, so not reason this has to be the go-to dependency.

Originally posted by @adamkucharski in epiverse-trace/epiverse-trace.github.io#85 (reply in thread)

Rename `epi_dist` argument to `epidist`

... I recommend changing the name from epi_dist to epidist for consistency with other packages (e.g. {superspreading}) and with the name of the class.

Originally posted by @joshwlambert in #59 (comment)

Fix vector length mismatch in estimate_severity()

In estimate_severity(), the vector u_t is multiplied with the vector pprange? The warnings on CI checks in #23 result from these being of unequal length - u_t for the Ebola data (ebola1976) is 37 values, while pprange is 1000 values - should they be the same length, should u_t be interpolated to a length of 1000 maybe?

Either way, this vector length mismatch throws warnings that should be fixed.

Deconvolution vs backwards sampling

Nice to see this project kicking off. I see that the MVP currently uses backwards sampling which has a range of bias issues.

Here is some code for a matrix approach to convolution which should generally be more efficient as well as a deterministic deconvolution method that works by solving for the matrix inverse. Note this example is tuned for ONS prevalence as we were aiming to recreate the method they use to get incidence (which they confirmed we had done via email).

https://gist.github.com/seabbs/fb1bc9c79c3dd7117f9314cb97e71615

(note this comes with no licensing for free reuse etc without attribution).

Release cfr 0.1.0

First release:

usethis::use_cran_comments()
Update (aspirational) install instructions in README
Proofread Title: and Description:
Check that all exported functions have @return and @examples
Check that Authors@R: includes a copyright holder (role 'cph')
Check licensing of included files
Review https://github.com/DavisVaughan/extrachecks

Prepare for release:

git pull
Check if any deprecation processes should be advanced, as described in Gradual deprecation
devtools::build_readme()
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE) : OK except 1. New package, 2. epiparameter dependency, 3. LaTeX error on R 4.2.1
devtools::check_win_devel()
rhub::check_for_cran()
git push
Draft blog post

Submit to CRAN:

usethis::use_version('minor')
devtools::submit_cran()
Approve email

Wait for CRAN...

Reduce or remove data download

This issue is to request that the package examples and vignettes should reduce the amount of data they download, as this causes delays in local testing and rendering of the package documentation. This could be a barrier for users working through the vignettes without reliable internet (currently the case in London).

Most data is downloaded from {covidregionaldata}, and alternatives to downloads such as making the data available from the package as with the 1976 ebola data should be considered.

Rethink `format_output()`

Is format_output() really necessary? Could we not return to the earlier implementation of the severity estimate as a named vector with three values? That handles the pretty printing issue to some extent.

Originally posted by @pratikunterwegs in #23 (comment)

Rethink plot_epiparameter_distribution()

I think {epiparameter} has a plot() method for epidist objects - I wonder whether plot_epiparameter_distribution() can then be removed. If this is a distinct method that achieves something quite different, it might be worth formally making this an S3 method for epidists.

Originally posted by @pratikunterwegs in #23 (comment)


	See [pull request template](https://github.com/epiverse-trace/cfr/blob/main/.github/PULL_REQUEST_TEMPLATE/pull_request_template.md)

epiverse-trace / cfr Goto Github PK

cfr's People

Contributors

Stargazers

Watchers

Forkers

cfr's Issues

Recommend Projects

Recommend Topics

Recommend Org