Giter Site home page Giter Site logo

dsrobertson / onlinefdr Goto Github PK

View Code? Open in Web Editor NEW
14.0 3.0 3.0 11.98 MB

Clone of the Bioconductor repository for the onlineFDR package. See https://bioconductor.org/packages/devel/bioc/html/onlineFDR.html for the official development version, and https://dsrobertson.github.io/onlineFDR/ for easy access to documentation.

Home Page: https://dsrobertson.github.io/onlineFDR/

R 75.18% C++ 24.52% Rich Text Format 0.29%
fdr hypothesis-testing error-rate-control fwer

onlinefdr's Introduction

R-CMD-check codecov

onlineFDR

onlineFDR allows users to control the false discovery rate (FDR) or familywise error rate (FWER) for online hypothesis testing, where hypotheses arrive in a stream. In this framework, a null hypothesis is rejected based on the evidence against it and on the previous rejection decisions.

Installation

To install the latest (development) version of the onlineFDR package from Bioconductor, please run the following code:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc
BiocManager::install()

BiocManager::install("onlineFDR")

Alternatively, you can install the package directly from GitHub:

# install.packages("devtools") # If devtools not installed

devtools::install_github("dsrobertson/onlineFDR")

Documentation

Documentation is hosted at https://dsrobertson.github.io/onlineFDR/

To view the vignette for the version of this package installed in your system, start R and enter:

browseVignettes("onlineFDR")

References

Aharoni, E. and Rosset, S. (2014). Generalized alpha-investing: definitions, optimality results and applications to public databases. Journal of the Royal Statistical Society (Series B), 76(4):771โ€“794.

Foster, D. and Stine R. (2008). alpha-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society (Series B), 29(4):429-444.

Javanmard, A., and Montanari, A. (2015). On Online Control of False Discovery Rate. arXiv preprint, https://arxiv.org/abs/1502.06197.

Javanmard, A., and Montanari, A. (2018). Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.

Ramdas, A., Yang, F., Wainwright M.J. and Jordan, M.I. (2017). Online control of the false discovery rate with decaying memory. Advances in Neural Information Processing Systems 30, 5650-5659.

Ramdas, A., Zrnic, T., Wainwright M.J. and Jordan, M.I. (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. Proceedings of the 35th International Conference in Machine Learning, 80:4286-4294.

Robertson, D.S. and Wason, J.M.S. (2018). Online control of the false discovery rate in biomedical research. arXiv preprint, https://arxiv.org/abs/1809.07292.

Robertson, D.S., Wason, J.M.S. and Ramdas, A. (2022). Online multiple hypothesis testing for reproducible research. arXiv preprint, https://arxiv.org/abs/2208.11418.

Robertson, D.S., Wildenhain, J., Javanmard, A. and Karp, N.A. (2019). onlineFDR: an R package to control the false discovery rate for growing data repositories. Bioinformatics, 35:4196-4199, https://doi.org/10.1093/bioinformatics/btz191.

Tian, J. and Ramdas, A. (2019). ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. Advances in Neural Information Processing Systems, 9388-9396.

Tian, J. and Ramdas, A. (2021). Online control of the familywise error rate. Statistical Methods for Medical Research, 30(4):976โ€“993.

Zrnic, T., Jiang D., Ramdas A. and Jordan M. (2020). The Power of Batching in Multiple Hypothesis Testing. International Conference on Artificial Intelligence and Statistics, PMLR, 108:3806-3815.

Zrnic, T., Ramdas, A. and Jordan, M.I. (2021). Asynchronous Online Testing of Multiple Hypotheses. Journal of Machine Learning Research, 22:1-33.

onlinefdr's People

Contributors

dsrobertson avatar jwokaty avatar lathanliou avatar latlio avatar lshep avatar nturaga avatar pc494 avatar vobencha avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

onlinefdr's Issues

LORDstar batch setting leads to error: caught segfault address 0x7f9b307d3500, cause 'memory not mapped'

Thanks so much for this package!

I have been having trouble with the batch settings of LORDstar & SAFFRONstar. Here is a quick example that throws an error on my local computer. I also tried it on an amazon EC2 instance and got the same error.

Is there any other info I can/should provide to help reproduce it?

library(onlineFDR)
set.seed(1)

tmax <- 1000

null_hyp <- (1:tmax) %in% sample(tmax, 
	ceiling((0.8)*tmax),
	replace = FALSE)

mu <- rep(3, tmax)
mu[null_hyp] <- 0

n_per_block <- 50
n_blocks <- tmax / n_per_block
block_ids <- rep(1:n_blocks, each = n_per_block)
batch_sizes <- rep(n_per_block, n_blocks)
stopifnot(sum(batch_sizes) == tmax)

z_base <- rnorm(tmax, 0, 1)
z_corr <- rnorm(n_blocks, 0, 1)
rho_s <- 0.3
z_s <- z_base * sqrt(1-rho_s) + 
	 z_corr[block_ids] * sqrt(rho_s) +
	 mu

p <- pnorm(-z_s)

sessionInfo()
# R version 4.0.3 (2020-10-10)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Catalina 10.15.7

# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] onlineFDR_2.1.0

# loaded via a namespace (and not attached):
# [1] compiler_4.0.3 Rcpp_1.0.6  

fdr_df <- LORDstar(d=p,
	alpha=0.05,
	batch.sizes = batch_sizes,
	version='batch')

#  *** caught segfault ***
# address 0x7f9b307d3500, cause 'memory not mapped'

# Traceback:
#  1: lordstar_batch_faster(pval, batch, batchsum, gammai, w0 = w0,     alpha = alpha, display_progress = display_progress)
#  2: LORDstar(d = p, alpha = 0.05, batch.sizes = batch_sizes, version = "batch")

# Possible actions:
# 1: abort (with core dump, if enabled)
# 2: normal R exit
# 3: exit R without saving workspace
# 4: exit R saving workspace

inconsistencies in LORDstar and SAFFRONstar for batch setting

A previous comment raised an issue about inconsistencies between async and dep settings on LORDstar and SAFFRONstar. It seems like these settings now agree, but they don't appear to agree with the batch setting.

Is the mapping below the correct way to map between these 3 settings? If not, would it be possible to extend the examples under ?LORDstar to include an equivalent implementation with the batch setting?

rm(list=ls())
library(onlineFDR)
set.seed(0)

##### Batch structure dependence
batch_size <- 5
n_batches <- 30
n_total <- sum(batch_size * n_batches)
n_alt <- floor(n_total/5)

pval <- runif(n_total)
which_alt <- sort(sample(n_total, n_alt))
pval[which_alt] <- runif(n_alt)^8

batch_sizes <- rep(batch_size, n_batches)
(lags <- rep( 0:(batch_size-1), times = n_batches))
    #Lag = number of previously observed test statistics from this batch.

df_base <- 
df_dep <- data.frame(id = 1:n_total, pval = pval)
df_dep$lags <- lags

out_dep <- LORDstar(df_dep, version='dep')
out_batch <- LORDstar(df_base, batch.sizes = batch_sizes, version='batch')

range(out_dep$alphai - out_batch$alphai)
## [1] -0.002390632  0.000000000

##### Comparison in a simpler setting with independence.
# Here, things are OK
df_0dep <- df_dep
df_0dep$lags[] <- 0
out_0dep <- LORDstar(df_0dep, version = 'dep')
out_0batch <- LORDstar(df_base, batch.sizes=rep(1, n_total), version = 'batch')
out_lord <- LORD(df_base)
all(out_0dep$alphai == out_lord$alphai)
## TRUE
all(out_0batch$alphai == out_lord$alphai)
## TRUE
all(out_0batch$alphai == out_0dep$alphai)
## TRUE

Similar issues appear to happen for SAFFRONstar also.

Thanks very much!

Error when using batch version of SAFFRONstar

The example in the documentation for SAFFRONstar with version == 'batch' works fine, but if I replace it with a p-value vector of random uniforms, with equal length, I get an error.

## Example from documentation works fine
pval_predefined = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757)
SAFFRONstar(data.frame(pval= pval_predefined), 
    version='batch', batch.sizes = c(4,6,5))


## Using random p-values generates an error
set.seed(0)
L <- length(pval_predefined)
pval_random <- runif(L)
SAFFRONstar(data.frame(pval= pval_random), 
    version='batch', batch.sizes = c(4,6,5))
    
   ## Error in data.frame(pval, batch = batch.no, alphai, R) : 
   ## arguments imply differing number of rows: 15, 8

As far as I can tell, there shouldn't be any meaningful difference between these two vectors that would have caused this. Is there?

length(pval_predefined) == length(pval_random)
class(pval_predefined)  == class(pval_random)

inconsistent results from LORDstar and SAFFRONstar

I was trying the different versions of LORDstar & SAFFRONstar and it looks like they give different results when they should be equivalent. I'm not sure if I'm misunderstanding something in the documentation or not.

For example, the code example in ?LORDstar appears to show two equivalent representations of having each p-value be "in conflict" with the previous p-value. In other words, p_i needs to be specified at time i-2. In the first case, this is done by saying that the decision (reject or not) is not seen until the end of the next stage, and so the threshold for p_i can only be based on observed rejections from tests 1 through i-2. As far as I understand, having a lag of 1 is should produce the same conflict set, but the two settings give different results.

Is this a bug? Or am I misunderstanding the documentation?

sample.df <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
    'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
    'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
pval = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757),
decision.times = seq_len(15) + 1)

out1 <- LORDstar(sample.df, version='async')

sample.df2 <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
    'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
    'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
pval = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757),
lags = rep(1,15))

out2 <- LORDstar(sample.df2, version='dep')

all(sample.df$pval == sample.df2$pval) 
## TRUE
range(out1$alphai - out2$alphai) / mean(abs(out1$alphai))
## [1] 0.000000 1.294928

SAFFRONstar produces a similar apparent issue

out3 <- SAFFRONstar(sample.df, version='async')
out4 <- SAFFRONstar(sample.df2, version='dep')
range(out3$alphai - out4$alphai) / mean(abs(out3$alphai))
## [1] 0.000000 1.663927

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.