dsrobertson / onlinefdr Goto Github PK

Clone of the Bioconductor repository for the onlineFDR package. See https://bioconductor.org/packages/devel/bioc/html/onlineFDR.html for the official development version, and https://dsrobertson.github.io/onlineFDR/ for easy access to documentation.

Home Page: https://dsrobertson.github.io/onlineFDR/

R 75.18% C++ 24.52% Rich Text Format 0.29%

fdr hypothesis-testing error-rate-control fwer

onlinefdr's Introduction

onlineFDR

onlineFDR allows users to control the false discovery rate (FDR) or familywise error rate (FWER) for online hypothesis testing, where hypotheses arrive in a stream. In this framework, a null hypothesis is rejected based on the evidence against it and on the previous rejection decisions.

Installation

To install the latest (development) version of the onlineFDR package from Bioconductor, please run the following code:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

# The following initializes usage of Bioc
BiocManager::install()

BiocManager::install("onlineFDR")

Alternatively, you can install the package directly from GitHub:

# install.packages("devtools") # If devtools not installed

devtools::install_github("dsrobertson/onlineFDR")

Documentation

Documentation is hosted at https://dsrobertson.github.io/onlineFDR/

To view the vignette for the version of this package installed in your system, start R and enter:

browseVignettes("onlineFDR")

References

Aharoni, E. and Rosset, S. (2014). Generalized alpha-investing: definitions, optimality results and applications to public databases. Journal of the Royal Statistical Society (Series B), 76(4):771–794.

Foster, D. and Stine R. (2008). alpha-investing: a procedure for sequential control of expected false discoveries. Journal of the Royal Statistical Society (Series B), 29(4):429-444.

Javanmard, A., and Montanari, A. (2015). On Online Control of False Discovery Rate. arXiv preprint, https://arxiv.org/abs/1502.06197.

Javanmard, A., and Montanari, A. (2018). Online Rules for Control of False Discovery Rate and False Discovery Exceedance. Annals of Statistics, 46(2):526-554.

Ramdas, A., Yang, F., Wainwright M.J. and Jordan, M.I. (2017). Online control of the false discovery rate with decaying memory. Advances in Neural Information Processing Systems 30, 5650-5659.

Ramdas, A., Zrnic, T., Wainwright M.J. and Jordan, M.I. (2018). SAFFRON: an adaptive algorithm for online control of the false discovery rate. Proceedings of the 35th International Conference in Machine Learning, 80:4286-4294.

Robertson, D.S. and Wason, J.M.S. (2018). Online control of the false discovery rate in biomedical research. arXiv preprint, https://arxiv.org/abs/1809.07292.

Robertson, D.S., Wason, J.M.S. and Ramdas, A. (2022). Online multiple hypothesis testing for reproducible research. arXiv preprint, https://arxiv.org/abs/2208.11418.

Robertson, D.S., Wildenhain, J., Javanmard, A. and Karp, N.A. (2019). onlineFDR: an R package to control the false discovery rate for growing data repositories. Bioinformatics, 35:4196-4199, https://doi.org/10.1093/bioinformatics/btz191.

Tian, J. and Ramdas, A. (2019). ADDIS: an adaptive discarding algorithm for online FDR control with conservative nulls. Advances in Neural Information Processing Systems, 9388-9396.

Tian, J. and Ramdas, A. (2021). Online control of the familywise error rate. Statistical Methods for Medical Research, 30(4):976–993.

Zrnic, T., Jiang D., Ramdas A. and Jordan M. (2020). The Power of Batching in Multiple Hypothesis Testing. International Conference on Artificial Intelligence and Statistics, PMLR, 108:3806-3815.

Zrnic, T., Ramdas, A. and Jordan, M.I. (2021). Asynchronous Online Testing of Multiple Hypotheses. Journal of Machine Learning Research, 22:1-33.

onlinefdr's People

Contributors

Stargazers

Watchers

Forkers

pneuvial kalexandriabond mrcbsu

onlinefdr's Issues

LORDstar batch setting leads to error: caught segfault address 0x7f9b307d3500, cause 'memory not mapped'

Thanks so much for this package!

I have been having trouble with the batch settings of LORDstar & SAFFRONstar. Here is a quick example that throws an error on my local computer. I also tried it on an amazon EC2 instance and got the same error.

Is there any other info I can/should provide to help reproduce it?

library(onlineFDR)
set.seed(1)

tmax <- 1000

null_hyp <- (1:tmax) %in% sample(tmax, 
	ceiling((0.8)*tmax),
	replace = FALSE)

mu <- rep(3, tmax)
mu[null_hyp] <- 0

n_per_block <- 50
n_blocks <- tmax / n_per_block
block_ids <- rep(1:n_blocks, each = n_per_block)
batch_sizes <- rep(n_per_block, n_blocks)
stopifnot(sum(batch_sizes) == tmax)

z_base <- rnorm(tmax, 0, 1)
z_corr <- rnorm(n_blocks, 0, 1)
rho_s <- 0.3
z_s <- z_base * sqrt(1-rho_s) + 
	 z_corr[block_ids] * sqrt(rho_s) +
	 mu

p <- pnorm(-z_s)

sessionInfo()
# R version 4.0.3 (2020-10-10)
# Platform: x86_64-apple-darwin17.0 (64-bit)
# Running under: macOS Catalina 10.15.7

# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] onlineFDR_2.1.0

# loaded via a namespace (and not attached):
# [1] compiler_4.0.3 Rcpp_1.0.6  

fdr_df <- LORDstar(d=p,
	alpha=0.05,
	batch.sizes = batch_sizes,
	version='batch')

#  *** caught segfault ***
# address 0x7f9b307d3500, cause 'memory not mapped'

# Traceback:
#  1: lordstar_batch_faster(pval, batch, batchsum, gammai, w0 = w0,     alpha = alpha, display_progress = display_progress)
#  2: LORDstar(d = p, alpha = 0.05, batch.sizes = batch_sizes, version = "batch")

# Possible actions:
# 1: abort (with core dump, if enabled)
# 2: normal R exit
# 3: exit R without saving workspace
# 4: exit R saving workspace

Automating the building of the website

Hi,

I'm hoping to automate the building of the website, following the routine described in: https://www.rostrum.blog/2020/08/09/ghactions-pkgs/

Is there anything weird going on with the way the website currently works that I should be aware of?

Best wishes,
Phillip Crout

inconsistencies in LORDstar and SAFFRONstar for batch setting

A previous comment raised an issue about inconsistencies between async and dep settings on LORDstar and SAFFRONstar. It seems like these settings now agree, but they don't appear to agree with the batch setting.

Is the mapping below the correct way to map between these 3 settings? If not, would it be possible to extend the examples under ?LORDstar to include an equivalent implementation with the batch setting?

rm(list=ls())
library(onlineFDR)
set.seed(0)

##### Batch structure dependence
batch_size <- 5
n_batches <- 30
n_total <- sum(batch_size * n_batches)
n_alt <- floor(n_total/5)

pval <- runif(n_total)
which_alt <- sort(sample(n_total, n_alt))
pval[which_alt] <- runif(n_alt)^8

batch_sizes <- rep(batch_size, n_batches)
(lags <- rep( 0:(batch_size-1), times = n_batches))
    #Lag = number of previously observed test statistics from this batch.

df_base <- 
df_dep <- data.frame(id = 1:n_total, pval = pval)
df_dep$lags <- lags

out_dep <- LORDstar(df_dep, version='dep')
out_batch <- LORDstar(df_base, batch.sizes = batch_sizes, version='batch')

range(out_dep$alphai - out_batch$alphai)
## [1] -0.002390632  0.000000000

##### Comparison in a simpler setting with independence.
# Here, things are OK
df_0dep <- df_dep
df_0dep$lags[] <- 0
out_0dep <- LORDstar(df_0dep, version = 'dep')
out_0batch <- LORDstar(df_base, batch.sizes=rep(1, n_total), version = 'batch')
out_lord <- LORD(df_base)
all(out_0dep$alphai == out_lord$alphai)
## TRUE
all(out_0batch$alphai == out_lord$alphai)
## TRUE
all(out_0batch$alphai == out_0dep$alphai)
## TRUE

Similar issues appear to happen for SAFFRONstar also.

Thanks very much!

Error when using batch version of SAFFRONstar

The example in the documentation for SAFFRONstar with version == 'batch' works fine, but if I replace it with a p-value vector of random uniforms, with equal length, I get an error.

## Example from documentation works fine
pval_predefined = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757)
SAFFRONstar(data.frame(pval= pval_predefined), 
    version='batch', batch.sizes = c(4,6,5))


## Using random p-values generates an error
set.seed(0)
L <- length(pval_predefined)
pval_random <- runif(L)
SAFFRONstar(data.frame(pval= pval_random), 
    version='batch', batch.sizes = c(4,6,5))
    
   ## Error in data.frame(pval, batch = batch.no, alphai, R) : 
   ## arguments imply differing number of rows: 15, 8

As far as I can tell, there shouldn't be any meaningful difference between these two vectors that would have caused this. Is there?

length(pval_predefined) == length(pval_random)
class(pval_predefined)  == class(pval_random)

inconsistent results from LORDstar and SAFFRONstar

I was trying the different versions of LORDstar & SAFFRONstar and it looks like they give different results when they should be equivalent. I'm not sure if I'm misunderstanding something in the documentation or not.

For example, the code example in ?LORDstar appears to show two equivalent representations of having each p-value be "in conflict" with the previous p-value. In other words, p_i needs to be specified at time i-2. In the first case, this is done by saying that the decision (reject or not) is not seen until the end of the next stage, and so the threshold for p_i can only be based on observed rejections from tests 1 through i-2. As far as I understand, having a lag of 1 is should produce the same conflict set, but the two settings give different results.

Is this a bug? Or am I misunderstanding the documentation?

sample.df <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
    'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
    'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
pval = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757),
decision.times = seq_len(15) + 1)

out1 <- LORDstar(sample.df, version='async')

sample.df2 <- data.frame(
id = c('A15432', 'B90969', 'C18705', 'B49731', 'E99902',
    'C38292', 'A30619', 'D46627', 'E29198', 'A41418',
    'D51456', 'C88669', 'E03673', 'A63155', 'B66033'),
pval = c(2.90e-08, 0.06743, 0.01514, 0.08174, 0.00171,
        3.60e-05, 0.79149, 0.27201, 0.28295, 7.59e-08,
        0.69274, 0.30443, 0.00136, 0.72342, 0.54757),
lags = rep(1,15))

out2 <- LORDstar(sample.df2, version='dep')

all(sample.df$pval == sample.df2$pval) 
## TRUE
range(out1$alphai - out2$alphai) / mean(abs(out1$alphai))
## [1] 0.000000 1.294928

SAFFRONstar produces a similar apparent issue

out3 <- SAFFRONstar(sample.df, version='async')
out4 <- SAFFRONstar(sample.df2, version='dep')
range(out3$alphai - out4$alphai) / mean(abs(out3$alphai))
## [1] 0.000000 1.663927

dsrobertson / onlinefdr Goto Github PK

onlinefdr's Introduction

onlineFDR

Installation

Documentation

References

onlinefdr's People

Contributors

Stargazers

Watchers

Forkers

onlinefdr's Issues

LORDstar batch setting leads to error: caught segfault address 0x7f9b307d3500, cause 'memory not mapped'

Automating the building of the website

inconsistencies in LORDstar and SAFFRONstar for batch setting

Error when using batch version of SAFFRONstar

inconsistent results from LORDstar and SAFFRONstar

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent