jackmwolf / pcsstools Goto Github PK

Tools for regression using pre-computed summary statistics

License: GNU General Public License v3.0

R 98.13% TeX 1.87%

r statistical-genetics gwas

pcsstools's Issues

model_combo() does not label coefficients correctly

When modeling a function with only one predictor, calculate_lm() does not label "(Intercept)" in output coefficients.
The following example shows output from model_combo(), which calls calculate_lm().

library(grass)
ex_data <- cont_data

means <- colMeans(ex_data)
covs <- cov(ex_data)
n <- nrow(ex_data)

phi <- c(1, 1)

model_combo(y1 + y2 ~ x, n = n, phi = phi, means = means, covs = covs)
#> Model approximated using Pre-Computed Summary Statistics.
#> 
#> Call:
#> model_combo(formula = y1 + y2 ~ x, phi = phi, n = n, means = means, 
#>     covs = covs)
#> 
#> Coefficients:
#>   Estimate Std. Error t value Pr(>|t|)    
#>   -0.50990    0.03349  -15.22   <2e-16 ***
#> x -0.89016    0.03287  -27.08   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.059 on 998 degrees of freedom
#> Multiple R-squared:  0.4236, Adjusted R-squared:  0.423 
#> F-statistic: 733.3 on 1 and 998 DF,  p-value: < 2.2e-16

summary(lm(y1 + y2 ~ x, data = cont_data))
#> 
#> Call:
#> lm(formula = y1 + y2 ~ x, data = cont_data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.2948 -0.7576 -0.0582  0.7384  3.4200 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -0.50990    0.03349  -15.22   <2e-16 ***
#> x           -0.89016    0.03287  -27.08   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.059 on 998 degrees of freedom
#> Multiple R-squared:  0.4236, Adjusted R-squared:  0.423 
#> F-statistic: 733.3 on 1 and 998 DF,  p-value: < 2.2e-16

Release pcsstools 0.1.2

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
devtools::submit_cran()
Approve email

Wait for CRAN...

new_predictor_* functions do not check input

The new_predictor_* functions do not validate user input. The following examples should result in an error or at least a warning. However, they currently run without issue.

library(pcsstools)

x <- new_predictor_snp(maf = NA)
z <- new_predictor_binary(p = 100)
y <- new_predictor_normal(mean = NA, sd = -1)

approx_mult_prod is sensitive to variable order

The output of approx_mult_prod() changes based on the order of response means/covariances.
The returned lists from both of the approx_mult_prod() statements in the reprex below should be equal but they are not.

library(grass)
ex_data <- bin_data[c("g", "x", "y1", "y2", "y3")]
head(ex_data)
#>   g          x y1 y2 y3
#> 1 0 -0.9161478  1  0  1
#> 2 0  1.2496985  0  1  0
#> 3 1 -1.2708514  0  0  0
#> 4 2  0.0832760  0  1  0
#> 5 0  0.4686342  0  1  1
#> 6 2  0.4620154  0  1  0

means <- colMeans(ex_data)
covs <- cov(ex_data)
n <- nrow(ex_data)

predictors <- list(
  new_predictor_snp(maf = mean(ex_data$g) / 2),
  new_predictor_normal(mean = mean(ex_data$x), sd = sd(ex_data$x))
)
responses <- lapply(means[3:length(means)], new_predictor_binary)

approx_mult_prod(means, covs, n, response = "binary",
  predictors = predictors, responses = responses, verbose = TRUE)
#> Approximating with responses ordered as:  y1 * y2 * y3 
#> Approximating with responses ordered as:  y1 * y3 * y2 
#> Approximating with responses ordered as:  y2 * y3 * y1
#> $means
#>           g           x      y1y2y3 
#>  0.56800000 -0.02927950  0.05444547 
#> 
#> $covs
#>                  g           x      y1y2y3
#> g       0.40978579 -0.04510754 -0.02670105
#> x      -0.04510754  0.99460726  0.04906614
#> y1y2y3 -0.02670105  0.04906614  0.05153269


# Reorder response means/covariances
means <- means[c(1, 2, 5, 4, 3)]
covs  <- covs[c(1, 2, 5, 4, 3), c(1, 2, 5, 4, 3)]

responses <- lapply(means[3:length(means)], new_predictor_binary)

approx_mult_prod(means, covs, n, response = "binary",
                 predictors = predictors, responses = responses, verbose = TRUE)
#> Approximating with responses ordered as:  y3 * y2 * y1 
#> Approximating with responses ordered as:  y3 * y1 * y2 
#> Approximating with responses ordered as:  y2 * y1 * y3
#> $means
#>           g           x      y3y2y1 
#>  0.56800000 -0.02927950  0.08101557 
#> 
#> $covs
#>                  g           x      y3y2y1
#> g       0.40978579 -0.04510754 -0.03324090
#> x      -0.04510754  0.99460726  0.05203570
#> y3y2y1 -0.03324090  0.05203570  0.07452658

^{Created on 2020-08-05 by the reprex package (v0.3.0)}

approx_conditional() can be simplified

approx_conditional() uses an equation that can be heavily reduced to estimate the conditional variance of a phenotype.

pcsstools/R/multiplication_estimation.R

Lines 416 to 417 in 7187ab8

    
           p_s2 <- (n * means[2]^2 + (n - 1) * covs[2, 2] - a * n * means[2] - 
        
             b * (n * means[1] * means[2] + (n - 1) * covs[1, 2])) / (n - 2)

This can be reduced to:

p_s2 <- (n-1) * (covs[2, 2] - b * covs[1, 2]) / (n - 2)

model_or() and model_and() do not label coefficients correctly with one predictor

There is a similar but different issue using either model_or() or model_and(). Models with only one predictor will label said predictor as NA.

library(grass)

ex_data <- bin_data

means <- colMeans(ex_data)
covs <- cov(ex_data)
n <- nrow(ex_data)
predictors <- list(
 g = new_predictor_snp(maf = mean(ex_data$g) / 2),
 x = new_predictor_normal(mean = mean(ex_data$x), sd = sd(ex_data$x))
)

model_and(
 y1 & y2 ~ g,
 means = means, covs = covs, n = n, predictors = predictors
)
#> Model approximated using Pre-Computed Summary Statistics.
#> 
#> Call:
#> model_and(formula = y1 & y2 ~ g, n = n, means = means, covs = covs, 
#>     predictors = predictors)
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  0.19601    0.01382  14.179  < 2e-16 ***
#> NA          -0.11797    0.01616  -7.301 5.82e-13 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3269 on 998 degrees of freedom
#> Multiple R-squared:  0.05071,    Adjusted R-squared:  0.04976 
#> F-statistic: 53.31 on 1 and 998 DF,  p-value: 5.819e-13

Originally posted by @jackmwolf in #3 (comment)

jackmwolf / pcsstools Goto Github PK

pcsstools's People

Contributors

Stargazers

Watchers

pcsstools's Issues

model_combo() does not label coefficients correctly

Release pcsstools 0.1.2

new_predictor_* functions do not check input

approx_mult_prod is sensitive to variable order

approx_conditional() can be simplified

model_or() and model_and() do not label coefficients correctly with one predictor

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	p_s2 <- (n * means[2]^2 + (n - 1) * covs[2, 2] - a * n * means[2] -
	b * (n * means[1] * means[2] + (n - 1) * covs[1, 2])) / (n - 2)