Giter Site home page Giter Site logo

eqtlmapt's Introduction

eQTLMAPT

eQTLMAPT:fast and accurate eQTL mediation analysis with efficient permutation testing approaches.

Introduction

eQTLMAPT implements fast and accurate eQTL mediation analysis with efficient permutation procedures to control for multiple testing. eQTLMAPT provides adaptive permutation scheme which prunes the permutation process opportunely, and it models the null distribution using generalized Pareto distribution (GPD) trained from a few permutations. The adjusted P-values can be estimated at any significance level based on the null distribution with much less computational cost compared with traditional fixed number permutation strategies. In addition, eQTLMAPT provides flexible options for users to choose proper permutation schemes combining with various confounding factors adjustment methods based on their practical needs. Experiments on real eQTL dataset demonstrate that eQTLMAPT provides higher resolution of estimated significance and is order of magnitude faster than compared methods with similar accuracy.

Installation

install.packages("devtools", dependencies = T)  
devtools::install_github("QidiPeng/eQTLMAPT")

Input

`snp.dat`          The genotype matrix.  Each row is an eQTL, each column is a sample.  
`fea.dat`          The gene expression profile matrix. Each row is a gene\'s expression profile, each column is a sample.  
`known.conf/conf`  The known confounder matrix which is adjusted in all mediation tests. Each row is a confounder, each column is a sample. 
`cov.pool`         The pool of candidate confounding variables from which potential confounders are adaptively selected to adjust for each mediation test. Each row is a covariate, each column is a sample.  Only need to sepcify in adaptive confounder selection mode, if not given in this mode, principal components will be calculated instead.
`trios.idx`        The trios matrix of 3 columns. Each row represents a trio (eQTL, cis-gene, trans-gene). The first element represents the index of the eQTL in `snp.dat`; The second element represents the index of cis-gene in `fea.dat`,  and the third element represents the index of the trans-gene in `fea.dat`.  
`cl`               If parallel computing is required, cluster information needs to be provided.  

For other parameter information, refer specifically to help function.

Output

`nperm`            The executed permutation times, for adaptive permutation scheme only.  
`nominal.p`        The nominal P-value by testing the significance of `beta2` in the regression formula `trans_gene ~ beta1 * SNP + beta2 * cis_gene + err`, using t-test.  
`empirical.p`      The permutation P-value by testing the significance of `beta2` using permutation test.  
`empirical.p.gpd`  The permutation P-value estimated by GPD approximation.  
`std.error`        The standard error of `beta2`.  
`t_stat`           The `t-statistics` in testing the significance of `beta2`.  
`beta`             The `beta1` in the regression formula `trans_gene ~ beta1 * SNP + beta2 * cis_gene + err`.  
`beta.total`       The `beta.total` in the regression formula `trans_gene ~ beta.total * SNP + err`.  
`beta.change`      The proportions mediated, calculated by `(beta.total-beta)/beta.total`.  
`pc.matrix`        The principal components (PCs) matrix of expression profiles. This will be returned if the PCs are used as the pool of potential confounders. Each column is a PC. Returned only in adaptive confounder selection mode.
`sel.conf.ind`     The indicator matrix indicating which confounders are selected during mediation analysis. Returned only in adaptive confounder selection mode. Dimension of `sel.conf.ind` is the number of trios by the number of covariates in `cov.pool` or `pc.matrix`.  

Demo data

# To generate a cluster with 4 nodes for parallel computing  
cl = makeCluster(4)    
    
# Using the first `pc.num` = 10 PCs as candidate confunding variable pools.  
# The maximum number of permutation is `Maxperm` = 10000 in the adaptive permutation scheme. And when permutation number better than original statistics upon `Minperm` = 100 stop.  
# When the empirical P-value is less than `gpd.perm` = 0.01, a more accurate empirical P-value is estimated using the GPD fit.  
 output <- gmap.ac.gpd(snp.dat = dat$snp.dat, fea.dat = dat$fea.dat, known.conf = dat$known.conf, trios.idx = dat$trios.idx[1:10,], cl = cl, cov.pool = NULL, pc.num = 10, Minperm = 100, Maxperm = 10000, gpd.perm = 0.01)  

The dat object used here is packaged in example.rda.

Contact

If you need help, please contact ydwang at hit dot edu dot cn, jiajiepeng at nwpu at edu at cn or 1571608336 at qq at com.

Full paper

Full paper has been submitted to Frontiers in Genetics

eqtlmapt's People

Contributors

qidipeng avatar stormlovetao avatar

Stargazers

Stefan avatar

Watchers

James Cloos avatar  avatar  avatar Xenophong avatar

eqtlmapt's Issues

example.rda

Hello, It seems like example.rda is empty. To be fair, I haven't used rda objects very much, so maybe I missed something. Any help would be appreciated, thanks!

gmap.gpd and gmap.ac.gpd do not work

Hello,

I've had success running the gmap() function without issues.

However, I have been unable to get the above functions to work properly.

When I run gmap.gpd() using the same genotype, expression, etc. inputs as the gmap() function, I get this error:

[1] "call Pgpd"
Error in if (abs(x) < REALMIN) return(2^(-1074)) :
the condition has length > 1

Similarly, when I try to run the gmap.ac.gpd function, I get this error:

Error in svd(x, nu = 0, nv = k) : infinite or missing values in 'x'

I've double checked, and there are no NAs in my expression matrix.

For reference, I am running R 4.2.1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.