Giter Site home page Giter Site logo

lrtq's Introduction

LRT-q

LRT-q is used for identifying regulatory effects of rare variants on genes with likelihood ratio test.

To install the R package:

library(devtools)
install_github("avallonking/LRTq")

To see the manual:

library(LRTq)
?LRTq

Note that LRT-q performs the rare variant association test for one quantitative phenotype/trait (the expression levels for one gene). So users need to run LRT-q function for multiple times if there are different phenotypes or genes.

Input:

  • Phenotypes E: a vector of quantitative traits of N individuals, such as gene expression levels. It should be standardized and normalized.
  • Genotypes G: an N by k matrix of individual genotypes, where N represents the population size and k stands for the number of rare variants. Rows are individuals, and columns are variants.
  • Weights W: a vector of the weights of k rare variants.
  • Permutations perm: the number of permutations to perform to calculate the p-value.

Output:

  • The p-value for this rare variant association test. If testing the association between gene expression and rare variants, then a significant p-value, such as a p-value smaller than 0.05, means that the gene expression is regulated by rare variants. Otherwise, there is no regulatory effect of rare variants on this gene.

Example:

library(LRTq)

## use sample data provided by SKAT
require(SKAT)
data("SKAT.example")
attach(SKAT.example)

## use the quantitative trait (y.c), and extract the genotypes of rare variants (Z)
E = y.c # Phenotypes
maf = colMeans(Z) / 2
Z = Z[, maf > 0 & maf < 0.05]
G = Z # Genotypes
## weight all variants equally with 0.30
W = rep(0.30, ncol(G)) # Weights
## and use 1000 permutations
perm = 1000 # Permutations

## run LRT-q with the simulated inputs
LRTq(expr = E, geno = G, causal_ratio = rep(0.30, ncol(G)), perm = 1000)
## the results could be 0.000999001, 0.001998002, or 0.002997003, 
## due to the randomness in the permutation test

Organization of this repository

  • Source codes for LRTq R package are in LRTq/src/LRTq.cpp
  • Scripts for generating all figures in the paper are in LRTq/analysis_scripts/figures/
  • Parameter settings for simulating genotypes are in LRTq/analysis_scripts/cosi
  • R scripts for running LRT-q and other methods on the GTEx dataset are in LRTq/analysis_scripts/gtex/association_tests
  • R scripts for analyzing the results of the GTEx dataset are in LRTq/analysis_scripts/gtex/statistical_analysis
  • R scripts for running the simulation experiments are in LRTq/analysis_scripts/simulation To reproduce the figures in the paper, please download the analysis_scripts folder and the "LRTq-Data" (https://drive.google.com/drive/folders/13HVdPpyOxCQCHxjfsGk3_gf4nrZfrTc1?usp=sharing). Decompose the "LRTq-Data" folder under analysis_scripts/figures, and rename it as data/. Also, decompose the pvals.tar.gz in the data/ folder. Then run the scripts in the analysis_scripts/figures directory. Note that it is important to use the correct folder name (data/), otherwise the scripts cannot work. For example, a user with a Linux or MasOS machine can do
# download the scripts
git clone https://github.com/avallonking/LRTq
# go to the directory with the scripts to generate figures
cd LRTq/analysis_scripts/figures
# download the required data from Google Drive 
# https://drive.google.com/drive/folders/13HVdPpyOxCQCHxjfsGk3_gf4nrZfrTc1?usp=sharing
# rename the downloaded folder as "data"
mv LRTq-Data data
# decompose the pvals.tar.gz in the data/ folder
cd data
tar xvzf pvals.tar.gz
cd ..
# make the directory for storing the generated plots
mkdir ../materials
# run the scripts to generate figures and tables

To re-run the simulation study, users can use the R scripts in LRTq/analysis_scripts/simulation/:

  • power.simulate.new.R: power simulation. It runs LRT-q and other methods on the simulated data assuming there are at least one rare variants regulating gene expression. Usage: Rscript power.simulate.new.R [simulated haplotypes] [repeats] [causal ratio] [a (constant)] [output file name]The simulated haplotypes could be data/simulation/haplotype/len5k_110var/processed.sim.hap1.100var.tsv
  • typeIerror.simulation.new.R: type I error simulation. It runs LRT-q and other methods on the simulated data assuming there are no rare variants affecting gene expression. Usage: Rscript typeIerror.simulation.new.R [simulated haplotyes] [permutations] [repeats] [output file name]The simulated haplotypes could be data/simulation/haplotype/len5k_110var/processed.sim.hap1.100var.tsv

To re-run the analysis of GTEx, users can use the R scripts in LRTq/analysis_scripts/gtex/association_tests/:

  • acat.R: run ACAT on the GTEx dataset
  • acat.regress_out_common_eqtls.R: run ACAT on the GTEx dataset and regress out the effects of common eQTLs
  • faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.R: run LRT-q, SKAT-O, and VT on the GTEx dataset
  • faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.maf01.R: run LRT-q, SKAT-O, and VT on the GTEx dataset, only considering rare variants with MAF < 0.01
  • faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.regress_out_common_eqtls.R: run LRT-q, SKAT-O, and VT on the GTEx dataset and regress out the effects of common eQTLs

Usage (output: p-values of each gene for different weights):

# The general analysis of GTEx
Rscript $rscript $tissue_gene_expression $genotype_matrix $gene_snp_set $covariates $gene_list $start $end $result_file_prefix $weight_file_prefix
# Regress out the effects of common eQTLs
Rscript $rscript $tissue_gene_expression $genotype_matrix $gene_snp_set $covariates $gene_list $start $end $result_file_prefix $weight_file_prefix $common_eqtl_file $common_geno_matrix_file
  • $rscript: R scripts in LRTq/analysis_scripts/gtex/association_tests/
  • $tissue_gene_expression: gene expression matrix. In our study, it is acquired from GTEx portal
  • $genotype_matrix: a SNP by individuals matrix of rare variants genotypes, encoded with 0, 1, 2
  • $gene_snp_set: gene-snp set files indicating the group of SNPs within 20kb from TSS of genes. It has two columns where the first column is genes and the second column is SNPs. Users can use the gene-snp set files in data/gene_snp_set, which are sorted by chromosomes
  • $covariates: covariates acquired from GTEx portal. Note that the original covariates matrices should be transposed before using them as input
  • $gene_list: lists of genes expressed. Users can use the list files in data/gene_list_v8, which are sorted by tissues and chromosomes
  • $start: starting index in the gene list, which indicates the gene to analyze in the beginning
  • $end: ending index in the gene list, which indicates the gene to analyze in the end
  • $result_file_prefix: prefix of the result files. The results files would be $result_file_prefix.lrt.csv, $result_file_prefix.skat.csv, $result_file_prefix.acat.csv, $result_file_prefix.vt.csv, representing outputs of LRT-q, SKAT-O, ACAT, and VT
  • $weight_file_prefix: prefix of weight files. Users are recommended to use data/weights/chr.$chromosome.rare.weight.summary
  • $common_eqtl_file: summary statistics of common eQTLs within 20kb, 50kb, or 100kb from TSS, which are extracted from GTEx eQTL summary statistics. Users can use the files in data/common_eqtls
  • $common_geno_matrix_file: a SNP by individuals genotypes matrix of common eQTLs to be regressed out

lrtq's People

Contributors

avallonking avatar

Stargazers

 avatar  avatar  avatar

Forkers

annacuomo

lrtq's Issues

LRTq.so: invalid ELF header

Hi,
I am installing LRTq as described on the LINUX server, but I always get error in invalid ELF header of LRTq.so. detailed below:

library(devtools)
install_github("avallonking/LRTq")
Downloading GitHub repo avallonking/LRTq@master
from URL https://api.github.com/repos/avallonking/LRTq/zipball/master
Installing LRTq
'/DATA1/user/software/conda/lib/R/bin/R' --no-site-file --no-environ
--no-save --no-restore --quiet CMD INSTALL
'/tmp/RtmpoYmlSa/devtools822f4add2c73/avallonking-LRTq-d9f8424'
--library='/DATA1/user/software/conda/lib/R/library' --install-tests

  • installing source package ‘LRTq’ ...
    ** libs
    make: Nothing to be done for 'all'.
    installing to /DATA1/user/software/conda/lib/R/library/LRTq/libs
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded
    Error: package or namespace load failed for ‘LRTq’ in dyn.load(file, DLLpath = DLLpath, ...):
    unable to load shared object '/DATA1/user/software/conda/lib/R/library/LRTq/libs/LRTq.so':
    /DATA1/user/software/conda/lib/R/library/LRTq/libs/LRTq.so: invalid ELF header
    Error: loading failed
    Execution halted
    ERROR: loading failed
  • removing ‘/DATA1/user/software/conda/lib/R/library/LRTq’
    Installation failed: Command failed (1)

the R on my LINUX server is in version 3.5.1.
Do you have any idea what I need to do to get it installed ?

Installation error

Hi there,

I am trying to install the LRTq package but get the following error:

install_github("avallonking/LRTq")
Loading required package: usethis
Downloading GitHub repo avallonking/LRTq@HEAD
✔ checking for file ‘/tmp/RtmpRcTxrm/remotesb4e522376a4/avallonking-LRTq-d9f8424/DESCRIPTION’ (415ms)
─ preparing ‘LRTq’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘LRTq_1.0.tar.gz’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/association_tests/faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.R’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/association_tests/faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.maf01.R’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/association_tests/faster.gtex_power_test.modified.fixed.more_perm.speed_up.other_tissues.regress_out_common_eqtls.R’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/statistical_analysis/all_tissues.egene.outliers.log2.standardized.corrected_tpm.only.R’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/statistical_analysis/all_tissues.non_egene.outliers.log2.standardized.corrected_tpm.only.R’
Warning in utils::tar(filepath, pkgname, compression = compression, compression_level = 9L, :
storing paths of more than 100 bytes is not portable:
‘LRTq/analysis_scripts/gtex/statistical_analysis/all_tissues.tissue_sharing.prop.num_shared_egenes.all.cv.rv.egenes.R’

  • installing source package ‘LRTq’ ...
    ** using staged installation
    ** libs
    x86_64-conda-linux-gnu-c++ -std=gnu++14 -I"/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib/R/include" -DNDEBUG -I'/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib/R/library/Rcpp/include' -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/include -I/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/include -Wl,-rpath-link,/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib -fpic -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1639563404388/work=/usr/local/src/conda/r-base-4.1.2 -fdebug-prefix-map=/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook=/usr/local/src/conda-prefix -c LRTq.cpp -o LRTq.o
    LRTq.cpp: In function 'double LRTq(Rcpp::NumericVector, Rcpp::IntegerMatrix, Rcpp::NumericVector, int)':
    LRTq.cpp:55:50: error: 'PI' was not declared in this scope
    55 | ln_a = log(1 - causal_ratio) - N / 2 * log(2 * PI * sigma) - N / 2;
    | ^~
    make: *** [/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib/R/etc/Makeconf:177: LRTq.o] Error 1
    ERROR: compilation failed for package ‘LRTq’
  • removing ‘/share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib/R/library/LRTq’
    Warning message:
    In i.p(...) :
    installation of package ‘/tmp/RtmpRcTxrm/fileb4e5183bf283/LRTq_1.0.tar.gz’ had non-zero exit status

see my sessionInfo() output below:

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /share/ScratchGeneral/anncuo/jupyter/conda_notebooks/envs/r_notebook/lib/libopenblasp-r0.3.17.so

locale:
[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8
[5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8
[7] LC_PAPER=en_AU.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] devtools_2.4.3 usethis_2.1.5

loaded via a namespace (and not attached):
[1] magrittr_2.0.2 pkgload_1.2.4 R6_2.5.1 rlang_1.0.2
[5] fastmap_1.1.0 tools_4.1.2 pkgbuild_1.3.1 sessioninfo_1.2.2
[9] cli_3.2.0 withr_2.5.0 ellipsis_0.3.2 remotes_2.4.2
[13] rprojroot_2.0.2 lifecycle_1.0.1 crayon_1.5.0 brio_1.1.3
[17] processx_3.5.2 purrr_0.3.4 callr_3.7.0 fs_1.5.2
[21] ps_1.6.0 curl_4.3.2 testthat_3.1.2 memoise_2.0.1
[25] glue_1.6.2 cachem_1.0.6 compiler_4.1.2 desc_1.4.0
[29] prettyunits_1.1.1

Could you help me figure out what to do to get this to install properly?

Thanks so much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.