The pumice from chaleeluo

PUMICE

PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) is a tool to create gene expression prediction models for transcriptome-wide association studies. Specifically, PUMICE leverages tissue-specific 3D genomic and epigenomic data to define regions that harbor cis-regulatory variants and prioritize them accordingly.

Bugs

04/21/2023: Update PUMICE+ code and README.

01/05/2023: Fix issues when genotype input contain rare variants in both PUMICE.nested_cv.R and PUMICE.compute_weights.R. Also, fix issue with processing constant windows mapping file in PUMICE.compute_weights.R.

09/13/2022: For precomputed models we uploaded onto the Github so far, we reported the square of Spearman's correlation in the "spearman_cor" of "modelattribute". We are in the process of fixing this.

09/18/2022: We have fixed the problem and uploaded the updated version of the GTEx V7 precomputed models ("models_GTEx_v7" folder)

Getting Started

PUMICE requires R 4.0, several R packages, and bedtools.

Prerequisites

A list of R packages required for PUMICE includes optparse, data.table, tidyr, tidyverse, dplyr, IRanges, GenomicRanges, genefilter, glmnet, caret, rareGWAMA, BEDMatrix, RSQLite.

Tool overview

To run PUMICE, two steps are required.

First, we need to run nested cross-validation to determine which window type and penalty factor are optimal (i.e. least mean cross-validated error) for each gene. This step is computationally intensive; therefore, we require users to run this step using parallel computation for the 22 autosomes and each window type. Users can further split each job into multiple jobs using the options total_file_num and file_num. PUMICE.nested_cv.R script can be found here.

   Rscript PUMICE.nested_cv.R
      --geno [Path to genotype data]
      --chr [Chromosome number]
      --exp [Path to expression data]
      --out [Path to output directory]
      --method [Window type to be used for creating models]
      --type [Specific 3D genome windows being used/Specific constant window size being used (in kb)]
      --window_path [Path to 3D genome window file]
      --bedtools_path [Path to bedtools software]
      --epi_path [Path to epigenomic data]
      --fold [Number of folds to be performed for nested cross-validation]
      --total_file_num [Number of total jobs to be splitted into]
      --file_num [Job number]
      --noclean [Do not delete any temporary files]

Second, we need to run cross-validation to create gene expression prediction model using window type and penalty factor derived from the first step. PUMICE.compute_weights.R script can be found here.

   Rscript PUMICE.compute_weights.R
      --geno [Path to genotype data]
      --chr [Chromosome number]
      --exp [Path to expression data]
      --out [Path to output directory]
      --pchic_path [Path to pchic window file]
      --loop_path [Path to loop window file]
      --tad_path [Path to tad window file]
      --domain_path [Path to domain window file]
      --bedtool_path [Path to bedtools software]
      --epi_path [Path to epigenomic data]
      --fold [Number of folds to be performed for cross-validation]
      --noclean [Do not delete any temporary files]

For TWAS association testings, we can run PUMICE+. Of note, PUMICE+ will first perform TWAS association analyses [Gusev et al, 2016] for PUMICE and UTMOST separately. It then will perform Cauchy combination test analyses between PUMICE and UTMOST, which are PUMICE+ results. It is important to make sure that the effect allele in GWAS summary statistics and Allele 1 in PLINK reference panel are the same as the effect allele in db files. Uploaded prediction models have reference allele as effect allele and the list of variants used to train the models can be found here. Variant ID is formatted as chr_pos_ref_alt_b37.

PUMICE+.association_test.R script can be found here.

   Rscript PUMICE+.association_test.R
      --geno [Path to genotype data in PLINK format]
      --chr [Chromosome number]
      --gwas [Path to GWAS summary statistic]
      --out [Path to output directory]
      --pumice_weight [Path to PUMICE db file]
      --utmost_weight [Path to UTMOST db file]
      --out [Path to output file directory and name]

Output: "twas.z.u" and "pval.u" refer to TWAS Z score and associated P value for UTMOST. "twas.z.p" and "pval.p" refer to TWAS Z score and associated P value for PUMICE. "twas.z.cauchy" and "pval.cauchy" refer to TWAS Z score and associated P value for PUMICE+.

Usage

We provided example input data here.

Data were subsetted from 1000 Genome Project Phase 3 and GEUVADIS datasets to be used as an example to run the script.

Example of shell script used to run step1 can be found here.

Outputs from the step1 using example input data are provided here.

Example of shell script used to run step2 can be found here.

Outputs from the step2 using example input data are provided here.

Example of shell script used to run PUMICE+ can be found here.

Output from PUMICE+ using example input data is provided here.

Precomputed PUMICE and UTMOST models trained in 48 tissues from GTEx V7 (hg19) can be found in "models_GTEx_v7" folder.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Chachrit (Poom) Khunsriraksakul - @ChachritK - [email protected]

Acknowledgements

Dajiang J. Liu

chaleeluo / pumice Goto Github PK

pumice's Introduction

PUMICE

Bugs

Getting Started

Prerequisites

Tool overview

Usage

License

Contact

Acknowledgements

pumice's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent