Giter Site home page Giter Site logo

pumice's Introduction

PUMICE

PUMICE (Prediction Using Models Informed by Chromatin conformations and Epigenomics) is a tool to create gene expression prediction models for transcriptome-wide association studies. Specifically, PUMICE leverages tissue-specific 3D genomic and epigenomic data to define regions that harbor cis-regulatory variants and prioritize them accordingly.

Bugs

04/21/2023: Update PUMICE+ code and README.

01/05/2023: Fix issues when genotype input contain rare variants in both PUMICE.nested_cv.R and PUMICE.compute_weights.R. Also, fix issue with processing constant windows mapping file in PUMICE.compute_weights.R.

09/13/2022: For precomputed models we uploaded onto the Github so far, we reported the square of Spearman's correlation in the "spearman_cor" of "modelattribute". We are in the process of fixing this.

09/18/2022: We have fixed the problem and uploaded the updated version of the GTEx V7 precomputed models ("models_GTEx_v7" folder)

Getting Started

PUMICE requires R 4.0, several R packages, and bedtools.

Prerequisites

A list of R packages required for PUMICE includes optparse, data.table, tidyr, tidyverse, dplyr, IRanges, GenomicRanges, genefilter, glmnet, caret, rareGWAMA, BEDMatrix, RSQLite.

Tool overview

To run PUMICE, two steps are required.

  1. First, we need to run nested cross-validation to determine which window type and penalty factor are optimal (i.e. least mean cross-validated error) for each gene. This step is computationally intensive; therefore, we require users to run this step using parallel computation for the 22 autosomes and each window type. Users can further split each job into multiple jobs using the options total_file_num and file_num. PUMICE.nested_cv.R script can be found here.
   Rscript PUMICE.nested_cv.R
      --geno [Path to genotype data]
      --chr [Chromosome number]
      --exp [Path to expression data]
      --out [Path to output directory]
      --method [Window type to be used for creating models]
      --type [Specific 3D genome windows being used/Specific constant window size being used (in kb)]
      --window_path [Path to 3D genome window file]
      --bedtools_path [Path to bedtools software]
      --epi_path [Path to epigenomic data]
      --fold [Number of folds to be performed for nested cross-validation]
      --total_file_num [Number of total jobs to be splitted into]
      --file_num [Job number]
      --noclean [Do not delete any temporary files]
  1. Second, we need to run cross-validation to create gene expression prediction model using window type and penalty factor derived from the first step. PUMICE.compute_weights.R script can be found here.
   Rscript PUMICE.compute_weights.R
      --geno [Path to genotype data]
      --chr [Chromosome number]
      --exp [Path to expression data]
      --out [Path to output directory]
      --pchic_path [Path to pchic window file]
      --loop_path [Path to loop window file]
      --tad_path [Path to tad window file]
      --domain_path [Path to domain window file]
      --bedtool_path [Path to bedtools software]
      --epi_path [Path to epigenomic data]
      --fold [Number of folds to be performed for cross-validation]
      --noclean [Do not delete any temporary files]

For TWAS association testings, we can run PUMICE+. Of note, PUMICE+ will first perform TWAS association analyses [Gusev et al, 2016] for PUMICE and UTMOST separately. It then will perform Cauchy combination test analyses between PUMICE and UTMOST, which are PUMICE+ results. It is important to make sure that the effect allele in GWAS summary statistics and Allele 1 in PLINK reference panel are the same as the effect allele in db files. Uploaded prediction models have reference allele as effect allele and the list of variants used to train the models can be found here. Variant ID is formatted as chr_pos_ref_alt_b37.

PUMICE+.association_test.R script can be found here.

   Rscript PUMICE+.association_test.R
      --geno [Path to genotype data in PLINK format]
      --chr [Chromosome number]
      --gwas [Path to GWAS summary statistic]
      --out [Path to output directory]
      --pumice_weight [Path to PUMICE db file]
      --utmost_weight [Path to UTMOST db file]
      --out [Path to output file directory and name]

Output: "twas.z.u" and "pval.u" refer to TWAS Z score and associated P value for UTMOST. "twas.z.p" and "pval.p" refer to TWAS Z score and associated P value for PUMICE. "twas.z.cauchy" and "pval.cauchy" refer to TWAS Z score and associated P value for PUMICE+.

Usage

We provided example input data here.

Data were subsetted from 1000 Genome Project Phase 3 and GEUVADIS datasets to be used as an example to run the script.

Example of shell script used to run step1 can be found here.

  • Outputs from the step1 using example input data are provided here.

Example of shell script used to run step2 can be found here.

  • Outputs from the step2 using example input data are provided here.

Example of shell script used to run PUMICE+ can be found here.

  • Output from PUMICE+ using example input data is provided here.

Precomputed PUMICE and UTMOST models trained in 48 tissues from GTEx V7 (hg19) can be found in "models_GTEx_v7" folder.

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Chachrit (Poom) Khunsriraksakul - @ChachritK - [email protected]

Acknowledgements

pumice's People

Contributors

ckhunsr1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.