Giter Site home page Giter Site logo

gigsea's Introduction

GIGSEA

Genotype Imputed Gene Set Enrichment Analysis using GWAS Summary Level Data

Description

Various methods of gene set analysis for trait-associated SNPs have been proposed, however, many challenges and limitations remained:

  1. Gene boundaries: different criteria have been proposed to assign a SNP to a gene but no consensus was reached;
  2. Long-range regulation: assigning a causal link to the gene nearest the associated variant falls short of elucidating long-range functional connection;
  3. Gene size: longer genes are more likely to have significant P-values, possibly inflating the association test for gene sets that have many long genes;
  4. Multiple-marker regulation: the best strategy is not determined on the number of SNPs for each gene and aggregation of different effect sizes of SNPs;
  5. Linkage disequilibrium (LD): the local LD may reduce power to detect associations dependent on multiple markers.
  6. Redundancy among gene sets: a gene may function in multiple ways and thus appear multiple times in functional gene sets. In spite of reflecting the crosstalk between gene sets, the overlap in gene sets may make the results of gene set enrichment analysis more difficult to interpret;
  7. Permutation efficiency: the computational burden of permutation can be substantial;
  8. Threshold-selection: a threshold-dependent procedure may cause the instability of results.

Here, we present GIGSEA (Genotype Imputed Gene Set Enrichment Analysis), a novel method that uses GWAS summary statistics and eQTL to infer differential gene expression and interrogate gene set enrichment for the trait-associated SNPs. By incorporating empirical eQTL of disease-relevant tissue, GIGSEA naturally accounts for factors such as gene size, gene boundary, SNP distal regulation, and multiple-marker regulation. The weighted linear regression model was used to perform the enrichment test, properly adjusting imputation accuracy, model incompleteness and redundancy in different gene sets. The significance level of enrichment is assessed by permutation, where matrix operation was employed to dramatically improve computation speed and efficiency. We have shown GIGSEA has appropriate type I error, and demonstrated high computational efficiency on real data set and discovered the plausible biological findings.

Dependencies on R packages

Installation:

  1. Install GIGSEA R package from devtools package in R
> install.packages("devtools")
> library(devtools) 
> install_github("zhushijia/GIGSEA")
  1. Install MetaXcan package in Python

Example:

See Tutorial

Citation:

S Zhu, T Qian, Y Hoshida, Y Shen, J Yu, and K Hao. GIGSEA: genotype imputed gene set enrichment analysis using GWAS summary level data. Bioinformatics 35 (1), 160-163. link, pdf

gigsea's People

Contributors

zhushijia avatar

Stargazers

 avatar qianche avatar CHEN_Yu avatar  avatar  avatar Øyvind Helgeland avatar Jeff Hammerbacher avatar Monther Alhamdoosh avatar

Watchers

 avatar Monther Alhamdoosh avatar

gigsea's Issues

Package failing on Bioconductor

Hello Shijia,

The next Bioconductor release 3.15 is scheduled for April 27th. Commits to the
current 3.14 release should be made by April 8th and the last day to commit to
the devel branch is April 22nd. We like to make an effort to have all
Bioconductor packages building and checking without ERROR or TIMEOUT in both
release and devel versions. The email listed in the DESCRIPTION file for this package
was undeliverable which is why I'm opening an issue on GitHub.

Currently your package is producing an ERROR on the Windows builder in the
release version of Bioconductor. Could you please investigate the error as soon
as possible?

Release version - https://bioconductor.org/checkResults/3.14/bioc-LATEST/GIGSEA/

If you need any assistance please feel free to ask questions at
[email protected].

Thank you,
Kayla

Error in permutationSimpleLm

When running the permutationSimpleLm function, I sometimes run into the following error:

[1] TRUE
0%.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%.....100%.Error in shuffledPval[i, ] < observedPval[i] :
comparison of these types is not implemented
Calls: permutationSimpleLm -> mean
Execution halted

It seems that this is because when the shuffled p values are regressed against the net dataset, there are sometimes some null values, and returned as 'numeric(0)'. The do.call function seems to convert this to "Numeric,0", which get compared to the observedPval, and the exit occurs because the two formats are incompatible.

Can you fix the issue so that the 'Numeric,0' is changed to a null value compatible with the observedPval and does not cause the function to exit?

MetaXcan output does not have pred_perf_r2

Hi,

I want to use GIGSEA to perform Gene Set Enrichment Analysis based on MetaXcan results. I read that you used prediction R^2 to build the weighted linear regression model. However, I ran MetaXcan using the MASHR model, which did not have pred_perf_r2 values. Is it fine if I only use weights as the fraction of imputation-used SNPs? Also, is it necessary to adjust the empirical p-values when interpreting results?

Thanks,
Quynh

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.