gigsea's Introduction

GIGSEA

Genotype Imputed Gene Set Enrichment Analysis using GWAS Summary Level Data

Description

Various methods of gene set analysis for trait-associated SNPs have been proposed, however, many challenges and limitations remained:

Gene boundaries: different criteria have been proposed to assign a SNP to a gene but no consensus was reached;
Long-range regulation: assigning a causal link to the gene nearest the associated variant falls short of elucidating long-range functional connection;
Gene size: longer genes are more likely to have significant P-values, possibly inflating the association test for gene sets that have many long genes;
Multiple-marker regulation: the best strategy is not determined on the number of SNPs for each gene and aggregation of different effect sizes of SNPs;
Linkage disequilibrium (LD): the local LD may reduce power to detect associations dependent on multiple markers.
Redundancy among gene sets: a gene may function in multiple ways and thus appear multiple times in functional gene sets. In spite of reflecting the crosstalk between gene sets, the overlap in gene sets may make the results of gene set enrichment analysis more difficult to interpret;
Permutation efficiency: the computational burden of permutation can be substantial;
Threshold-selection: a threshold-dependent procedure may cause the instability of results.

Here, we present GIGSEA (Genotype Imputed Gene Set Enrichment Analysis), a novel method that uses GWAS summary statistics and eQTL to infer differential gene expression and interrogate gene set enrichment for the trait-associated SNPs. By incorporating empirical eQTL of disease-relevant tissue, GIGSEA naturally accounts for factors such as gene size, gene boundary, SNP distal regulation, and multiple-marker regulation. The weighted linear regression model was used to perform the enrichment test, properly adjusting imputation accuracy, model incompleteness and redundancy in different gene sets. The significance level of enrichment is assessed by permutation, where matrix operation was employed to dramatically improve computation speed and efficiency. We have shown GIGSEA has appropriate type I error, and demonstrated high computational efficiency on real data set and discovered the plausible biological findings.

Dependencies on R packages

Matrix - Matrix
locfdr - locfdr
GIGSEAdata - GIGSEAdata

Installation:

Install GIGSEA R package from devtools package in R

> install.packages("devtools")
> library(devtools) 
> install_github("zhushijia/GIGSEA")

Install MetaXcan package in Python

Example:

See Tutorial

Citation:

S Zhu, T Qian, Y Hoshida, Y Shen, J Yu, and K Hao. GIGSEA: genotype imputed gene set enrichment analysis using GWAS summary level data. Bioinformatics 35 (1), 160-163. link, pdf

gigsea's People

Contributors

Stargazers

Watchers

gigsea's Issues

Package failing on Bioconductor

Hello Shijia,

The next Bioconductor release 3.15 is scheduled for April 27th. Commits to the
current 3.14 release should be made by April 8th and the last day to commit to
the devel branch is April 22nd. We like to make an effort to have all
Bioconductor packages building and checking without ERROR or TIMEOUT in both
release and devel versions. The email listed in the DESCRIPTION file for this package
was undeliverable which is why I'm opening an issue on GitHub.

Currently your package is producing an ERROR on the Windows builder in the
release version of Bioconductor. Could you please investigate the error as soon
as possible?

Release version - https://bioconductor.org/checkResults/3.14/bioc-LATEST/GIGSEA/

If you need any assistance please feel free to ask questions at
[email protected].

Thank you,
Kayla

Error in permutationSimpleLm

When running the permutationSimpleLm function, I sometimes run into the following error:

[1] TRUE
0%.....10%.....20%.....30%.....40%.....50%.....60%.....70%.....80%.....90%.....100%.Error in shuffledPval[i, ] < observedPval[i] :
comparison of these types is not implemented
Calls: permutationSimpleLm -> mean
Execution halted

It seems that this is because when the shuffled p values are regressed against the net dataset, there are sometimes some null values, and returned as 'numeric(0)'. The do.call function seems to convert this to "Numeric,0", which get compared to the observedPval, and the exit occurs because the two formats are incompatible.

Can you fix the issue so that the 'Numeric,0' is changed to a null value compatible with the observedPval and does not cause the function to exit?

MetaXcan output does not have pred_perf_r2

Hi,

I want to use GIGSEA to perform Gene Set Enrichment Analysis based on MetaXcan results. I read that you used prediction R^2 to build the weighted linear regression model. However, I ran MetaXcan using the MASHR model, which did not have pred_perf_r2 values. Is it fine if I only use weights as the fraction of imputation-used SNPs? Also, is it necessary to adjust the empirical p-values when interpreting results?

Thanks,
Quynh

Recommend Projects

zhushijia / gigsea Goto Github PK

gigsea's Introduction

GIGSEA

Description

Dependencies on R packages

Installation:

Example:

Citation:

gigsea's People

Contributors

Stargazers

Watchers

Forkers

gigsea's Issues

Package failing on Bioconductor

Error in permutationSimpleLm

MetaXcan output does not have pred_perf_r2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent