rajlabmssm / qtl-mapping-pipeline Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 0.0 2.52 MB

A snakemake pipeline for QTL mapping. Based on the GTEX pipeline

Python 70.82% R 26.25% Shell 2.93%

eqtl sqtl pipeline snakemake

qtl-mapping-pipeline's People

Contributors

Stargazers

Watchers

qtl-mapping-pipeline's Issues

put back qvalue calculation after permutation step

Currently users have to manually append the qval column.

There is an R script to do this called interaction_qvalue.R. It needs fixing as currently it tries to work on interaction QTL results only. Make this work @bzmull !

add TensorQTL and interaction QTLs

chunking error - testing same genes twice

my permutation results contain each gene twice:

$ ls *permutations.txt | xargs grep "^ENSG00000015592.16"

LumbarSpinalCord_expression_peer30_chunk152.permutations.txt:ENSG00000015592.16 chr8 27258420 27258420 - 6436 12913 . chr8 27245507 27245507 195 173.024 1.05009 656.504 2.17524e-47 -1.06769 9.999e-05 2.29836e-41
LumbarSpinalCord_expression_peer30_chunk609.permutations.txt:ENSG00000015592.16 chr8 27258420 27258420 - 6436 12913 . chr8 27245507 27245507 195 175.538 1.0437 715.465 2.17524e-47 -1.06769 9.999e-05 1.08018e-41

What's going on? I'm systematically over-reporting the number of significant eGenes found by the pipeline by a factor of 2.

Hopefully switching over to TensorQTL will get rid of the need for chunking and all the headaches that come with.

add group ID and strand columns to expression.bed.gz and splicing.bed.gz

GTEX pipeline produces BED files formatted for FASTQTL, not QTLtools. The only difference is that FASTQTL BED files have 6 columns before the per-sample columns:

Chromosome ID [string]
Start genomic position of the phenotype (here the TSS of gene1) [integer, 0-based]
End genomic position of the phenotype (here the TSS of gene1) [integer, 1-based]
Phenotype ID (here the exon IDs) [string].
Phenotype group ID (here the gene IDs, multiple exons belong to the same gene) [string]
Strand orientation [+/-]

These final two columns must be added in the eqtl_prepare_expression.py and the sqtl_prepare_expression.py

peer R package

currently requires users to have their own version installed on minerva from here: https://github.com/PMBio/peer

error while loading shared libraries: libgsl.so.0

does this happen randomly or on certain nodes? Add to QTLtools rules that the name of the node is recorded to log.

Run with zero PEER factors

what happens? Do I have to set some kind of exception?

group sQTLs by cluster OR gene

Complex genes will have multiple splicing clusters within them. In theory each cluster could have independent cis genetic regulation.

Therefore add an option in sQTL mapping to either group QTLs by gene (current approach) or by leafcutter cluster.

This may increase number of sQTLs found.

convert TensorQTL per-chrom Parquet files into giant tabixed files

maybe brute force in R is best here? read each file in and smush together, write out, bgzip and tabix?

map_junctions_to_genes

does not take strand into account - this is obvious.
should be run downstream of junction filtering to avoid mis-classifying a cluster
if multiple genes overlap junctions in a cluster, which gene is picked as the name?

add splicing QTL functions

Conditionally independent (secondary) QTLs

tensorQTL has this option:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --cis_results ${cis_results_file} \
    --mode cis_independent

Plug it in, could be useful for COLOCing.

rajlabmssm / qtl-mapping-pipeline Goto Github PK

qtl-mapping-pipeline's People

Contributors

Stargazers

Watchers

qtl-mapping-pipeline's Issues

Recommend Projects

Recommend Topics

Recommend Org