Hi, thanks for developing and maintaining such an excellent tool. I’m currently ex

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Memory Allocation Issue during trans-QTL Mapping about tensorqtl HOT 3 OPEN

wangyf1125 commented on July 22, 2024

Memory Allocation Issue during trans-QTL Mapping

from tensorqtl.

Comments (3)

gaow commented on July 22, 2024

Although in principle it is possible to break down the genotype by chroms to easy the memory usage, it would not be feasible to do permutation testing properly (which is a tremendous task anyways if there are many variants involved). Currently we stick to per chrom analysis without relying on multiple testing results.

from tensorqtl.

francois-a commented on July 22, 2024

It looks like you're generating dense output — are you changing the default pval_threshold? If you want that you should use --return_dense instead. For permutations/FDR, you don't need the dense output though. Can you specify what you're trying to do?

from tensorqtl.

gaow commented on July 22, 2024

Thank you @francois-a -- we set pval_threshold by setting it to as high as 1.0 because we would like to have some association results under the null which is an input to a trans-QTL integration step downstreams. The core tensorQTL calls in our trans analysis pipeline looks like this where by default in our pipeline pval_threshold = 1.0 :

    ## Trans analysis
    trans_df = trans.map_trans(genotype_df, 
                            phenotype_df,
                            covariates_df, 
                            batch_size=$[batch_size],
                            return_sparse=True, 
                            return_r2 = True, 
                            pval_threshold=$[pval_threshold], 
                            maf_threshold=$[maf_threshold])

    ## Filter out cis signal, again if customized cis windows are used, the windows is [start-win,end + win] where win = 0, else it is [start - win, start + win]
    trans_df = trans.filter_cis(trans_df, phenotype_pos_df, variant_df, window=window)   

    ## Permutation

    if $['True' if permutation else 'False']:
        perm_df = trans.map_permutations(genotype_df, covariates_df, batch_size=$[batch_size],
                             maf_threshold=$[maf_threshold])
        perm_output = trans.apply_permutations(perm_df,trans_df)
        perm_output.to_csv("$[_output:nn].transqtl_permutation.gz", sep='\t',index = None, compression={'method': 'gzip', 'compresslevel': 9})

That is why we put return_dense = True.

I guess from your suggestion, we can do two rounds of analysis:

Per-chrom analysis, or breaking genotypes into even smaller chunks than chroms, where we report all results (regardless of p-value) and skip permutation testing
We then focus on permutation testing, setting p-value cutoff to some small numbers and use sparse matrix to save marginal association results. We can even just skip the trans analysis, and only get permutation results separately in a dedicated run.

Do you think it is the correct approach to take?

On the other hand, let me think into the downstreams trans-QTL association analysis methods to see if we can get away without using those summary statistics corresponding to large p-values. Then we may be able to get away with a smaller p-value cutoff here for the output, and can do both marginal association and permutation testing in the same pass rather than running it two rounds.

from tensorqtl.

Memory Allocation Issue during trans-QTL Mapping about tensorqtl HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent