Giter Site home page Giter Site logo

Comments (3)

gaow avatar gaow commented on July 22, 2024

Although in principle it is possible to break down the genotype by chroms to easy the memory usage, it would not be feasible to do permutation testing properly (which is a tremendous task anyways if there are many variants involved). Currently we stick to per chrom analysis without relying on multiple testing results.

from tensorqtl.

francois-a avatar francois-a commented on July 22, 2024

It looks like you're generating dense output ā€” are you changing the default pval_threshold? If you want that you should use --return_dense instead. For permutations/FDR, you don't need the dense output though. Can you specify what you're trying to do?

from tensorqtl.

gaow avatar gaow commented on July 22, 2024

Thank you @francois-a -- we set pval_threshold by setting it to as high as 1.0 because we would like to have some association results under the null which is an input to a trans-QTL integration step downstreams. The core tensorQTL calls in our trans analysis pipeline looks like this where by default in our pipeline pval_threshold = 1.0 :

    ## Trans analysis
    trans_df = trans.map_trans(genotype_df, 
                            phenotype_df,
                            covariates_df, 
                            batch_size=$[batch_size],
                            return_sparse=True, 
                            return_r2 = True, 
                            pval_threshold=$[pval_threshold], 
                            maf_threshold=$[maf_threshold])

    ## Filter out cis signal, again if customized cis windows are used, the windows is [start-win,end + win] where win = 0, else it is [start - win, start + win]
    trans_df = trans.filter_cis(trans_df, phenotype_pos_df, variant_df, window=window)   

    ## Permutation

    if $['True' if permutation else 'False']:
        perm_df = trans.map_permutations(genotype_df, covariates_df, batch_size=$[batch_size],
                             maf_threshold=$[maf_threshold])
        perm_output = trans.apply_permutations(perm_df,trans_df)
        perm_output.to_csv("$[_output:nn].transqtl_permutation.gz", sep='\t',index = None, compression={'method': 'gzip', 'compresslevel': 9})

That is why we put return_dense = True.

I guess from your suggestion, we can do two rounds of analysis:

  1. Per-chrom analysis, or breaking genotypes into even smaller chunks than chroms, where we report all results (regardless of p-value) and skip permutation testing
  2. We then focus on permutation testing, setting p-value cutoff to some small numbers and use sparse matrix to save marginal association results. We can even just skip the trans analysis, and only get permutation results separately in a dedicated run.

Do you think it is the correct approach to take?

On the other hand, let me think into the downstreams trans-QTL association analysis methods to see if we can get away without using those summary statistics corresponding to large p-values. Then we may be able to get away with a smaller p-value cutoff here for the output, and can do both marginal association and permutation testing in the same pass rather than running it two rounds.

from tensorqtl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.