Giter Site home page Giter Site logo

tensorqtl's Introduction

tensorQTL

tensorQTL is a GPU-enabled QTL mapper, achieving ~200-300 fold faster cis- and trans-QTL mapping compared to CPU-based implementations.

If you use tensorQTL in your research, please cite the following paper: Taylor-Weiner, Aguet, et al., Genome Biol., 2019.

Empirical beta-approximated p-values are computed as described in Ongen et al., Bioinformatics, 2016.

Install

You can install tensorQTL using pip:

pip3 install tensorqtl

or directly from this repository:

$ git clone [email protected]:broadinstitute/tensorqtl.git
$ cd tensorqtl
# set up virtual environment and install
$ virtualenv venv
$ source venv/bin/activate
(venv)$ pip install -r install/requirements.txt .

To use PLINK 2 binary files (pgen/pvar/psam), pgenlib must be installed:

git clone [email protected]:chrchang/plink-ng.git
cd plink-ng/2.0/Python/
python3 setup.py build_ext
python3 setup.py install

Requirements

tensorQTL requires an environment configured with a GPU for optimal performance, but can also be run on a CPU. Instructions for setting up a virtual machine on Google Cloud Platform are provided here.

Input formats

Three inputs are required for QTL analyses with tensorQTL: genotypes, phenotypes, and covariates.

  • Phenotypes must be provided in BED format, with a single header line starting with # and the first four columns corresponding to: chr, start, end, phenotype_id, with the remaining columns corresponding to samples (the identifiers must match those in the genotype input). The BED file should specify the center of the cis-window (usually the TSS), with start == end-1. A function for generating a BED template from a gene annotation in GTF format is available in pyqtl (io.gtf_to_tss_bed).

  • Covariates can be provided as a tab-delimited text file (covariates x samples) or dataframe (samples x covariates), with row and column headers.

  • Genotypes must be in PLINK format, which can be generated from a VCF as follows:

    plink2 --make-bed \
        --output-chr chrM \
        --vcf ${plink_prefix_path}.vcf.gz \
        --out ${plink_prefix_path}
    

    If using PLINK 1.9 or earlier, add the --keep-allele-order flag.

    Alternatively, the genotypes can be provided as a dataframe (genotypes x samples).

The examples notebook below contains examples of all input files. The input formats for phenotypes and covariates are identical to those used by FastQTL.

Examples

For examples illustrating cis- and trans-QTL mapping, please see tensorqtl_examples.ipynb.

Running tensorQTL

This section describes how to run the different modes of tensorQTL, both from the command line and within Python. For a full list of options, run

python3 -m tensorqtl --help

Loading input files

This section is only relevant when running tensorQTL in Python. The following imports are required:

import pandas as pd
import tensorqtl
from tensorqtl import genotypeio, cis, trans

Phenotypes and covariates can be loaded as follows:

phenotype_df, phenotype_pos_df = tensorqtl.read_phenotype_bed(phenotype_bed_file)
covariates_df = pd.read_csv(covariates_file, sep='\t', index_col=0).T  # samples x covariates

Genotypes can be loaded as follows, where plink_prefix_path is the path to the VCF in PLINK format (excluding .bed/.bim/.fam extensions):

pr = genotypeio.PlinkReader(plink_prefix_path)
# load genotypes and variants into data frames
genotype_df = pr.load_genotypes()
variant_df = pr.bim.set_index('snp')[['chrom', 'pos']]

To save memory when using genotypes for a subset of samples, a subset of samples can be loaded (this is not strictly necessary, since tensorQTL will select the relevant samples from genotype_df otherwise):

pr = genotypeio.PlinkReader(plink_prefix_path, select_samples=phenotype_df.columns)

cis-QTL mapping: permutations

This is the main mode for cis-QTL mapping. It generates phenotype-level summary statistics with empirical p-values, enabling calculation of genome-wide FDR. In Python:

cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df)
tensorqtl.calculate_qvalues(cis_df, qvalue_lambda=0.85)

Shell command:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode cis

${prefix} specifies the output file name.

cis-QTL mapping: summary statistics for all variant-phenotype pairs

In Python:

cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df,
                prefix, covariates_df, output_dir='.')

Shell command:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode cis_nominal

The results are written to a parquet file for each chromosome. These files can be read using pandas:

df = pd.read_parquet(file_name)

cis-QTL mapping: conditionally independent QTLs

This mode maps conditionally independent cis-QTLs using the stepwise regression procedure described in GTEx Consortium, 2017. The output from the permutation step (see map_cis above) is required. In Python:

indep_df = cis.map_independent(genotype_df, variant_df, cis_df,
                               phenotype_df, phenotype_pos_df, covariates_df)

Shell command:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --cis_output ${prefix}.cis_qtl.txt.gz \
    --mode cis_independent

cis-QTL mapping: interactions

Instead of mapping the standard linear model (p ~ g), this mode includes an interaction term (p ~ g + i + gi) and returns full summary statistics for the model. The interaction term is a tab-delimited text file or dataframe mapping sample ID to interaction value(s) (if multiple interactions are used, the file must include a header with variable names). With the run_eigenmt=True option, eigenMT-adjusted p-values are computed. In Python:

cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, prefix,
                covariates_df=covariates_df,
                interaction_df=interaction_df, maf_threshold_interaction=0.05,
                run_eigenmt=True, output_dir='.', write_top=True, write_stats=True)

The input options write_top and write_stats control whether the top association per phenotype and full summary statistics, respectively, are written to file.

Shell command:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --interaction ${interactions_file} \
    --best_only \
    --mode cis_nominal

The option --best_only disables output of full summary statistics.

Full summary statistics are saved as parquet files for each chromosome, in ${output_dir}/${prefix}.cis_qtl_pairs.${chr}.parquet, and the top association for each phenotype is saved to ${output_dir}/${prefix}.cis_qtl_top_assoc.txt.gz. In these files, the columns b_g, b_g_se, pval_g are the effect size, standard error, and p-value of g in the model, with matching columns for i and gi. In the *.cis_qtl_top_assoc.txt.gz file, tests_emt is the effective number of independent variants in the cis-window estimated with eigenMT, i.e., based on the eigenvalue decomposition of the regularized genotype correlation matrix (Davis et al., AJHG, 2016). pval_emt = pval_gi * tests_emt, and pval_adj_bh are the Benjamini-Hochberg adjusted p-values corresponding to pval_emt.

trans-QTL mapping

This mode computes nominal associations between all phenotypes and genotypes. tensorQTL generates sparse output by default (associations with p-value < 1e-5). cis-associations are filtered out. The output is in parquet format, with four columns: phenotype_id, variant_id, pval, maf. In Python:

trans_df = trans.map_trans(genotype_df, phenotype_df, covariates_df,
                           return_sparse=True, pval_threshold=1e-5, maf_threshold=0.05,
                           batch_size=20000)
# remove cis-associations
trans_df = trans.filter_cis(trans_df, phenotype_pos_df.T.to_dict(), variant_df, window=5000000)

Shell command:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix} \
    --covariates ${covariates_file} \
    --mode trans

tensorqtl's People

Contributors

francois-a avatar susie-song avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorqtl's Issues

Cis analysis fails to find R package qvalue

Hi there, I hope this is an easy fix but I've run out of ideas. I've successfully run TensorQTL in 'cis_nominal' mode, but when I try to run in 'cis' to get empirical p-values for each phenotype, it runs the analysis but fails to write out results. This appears to be due to a failure to load the R package 'qvalue' as per the error below:

R[write to console]: Error in loadNamespace(name) : there is no package called ‘qvalue’
Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

Traceback (most recent call last):
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/tensorqtl.py", line 111, in main
    calculate_qvalues(res_df, fdr=args.fdr, qvalue_lambda=args.qvalue_lambda, logger=logger)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/post.py", line 40, in calculate_qvalues
    qval, pi0 = rfunc.qvalue(res_df['pval_beta'])
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/rfunc.py", line 17, in qvalue
    qvalue = importr("qvalue")
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/rpy2/robjects/packages.py", line 477, in importr
    env = _get_namespace(rname)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/rpy2/rinterface_lib/conversion.py", line 40, in _
    cdata = function(*args, **kwargs)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/rpy2/rinterface.py", line 789, in __call__
    raise embedded.RRuntimeError(_rinterface._geterrmessage())
rpy2.rinterface_lib.embedded.RRuntimeError: Error in loadNamespace(name) : there is no package called ‘qvalue’
Calls: <Anonymous> ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>

  Time elapsed: 45.64 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 17666
  * Correlation between Beta-approximated and empirical p-values: : 1.0000

I'm baffled by this, as qvalue is definitely in my default R library. When I run python interactively, I'm able to run the rpy2 command qvalue = importr("qvalue") from rfunc.py and it appears to find the package just fine.

Do you have any suggestions what might be causing this? Thanks.

adding random effects for shared donors

Dear Francois,

I have a dataset with multiple similar tissues from more or less the same set of donors (very much analogous to GTEx).

What I'm hoping to do is to increase my QTL discovery power for subsequent colocalisation with disease GWAS, which requires meta-analysed beta and standard errors for each SNP-Gene pair. I ran METASOFT with an RE2 model on the tensorQTL summary stats but that doesn't account for donor sharing. MASHR is really nice but doesn't produce estimates per SNP, only per gene. The published software for performing these kinds of shared-donor meta-analyses are either horribly convoluted to run (Meta-Tissue), or focus on solely producing P-values rather than betas and standard errors (RECOV, RE2C).

Do you have any plans to be able to include random effects into the tensorQTL model so that one could account for shared donors in a single joint analysis?
Alternatively, do you have any thoughts or tips on what the current state of the art is for this problem?

best wishes,

Jack

paramter best_only

Hello:

Can you let me know if '--best-only' working in cis or trans mode? did this paramter work after '--pval_threshold'? Thanks.

Understand about the output of interactionQTL

Hi,
Can any one help me understand, what is the meaning of each column of interaction QTL output,
b_g b_g_se pval_i b_i b_i_se pval_gi b_gi b_gi_se

b_g is beta of SNP?
b_g_se is SE of SNP?
pval_i is the pvalue of the interaction trait? so no value of SNP (genotype)?
b_i is beta of interaction trait?
p_gi is the Pvlaue of the interaction item?
And how did you define cis-eQTL in this software, is the default 1Mb up- and down-stream of the TSS position?
The tutorial really need to be updated, since this software looks very useful and very fast.
Xin

Utilizing dosage input to encode a non-genetic 'genotype'?

Hello,

I posted this issue in @francois-a's package fastqtl, but since tensorqtl is similar, I'm writing the same question here:

I'm attempting to replicate the gtex-pipeline using scripts from the gtex-pipeline repository on GitHub.

However, my use case is a bit different than eQTL mapping. I would like to run a phenotype-QTL mapping, where: instead of many single-variant association tests, I want to test a general phenotype and its association to expression of many genes amongst tissues.

Most importantly, I would like to achieve fast performance on permutation testing for computing p-values of such a phenotype and its association with gene expression (across genes).

My intuition is to encode the phenotype as a dosage for a single genetic variant, across individuals, but I am unsure if this is supported by FastQTL and/or TensorQTL.

So, I'm wondering if this could be possible? If so, could you help me encode this strategy for FastQTL or TensorQTL? I would like to utilize the programs' fast performance in terms of permutation testing.

Thank you

Choosing the top variants in the case of ties

For cis-qtl permutations, the top association is selected per gene, I believe using argmax, so in the case of ties it chooses the first one (lowest coordinate). For datasets such as outbred animal populations, there are often stretches of many SNPs (dozens or even hundreds) with identical genotypes across all individuals. Choosing the first top SNP results in a bias toward lower coordinates, as well as a small enrichment of top SNPs right at the cis-window boundary if the stretch of identical top SNPs extends that far.

Would it make sense for the software to instead choose a random tiebreaker in the case of multiple tied top associations to avoid that bias? Currently, a user could refer back to the full nominal results and/or genotype files and choose random representative SNPs themselves afterwards, but in my experience that can be tricky.

Warning

Hi,
I am running tensorqtl cis mapping and getting the following warning. Would I need to be worried regarding this?

WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)

Thanks,
Matiss

KeyError -1 writing permutation output

Hello,
With an sQTL dataset of ~260k phenotypes and ~8M variants we are running into convergence issues during calculations of cis-permutations. The permutations appear to run and yield a count of QTL phenotypes @ FDR 0.05 in the log files, but no final output is written. The logs contain ~5k messages of "WARNING: excluding # monomorphic variants" and about 14k warnings of "WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)". The job dies with the following error message. Any ideas how to resolve this issue?

/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/genotypeio.py:145: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]

To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  self.bed = self.bed[:,ix]
/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/core.py:234: RuntimeWarning: invalid value encountered in sqrt
  return 2*stats.t.cdf(-np.abs(np.sqrt(tstat2)), dof)
  Time elapsed: 457.84 min
done.
  * writing output
Computing q-values
  * Number of phenotypes tested: 269872
  * Correlation between Beta-approximated and empirical p-values: : 0.9999
  * Proportion of significant phenotypes (1-pi0): 0.25
  * QTL phenotypes @ FDR 0.05: 10306
Traceback (most recent call last):
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 2131, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 2140, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/bin/tensorqtl", line 11, in <module>
    sys.exit(runpy.run_module('tensorqtl', {}, "__main__"))
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/runpy.py", line 210, in run_module
    return _run_code(code, {}, init_globals, run_name, mod_spec)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/tensorqtl.py", line 114, in main
    calculate_qvalues(res_df, fdr=args.fdr, qvalue_lambda=args.qvalue_lambda, logger=logger)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/tensorqtl/post.py", line 53, in calculate_qvalues
    lb = lb[-1]
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/series.py", line 942, in __getitem__
    return self._get_value(key)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/series.py", line 1051, in _get_value
    loc = self.index.get_loc(label)
  File "/hpc/packages/minerva-centos7/anaconda3/2020.11/envs/pytorchGPU11.1/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: -1

Running tensorqtl with interaction term on command line

Hello,

First off, congratulations and thank you for this great tool. I have a few questions regarding the usage of tensorqtl when an interaction term is present. If running in command line, I'm assuming the interaction term would be a file. What format should this file follow? I'm having trouble understanding how to create this file.

Once I know the file format and I have all the files ready, would the appropriate way to run tensorqtl on command line be as follow:

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix}
--covariates ${covariates_file}
--interaction ${interaction_term}
--mode cis_nominal

Additionally, if I wanted to run this analysis in permutation mode, sort of like FastQTL, how would I go about it?

Thank you for your help.

Nelson Barrientos

Interaction mode

Hello! I'm running into issues with the interaction model through both Python and CLI. I've successfully run both the cis and cis_nominal versions of tensorQTL with the same exact files I'm attempting to use for interaction.

My interaction file looks like this, where there are two columns (tab-delim) with individuals in the first column and a binary assignment in a second column. The individuals are in the same order as in the phenotype file.

HG01507 0
HG01518 1
HG01519 0
HG01530 1
HG01531 0
NA19379 0
NA19391 0
NA19399 0
NA19430 1

When I run the example command in Python, I get the following error:

Traceback (most recent call last):
File "", line 1, in
File "/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/cis.py", line 188, in map_nominal
mask_s[interaction_s.sort_values(kind='mergesort').index[:interaction_s.shape[0]//2]] = False
File "/tensorqtl/venv/lib/python3.8/site-packages/pandas/util/_decorators.py", line 311, in wrapper
return func(*args, **kwargs)
TypeError: sort_values() missing 1 required positional argument: 'by'

When I run the example command in CLI, I get this error instead:

/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/eigenmt.py:101: UserWarning: torch.symeig is deprecated in favor of torch.linalg.eigh and will be removed in a future PyTorch release.
The default behavior has changed from using the upper triangular portion of the matrix by default to using the lower triangular portion.
L, _ = torch.symeig(A, upper=upper)
should be replaced with
L = torch.linalg.eigvalsh(A, UPLO='U' if upper else 'L')
and
L, V = torch.symeig(A, eigenvectors=True)
should be replaced with
L, V = torch.linalg.eigh(A, UPLO='U' if upper else 'L') (Triggered internally at ../aten/src/ATen/native/BatchLinearAlgebra.cpp:2499.)
eigenvalues_t,_ = torch.symeig(shrunk_cor_t, eigenvectors=False)
Traceback (most recent call last):
File "/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/core.py", line 164, in calculate_interaction_nominal
Xinv = torch.matmul(torch.transpose(X_t, 1, 2), X_t).inverse() # ng x 3 x 3
RuntimeError: inverse_cpu: (Batch element 227): The diagonal element 3 is zero, the inversion could not be completed because the input matrix is singular.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/apps/rhel8/Anaconda3-2021.05/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/apps/rhel8/Anaconda3-2021.05/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/main.py", line 2, in
tensorqtl.main()
File "/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/tensorqtl.py", line 119, in main
cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, args.prefix, covariates_df=covariates_df,
File "/tensorqtl/venv/lib/python3.8/site-packages/tensorqtl/cis.py", line 264, in map_nominal
res = calculate_interaction_nominal(genotypes_t, phenotype_t.unsqueeze(0), interaction_t,
File "/venv/lib/python3.8/site-packages/tensorqtl/core.py", line 167, in calculate_interaction_nominal
i = int(re.findall('For batch (\d+)', str(e))[0])
IndexError: list index out of range

If anyone has any ideas, that would be amazing! I'm very stuck.

MemoryError

Hi,

I am trying to reproduce the geuvadis study but tensorQTL fails. I used a multi-sample VCF (all chr together), which was preprocessed as described in the FastQTL manual. Moreover, FastQTL finishes successfully with the same data. Here am attaching a log file.
tensorqtl_error.log

This was run on an amazon GPU instance using your docker image but it didn't recognize GPUs (warning: using cpu).

Thanks,

One variant per gene

Hi,

Here is a code that returns an error (too many indices for tensor of dimension 0) when there is only one cis-variant. The error appears when you uncomment gt_pos line.

import pandas as pd
from tensorqtl import cis
from numpy.random import randint

samples = 10
snps = 2
genes = 2

gt = pd.DataFrame(randint(-1, 3, size=(snps,samples)), 
                  columns=['sample_{}'.format(i) for i in range(1,samples+1)], 
                  index=['snp_{}'.format(i) for i in range(1,snps+1)])

gt_pos = pd.DataFrame({'pos' : list(range(1,snps+1)),
                      'chrom': '1'},
                      index = gt.index)

#gt_pos.loc['snp_2','chrom'] = '2'

ph = pd.DataFrame(randint(5, 15, size=(genes, samples)), 
                  columns = gt.columns, 
                  index = ['gene_{}'.format(i) for i in range(1,genes+1)])

ph_pos = pd.DataFrame({'chr' : ['1' ,'1'], 'tss' : [100,1000]})
ph_pos.index = ph.index

covariates_df = pd.DataFrame(1, index=['C'], columns=ph.columns).T

cis_df = cis.map_cis(gt, gt_pos, ph, ph_pos, covariates_df)

Thanks

IndexError: tensors used as indices must be long, byte or bool tensors

I have downloaded your example data GEUVADIS and used this command

python -m tensorqtl .\GEUVADIS.445_samples.GRCh38.20170504.maf01.filtered.nodup .\GEUVADIS.445_samples.expression.bed.gz GEUVADIS.445_samples --covariates .\GEUVADIS.445_samples.covariates.txt --mode cis

got this error
image
Can you give me some advices ? thanks much

warning using cpu

We are using tensorqtl using singularity. Is the Dockerfile up-to-date on the repo?

It was working fine back in April. We did try to use latest version of tensorqtl and other older versions. None of them are using GPU now.

python3 -m tensorqtl ${plink_prefix_path} ${expression_bed} ${prefix}     --covariates ${covariates_file}     --mode cis_nominal
[Sep 01 13:34:48] Running TensorQTL: cis-QTL mapping
  * WARNING: using CPU!
  * reading phenotypes (GEUVADIS.445_samples.expression.bed.gz)
  * reading covariates (GEUVADIS.445_samples.covariates.txt)
Mapping files: 100%|############################################################| 3/3 [00:20<00:00,  6.69s/it]
Singularity> python3 -c "import torch; print(torch.__version__)"
1.5.0
Singularity> nvidia-smi
Tue Sep  1 14:55:45 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.01    Driver Version: 418.87.01    CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 00000000:3B:00.0 Off |                    0 |
| N/A   32C    P0    26W / 250W |      0MiB / 16130MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Beta approximation fails to converge regularly

Everytime I have ran permutations to determine cis or trans-eQTL significance I get the following Warning:

WARNING: scipy.optimize.newton failed to converge (running scipy.optimize.minimize)
/usr/local/lib/python3.6/dist-packages/tensorqtl/tensorqtl.py:210: RuntimeWarning: invalid value encountered in sqrt
  return 2*stats.t.cdf(-np.abs(np.sqrt(tstat2)), dof)
/usr/local/lib/python3.6/dist-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
  return (self.a < x) & (x < self.b)
/usr/local/lib/python3.6/dist-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
  return (self.a < x) & (x < self.b)
/usr/local/lib/python3.6/dist-packages/scipy/stats/_distn_infrastructure.py:1738: RuntimeWarning: invalid value encountered in greater_equal
  cond2 = (x >= self.b) & cond0

Is it possible to increase or set as an argument the number of iterations to pass to scipy.optimize.minimize?

permutations IndexError

Hi,

At the permutations step, tensorqtl gives the error:

  File "/lustre/scratch118/humgen/resources/conda_envs/tensorqtl/lib/python3.5/site-packages/tensorqtl/cis.py", l$
    return r_nominal_t[ix], std_ratio_t[ix], ix, r2_perm_t, genotypes_t[ix]                                       
IndexError: too many indices for tensor of dimension 0

see full log:
log.txt

The command executed was
python3 -m tensorqtl plink_franke.filtered expression_data.filtered.bed.gz franke_out --covariates franke_covariates.txt --mode cis
Using tensorqtl version 1.0.2 installed via pip.
I've unsuccessfully tried a couple of number of permutations and seed.

The input data is about 170MB, which I can provide if you would like to reproduce the issue.

Any idea what might have caused the error?
Thanks!

Guillaume

A check on variant formatting and effect allele

Apologies if I've missed this explicitly somewhere else but I'd like to triple check:

  • Where the variant id is formatted as chr_pos_a1_a2, is this always in the order/convention of chr_pos_ref_alt?

  • Is the QTL effect allele the ALT allele? I.e. the slope values etc are reported relative to that allele.

map_trans fails with int error

I have been running tensorQTL on single phenotypes but encounter the following error:

>>> gw_trans_df = trans.map_trans(gw_genotype_df, phenotype_df, covariates_df, return_sparse=True, maf_threshold = 0.03)
trans-QTL mapping
  * 4038 samples
  * 1 phenotypes
  * 25 covariates
  * 162168 variants
    processing batch 3/9Traceback (most recent call last):
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/numpy/lib/index_tricks.py", line 376, in __getitem__
    axis = int(item)
ValueError: invalid literal for int() with base 10: 'rs202244392'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/trans.py", line 115, in map_trans
    res.append(np.c_[variant_ids[ix[:,0]], phenotype_df.index[ix[:,1]], tstat.cpu(), maf[ix[:,0]]])
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/numpy/lib/index_tricks.py", line 379, in __getitem__
    raise ValueError("unknown special directive")
ValueError: unknown special directive
    processing batch 8/9

Looking at the genotype_df for this variant, it seems to be to be the right kind of value:

 gw_genotype_df.loc["rs202244392"]
iid
110000315494    2
110000366469    2
110000399461   -1
110000399523    2
110000399542    2
               ..
119999967049    2
119999967065    2
119999984952    2
119999984992    2
119999995956    2
Name: rs202244392, Length: 4144, dtype: int8


>>> gw_genotype_df.loc["rs202244392"].value_counts()
 2    3048
-1     636
 1     448
 0      12
Name: rs202244392, dtype: int64

Do you have any idea what might be going wrong here? I know the primary purpose of tensorQTL isn't single-phenotype analyses but I've been able to run this with different genotype files before.

interaction QTL

please provide detailed information about "interaction_s" in function below

cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df, prefix,
interaction_s=interaction_s, maf_threshold_interaction=0.05,
group_s=None, run_eigenmt=True, output_dir='.')

thanks.

unrecognized argument:--cis-_results

Hi
I want to use the mode of cis_independent for my own data ( wheat). But there are some fundamental problems. The mode of cis and cis_nominal can work properly. I don't know if there is a problem with my input.

code:python3 -m tensorqtl /data2/junxu/genotypeMaf005_87_bed/1.87.maf005 /data1/home/junxu/eQTL/zscore/parseBed/1.sorted.bed.gz /data2/junxu/1 --covariates /data1/home/junxu/eQTL/zscore/7_5_covariance.txt.gz --cis_results tensorResult/1.cis_qtl.txt.gz --mode cis_independent

error:main.py: error: unrecognized arguments: --cis_results tensorResult/1.cis_qtl.txt.gz
Thanks

Unable to run cis.map_nomial with the example files

With the tensor_example Jupiter notebook, I find that I am not able to run the command [cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df,
covariates_df, prefix, output_dir='.')]. It reports an assertion error [assert np.all(phenotype_df.columns==covariates_df.index)] even after I changed the covariates_df to None.

Interaction mode questions

Hi there,

Could you verify that my understanding of the interaction mode is correct?

In my dataset, we have some disease and some control samples. Interaction mode can help identify QTLs that are condition-specific (via the pval_i and pval_gi columns)?

Also, should condition still be provided as a covariate if I'm using it for the interaction parameter?

Thank you!

tensorQTL using CPU on a GPU host

Hello, I am writing on behalf of one of my users at our facility.

We are trying to run tensorQTL on a test dataset, and have tensorflow working with cuda and gpu resources but tensorQTL presents this warning message:
`[Nov 04 14:38:47] Running TensorQTL: cis-QTL mapping

  • WARNING: using CPU!`

This setup is using the following modules/resources
[root@pe2dg2-01 local]# python --version Python 3.7.1rc1 [root@pe2dg2-01 local]# pip freeze | grep tensor tensorboard==2.0.0 tensorflow-estimator==2.0.1 tensorflow-gpu==2.0.0 tensorqtl==1.0.1 NVIDIA-SMI 410.48 Driver Version: 410.48 0 Tesla V100-PCIE 1 Tesla V100-PCIE

Can you recommend any changes to our setup or modules that are necessary to use GPU compute with tensorqtl?

interaction term

Hello,

To identify potential disease risk eQTLs, we planned to test for an interaction (eQTL x diabetes) term between genotype and diabetes status using the “modelLINEAR_CROSS” argument in MatrixEQTL. But we're wondering if the same type of analysis can be done with tensorQTL. Our goal is really to identify disease specific QTLs. Can you please advise if that is possible with your software and how?

Thanks
Ana

phenotype_group assertion error

Hello,
I've run into some issues using the phenotype_group option of tensorqtl when trying to analyze an intron dataset generated by leafcutter. After preparing the inputs tensorqtl runs without apparent issues without the phenotype_group option. When adding the phenotype_group option (mapping introns to associated gene names), tensorqtl fails with the message "Groups defined in input do not match phenotype file (check sort order)."

Based on the sort order verification check it appears that tensorqtl expects groupings to be contiguous, but in the leafcutter output it is possible that introns mappings are sometimes interspersed with other genes (e.g. intron1 will map to geneA, intron2 will map to geneB, and intron3 will map to geneA again). As a result the sort order check fails.

As an alternative I tried resorting the bed and phenotype group files by group (i.e. gene) and then by start coordinate. This, however, appears to trigger the following assertionError:

assert np.all([self.cis_ranges[g.index[0]][0] == self.cis_ranges[i][0] and self.cis_ranges[g.index[0]][1] == self.cis_ranges[i][1] for i in g.index[1:]])

I'm wondering what the correct way is to prepare the phenotype_group file for tensorqtl analysis and what (if any) expectations it has for the way the phenotypes are sorted in the input files.

I can provide test datasets to help troubleshoot if needed.

RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

Hi,
Trying to run tensorQTL and keep getting this error.

aduarte@leia:~/files_map$ python3 -m tensorqtl ~/files_map/genotypes22.vcf.gz ~/files_map/phenotypes_22.bed.gz results --covariates ~/files_map/covariates.txt --mode cis --permutations 10000 --window 250000 --maf_threshold 0.01 --seed 123456789
[Mar 09 11:54:16] Running TensorQTL: cis-QTL mapping
  * WARNING: using CPU!
  * using seed 123456789
  * reading phenotypes (/home/aduarte/files_map/phenotypes_22.bed.gz)
  * reading covariates (/home/aduarte/files_map/covariates.txt)
Mapping files: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00,  4.00s/it]
cis-QTL mapping: empirical p-values for phenotypes
  * 83 samples
  * 72424 phenotypes
  * 22 covariates
  * 11548697 variants
  * using seed 123456789
  * checking phenotypes: 72424/72424
    ** dropping 5 phenotypes without variants in cis-window
  * computing permutations
    processing phenotype 8/72419Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/aduarte/.local/lib/python3.6/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/home/aduarte/.local/lib/python3.6/site-packages/tensorqtl/tensorqtl.py", line 109, in main
    logger=logger, seed=args.seed, verbose=True)
  File "/home/aduarte/.local/lib/python3.6/site-packages/tensorqtl/cis.py", line 460, in map_cis
    res = calculate_cis_permutations(genotypes_t, phenotype_t, residualizer, permutation_ix_t)
  File "/home/aduarte/.local/lib/python3.6/site-packages/tensorqtl/cis.py", line 59, in calculate_cis_permutations
    r2_perm_t,_ = corr_t[~torch.isnan(corr_t).any(1),:].max(0)
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

Any sugestions?

Documentation for interaction & conditional analyses

Hi there,

Back again with more questions. I'm trying to run an analysis with an interaction term but I'm not entirely clear on how this is meant to be run.

First I tried following the --interaction flag the column name of the variable to be used as an interaction term from my covariate file. This obviously failed.

I then tried passing the name of a new text file that contained columns for sample ID and the interaction term. When run in the command line, this failed with an AsserionError:

Feb 05 14:33:24] Running TensorQTL: cis-QTL mapping
  * using GPU (Tesla P100-PCIE-16GB)
  * reading phenotypes (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/phenotypes/INTERVAL_RNAseq_phase1_filteredSamplesGenes_TMMNormalised_FPKM_Counts_foranalysis.bed.gz)
  * reading covariates (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/covariates/INTERVAL_RNAseq_phase1_age_sex_rin_batch_PC10_PEER20.txt)
  * reading interaction term (/rds/project/jmmh2/rds-jmmh2-projects/interval_rna_seq/analysis/03_tensorqtl/covariates/INTERVAL_RNAseq_phase1_GxE_neutPCT.txt)
Traceback (most recent call last):
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/tensorqtl.py", line 70, in main
    assert covariates_df.index.isin(interaction_s.index).all()
AssertionError

As far as I can tell, this is checking that the indices for the covariate and interaction dataframes match, and this comes up with an error. The confusing thing is that when I read both of these files into python interactively and perform the same assert test, it passes, so the ids are definitely consistent between the files.

I finally tried running the whole command interactively in python with the following code:

cisnom_df = cis.map_nominal(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df, prefix="Test_gxe", interaction_s=interaction_df)

Which gave another different error:

cis-QTL mapping: nominal associations for all variant-phenotype pairs
  * 2745 samples
  * 17674 phenotypes
  * 22 covariates
  * 6811432 variants
  * including interaction term
    * using 0.05 MAF threshold
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jm2294/.conda/envs/tensorQTL/lib/python3.7/site-packages/tensorqtl/cis.py", line 108, in map_nominal
    interaction_mask_t = torch.BoolTensor(interaction_s >= interaction_s.median()).to(device)
ValueError: could not determine the shape of object type 'DataFrame'

Do you have any idea what I might be doing wrong here?

interaction QTL mapping gives MKL error

Hi Francois,

Thanks for adding eigenMT in to the command line version of tensorQTL for interaction QTL mapping!
I've updated and reinstalled, but I'm now getting this error at the end of my run, after all phenotype groups have been processed:

Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.

I'm running tensorQTL on my HPC which runs CENTOS Linux. I followed your installation instructions but used conda instead of venv to create an environment. I've run tensorQTL on cis and nominal passes multiple times (on GPUs) without error.

I did some digging and seems to be an issue with the MKL library. I've tried installing mkl, nomkl (depending on which stack overflow thread you look at, both are apparently the culprit), updating numpy and scipy but to no avail.

Do you have any ideas how we could fix this?

Setting a seed

Is there any ability to set a seed for the tensorflow ops, given it's a non-deterministic optimisation process?

cis mode gives wrong top SNP when monomorphic SNP is present

Hello, when running cis mode I'm getting two different top SNPs for a gene depending on whether a third SNP, which is monomorphic, is present. In that case it seems that pval_nominal, ma_samples, ma_count, etc. are correct, but not the variant_id, tss_distance, etc. For example (files: test.zip):

python -m tensorqtl --mode cis geno1 expr.bed.gz out1 --covariates covar.txt

That outputs:

phenotype_id    num_var beta_shape1     beta_shape2     true_df pval_true_df    variant_id      tss_distance    ma_samples      ma_count        maf     ref_factor      pval_nominal
    slope   slope_se        pval_perm       pval_beta
ENSRNOG00000014290      30      0.963347        3.1823  68.246  0.0478687       chr1:4435151    -730709 53      65      0.411392        1       0.0482753       -0.0647425      0.0321917       0.160684        0.156524

When chr1:4403830, which is all homozygous alt, is excluded, the SNP following chr1:4435151 becomes the top one:

python -m tensorqtl --mode cis geno2 expr.bed.gz out2 --covariates covar.txt

That outputs:

phenotype_id    num_var beta_shape1     beta_shape2     true_df pval_true_df    variant_id      tss_distance    ma_samples      ma_count        maf     ref_factor      pval_nominal
    slope   slope_se        pval_perm       pval_beta
ENSRNOG00000014290      29      0.944845        3.14276 68.2232 0.0479062       chr1:4435247    -730613 53      65      0.411392        1       0.0482753       -0.0647425      0.0321917       0.158984        0.161257

cis_nominal mode produces the correct values for these two SNPs in both cases, with ma_samples and ma_count matching what I see in the original VCF:

phenotype_id    variant_id  tss_distance       maf  ma_samples  ma_count  pval_nominal     slope  slope_se
...
ENSRNOG00000014290  chr1:4435151       -730709  0.082278          13        13      0.387017 -0.049743  0.057135
ENSRNOG00000014290  chr1:4435247       -730613  0.411392          53        65      0.048275 -0.064742  0.032192

I'm using the latest version from GitHub.

Set number of permutations fails

Hi,

I managed to run tensorQTL successfully from a docker container (using your dockerfile) but soon as I added permutations (e.g. --permutations 100) it failed with the following error:

permutation_ix_t = torch.LongTensor(np.array([np.random.permutation(ix) for i in range(nperm)])).to(device)
TypeError: 'str' object cannot be interpreted as an integer

It also failed to build docker image from your docker file as Bioconductor changed the way to install libraries. When I updated that line (47 I think) I managed to create a docker image.

Hope this helps,
Thanks,

[QST] Binary phenotypes

I have a question - Does tensorQTL support binary phenotypes or just quantitative? It looks like FastQTL only allows float phenotype values thus my question.

TSS distance higher than the specified window for cis-eQTL analysis

Hi,
I was interested in TSS distances in cis eQTL results using tensorqtl.
Some values of tss_distance were <-1Mb or >1Mb, for example 23,163,540. The computation of the TSS distance is correct from looking at the position of the variant in the .bim file and the TSS. I am wondering what went wrong in my analysis. Could you please help me in understanding this ?
Thank you
Elodie

mixQTL

Hello!

I'm a research scientist at the Ontario Institute for Cancer Research. We would be very interested to use mixQTL. In the paper (Liang et al. 2021 Nat Comm) it is mentioned that it is embedded within tensorQTL, but I cannot see any reference to it over here, and how to use it. Could you help us?

Thanks,

Marie-Julie


Marie-Julie Favé, Ph.D.
Awadalla Research Group
Ontario Institute for Cancer Research
Montreal Heart Institute
Vector Institute for Artificial Intelligence PG Affiliate

independent eQTL process Missing results

Hi ~
I'm run conditionally independent QTLs model to find multiple independent signals of a eGene.

it ran successfully and got the result. But when I checked the number of eGene, I found that one eGene was missing from the output. Can I directly treat this missing eGene as having only one independent signal. will this affect the final result?

And the following is the missing eGene item and log file I used

[Apr 23 23:17:12] Running TensorQTL: cis-QTL mapping
  * WARNING: using CPU!
  * reading phenotypes (/data/cotton/zhenpingliu/LZP_RNAseq/03expres
  * reading covariates (/data/cotton/zhenpingliu/LZP_RNAseq/03expres
cis-QTL mapping: conditionally independent variants
  * 370 samples
  * 399/399 significant phenotypes
  * 23 covariates
  * 2658921 variants
  * computing independent QTLs
  Time elapsed: 13.59 min
done.
  * writing output
[Apr 23 23:31:42] Finished mapping

The information of missing eGene

phenotype_id num_var beta_shape1 beta_shape2 true_df pval_true_df variant_id tss_distance ma_samples ma_count maf ref_factor pval_nominal slope slope_se pval_perm pval_beta qval pval_nominal_threshold
Ghir_A12G018680 3836 1.00796 86.6273 311.949 0.000156202 SNP1417548 809012 167 195 0.282609 -1 6.99197e-05 -0.206414 0.0512763 0.0141986 0.012944999999999998 0.0480818862398944 0.035407167156267794

ValueError: Names should be an ordered collection.

Thank you for the hard work.

I've been using tensorqtl for the past 3-4 months and recently it started erroring when reading the genotype file, even with the demonstration files:

pr = genotypeio.PlinkReader(plink_prefix_path)
Mapping files: 0%| | 0/3 [00:00<?, ?it/s]Traceback (most recent call last):
File "", line 1, in
File "/home/aduarte/tensorqtl/tensorqtl/genotypeio.py", line 135, in init
self.bim, self.fam, self.bed = read_plink(plink_prefix_path, verbose=verbose)
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas_plink/_read.py", line 115, in read_plink
bim = _read_file(fn, lambda fn: _read_bim(fn["bim"]), pbar)
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas_plink/_read.py", line 326, in _read_file
data.append(read_func(f))
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas_plink/_read.py", line 115, in
bim = _read_file(fn, lambda fn: _read_bim(fn["bim"]), pbar)
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas_plink/_read.py", line 358, in _read_bim
df = _read_csv(fn, header)
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas_plink/_read.py", line 341, in _read_csv
engine="c",
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 686, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 449, in _read
_validate_names(kwds.get("names", None))
File "/home/aduarte/tensorqtl/example/venv/lib/python3.6/site-packages/pandas/io/parsers.py", line 417, in _validate_names
raise ValueError("Names should be an ordered collection.")
ValueError: Names should be an ordered collection.

I believe it is due to the most recent pandas update.
Thank you for your time

Incompatible with pandas plink 2.2

Hi @francois-a, thanks for developing this extremely useful tool. Recently we encountered the following error

python3 -m [genotype_prefix] [bed_gz] [out_prefix] --mode trans
[Nov 17 09:14:45] Running TensorQTL: trans-QTL mapping
  * WARNING: using CPU!
  * reading phenotypes (/gpfs/data/im-lab/nas40t2/natasha/pheno-tensorqtl.bed.gz)
Mapping files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [02:01<00:00, 40.61s/it]
/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/genotypeio.py:144: PerformanceWarning: Slicing is producing a large chunk. To accept the large
chunk and silence this warning, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': False}):
    ...     array[indexer]
To avoid creating the large chunks, set the option
    >>> with dask.config.set(**{'array.slicing.split_large_chunks': True}):
    ...     array[indexer]
  self.bed = self.bed[:,ix]
Traceback (most recent call last):
  File "/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/runpy.py", line 192, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/__main__.py", line 2, in <module>
    tensorqtl.main()
  File "/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/tensorqtl.py", line 100, in main
    pr = genotypeio.PlinkReader(args.genotype_path, select_samples=phenotype_df.columns, dtype=np.int8)
  File "/gpfs/data/im-lab/nas40t2/bin/envs/tensorqtl/lib/python3.8/site-packages/tensorqtl/genotypeio.py", line 159, in __init__
    self.chrs = self.bim['chrom'].unique().tolist()
AttributeError: 'StringArray' object has no attribute 'tolist'

And here is the environment we used.

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                      1_llvm    conda-forge
_pytorch_select           0.2                       gpu_0
_r-mutex                  1.0.0               anacondar_1
attrs                     20.3.0                   pypi_0    pypi
binutils_impl_linux-64    2.31.1               h6176602_1
binutils_linux-64         2.31.1               h6176602_9
blas                      1.0                         mkl
bwidget                   1.9.11                        1
bzip2                     1.0.8                h7b6447c_0
ca-certificates           2020.10.14                    0
cairo                     1.14.12              h8948797_3
certifi                   2020.6.20          pyhd3eb1b0_3
cffi                      1.14.3                   pypi_0    pypi
cudatoolkit               10.1.243             h6bb024c_0
cudnn                     7.6.5                cuda10.1_0
curl                      7.71.1               he644dc0_3    conda-forge
dask                      2.30.0                   pypi_0    pypi
deprecated                1.2.10                   pypi_0    pypi
fontconfig                2.13.0               h9420a91_0
freetype                  2.10.4               h5ab3b9f_0
fribidi                   1.0.10               h7b6447c_0
fsspec                    0.8.4                    pypi_0    pypi
gcc_impl_linux-64         7.3.0                habb00fd_1
gcc_linux-64              7.3.0                h553295d_9
gettext                   0.19.8.1             hd7bead4_3
gfortran_impl_linux-64    7.3.0                hdf63c60_1
gfortran_linux-64         7.3.0                h553295d_9
glib                      2.66.2               h58526e2_0    conda-forge
graphite2                 1.3.14               h23475e2_0
gsl                       2.4                  h14c3975_4
gxx_impl_linux-64         7.3.0                hdf63c60_1
gxx_linux-64              7.3.0                h553295d_9
harfbuzz                  2.4.0                hca77d97_1
icu                       58.2                 he6710b0_3
iniconfig                 1.1.1                    pypi_0    pypi
jinja2                    2.11.2                     py_0
jpeg                      9b                   h024ee3a_2
krb5                      1.17.1               h173b8e3_0
ld_impl_linux-64          2.34                 h53a641e_0    conda-forge
libcurl                   7.71.1               hcdd3856_3    conda-forge
libedit                   3.1.20191231         h46ee950_2    conda-forge
libffi                    3.2.1             he1b5a44_1007    conda-forge
libgcc-ng                 9.2.0                h24d8f2e_2    conda-forge
libgfortran-ng            7.3.0                hdf63c60_5    conda-forge
libglib                   2.66.2               hbe7bbb4_0    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
libpng                    1.6.37               hbc83047_0
libssh2                   1.9.0                h1ba5d50_1
libstdcxx-ng              9.2.0                hdf63c60_2    conda-forge
libtiff                   4.1.0                h2733197_1
libuuid                   1.0.3                h1bed415_2
libxcb                    1.14                 h7b6447c_0
libxml2                   2.9.10               hb55368b_3
llvm-openmp               10.0.0               hc9558a2_0    conda-forge
locket                    0.2.0                    pypi_0    pypi
lz4-c                     1.9.2                heb0550a_3
make                      4.2.1                h1bed415_1
markupsafe                1.1.1            py38h7b6447c_0
mkl                       2019.5                      281    conda-forge
mkl-service               2.3.0            py38he904b0f_0
mkl_fft                   1.2.0            py38h23d657b_0
mkl_random                1.1.0            py38h962f231_0
ncurses                   6.1               hf484d3e_1002    conda-forge
ninja                     1.10.0               hc9558a2_0    conda-forge
numpy                     1.19.2           py38h54aff64_0
numpy-base                1.19.2           py38hfa32c7d_0
openssl                   1.1.1h               h7b6447c_0
packaging                 20.4                     pypi_0    pypi
pandas                    1.1.4                    pypi_0    pypi
pandas-plink              2.2.2                    pypi_0    pypi
pango                     1.45.3               hd140c19_0
partd                     1.1.0                    pypi_0    pypi
pcre                      8.44                 he6710b0_0
pip                       20.0.2                     py_2    conda-forge
pixman                    0.40.0               h7b6447c_0
pluggy                    0.13.1                   pypi_0    pypi
py                        1.9.0                    pypi_0    pypi
pyarrow                   2.0.0                    pypi_0    pypi
pycparser                 2.20                       py_0    conda-forge
pyparsing                 2.4.7                    pypi_0    pypi
pytest                    6.1.2                    pypi_0    pypi
python                    3.8.0                h357f687_5    conda-forge
python-dateutil           2.8.1                    pypi_0    pypi
python_abi                3.8                      1_cp38    conda-forge
pytorch                   1.4.0           cuda101py38h02f0884_0
pytz                      2020.4                   pypi_0    pypi
pyyaml                    5.3.1                    pypi_0    pypi
r-base                    3.6.1                haffb61f_2
readline                  8.0                  hf8c457e_0    conda-forge
rpy2                      3.3.6           py38r36hab2c0dc_0    conda-forge
scipy                     1.5.4                    pypi_0    pypi
setuptools                50.3.1           py38h06a4308_1
simplegeneric             0.8.1                    py38_2
six                       1.14.0                     py_1    conda-forge
sqlite                    3.30.1               hcee41ef_0    conda-forge
tensorqtl                 1.0.4                    pypi_0    pypi
tk                        8.6.10               hed695b0_0    conda-forge
tktable                   2.10                 h14c3975_0
toml                      0.10.2                   pypi_0    pypi
toolz                     0.11.1                   pypi_0    pypi
tqdm                      4.51.0                   pypi_0    pypi
tzlocal                   2.1                      py38_0
wheel                     0.34.2                     py_1    conda-forge
wrapt                     1.12.1                   pypi_0    pypi
xarray                    0.16.1                   pypi_0    pypi
xz                        5.2.5                h516909a_0    conda-forge
zlib                      1.2.11            h516909a_1006    conda-forge
zstandard                 0.14.0                   pypi_0    pypi
zstd                      1.4.5                h9ceee32_0

With some look into the code, it seems that the error is due to the incompatibility with pandas-plink=2.2 (about loading BIM file). We were wondering if this issue will be fixed in the future or we should stick with an older version of pandas-plink in order to run tensorqtl.

Thanks!

Beta-value for trans-eQTL

Hello, Quick question, for trans-eQTL, how to output the Beta value? The out put only output p-value and maf. thanks.

invalid value encountered in sqrt error in cis-QTL mapping

Hi,

I've done the cis-QTL mapping with cis.map_nominal function and got the results, but when I tried to run cis_df = cis.map_cis(genotype_df, variant_df, phenotype_df, phenotype_pos_df, covariates_df, nperm=100, seed = 1234) it kept giving me the same error below even when I used the example GEUVADIS.445 data.

processing phenotype 19795/19798
processing phenotype 19796/19798
processing phenotype 19797/19798
processing phenotype 19798/19798

/home/il/faststorage/miniconda3/envs/tensorQTL_new/lib/python3.6/site-packages/tensorqtl/core.py:217: RuntimeWarning: invalid value encountered in sqrt
return 2*stats.t.cdf(-np.abs(np.sqrt(tstat2)), dof)
/home/il/faststorage/miniconda3/envs/tensorQTL_new/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in greater
 return (a < x) & (x < b)
/home/il/faststorage/miniconda3/envs/tensorQTL_new/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:903: RuntimeWarning: invalid value encountered in less
return (a < x) & (x < b)
/home/il/faststorage/miniconda3/envs/tensorQTL_new/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1827: RuntimeWarning: invalid value encountered in greater_equal
cond2 = (x >= np.asarray(_b)) & cond0

I'm not sure which problem caused this error?

Many thanks.

Incorrect calculation of TSS

It appears that the "TES" is being assigned as the TSS:

phenotype_pos_df = phenotype_df[['chr', 'end']].rename(columns={'end':'tss'})

though even changing end to start wouldn't quite fix the issue since genes may be on the reverse strand. Perhaps there should be an option to include strand information in the phenotype files.

Conditional analysis run time

Hi Francois,

Looking at implementing a conditional analysis step in my pipeline but I'm finding the time required to run is much longer. As an example, a map_cis call for ~1100 phenotypes on one chromosome takes 1 minute, but map_independent only makes it through 9 or 10 of those phenotypes in 12 hours.

I assume this is an inevitable consequence of the forward-backward method, but I was wondering if there's anything I can do to speed it up. I tried reducing the the nperm parameter from 10,000 to 1000 but all that did was stop some phenotypes from converging.

Failing that, do you have a rough ballpark estimate for how much longer the independent step is likely to take than the corresponding cis step? I can chunk things up if necessary but your guidance could save a lot of trial and error!

Thanks.

const gene values

Hi,

Here is a code that returns an error (cannot perform reduction function max on tensor with no elements because the operation does not have an identity) when a gene contains the same value for all samples. If you uncomment ph.loc['gene_1', 'sample_1'] = 6 it finishes successfully.

# Const gene
import pandas as pd
from tensorqtl import cis
from numpy.random import randint

samples = 10
snps = 2
genes = 2

print('Start: Const gene')

gt = pd.DataFrame(randint(-1, 3, size=(snps,samples)), 
                  columns=['sample_{}'.format(i) for i in range(1,samples+1)], 
                  index=['snp_{}'.format(i) for i in range(1,snps+1)])

gt_pos = pd.DataFrame({'pos' : list(range(1,snps+1)),
                      'chrom': '1'},
                      index = gt.index)

ph = pd.DataFrame(randint(5, 15, size=(genes, samples)), 
                  columns = gt.columns, 
                  index = ['gene_{}'.format(i) for i in range(1,genes+1)])

ph.loc['gene_1'] = 5
#ph.loc['gene_1', 'sample_1'] = 6
print(ph.loc['gene_1'])

covariates_df = pd.DataFrame(1, index=['C'], columns=ph.columns).T

#genes_pos chr tss
ph_pos = pd.DataFrame({'chr' : ['1' ,'1'], 'tss' : [100,1000]})
ph_pos.index = ph.index

cis_df = cis.map_cis(gt, gt_pos, ph, ph_pos, covariates_df)

Thanks!

Missing values

I am wondering if tensorQTL can handle missing values (e.g. NA) for phenotype?

Insignificant Interaction QTLs

Hi,
I know some cell types have no ieQTLs, but I am still want the nominal p value.
I found if there is no "cis_qtl_top_assoc.txt.gz" output file, the ".cis_qtl_pairs.chr*.parquet" file will be empty. Can this file be not empty?
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.