jonjala / mtag Goto Github PK

View Code? Open in Web Editor NEW

167.0 15.0 54.0 43.14 MB

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)

License: GNU General Public License v3.0

Perl 2.28% R 0.78% Python 96.94% Roff 0.01%

statistical-genetics gwas multi-trait-analysis mtag

mtag's People

Contributors

Stargazers

Watchers

Forkers

geneticresources xuqiang9042 aaronwolen xtmgah tankmermaid saphir746 gqi arcca dongjt0727 wusixer lilinshine biostatpzeng engineerkhan carbocation lechanglc hank9cao benrifkah xhyuo junghyunjj zhanyq chenll9701 scau-drr stephanholgerd strao1986 fabbondanza quanrd lkachuri hebilly shicheng-guo gcuellarpartida ettasnow tinaguo-hku stoikumo xiahui625649 songtaogui jtnedoctor baris-insitro lvxinyi726 mayunlong89 pkuepisd cinnamonfish yxkxy davidroberson nvrivera zhengsirui tao-1016 skresearcher sharkts666 nvice111 zietzm xiechengyong123 thebigcorporation hihg-um fayfeifay

mtag's Issues

cannot multiply sequence by non-int of type float

Running MTAG with this command line:

Calling ./mtag.py \
--z-name Z \
--bpos-name BP \
--verbose  \
--stream-stdout  \
--n-name N \
--a2-name ALLELE0 \
--n-min 0 \
--a1-name ALLELE1 \
--snp-name SNP \
--chr-name CHR \
--sumstats temp,temp2 \
--out out

But get the error below, bit files however get generated. Log is also attached.

Writing Phenotype 1 to file ...
Writing Phenotype 2 to file ...
can't multiply sequence by non-int of type 'float'
Traceback (most recent call last):
  File "/humgen/atgu1/fs03/wip/aganna/mtag-master/mtag.py", line 1382, in <module>
    mtag(args)
  File "/humgen/atgu1/fs03/wip/aganna/mtag-master/mtag.py", line 1279, in mtag
    save_mtag_results(args, res_temp,Zs,Ns, Fs,mtag_betas,mtag_se)
  File "/humgen/atgu1/fs03/wip/aganna/mtag-master/mtag.py", line 840, in save_mtag_results
    final_summary += str(summary_df.round(3))+'\n'
  File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/pandas/core/frame.py", line 4396, in round
    new_cols = [np.round(v, decimals) for _, v in self.iteritems()]
  File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2793, in round_
    return round(decimals, out)
  File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/pandas/core/series.py", line 1240, in round
    result = _values_from_object(self).round(decimals, out=out)
TypeError: can't multiply sequence by non-int of type 'float'
Analysis terminated from error at Sat Nov 18 13:41:26 2017

mixmodel_mainpheno_klein_sexual_partners_qt_M_mtag.log

Reporting weights and correlation matrix in MTAG log file

It would be helpful for diagnosing problems with the software if the log file reported two additional things.

The log file already reports Omega, it would be great if it also reported the correlation matrix implied by Omega.
It also would be helpful if it reported the weights assigned to each trait. This could be the weight associated with the max N or the mean N.

p-value threshold in maxFDR calculation

I’m having a bit of problems getting the FDR step to run. It works fine with the standard p-value threshold, but it chokes if I provide an alternative threshold using –p_sig (which is relevant to do in our case). Specifying 5.0e-8, 1.667e-8 and 0.0000000167 results in a TypeError: unsupported operand type(s) for /: ‘str’ and ‘float’. Any ideas why that is? – I can send you the traceback if you need it.

Some question about n

In the two sample files 1_oa2016_hm3samp_neur.txt and 1_oa2016_hm3samp_swb.txt you gave me, I don't know exactly how to get the N value, and I still don't fully understand it after reading relevant literature. Can you give me an example of where the N values in these two files come from? This is the N value is the population sample size or the SNP sample size?

case-control GWAS

Very interesting work. One of my traits is case-control GWAS, I am wondering whether I can use log(OR) for beta?

Thank you

Missing SNPs in MTAG summary results

Dear Omeed,

I have two phenotypes that are moderately correlated (rho = 0.6). Therefore, I used MTAG to boost the power of my single-GWASs. However, when inspecting the results, I see that some of the original SNPs available in the single-GWAS summary statistics are missing from the MTAG results for both phenotypes.

I have checked for allele inconsistency between both single-GWASs and this is not the case.

I was wondering if you could help me with this issue?

Many thanks,
Julia

Beta value from mtag results.

When running MTAG I have noticed that I´m getting much lower beta values (mtag_beta) for each trait.
For example, for one trait a SNP has an effect of -0.09 but the mtag_beta is -0.01 for that same trait.
This might of course depend on many things, e.g the traits used.
After reading the MTAG paper (and supplementary information) and the tutorial on github I´m no closer in understanding how to interpret the mtag_beta.

When running a single trait in MTAG (this is actually possible without getting warning or error messages) the mtag_beta is also much lower than the original beta value for that single trait. Would you not expect mtag_beta to be identical to the trait´s beta value when running only one trait ?

Here is the log file from the single trait MTAG run:
ST.txt

Using maf_min to preserve low-MAF SNPs without affecting the model

Looking at @andgan 's recent thread in which maf_min is set to 0, I am wondering to what extent that maf_min affects the MTAG omega estimation. If MTAG uses MAF 0.01 as a hard minimum for estimation, then I'd like to set maf_min to 0 to observe the re-weighted output. But if dropping maf_min to 0 will cause a change in how the omega and other matrices are computed, then I'd rather leave it untouched, even if that means that the low frequency spectrum SNPs cannot be output.

Array must not contain infs or NaNs error

I am getting a message that I interpret to mean that you cannot have missing pvalues for SNPs that overlap the two trait summary files.
Is this a correct assumption?
The input file example is below and so is the error message ).

snpid chr bpos a1 a2 freq z pval n
rs531646671 1 14599 A T 0.02298 -0.8175 0.414 544
rs541940975 1 14604 G A 0.02298 -0.8175 0.414 544
1 14610 C T 0.02206 -0.9074 0.3646 544
rs548251696 1 15776 T A 0.00814 . . 553
rs2691315 1 15820 T G 0.47232 -0.0544 0.9567 542
1 15834 T C 0.00091 . . 552
rs556025965 1 15849 T C 0.00271 . . 553
1 15884 C G 0.00096 . . 522

2018/04/03/09:55:57 AM Beginning estimation of Omega ...
2018/04/03/09:55:57 AM Using GMM estimator of Omega ..
2018/04/03/09:55:59 AM Checking for positive definiteness ..
2018/04/03/09:55:59 AM
Traceback (most recent call last):
File "mtag-master/mtag.py", line 1348, in
mtag(args)
File "mtag-master/mtag.py", line 1233, in mtag
args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat)
File "mtag-master/mtag.py", line 690, in estimate_omega
return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD))
File "mtag-master/mtag.py", line 417, in _posDef_adjustment
if is_pos_semidef(mat):
File "mtag-master/mtag.py", line 416, in
is_pos_semidef = lambda m: np.all(np.linalg.eigvals(m) >= 0)
File "/hpcf/apps/python/install/2.7.12/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 904, in eigvals
_assertFinite(a)
File "/hpcf/apps/python/install/2.7.12/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 216, in _assertFinite
raise LinAlgError("Array must not contain infs or NaNs")
LinAlgError: Array must not contain infs or NaNs
2018/04/03/09:55:59 AM Analysis terminated from error at Tue Apr 3 09:55:59 2018
2018/04/03/09:55:59 AM Total time elapsed: 15.0m:38.64s

If freq is not available,

Hello,

I wonder if I can use MTAG for a gwas summary which do not have freq information. I downloaded GWAS summary data sets, and one of them do not have freq information.
Also, do you expect this frequency of a SNP computed from only controls, only cases, or both?

Best,
Jaeyoon

tutorial not working, SE requested but not in example files

Hi
I am trying to run the tutorial, but the program seems to be expecting a SE column, which should not be necessary

tutorial_results_1.1NS.log

Best,

Uku

Intersecting SNP not reported

I have noticed that some SNPs that are present in both summary statistics files are not reported by MTAG. They have high AF and INFO score, so I don't understand why they are filtered. For example this SNP:

file 1:

  SNP           CHR      BP        a1 a2     freq         Variant             INFO       
rs11855646  15 57178528  T  A 0.828166 15:57178528:T:A 0.989748
         BETA         SE                  Z              P      N
0.00485497 0.00086159 5.6349    1.8e-08 187949

file 2:

         P          BETA              SE                N        SNP           CHR       BP    a1 a2    INFO
0.0652447 0.0446304 0.0242773 29813 rs11855646  15 57178528  T  A 0.99836
      freq        Z
1: 0.81012 1.838359

and run MTAG with the following options:

./mtag.py --sumstats file1,file2 --snp_name SNP --n_name N --maf_min 0 --chr_name CHR --fdr --cores 8 --n_approx --bpos_name BP --z_name Z --out outfile --verbose --stream_stdout --n_min 0

In the log it says: Dropped 1161271 SNPs due to strand ambiguity, but I didn't specified --drop_ambig_snps. Might this be a bug?

Attached the relevant log:

Beginning MTAG analysis...
MTAG will use the Z column for analyses.
Read in Trait 1 summary statistics (7676511 SNPs) from /humgen/atgu1/fs03/wip/aganna/uk_bio/SSI/GWAS/final_st/temp ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1  <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
N:	Sample size
a1:	Allele 1, interpreted as ref allele for signed sumstat.
P:	p-Value
a2:	Allele 2, interpreted as non-ref allele for signed sumstat.
SNP:	Variant ID (e.g., rs number)
Z:	Directional summary statistic as specified by --signed-sumstats.
SE:	Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 7676511 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.0.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
7676511 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (7676511 SNPs remain).
Removed 0 SNPs with N < 0.0 (7676511 SNPs remain).
Median value of SIGNED_SUMSTAT was 0.00805245, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.056
Lambda GC = 1.048
Max chi^2 = 33.09
46 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Sat Nov 24 08:23:55 2018
Total time elapsed: 1.0m:5.71s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining:	 7676511
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

[...]


Read in Trait 2 summary statistics (7929183 SNPs) from /humgen/atgu1/fs03/wip/aganna/uk_bio/SSI/GWAS/final_st/temp3 ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2  <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
N:	Sample size
a1:	Allele 1, interpreted as ref allele for signed sumstat.
P:	p-Value
a2:	Allele 2, interpreted as non-ref allele for signed sumstat.
SNP:	Variant ID (e.g., rs number)
Z:	Directional summary statistic as specified by --signed-sumstats.
SE:	Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 7929183 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.0.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 660921 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
7268262 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (7268262 SNPs remain).
Removed 0 SNPs with N < 0.0 (7268262 SNPs remain).
Median value of SIGNED_SUMSTAT was -7.28457228869e-05, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.048
Lambda GC = 1.046
Max chi^2 = 27.108
0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Sat Nov 24 08:25:59 2018
Total time elapsed: 1.0m:6.05s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining:	 7268262
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Dropped 1161271 SNPs due to strand ambiguity, 6515240 SNPs remain in intersection after merging trait1
Dropped 1384 SNPs due to inconsistent allele pairs from phenotype 2. 5713249 SNPs remain.
Flipped the signs of of 2855037 SNPs to make them consistent with the effect allele orderings of the first trait.
Dropped 0 SNPs due to strand ambiguity, 5713249 SNPs remain in intersection after merging trait2
... Merge of GWAS summary statistics complete. Number of SNPs:	 5713249
Using 5713249 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity)
Estimating sigma..
Preparing phenotype 0 to estimate sigma
Preparing phenotype 1 to estimate sigma
created Logger instance to pass through ldsc.
Reading reference panel LD Score from /humgen/atgu1/fs03/wip/aganna/mtag/ld_ref_panel/eur_w_ld_chr/[1-22] ...
Read reference panel LD Scores for 1290028 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /humgen/atgu1/fs03/wip/aganna/mtag/ld_ref_panel/eur_w_ld_chr/[1-22] ...

[...]

Beta, SE and sample size flags

I note the new --beta_name and --se_name flags in the most recent version of mtag script. I also read somewhere in the forum that beta and standard errors should be standardized beforehand. My phenotype is a quantitative variable which has been inverse normal rank-transformed. Do I still need standardized GWAS summary betas and ses for mtag?

Secondly, the GWAS summary statistics table from BOLT-LMM does not include the sample size column. Of course, the overall sample size of the phenotype is known. Is it ok to create a new column with a single overall sample size in the summary stats for mtag script? The wiki page on ldsc summary stats (https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format) mentioned that the sample size may vary from SNP to SNP. What is the best way to obtain SNP-specific sample sizes?

About Beta and OR

Hi everone,

I have two results of GWASs, one is the Beta and another is OR, could I used the MTAG program to analyze them? Thanks!

error list index out of range

Running this command

python mtag.py \
	--sumstats temp.txt,temp2.txt \
	--snp_name SNP \
	--n_name N \
	--chr_name CHR \
	--bpos_name BP \
	--a1_name ALLELE1 \
	--a2_name ALLELE0 \
	--z_name Z \
	--out outfile \
	--verbose \
	--stream_stdout \
	--n_min 0

and get this error:

list index out of range
Traceback (most recent call last):
  File "mtag.py", line 1382, in <module>
    mtag(args)
  File "mtag.py", line 1206, in mtag
    mtag_path = re.findall(".*/",__file__)[0]
IndexError: list index out of range

I attached the head of the two files

thanks

temp.txt
temp2.txt

ValueError: cannot convert float NaN to integer

After completed MTAG calculations of 5 traits, I get a ValueError: cannot convert float NaN to integer.
The consequence is that I lack mtag_estimates for one of my traits.
I have attached the log-file.

Prior to the analysis I removed all SNPs with NAs and the SNPs with frequencies of 0 or 1.
I have no idea why this error occurs, and what to do about it.

NaK.log

Questions concerning input files

I am attempting to use the MTAG software to jointly analyze two sets of GWAS summary statistics.
I have updated all the required packages and have it running successfully on the neuroticism and subjective well-being example data sets from the tutorial.

However, when I try to run it on my own set of GWAS's (2 of them), I am getting an error that leaves me unsure about how to proceed.

Considering that the software worked on the example data sets, I suspect that the issue arises from one of two sources:

I am not building/formatting my input data sets correctly.
A certain assumption about the data does not hold in one or both of my GWAS's.

To elaborate:

I use the following columns in my input file:
"snpid chr bpos a1 a2 freq z pval n".
I have no doubts about the snpid, chr, bpos, a1, a2 columns but I suspect that there may be an issue with one or both of the remaining two columns.
Z-scores for each SNP is z = beta/standard error and the SNP sample sizes n = number of individuals in the GWAS (341,450 in each GWAS).
I am not sure that "snp sample sizes" represents the number of individuals in the cohort but this was my best guess after doing some research.
If there are no issues with the input columns, then my next guess would be that there are statistical differences between our data sets that were perhaps not accounted for in the software.

I have attached the log file. Hopefully it can help you direct me toward a solution.
Log file:
mtag.log

Thanks in advance

ValueError : Could not determine N with --use-beta-se parameter

Hi,

I saw that recently, MTAG takes BETA and SE instead of Z-score and N.
So I used --use_beta_se --beta_name and --se_name but, I have an error :

MTAG could not determine N

Should I give the N column with BETA and SE anyway ?
Also when I used --use_beta_se only (with my column named beta and se) I have an other error :

Could not find SIGNED_SUMSTAT column

Thank you !

Morgane

There is my log file for the first question :

2018/10/25/10:44:02 AM
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.8
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py
--se-name SE
--n-min 0.0
--ld-ref-panel ldscore_calc/filtered-set_test2_mis84maf3_bgl_ref_ldscores/filtered-set_test2_mis84ma
f3_bgl_withNames_newCHRNames_
--use-beta-se
--sumstats 1DmeansTS_06182.assoc.4MVP,1DmeansTS_06186.assoc.4MVP,1DmeansTS_061811.assoc.4MVP
--beta-name BETA
--out ./CompoProt_T_MTAG

2018/10/25/10:44:02 AM Beginning MTAG analysis...
2018/10/25/10:44:02 AM MTAG will use the provided BETA/SE columns for analyses.
2018/10/25/10:44:20 AM Read in Trait 1 summary statistics (5591340 SNPs) from 1DmeansTS_06182.assoc.4MVP ...
2018/10/25/10:44:20 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/10/25/10:44:20 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/10/25/10:44:20 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/10/25/10:44:20 AM
ERROR converting summary statistics:

2018/10/25/10:44:20 AM Traceback (most recent call last):
File "MTAG/mtag-master_Octobre18/mtag_munge.py", line 812, in munge_sumstats
raise ValueError('Could not determine N.')
ValueError: Could not determine N.

2018/10/25/10:44:20 AM
Conversion finished at Thu Oct 25 10:44:20 2018
2018/10/25/10:44:20 AM Total time elapsed: 0.0s
2018/10/25/10:44:20 AM Could not determine N.
Traceback (most recent call last):
File "../mtag-master_Octobre18/mtag.py", line 1520, in
mtag(args)
File "../mtag-master_Octobre18/mtag.py", line 1291, in mtag
DATA_U, DATA, args = load_and_merge_data(args)
File "../mtag-master_Octobre18/mtag.py", line 245, in load_and_merge_data
GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p)
File "../mtag-master_Octobre18/mtag.py", line 159, in _perform_munge
munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False)
File "MTAG/mtag-master_Octobre18/mtag_munge.py", line 812, in munge_sumstats
raise ValueError('Could not determine N.')
ValueError: Could not determine N.
2018/10/25/10:44:20 AM Analysis terminated from error at Thu Oct 25 10:44:20 2018
2018/10/25/10:44:20 AM Total time elapsed: 18.19s

There is my log file for the second question :

2018/10/25/10:19:29 AM
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.8
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py
--se-name s
--ld-ref-panel MTAG/Medicago_PMG/ldscore_calc/filtered-set_test2_mis84maf3_bgl_ref_ldscores/filtered-set_test2_mis84maf3_bgl_withNames_newCHRNames_
--use-beta-se
--sumstats 1DmeansTS_06182.assoc.4MVP,1DmeansTS_06186.assoc.4MVP,1DmeansTS_061811.assoc.4MVP
--beta-name b
--out ./CompoProt_T_MTAG

2018/10/25/10:19:29 AM Beginning MTAG analysis...
2018/10/25/10:19:29 AM MTAG will use the provided BETA/SE columns for analyses.
2018/10/25/10:19:47 AM Read in Trait 1 summary statistics (5591340 SNPs) from 1DmeansTS_06182.assoc.4MVP ...
2018/10/25/10:19:47 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/10/25/10:19:47 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/10/25/10:19:47 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/10/25/10:19:47 AM
ERROR converting summary statistics:

2018/10/25/10:19:47 AM Traceback (most recent call last):
File "MTAG/mtag-master_Octobre18/mtag_munge.py", line 796, in munge_sumstats
raise ValueError('Could not find {C} column.'.format(C=c))
ValueError: Could not find SIGNED_SUMSTAT column.

2018/10/25/10:19:47 AM
Conversion finished at Thu Oct 25 10:19:47 2018
2018/10/25/10:19:47 AM Total time elapsed: 0.0s
2018/10/25/10:19:47 AM Could not find SIGNED_SUMSTAT column.
Traceback (most recent call last):
File "../mtag-master_Octobre18/mtag.py", line 1520, in
mtag(args)
File "../mtag-master_Octobre18/mtag.py", line 1291, in mtag
DATA_U, DATA, args = load_and_merge_data(args)
File "../mtag-master_Octobre18/mtag.py", line 245, in load_and_merge_data
GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p)
File "../mtag-master_Octobre18/mtag.py", line 159, in _perform_munge
munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False)
File "MTAG/mtag-master_Octobre18/mtag_munge.py", line 796, in munge_sumstats
raise ValueError('Could not find {C} column.'.format(C=c))
ValueError: Could not find SIGNED_SUMSTAT column.
2018/10/25/10:19:47 AM Analysis terminated from error at Thu Oct 25 10:19:47 2018
2018/10/25/10:19:47 AM Total time elapsed: 18.25s

reference panel

The reference panel implemented in MTAG is 1000G CEU. Is it possible to use other reference panel such as 1000G ASW or lager reference panel of the Haplotype Reference Consortium (HRC)? How should I generate the reference panel information needed for MTAG computation? Thank you very much!

Divide by zero error

Hi All,

Really love this project. I'm coming back to some analyses that I previously ran with v1.0.3, but now the same analysis is throwing an error in v1.0.6 and v1.0.7. Another member of my lab independently ran into the same thing (fresh install of python & MTAG)- log file attached: MTAG_test2.log

Briefly, when it gets to the point where it's time to write-out Trait 2 it throws an error:

File "/scratch/dbaranger/mtag/mtag.py", line 762, in save_mtag_results
out_df['mtag_beta'] = mtag_betas[:,p] / weights
FloatingPointError: divide by zero encountered in true_divide

So far, I can reproduce the error when the traits have fairly different mean Chi2, but not if they're similar (within ~0.2 of each other). Of course if the traits are very different they don't belong in MTAG together, but maybe the software could throw a warning or error message? I was also under the impression that it would be valid to use differently powered GWAS if the goal was a meta-analysis of them (ie --perfect_gencov --equal_h2), though perhaps that's incorrect?

Do you have any advice or insight?
Thanks in advance!

David

P-values from MTAG

My current project aims at finding genetic variants significantly associated with two traits that are highly correlated phenotypically and genetically (0.86 and 0.75, respectively).

I have run two individual GWAS, one for each trait. I have now run MTAG using the summary statistics from both GWAS to see if I identify more significant SNPs than when running the individuals GWASs. The samples are 99% the same (there are some missing samples for the first trait), with an approximate sample size of 65k.

In my original GWAS's results, I found 37 and 800 genome-wide significant SNPs for each trait, respectively. According to the .log document from MTAG, I now have 41 and 845, respectively. I was expecting these values because of the overlap of the samples.

However, when I open the results tables with the new summary statistics for both traits and I check how many SNPs are genome-wide significant (according to the "mtag_pval" column), I find way more (1,580 and 7,630, respectively). Again, considering the overlapping of the samples, I would think these results are not correct...Why would you think I am getting these discrepancies between the .log file and the output tables? Which is the correct figure?
You can find the .log file here:
MTAG2.log

ValueError: DLASCL parameter

Hi,
I'm trying to run MTAG, but having some issues. I use the following command to run it. python2.7 ../software/mtag/mtag.py
--sumstats Medium_Small_LDL_all.txt_head_clean,Medium_Large_VLDL_all.txt_head_clean --out ./out_test
--stream_stdout &

I'm getting the following error:

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.7
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py
--stream-stdout
--sumstats Medium_Small_LDL_all.txt_head_clean,Medium_Large_VLDL_all.txt_head_clean
--out ./out_test

Beginning MTAG analysis...
Read in Trait 1 summary statistics (520409 SNPs) from Medium_Small_LDL_all.txt_head_clean ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
. done
Read 520409 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
520409 SNPs remain.
Removed 49 SNPs with duplicated rs numbers (520360 SNPs remain).
Removed 0 SNPs with N < 548.0 (520360 SNPs remain).
Median value of SIGNED_SUMSTATS was -0.000250662830088, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 0.994
WARNING: mean chi^2 may be too small.
Lambda GC = 1.001
Max chi^2 = 27.354
0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Tue Apr 24 13:41:30 2018
Total time elapsed: 16.57s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 520409
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 1: Dropped 49 SNPs for duplicate values in the "snp_name" column
Read in Trait 2 summary statistics (507214 SNPs) from Medium_Large_VLDL_all.txt_head_clean ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
. done
Read 507214 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
507214 SNPs remain.
Removed 41 SNPs with duplicated rs numbers (507173 SNPs remain).
Removed 0 SNPs with N < 582.0 (507173 SNPs remain).
Median value of SIGNED_SUMSTATS was 0.0160431091306, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 0.982
WARNING: mean chi^2 may be too small.
Lambda GC = 0.971
Max chi^2 = 17.973
0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Tue Apr 24 13:41:46 2018
Total time elapsed: 14.1s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining: 507214
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2: Dropped 41 SNPs for duplicate values in the "snp_name" column
Trait 2 summary statistics: 505198 SNPs remaining merging with previous traits.
Dropped 4 SNPs due to inconsistent allele pairs from phenotype 2. 505194 SNPs remain.
... Merge of GWAS summary statistics complete. Number of SNPs: 505194
Using 430221 SNPs to estimate Omega (74973 SNPs excluded due to strand ambiguity)
Estimating sigma..
(array([], dtype=int64), array([], dtype=int64))
(array([14046]), array([0]))
(array([], dtype=int64), array([], dtype=int64))
(array([ 0, 0, 1, ..., 71699, 71700, 71700]), array([0, 1, 0, ..., 1, 0, 1]))
(array([ 0, 1, 2, ..., 71698, 71699, 71700]), array([0, 0, 0, ..., 0, 0, 0]))
(array([ 0, 1, 2, ..., 71698, 71699, 71700]), array([0, 0, 0, ..., 0, 0, 0]))
On entry to DLASCL parameter number 4 had an illegal value
Traceback (most recent call last):
File "../software/mtag/mtag.py", line 1348, in
mtag(args)
File "../software/mtag/mtag.py", line 1211, in mtag
args.sigma_hat = estimate_sigma(DATA[not_SA], args)
File "../software/mtag/mtag.py", line 393, in estimate_sigma
rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging())
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/sumstats.py", line 444, in estimate_rg
rghat = _rg(loop, args, log, M_annot, ref_ld_cnames, w_ld_cname, k, i)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/sumstats.py", line 592, in _rg
intercept_gencov=intercepts[2], n_blocks=n_blocks, twostep=args.two_step)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/regressions.py", line 688, in init
slow=slow, twostep=twostep)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/regressions.py", line 346, in init
slow=slow, step1_ii=step1_ii, old_weights=old_weights)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/regressions.py", line 213, in init
x, yp, update_func, n_blocks, slow=slow, w=initial_w)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/irwls.py", line 66, in init
x, y, update_func, n_blocks, w, slow=slow, separators=separators)
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/irwls.py", line 113, in irwls
new_w = np.sqrt(update_func(cls.wls(x, y, w)))
File "/media/rafet/5e29cbfd-eff8-4007-8e15-c06b94f09007/software/mtag/ldsc_mod/ldscore/irwls.py", line 164, in wls
coef = np.linalg.lstsq(x, y)
File "/usr/lib/python2.7/dist-packages/numpy/linalg/linalg.py", line 1919, in lstsq
0, work, lwork, iwork, 0)
ValueError: On entry to DLASCL parameter number 4 had an illegal value

What do you think could cause this.

I run the data you have in the gitHub it works fin!

Thank you for your help.

Singular matrix

I'm not very familiar with matrix calculations. The LDSC part of the script seems to run fine, but there are negatives in my sigma hat matrix and I get the error 'Singular matrix'. Would you know what might be the problem? This is my log file:

`
Calling ./mtag.py
--n-min 0
--sumstats /home/jpasman/LDpred/MTG/sumstatsalc_clean,/home/jpasman/LDpred/MTG/sumstatscpd_clean,/home/jpasman/LDpred/MTG/sumstatscpd_clean
--out ./cpdalccan

2018/05/03/11:21:15 AM Beginning MTAG analysis...
2018/05/03/11:21:29 AM Read in Trait 1 summary statistics (9647583 SNPs) from /home/jpasman/LDpred/MTG/sumstatsalc_clean ...
2018/05/03/11:21:29 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:21:29 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/05/03/11:21:29 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:21:29 AM Interpreting column names as follows:
2018/05/03/11:21:29 AM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/05/03/11:21:30 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/05/03/11:21:44 AM Read 9647583 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
9647583 SNPs remain.
2018/05/03/11:21:51 AM Removed 0 SNPs with duplicated rs numbers (9647583 SNPs remain).
2018/05/03/11:21:52 AM Removed 0 SNPs with N < 74754.0 (9647583 SNPs remain).
2018/05/03/11:22:21 AM Median value of SIGNED_SUMSTATS was -0.0038335, which seems sensible.
2018/05/03/11:22:22 AM Dropping snps with null values
2018/05/03/11:22:23 AM
Metadata:
2018/05/03/11:22:23 AM Mean chi^2 = 1.122
2018/05/03/11:22:23 AM Lambda GC = 1.101
2018/05/03/11:22:23 AM Max chi^2 = 132.526
2018/05/03/11:22:23 AM 414 Genome-wide significant SNPs (some may have been removed by filtering).
2018/05/03/11:22:23 AM
Conversion finished at Thu May 3 11:22:23 2018
2018/05/03/11:22:23 AM Total time elapsed: 53.96s
2018/05/03/11:22:36 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:22:36 AM Munging of Trait 1 complete. SNPs remaining: 9647583
2018/05/03/11:22:36 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/05/03/11:22:55 AM Read in Trait 2 summary statistics (2415427 SNPs) from /home/jpasman/LDpred/MTG/sumstatscpd_clean ...
2018/05/03/11:22:55 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:22:55 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/05/03/11:22:55 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:22:55 AM Interpreting column names as follows:
2018/05/03/11:22:55 AM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/05/03/11:22:55 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/05/03/11:22:59 AM Read 2415427 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
2415427 SNPs remain.
2018/05/03/11:23:00 AM Removed 0 SNPs with duplicated rs numbers (2415427 SNPs remain).
2018/05/03/11:23:00 AM Removed 0 SNPs with N < 49368.6666667 (2415427 SNPs remain).
2018/05/03/11:23:08 AM Median value of SIGNED_SUMSTATS was 0.0244989, which seems sensible.
2018/05/03/11:23:08 AM Dropping snps with null values
2018/05/03/11:23:08 AM
Metadata:
2018/05/03/11:23:08 AM Mean chi^2 = 1.051
2018/05/03/11:23:08 AM Lambda GC = 1.055
2018/05/03/11:23:08 AM Max chi^2 = 152.803
2018/05/03/11:23:08 AM 132 Genome-wide significant SNPs (some may have been removed by filtering).
2018/05/03/11:23:08 AM
Conversion finished at Thu May 3 11:23:08 2018
2018/05/03/11:23:08 AM Total time elapsed: 12.8s
2018/05/03/11:23:11 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:23:11 AM Munging of Trait 2 complete. SNPs remaining: 2415427
2018/05/03/11:23:11 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/05/03/11:23:18 AM Read in Trait 3 summary statistics (2415427 SNPs) from /home/jpasman/LDpred/MTG/sumstatscpd_clean ...
2018/05/03/11:23:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:23:18 AM Munging Trait 3 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/05/03/11:23:18 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:23:18 AM Interpreting column names as follows:
2018/05/03/11:23:18 AM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/05/03/11:23:18 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/05/03/11:23:21 AM Read 2415427 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
2415427 SNPs remain.
2018/05/03/11:23:22 AM Removed 0 SNPs with duplicated rs numbers (2415427 SNPs remain).
2018/05/03/11:23:23 AM Removed 0 SNPs with N < 49368.6666667 (2415427 SNPs remain).
2018/05/03/11:23:30 AM Median value of SIGNED_SUMSTATS was 0.0244989, which seems sensible.
2018/05/03/11:23:30 AM Dropping snps with null values
2018/05/03/11:23:30 AM
Metadata:
2018/05/03/11:23:30 AM Mean chi^2 = 1.051
2018/05/03/11:23:30 AM Lambda GC = 1.055
2018/05/03/11:23:30 AM Max chi^2 = 152.803
2018/05/03/11:23:30 AM 132 Genome-wide significant SNPs (some may have been removed by filtering).
2018/05/03/11:23:30 AM
Conversion finished at Thu May 3 11:23:30 2018
2018/05/03/11:23:30 AM Total time elapsed: 12.75s
2018/05/03/11:23:33 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/03/11:23:33 AM Munging of Trait 3 complete. SNPs remaining: 2415427
2018/05/03/11:23:33 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/05/03/11:23:48 AM Trait 2 summary statistics: 2200165 SNPs remaining merging with previous traits.
2018/05/03/11:23:54 AM Trait 3 summary statistics: 2200165 SNPs remaining merging with previous traits.
2018/05/03/11:23:58 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 2200165
2018/05/03/11:24:01 AM Using 1863953 SNPs to estimate Omega (336212 SNPs excluded due to strand ambiguity)
2018/05/03/11:24:01 AM Estimating sigma..
2018/05/03/11:25:32 AM Checking for positive definiteness ..
2018/05/03/11:25:32 AM Sigma hat:
[[ 1.013 -0.003 -0.003]
[-0.003 1.008 1.008]
[-0.003 1.008 1.008]]
2018/05/03/11:25:32 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation.
2018/05/03/11:25:32 AM Beginning estimation of Omega ...
2018/05/03/11:25:32 AM Using GMM estimator of Omega ..
2018/05/03/11:25:32 AM Checking for positive definiteness ..
2018/05/03/11:25:32 AM Completed estimation of Omega ...
2018/05/03/11:25:32 AM Beginning MTAG calculations...
2018/05/03/11:25:41 AM Singular matrix
Traceback (most recent call last):
File "/home/jpasman/LDpred/MTG/mtag/mtag.py", line 1348, in
mtag(args)
File "/home/jpasman/LDpred/MTG/mtag/mtag.py", line 1242, in mtag
mtag_betas, mtag_se = mtag_analysis(Zs, Ns, args.omega_hat, args.sigma_hat)
File "/home/jpasman/LDpred/MTG/mtag/mtag.py", line 714, in mtag_analysis
inv_xx = np.linalg.inv(xx)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 513, in inv
ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 90, in _raise_linalgerror_singular
raise LinAlgError("Singular matrix")
LinAlgError: Singular matrix`

KeyError: 'A10' after munging of file 1 is complete

When running MTAG between two traits, I complete munging the first GWAS, but after that I get the A10 error below. It does not depend on the order of traits entered. What could I do here?
Thanks

Uku

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.8
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py
--se-name se
--bpos-name pos
--stream-stdout
--n-name n_complete_samples
--a2-name alt
--n-min 0.0
--a1-name ref
--snp-name rsid
--eaf-name minor_AF
--sumstats /dagher/dagherX/uvainik/gwas_base/cbmi_2015_felix/EGG_BMI_HapMap_DISCOVERY_mtagbmi.txt,/dagher/dagherX/uvainik/gwas_base/bmi_2018_ukb/bmi_rs_hiconf.tsv
--beta-name beta
--cores 12
--out /dagher/dagherX/uvainik/mtag_res/mtag_cbmi_bmi2018ukb.1NS

Beginning MTAG analysis...
MTAG will use the provided BETA/SE columns for analyses
Read in Trait 1 summary statistics (2499691 SNPs) from /dagher/dagherX/uvainik/gwas_base/cbmi_2015_felix/EGG_BMI_HapMap_DISCOVERY_mtagbmi.txt ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
rsid: Variant ID (e.g., rs number)
n_complete_samples: Sample size
pval: p-Value
beta: Directional summary statistic as specified by --signed-sumstats.
alt: Allele 2, interpreted as non-ref allele for signed sumstat.
ref: Allele 1, interpreted as ref allele for signed sumstat.
se: Standard errors of BETA coefficients

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
Read 2499691 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with SE <0 or NaN values.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
2499691 SNPs remain.
Removed 0 SNPs with duplicated rs numbers (2499691 SNPs remain).
Removed 0 SNPs with N < 0.0 (2499691 SNPs remain).
Median value of SIGNED_SUMSTAT was 0.0, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.133
Lambda GC = 1.104
Max chi^2 = 96.585
525 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Fri Oct 5 13:37:25 2018
Total time elapsed: 10.66s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 2499691
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

'A10'
Traceback (most recent call last):
File "./mtag.py", line 1526, in
mtag(args)
File "./mtag.py", line 1298, in mtag
DATA_U, DATA, args = load_and_merge_data(args)
File "./mtag.py", line 271, in load_and_merge_data
GWAS_d[p][col] = GWAS_d[p][col].str.upper()
File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 1964, in getitem
return self._getitem_column(key)
File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 1971, in _getitem_column
return self._get_item_cache(key)
File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 1645, in _get_item_cache
values = self._data.get(item)
File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "/export02/data/uku/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2444, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'A10'
Analysis terminated from error at Fri Oct 5 13:37:28 2018
Total time elapsed: 25.46s

Installation error

I’m trying to install the latest version of MTAG at
https://github.com/omeed-maghzian/mtag

I’ve ran
git clone https://github.com/omeed-maghzian/mtag.git
cd mtag

and this seems to work fine. However when I use
./mtag.py -h

Rather than the list of commands I get a message reading no module named joblib. I see that joblib is the name of one of the packages that MTAG is dependent upon and that it can be found, along with the other dependencies, on the Anaconda python distribution. However the link on your github (In the getting started section) doesn’t seem to be working. Could you say if there is any other way to get this package? Many thanks!

Memory Error with many traits

Hi I'm trying to run MTAG on 93 traits simultaneously...

After Munging and Merging the log was the following:

.....
2018/10/02/02:28:26 PM Dropped 0 SNPs due to strand ambiguity, 1915103 SNPs remain in intersection after merging trait93
2018/10/02/02:28:26 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 2258212
2018/10/02/02:30:42 PM Using 1915103 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity)
2018/10/02/02:30:42 PM Estimating sigma..
2018/10/03/12:43:54 AM Checking for positive definiteness ..
2018/10/03/12:43:55 AM Sigma hat:
[[ 1.052 0.41 0.022 ... 0.039 -0.024 -0.135]
[ 0.41 1.048 0.034 ... 0.102 -0.064 -0.048]
[ 0.022 0.034 1.043 ... -0.025 0.801 0.059]
...
[ 0.039 0.102 -0.025 ... 1.018 -0.006 0.055]
[-0.024 -0.064 0.801 ... -0.006 1.045 0.031]
[-0.135 -0.048 0.059 ... 0.055 0.031 1.003]]
2018/10/03/12:43:55 AM Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation.
2018/10/03/12:43:56 AM Beginning estimation of Omega ...
2018/10/03/12:50:34 AM Using GMM estimator of Omega ..
2018/10/03/12:51:30 AM
Traceback (most recent call last):
File "/home/project/mtag/mtag.py", line 1514, in
mtag(args)
File "/home/project/mtag/mtag.py", line 1307, in mtag
args.omega_hat = estimate_omega(args, Zs[not_SA], Ns[not_SA], args.sigma_hat)
File "/home/project/mtag/mtag.py", line 715, in estimate_omega
return _posDef_adjustment(gmm_omega(Zs,Ns,sigma_LD))
File "/home/project/mtag/mtag.py", line 607, in gmm_omega
N_mats = np.sqrt(np.einsum('mp,mq->mpq', Ns,Ns))
MemoryError
2018/10/03/12:51:30 AM Analysis terminated from error at Wed Oct 3 00:51:30 2018
2018/10/03/12:51:30 AM Total time elapsed: 12.0h:51.0m:0.25s

So I suppose some of the matrices used in the calculation of Omega using a GMM becomes incredibly big. This also this happens quite fast (~1 minute distance between "Using GMM estimator of Omega .." and the error message).

Do you have any suggestions on how to proceed from this?

Would it make sense to try calculate an estimate of genetic correlation first to divide all traits into correlation blocks to the run MTAG including only traits within each block? If yes, any tool you would suggest to do that?

Thank you so much for your help!

Cheers,
Robbie

Allele0, Allele1, Allele2(?)

The Basics in the Wiki state: "a1 is considered to be the effect allele, which should also be reflected in the signs of the Z-scores"

In BOLT-LMM output, ALLELE1 is the effect allele and ALLELE0 is the other (As of 2.3.3: "ALLELE1: first allele in bim file (usually the minor allele), used as the effect allele"), so I want ALLELE1 to be my effect allele (a1 according to the Wiki).

But if I pass the following arguments:

  --a1_name ALLELE1 \
  --a2_name ALLELE0 \

I then get the following message in the log:

ALLELE0:	Allele 2, interpreted as non-ref allele for signed sumstat.
ALLELE1:	Allele 1, interpreted as ref allele for signed sumstat.

This is confusing in a couple ways:

What is Allele 2?
The wiki says that a1 is the effect allele, but if I pass ALLELE1 to a1, the log tells me that is the ref allele.

I don't think this ultimately matters (if everything is reversed, it's just a sign flip at the end), but it would be nice for the language to be uniform here. Thanks for considering.

Outputting FDR

Dear Authors,

I was able to run your python script without errors. However, the output doesn't include information about false-discovery rate (FDR), although this seems to be calculated in the script ('return FDR_val'). Are the MTAG_results traits files already filtered by FDR? Thank you.

error running the example

Hello,

I am using Anaconda: Python 2.7.13 |Anaconda 2.3.0 (64-bit)| (default, Dec 20 2016, 23:09:15) .
I tried to run the MTAG example:
python ../mtag.py --sumstats 1_OA2016_hm3samp_NEUR.txt,1_OA2016_hm3samp_SWB.txt --out ./tutorial_results_1.1NS --n_min 0.0 --stream_stdout &

But, I got an error message below:
...
2018/05/21/01:40:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/05/21/01:40:58 PM Munging of Trait 2 complete. SNPs remaining: 65374
2018/05/21/01:40:58 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/05/21/01:40:58 PM Trait 2 summary statistics: 20321 SNPs after merging with previous traits.
2018/05/21/01:40:58 PM ... Merge of GWAS summary statistics complete. Number of SNPs: 20321
2018/05/21/01:40:58 PM Using 20321 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity)
2018/05/21/01:40:58 PM Estimating sigma..
2018/05/21/01:41:18 PM Checking for positive definiteness ..
2018/05/21/01:41:18 PM Sigma hat:
[[ 1.043 -0.129]
[-0.129 0.966]]
2018/05/21/01:41:18 PM Beginning estimation of Omega ...
2018/05/21/01:41:18 PM Using GMM estimator of Omega ..
2018/05/21/01:41:18 PM Checking for positive definiteness ..
2018/05/21/01:41:18 PM Completed estimation of Omega ...
2018/05/21/01:41:18 PM Beginning MTAG calculations...
2018/05/21/01:41:18 PM ... Completed MTAG calculations.
2018/05/21/01:41:18 PM Writing Phenotype 1 to file ...
2018/05/21/01:41:18 PM Writing Phenotype 2 to file ...
2018/05/21/01:41:18 PM can't multiply sequence by non-int of type 'float'
Traceback (most recent call last):
File "../mtag.py", line 1359, in
mtag(args)
File "../mtag.py", line 1249, in mtag
save_mtag_results(args, res_temp,Zs,Ns, Fs,mtag_betas,mtag_se)
File "../mtag.py", line 828, in save_mtag_results
final_summary += str(summary_df.round(3))+'\n'
File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/pandas/core/frame.py", line 4396, in round
new_cols = [np.round(v, decimals) for , v in self.iteritems()]
File "/broad/software/free/Linux/redhat_6_x86_64/pkgs/anaconda_2.3.0-jupyter/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2793, in round
return round(decimals, out)

chr and pos instead of rsid

Hi all,

I want to run mTAG but most of my SNPs do not have rsIDs. Instead, they have chromosome and position. Can I run mTAG using this information?

Thank you in advance

Key error if running FDR step separately but have custom name for 'n' column

Hi,

I downloaded and ran MTAG today.

First I calculated test statistics
python mtag-master/mtag.py --sumstats eur_dec28_2017_maf01_info6.results_nefff_pos2.mtag,data2_sumstats --snp_name SNP --z_name z --beta_name BETA --se_name SE --n_name N --eaf_name MAF --a1_name A1 --a2_name A2 --chr_name CHR --bpos_name BP --drop_ambig_snps --fdr --skip_mtag --out ./pt_md

This step ran fine. However when I tried doing FDR

python mtag-master/mtag.py --fdr --skip_mtag --n_approx --out ./pt_md

I got an error:

"Traceback (most recent call last):
File "mtag-master/mtag.py", line 1503, in
N_mat[:,t] = df_d[t]['n']
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/pandas/core/frame.py", line 2059, in getitem
return self._getitem_column(key)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/pandas/core/frame.py", line 2066, in _getitem_column
return self._get_item_cache(key)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/pandas/core/generic.py", line 1386, in _get_item_cache
values = self._data.get(item)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/pandas/core/internals.py", line 3543, in get
loc = self.items.get_loc(item)
File "/sara/sw/python-2.7.9/lib/python2.7/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 132, in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)
File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)
KeyError: 'n'
"

Notice that in the first command my input was --n_name N. This error occurred because mtag was looking through the trait_1/trait_2 files for a column called 'n', but obviously none would be there because I was using a column called 'N'.

I worked around the problem by just going to line 1503 in mtag.py and changing "N_mat[:,t] = df_d[t]['n']" to "N_mat[:,t] = df_d[t]['N']" , after which the program ran successfully.

Therefore I guess I could have avoided this if I just ran the FDR step with the initial mtag command, or if I just modified the input files. Anyway I just thought I would document this.

Thanks!
Adam

Allow MTAG to accept beta / se columns (instead of z, n)

@paturley I am posting your comment on allowing MTAG to accept beta / se columns from #10

it would be good if we could add an option to MTAG so it
accepts betas and SEs rather than Zs and Ns. It would be a lot more robust
to these sorts of problems.

Would we then back out the approximate sample from the z(=beta/se) and use that in the summary table and output of results?

assertion error

I get this assertion error and it is not clear what's the problem with my data:

Traceback (most recent call last):
  File "/home/mtag/mtag-master/mtag.py", line 1348, in <module>
    mtag(args)
  File "/home/mtag/mtag-master/mtag.py", line 1197, in mtag
    Zs , Ns ,Fs, res_temp, DATA = extract_gwas_sumstats(DATA,args)
  File "/home/mtag/mtag-master/mtag.py", line 520, in extract_gwas_sumstats
    assert Zs.shape[1] == Ns.shape[1] == Fs.shape[1]
AssertionError

can't multiply sequence by non-int

MTAG for chrX

Hi,
I'm not sure if MTAG can be used with chrX, currently passing SNPs in chr X I get the following error:

Read regression weight LD Scores for 1290028 SNPs.
After merging with reference panel LD, 0 SNPs remain.
Traceback (most recent call last):
  File "/humgen/atgu1/fs03/wip/aganna/mtag/mtag.py", line 1385, in <module>
    mtag(args)
  File "/humgen/atgu1/fs03/wip/aganna/mtag/mtag.py", line 1245, in mtag
    args.sigma_hat = estimate_sigma(DATA[not_SA], args)
  File "/humgen/atgu1/fs03/wip/aganna/mtag/mtag.py", line 394, in estimate_sigma
    rg_results =  sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging())
  File "/humgen/atgu1/fs03/wip/aganna/mtag/ldsc_mod/ldscore/sumstats.py", line 422, in estimate_rg
    M_annot, w_ld_cname, ref_ld_cnames, sumstats, _ = _read_ld_sumstats(args, log, None, alleles=True, dropna=True,sumstats=p1)
  File "/humgen/atgu1/fs03/wip/aganna/mtag/ldsc_mod/ldscore/sumstats.py", line 250, in _read_ld_sumstats
    sumstats = _merge_and_log(ref_ld, sumstats, 'reference panel LD', log)
  File "/humgen/atgu1/fs03/wip/aganna/mtag/ldsc_mod/ldscore/sumstats.py", line 235, in _merge_and_log
    raise ValueError(msg.format(N=len(sumstats), F=noun))
ValueError: After merging with reference panel LD, 0 SNPs remain.
Analysis terminated from error at Sun Dec 10 20:06:19 2017
Total time elapsed: 18.38s

thanks

The error

Dear Authors,

When I run the program, there have some errors:

Calling ./mtag.py
--stream-stdout
--sumstats EXCESSIVE_DAYTIME_SLEEPINESS,SLEEP_DURATION
--out ./test_sleep

Beginning MTAG analysis...
mtag.py:1194: DtypeWarning: Columns (5,6,7) have mixed types. Specify dtype option on import or set low_memory=False.
DATA, args = load_and_merge_data(args)
Read in Trait 1 summary statistics (9889300 SNPs) from EXCESSIVE_DAYTIME_SLEEPINESS ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
WARNING: 1 SNPs had P outside of (0,1]. The P column may be mislabeled.
. done
Read 9889300 SNPs from --sumstats file.
Removed 3021 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 1 SNPs with out-of-bounds p-values.
Removed 1089297 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
8796981 SNPs remain.
Removed 15 SNPs with duplicated rs numbers (8796966 SNPs remain).
Removed 0 SNPs with N < 74431.6866667 (8796966 SNPs remain).
Median value of SIGNED_SUMSTATS was 7.2146e-05, which seems sensible.
Dropping snps with null values

Metadata:
Mean chi^2 = 1.097
Lambda GC = 1.092
Max chi^2 = 27.48
0 Genome-wide significant SNPs (some may have been removed by filtering).

Conversion finished at Wed Jan 17 22:19:11 2018
Total time elapsed: 3.0m:3.9s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 1 complete. SNPs remaining: 8796981
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 1: Dropped 15 SNPs for duplicate values in the "snp_name" column
mtag.py:1194: DtypeWarning: Columns (1,2,5,6,7,8,9) have mixed types. Specify dtype option on import or set low_memory=False.
DATA, args = load_and_merge_data(args)
Read in Trait 2 summary statistics (10818790 SNPs) from SLEEP_DURATION ...
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Interpreting column names as follows:
snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
WARNING: 32358 SNPs had P outside of (0,1]. The P column may be mislabeled.
.. done
Read 10818790 SNPs from --sumstats file.
Removed 0 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 32358 SNPs with out-of-bounds p-values.
Removed 1160932 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
9625500 SNPs remain.
Removed 836492 SNPs with duplicated rs numbers (8789008 SNPs remain).
Removed 0 SNPs with N < 74651.7266667 (8789008 SNPs remain).

ERROR converting summary statistics:

Traceback (most recent call last):
File "/Users/xiangbo/mtag/mtag/ldsc_mod/munge_sumstats.py", line 714, in munge_sumstats
dat.P = p_to_z(dat.P, dat.N)
File "/Users/xiangbo/mtag/mtag/ldsc_mod/munge_sumstats.py", line 364, in p_to_z
return np.sqrt(chi2.isf(P, 1))
File "/Users/xiangbo/anaconda2/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1960, in isf
place(output, cond, self._isf(*goodargs) * scale + loc)
File "/Users/xiangbo/anaconda2/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py", line 932, in _isf
return sc.chdtri(df, p)
TypeError: ufunc 'chdtri' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Conversion finished at Wed Jan 17 22:21:11 2018
Total time elapsed: 49.65s
ufunc 'chdtri' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Traceback (most recent call last):
File "mtag.py", line 1348, in
mtag(args)
File "mtag.py", line 1194, in mtag
DATA, args = load_and_merge_data(args)
File "mtag.py", line 229, in load_and_merge_data
GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p)
File "mtag.py", line 149, in _perform_munge
munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False)
File "/Users/xiangbo/mtag/mtag/ldsc_mod/munge_sumstats.py", line 714, in munge_sumstats
dat.P = p_to_z(dat.P, dat.N)
File "/Users/xiangbo/mtag/mtag/ldsc_mod/munge_sumstats.py", line 364, in p_to_z
return np.sqrt(chi2.isf(P, 1))
File "/Users/xiangbo/anaconda2/lib/python2.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1960, in isf
place(output, cond, self._isf(*goodargs) * scale + loc)
File "/Users/xiangbo/anaconda2/lib/python2.7/site-packages/scipy/stats/_continuous_distns.py", line 932, in _isf
return sc.chdtri(df, p)
TypeError: ufunc 'chdtri' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Analysis terminated from error at Wed Jan 17 22:21:11 2018
Total time elapsed: 5.0m:27.65s

Please give some advice to me, Thank you so much!

Best wishes!

error "divide by zero"

Hi there,
I keep getting this error and I'm not sure why its occurring. Any help would be appreciated!

Conversion finished at Mon May 14 08:19:21 2018
Total time elapsed: 4.11s
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 2 complete. SNPs remaining: 143010
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 2 summary statistics: 143010 SNPs after merging with previous traits.
... Merge of GWAS summary statistics complete. Number of SNPs: 143010
Using 121231 SNPs to estimate Omega (21779 SNPs excluded due to strand ambiguity)
Estimating sigma..
/Users/X/mtag/ldsc_mod/ldscore/irwls.py:161: FutureWarning: rcond parameter will change to the default of machine precision times max(M, N) where M and N are the input matrix dimensions.
To use the future default and silence this warning we advise to pass rcond=None, to keep using the old, explicitly pass rcond=-1.
coef = np.linalg.lstsq(x, y)
Checking for positive definiteness ..
Sigma hat:
[[1.074 0.056]
[0.056 1.067]]
Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation.
Beginning estimation of Omega ...
Using GMM estimator of Omega ..
Checking for positive definiteness ..
matrix is not positive definite, performing adjustment..
Warning: max number of iterations reached in adjustment procedure. Sigma matrix used is still non-positive-definite.
Completed estimation of Omega ...
Beginning MTAG calculations...
... Completed MTAG calculations.
Writing Phenotype 1 to file ...
divide by zero encountered in true_divide
Traceback (most recent call last):
File "mtag.py", line 1359, in
mtag(args)
File "mtag.py", line 1249, in mtag
save_mtag_results(args, res_temp,Zs,Ns, Fs,mtag_betas,mtag_se)
File "mtag.py", line 778, in save_mtag_results
out_df['mtag_beta'] = mtag_betas[:,p] / weights
FloatingPointError: divide by zero encountered in true_divide
Analysis terminated from error at Mon May 14 08:19:40 2018
Total time elapsed: 33.1s

error in log file

I applied MTAG https://github.com/omeed-maghzian/mtag for some public data:http://csg.sph.umich.edu//abecasis/public/lipids2013/. I choose Total Cholesterol and Triglycerides data to test MTAG.

When I run MTAG, the log file has some error, please see the Log file. I just added column z=Beta/SE to the input file of MTAG from original data.
(1) Is it correct for z-value calculation ?
(2) Is N value is correct?
(3) The error from log is: ERROR converting summary statistics. Could you explain why there is error in converting summary statistics?

Log file:

2018/04/06/12:23:11 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/04/06/12:23:11 PM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/04/06/12:23:11 PM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/04/06/12:23:11 PM Interpreting column names as follows:
2018/04/06/12:23:11 PM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/04/06/12:23:11 PM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/04/06/12:23:16 PM Read 2446981 SNPs from --sumstats file.
Removed 805 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
2446176 SNPs remain.
2018/04/06/12:23:17 PM Removed 0 SNPs with duplicated rs numbers (2446176 SNPs remain).
2018/04/06/12:23:18 PM Removed 33274 SNPs with N < 63063.3333333 (2412902 SNPs remain).
2018/04/06/12:24:37 PM
ERROR converting summary statistics:

2018/04/06/12:24:37 PM Traceback (most recent call last):
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 718, in munge_sumstats
check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname))
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 372, in check_median
raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2)))
ValueError: WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.

2018/04/06/12:24:37 PM
Conversion finished at Fri Apr 6 12:24:37 2018
2018/04/06/12:24:37 PM Total time elapsed: 1.0m:26.4s
2018/04/06/12:24:37 PM WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.
Traceback (most recent call last):
File "mtag.py", line 1348, in
mtag(args)
File "mtag.py", line 1194, in mtag
DATA, args = load_and_merge_data(args)
File "mtag.py", line 229, in load_and_merge_data
GWAS_d[p], sumstats_format[p] = _perform_munge(args, GWAS_d[p], gwas_dat_gen, p)
File "mtag.py", line 149, in _perform_munge
munged_results = munge_sumstats.munge_sumstats(argnames, write_out=False, new_log=False)
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 718, in munge_sumstats
check_median(dat.SIGNED_SUMSTAT, signed_sumstat_null, 0.1, sign_cname))
File "/mnt/speliotes-lab/Software/MTAG/mtag-master/ldsc_mod/munge_sumstats.py", line 372, in check_median
raise ValueError(msg.format(F=name, M=expected_median, V=round(m, 2)))
ValueError: WARNING: median value of SIGNED_SUMSTATS is 0.71 (should be close to 0.0). This column may be mislabeled.

Script won't run when snpid=variant id

Hi,

I successfully ran your script with snpid=rs number. However, in my data, rsid is non-unique, while variant (chr:pos:a1:a2) is unique. Unfortunately, the script would not run when I used this as the input snpid. First, I left rsid in the data (with snpid=variant) but the script said 2 SNP columns were detected, and the program stopped. Next, I removed rsid, but the program returned this error: "After merging with reference panel LD, 0 SNPs remain." Any advice? Thanks.

ValueError: You are trying to merge on object and float64 columns

When trying to run the MTAG analysis on my data, I get the following error message.
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

I have no idea how to fix this problem. The input files are both in the format specified in tutorial 1.

The log file:

2018/08/13/10:15:19 AM
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>
<> MTAG: Multi-trait Analysis of GWAS
<> Version: 1.0.8
<> (C) 2017 Omeed Maghzian, Raymond Walters, and Patrick Turley
<> Harvard University Department of Economics / Broad Institute of MIT and Harvard
<> GNU General Public License v3
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<> Note: It is recommended to run your own QC on the input before using this program.
<> Software-related correspondence: [email protected]
<> All other correspondence: [email protected]
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Calling ./mtag.py
--stream-stdout
--n-min 0.0
--sumstats OW_Rate-2year.txt,OWrate2016MGATformat.txt
--out ./output

2018/08/13/10:15:19 AM Beginning MTAG analysis...
2018/08/13/10:15:19 AM Read in Trait 1 summary statistics (201694 SNPs) from OW_Rate-2year.txt ...
2018/08/13/10:15:19 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:19 AM Munging Trait 1 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/08/13/10:15:19 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:19 AM Interpreting column names as follows:
2018/08/13/10:15:19 AM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/08/13/10:15:19 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/08/13/10:15:20 AM Read 201694 SNPs from --sumstats file.
Removed 299 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
201395 SNPs remain.
2018/08/13/10:15:20 AM Removed 0 SNPs with duplicated rs numbers (201395 SNPs remain).
2018/08/13/10:15:20 AM Removed 0 SNPs with N < 172.0 (201395 SNPs remain).
2018/08/13/10:15:21 AM Median value of SIGNED_SUMSTATS was 0.063935831, which seems sensible.
2018/08/13/10:15:21 AM Dropping snps with null values
2018/08/13/10:15:21 AM
Metadata:
2018/08/13/10:15:21 AM Mean chi^2 = 1.057
2018/08/13/10:15:21 AM Lambda GC = 0.986
2018/08/13/10:15:21 AM Max chi^2 = 21.611
2018/08/13/10:15:21 AM 0 Genome-wide significant SNPs (some may have been removed by filtering).
2018/08/13/10:15:21 AM
Conversion finished at Mon Aug 13 10:15:21 2018
2018/08/13/10:15:21 AM Total time elapsed: 2.18s
2018/08/13/10:15:21 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:21 AM Munging of Trait 1 complete. SNPs remaining: 201395
2018/08/13/10:15:21 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/08/13/10:15:22 AM Read in Trait 2 summary statistics (201694 SNPs) from OWrate2016MGATformat.txt ...
2018/08/13/10:15:22 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:22 AM Munging Trait 2 <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><
2018/08/13/10:15:22 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:22 AM Interpreting column names as follows:
2018/08/13/10:15:22 AM snpid: Variant ID (e.g., rs number)
n: Sample size
a1: Allele 1, interpreted as ref allele for signed sumstat.
pval: p-Value
a2: Allele 2, interpreted as non-ref allele for signed sumstat.
z: Directional summary statistic as specified by --signed-sumstats.

2018/08/13/10:15:22 AM Reading sumstats from provided DataFrame into memory 10000000 SNPs at a time.
2018/08/13/10:15:22 AM Read 201694 SNPs from --sumstats file.
Removed 242 SNPs with missing values.
Removed 0 SNPs with INFO <= None.
Removed 0 SNPs with MAF <= 0.01.
Removed 0 SNPs with out-of-bounds p-values.
Removed 0 variants that were not SNPs. Note: strand ambiguous SNPs were not dropped.
201452 SNPs remain.
2018/08/13/10:15:23 AM Removed 0 SNPs with duplicated rs numbers (201452 SNPs remain).
2018/08/13/10:15:23 AM Removed 0 SNPs with N < 172.0 (201452 SNPs remain).
2018/08/13/10:15:24 AM Median value of SIGNED_SUMSTATS was 0.0337629845, which seems sensible.
2018/08/13/10:15:24 AM Dropping snps with null values
2018/08/13/10:15:24 AM
Metadata:
2018/08/13/10:15:24 AM Mean chi^2 = 1.076
2018/08/13/10:15:24 AM Lambda GC = 0.998
2018/08/13/10:15:24 AM Max chi^2 = 22.221
2018/08/13/10:15:24 AM 0 Genome-wide significant SNPs (some may have been removed by filtering).
2018/08/13/10:15:24 AM
Conversion finished at Mon Aug 13 10:15:24 2018
2018/08/13/10:15:24 AM Total time elapsed: 1.88s
2018/08/13/10:15:24 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
2018/08/13/10:15:24 AM Munging of Trait 2 complete. SNPs remaining: 201452
2018/08/13/10:15:24 AM <><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

2018/08/13/10:15:25 AM Dropped 0 SNPs due to strand ambiguity, 201395 SNPs remain in intersection after merging trait1
2018/08/13/10:15:25 AM Dropped 0 SNPs due to strand ambiguity, 201395 SNPs remain in intersection after merging trait2
2018/08/13/10:15:25 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 201395
2018/08/13/10:15:26 AM Using 201395 SNPs to estimate Omega (0 SNPs excluded due to strand ambiguity)
2018/08/13/10:15:26 AM Estimating sigma..
2018/08/13/10:15:34 AM You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
File "/Users/CathrineKiel/Desktop/MTAG/mtag/mtag.py", line 1514, in
mtag(args)
File "/Users/CathrineKiel/Desktop/MTAG/mtag/mtag.py", line 1287, in mtag
args.sigma_hat = estimate_sigma(DATA[not_SA], args)
File "/Users/CathrineKiel/Desktop/MTAG/mtag/mtag.py", line 417, in estimate_sigma
rg_results = sumstats_sig.estimate_rg(args_ldsc_rg, Logger_to_Logging())
File "/Users/CathrineKiel/Desktop/MTAG/mtag/ldsc_mod/ldscore/sumstats.py", line 423, in estimate_rg
M_annot, w_ld_cname, ref_ld_cnames, sumstats, _ = _read_ld_sumstats(args, log, None, alleles=True, dropna=True,sumstats=p1)
File "/Users/CathrineKiel/Desktop/MTAG/mtag/ldsc_mod/ldscore/sumstats.py", line 251, in _read_ld_sumstats
sumstats = _merge_and_log(ref_ld, sumstats, 'reference panel LD', log)
File "/Users/CathrineKiel/Desktop/MTAG/mtag/ldsc_mod/ldscore/sumstats.py", line 233, in _merge_and_log
sumstats = smart_merge(ld, sumstats)
File "/Users/CathrineKiel/Desktop/MTAG/mtag/ldsc_mod/ldscore/sumstats.py", line 77, in smart_merge
out = pd.merge(x, y, how='inner', on='SNP')
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 61, in merge
validate=validate)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 555, in init
self._maybe_coerce_merge_keys()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/reshape/merge.py", line 986, in _maybe_coerce_merge_keys
raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
2018/08/13/10:15:35 AM Analysis terminated from error at Mon Aug 13 10:15:35 2018
2018/08/13/10:15:35 AM Total time elapsed: 15.85s

ValueError: The mean chi2 statistic of trait 1 is less than 1.02, which is too small to be well-suited for MTAG.

When running MTAG I get an error message saying that the mean chi2 statistic of trait 1 is less than 1.02. What exactly is this referring to, and how can I proceed from here?
I've attached the .log-file.
OWtraitsMTAG.log

P-values outside float range

Error converting summary statistics

TypeError: ufunc 'chdtri' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Input: Two GWAS's with 18,593,659 and 18,593,553 SNPs.
P-value ranges = [1.4e-1019, 1.0], [2.2e-1316, 1.0]
Running MTAG with default settings fails. See log below.
allpvalues.log

I suspect that this is due to p-values being smaller than the smallest possible float value in python (p<2.23e-308)
See https://stackoverflow.com/questions/1835787/what-is-the-range-of-values-a-float-can-have-in-python/1839009

Solution:
Omit all SNPs with p<2.23e-308 from input files. MTAG runs successfully. Omitted SNPs can be logged in separate file with unmodified summary statistics.

New P-value ranges = [1.8e-302, 1.0], [2.8e-305, 1.0]
New log file included below.
minPval_andAbove.log

Permit redefinition of the P value column, +/- fixed N

It's great that the end-user can specify n_name, a1_name, a2_name, eaf_name, beta_name, se_name, chr_name, bpos_name, and snp-name.

The one additional field that would be great to be able to specify is p_name!

Since BOLT-LMM doesn't output the N, it would also be nice if we could specify what the value of N is.

Those two changes would allow BOLT-LMM users to use mtag without any data munging up-front.

INFO filtering without an INFO field

Just bumped into this, but it seems that if someone attempts to filter on INFO, but they have not provided MTAG with an INFO column, this should error out rather than "successfully" removing 0 SNPs.

Problem with --ld_ref_panel flag

I am currently trying to run a meta analysis of GWAS for rice.

Because LD score for rice is different from the 1000 Genomes that is used in MTAG as reference, I used ldsc to calculate the LD scores.

I was finally able to get it running but there is a very simple error that I am really not sure how to solve and there is no information in the wiki, tutorials, google groups or anything.

My organism has only 12 chromosomes but MTAG looks for chr13. I have all the LD scores up to chr12 but when I am running MTAG the following error appears:

2018/05/01/10:47:56 AM Trait 2 summary statistics: 693501 SNPs remaining merging with previous traits.
2018/05/01/10:47:57 AM ... Merge of GWAS summary statistics complete. Number of SNPs: 693501
2018/05/01/10:47:57 AM Using 610674 SNPs to estimate Omega (82827 SNPs excluded due to strand ambiguity)
2018/05/01/10:47:57 AM Estimating sigma..
2018/05/01/10:47:59 AM [Errno 2] No such file or directory: '/work/lmc86/META/LD_data/LDscores/13.l2.ldscore.gz'

CODE:
python mtag.py --sumstats EXP1MTAG.txt,EXP2MTAG.txt --ld_ref_panel /LDscores/ --out /EXP1EXP2MTAG/test6 --n_min 0 &

Do you have any suggestions as to what should I do to overcome this issue?

results quite different from plink results

I am analyzing several related sub-phenotypes for one gwas cohort. There is a lot of sample overlap between these sub-phenotypes. I first conducted gwas analyses using plink and then did mtag analysis based on plink summary statistics of these sub-phenotypes. At least for the top significant loci, the mtag results were quite different from plink results. For example, snps that reached genome-wide significance in mtag result of one sub-phenotype were just at pval=10-3 or 10-2 level in plink result of each phenotype; and those snps close to genome-wide significant in plink results were also just at pval=10-3 level in mtag. What might go wrong in my analyses? Thank you very much!

Likely wrong results from MTAG

Hi,

I'm running MTAG similarly to what indicated in the bug report cannot multiply sequence by non-int of type float (and get the same error). Moreover, the results look likely wrong. For example, top-SNP in the MTAG file is:

SNP              CHR       BP     a1 a2        Z            N           freq       mtag_beta 
rs11229352  11 58075672  G  A 4.967528 188669 0.681504 0.01696269 
 mtag_se         mtag_z        mtag_pval
0.002922755 5.803663 6.488153e-09

However the same SNP in the two summary statistic files read by MTAG has P-value: 0.58 and 0.12, so I don't see how it can get a P-value of 6.488153e-09 when MTAGed.

thanks

Any issues with composite traits (e.g., weight, height^2, BMI)?

I know that in the supplement to the paper, you conducted an analysis on anthropometric traits, including BMI, weight, and waist-to-hip ratio. I wonder whether there would be any potential issue if you instead had done this with BMI, weight, and (say) height^2.

I ask this since BMI is a synthetic trait that is entirely derived from weight and height^2 -- would analyzing all 3 together violate any assumptions of the algorithm?

ValueError: cannot reindex from a duplicate axis

Not sure if this is on my end or if this (perhaps?) has to do with the new n_value or p_name options. This is my first time not adding an N column or renaming the P column from the BOLT-LMM output. After the 3 trait files get munged, emitting mean chi^2, GC estimates, etc, the script panics due to ValueError: cannot reindex from a duplicate axis.

Does any clear issue stand out? If not, I can paste the full log and start manipulating columns to try to narrow down whether this is actually related to the new n_value or p_name settings.

Error:

<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
Munging of Trait 3 complete. SNPs remaining:	 8647669
<><><<>><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

Trait 3: Dropped 9225 SNPs for duplicate values in the "snp_name" column
Dropped 1351905 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait1
Dropped 0 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait2
Dropped 0 SNPs due to strand ambiguity, 7286539 SNPs remain in intersection after merging trait3
... Merge of GWAS summary statistics complete. Number of SNPs:	 7286539
cannot reindex from a duplicate axis
Traceback (most recent call last):
  File "mtag.py", line 1557, in <module>
    mtag(args)
  File "mtag.py", line 1330, in mtag
    Zs , Ns ,Fs, res_temp, DATA, N_raw = extract_gwas_sumstats(DATA,args,list(np.arange(args.P)))
  File "mtag.py", line 526, in extract_gwas_sumstats
    Ns = DATA.filter(items=n_cols).as_matrix()
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3900, in filter
    **{name: [r for r in items if r in labels]})
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/util/_decorators.py", line 187, in wrapper
    return func(*args, **kwargs)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3566, in reindex
    return super(DataFrame, self).reindex(**kwargs)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3689, in reindex
    fill_value, copy).__finalize__(self)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3496, in _reindex_axes
    fill_value, limit, tolerance)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/frame.py", line 3521, in _reindex_columns
    allow_dups=False)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/generic.py", line 3810, in _reindex_with_indexers
    copy=copy)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/internals.py", line 4414, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/Users/jamesp/anaconda2/envs/mtag/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3576, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Analysis terminated from error at Thu Nov 29 20:44:11 2018
Total time elapsed: 12.0m:16.14s

Command:

python mtag.py \
  --sumstats trait1,trait2, trait3 \
  --out mtag.out \
  --n_value 100,100,100 \
  --p_name P_BOLT_LMM \
  --snp_name SNP \
  --chr_name CHR \
  --bpos_name BP \
  --beta_name BETA \
  --se_name SE \
  --a1_name ALLELE1 \
  --a2_name ALLELE0 \
  --eaf_name A1FREQ \
  --n_min 0.0 \
  --info_min 0.3 \
  --cores 1 \
  --use_beta_se \
  --n_approx \
  --stream_stdout \
  --fdr

MTAG N (max) and other sample size estimates differ from expectation

I have 3 GWASes on related traits in tens of thousands of people. After MTAG completes, I see either tiny or huge values for N (max):

Summary of MTAG results:
------------------------
  Trait                 N (max)  N (mean)         ...           GWAS mean chi^2  MTAG mean chi^2  GWAS equiv. (max) N
1  trait1.gz-mtag.gz       29       25          ...           1.063            1.063                 29            
2  trait2.gz-mtag.gz       83       71          ...           1.066            1.068                 86            
3  trait3.gz-mtag.gz  5725973  4883237          ...           1.062            1.063            5774601

The mean Chi^2 for the 3 traits are all around ~1.06 so I do get: Mean chi^2 of SNPs used to estimate Omega is low for some SNPsMTAG may not perform well in this situation. But I think I'm still surprised that:

The N (max) for two of the traits is listed as a value that is ~100-1000x smaller than the real N, and
The N (max) for the third trait is over 100x larger than the real N

Any pointers for troubleshooting?

Log is attached

jonjala / mtag Goto Github PK

mtag's People

Contributors

Stargazers

Watchers

Forkers

mtag's Issues

Error converting summary statistics

Recommend Projects

Recommend Topics

Recommend Org