carlaml / ldpred-funct Goto Github PK

Here we develop the software for the method LDpredfunct described in https://www.biorxiv.org/content/early/2018/07/24/375337

License: GNU General Public License v3.0

Python 100.00%

ldpred-funct's People

Contributors

Stargazers

Watchers

Forkers

raonyguimaraes geneticresources e-macdonald-dunlop zscu johnm434 zhilizheng karini925

ldpred-funct's Issues

Unable to download 1000G_Phase3_baselineLD_ldscores.tgz 1000G_Phase3_baselineLD_v1.1_ldscores.tgz, 1000G_Phase3_baselineLD_v2.0_ldscores.tgz, 1000G_Phase3_baselineLD_v2.1_ldscores.tgz

Hi @carlaml ,

I cannot download any of the below data;
1000G_Phase3_baselineLD_ldscores.tgz
1000G_Phase3_baselineLD_v1.1_ldscores.tgz, 1000G_Phase3_baselineLD_v2.0_ldscores.tgz, 1000G_Phase3_baselineLD_v2.1_ldscores.tgz

Could you share the link from where the above folders are available?

Thanks

Question about input file

Dear,

I am trying to apply LDpred-funct to generate PRSs from the summary statistics. I have already successfully used the S-LDSC on the 1000G_Phase1_baseline_ldscores.tgz, weights_hm3_no_hla.tgz, 1000G_Phase1_frq.tgz, and 1000G_Phase1_cell_type_groups.tgz to generate the FUNCTFILE as the input of the LDpred-funct.

In the LDpred-funct package, you used TSI_simulated_trait.txt as the input of the phenotype file to calculate PRSs. I wonder what kind of phenotype file I should use in this scenario, or how should I generate a proper phenotype file as the input?

Thanks in advance!

Python3

Hi Carla, I'm trying a polygenic risk score using LDpred-funct and I'm I'm having some issues to adapt the scripts to Python3. Sorry, I'm really new on it all. Would you have these scripts compatible with Python3?

Thanks in advance!
Maria

Separate genotyping files used for LD-reference and validation(PRS)

Hi,

When using flag --gf [PLINK FILES], the plink files will be used for both LD-reference panel and validation, right?

Do you have any options (or flags) that separate these two functions?
Let say, I want to use 1000 Genome European plink files as the LD-reference panel, but using UK-Biobank plink files as my validation.

(Sorry, maybe you've mentioned this flag somewhere in the github pages, but I cannot find it, please guide me)

Thank you very much,

Restu

LDSC Per Snp Heritability

In the documentation and the paper you suggest using LDSC to get per SNP heritability values. However, I am only able to achieve total heritability of a trait that is caused by the SNPs, not many values, one for each SNP. Could you point me to the exact LDSC documentation that produces these per-SNP heritabilities. Also, should I be generating partitioned heritability based off of functional annotation? Thank you.

Can't run through the test data provided

can't run through the example data in the test folder. It prompts error with the code. I have installed the software correctly.
Looks like the software can't accept the FUNCT_same_h2gi arguments in the parse_sum_stats_standard_ldscore function.

Could you help me have a check run of the test data?

Traceback (most recent call last):
File "ldpredfunct.py", line 234, in
main()
File "ldpredfunct.py", line 174, in main
outfile=p_dict['coord'] + "_snps_NaN.txt", FUNCT_FILE=p_dict["FUNCT_FILE"],CHISQ=p_dict['chisq'],FUNCT_same_h2gi=p_dict['FUNCT_same_h2gi'],h2g=p_dict['H2'])
TypeError: parse_sum_stats_standard_ldscore() got an unexpected keyword argument 'FUNCT_same_h2gi'

Questions about LD files to use

I am trying to use LDpred-funct in order to generate a polygenic risk score from a summary statistic. I am following the tutorial reported here but I have some questions:

At first, I used S-LDSC in order to calculate regression coefficients. The github tutorial page of S-LDSC suggests to use as baseline model LD scores the ones in the file: 1000G_Phase1_baseline_ldscores.tgz downloaded from https://data.broadinstitute.org/alkesgroup/LDSCORE/

I managed to run S-LDSC and to obtain the result files. However, at the point of calculating the per -SNP heritability, present tutorial says to use baselineLD.*.annot.gz files, in order to extract the so called X matrices.

From which compressed tgz file, baselineLD.*.annot.gz files have to be taken? I see different possibilities in the download page:
1000G_Phase3_baselineLD_ldscores.tgz
1000G_Phase3_baselineLD_v1.1_ldscores.tgz, 1000G_Phase3_baselineLD_v2.0_ldscores.tgz, 1000G_Phase3_baselineLD_v2.1_ldscores.tgz
In the S-LDSC step when calculating the regression coefficients, I used a different baseline model LD file (baselineLD.*.annot.gz) than those reported above. Is this correct or possibly I should have used one of the baselineLD file above? (in case, which one?)
The tutorial says to divide regression coefficients by the parameter h2g, the latter being taken from the *.log file generated by S-LDSC. In the *.log file I don't see any h2g parameter, but rather a h2 parameter. Is that the correct one to be taken?

Thank you in advance for the answers to these questions

Column 5 in the output file Polygenic risk score for each individual in the validation.

Hi, I can't find the column 5 in the output file Polygenic risk score for each individual in the validation.I am wondering why it is not shown.

Reported R2 accuracy metric

Curious which of the R2 metrics produced in the logs is the value reported in the publication

"UnboundLocalError" during Step 2 when running LDpred-funct

Thanks for the work you've done developing and maintaining LDpred-funct! I'm running to an issue when running LDpred-funct that occurs during "Step 2: Compute posterior mean effect sizes" after the coordination step. In particular, I'm running into UnboundLocalError: local variable 'ldpred_effect_sizes' referenced before assignment when trying to calculate posterior means for chromosome 1. This error seems to be coming from line 298 ldpred_effect_sizes.extend(updated_betas) in LDpredfunct_bayes_shrink.py. I'm not sure what is causing this error or how to resolve it, and I'd appreciate any help in fixing this issue.

I've attached the log file in case it would be helpful for you to figure out what the issue is. Thank you so much!
LDpred_funct.log

Prediction R2 exceeding h2g

Testing on BMI trait, with h2g = 0.2011
However, LDpred-funct produces cross-validation average PRS R2 = 0.3144
How could this be possible?

Index out of bounds

Hello,
for some reason, if the last SNP in the SS file is included in the score, there is an index out of bounds error.

Traceback (most recent call last):
File "LDpred-funct/ldpredfunct.py", line 234, in
main()
File "LDpred-funct/ldpredfunct.py", line 181, in main
method="STANDARD_FUNCT", skip_ambiguous=p_dict['skip_ambiguous'])
File "LDpred-funct/coord_genotypes_ldpredfunct_v1_2.py", line 595, in coordinate_genot_ss
ss_flips = ss_flips[ok_indices['ss']] ### record flips
IndexError: index 174814 is out of bounds for axis 0 with size 174814

I have a quick workaround, but this doesn't solve the problem:


--- a/coord_genotypes_ldpredfunct_v1_2.py
+++ b/coord_genotypes_ldpredfunct_v1_2.py
@@ -564,9 +564,12 @@ def coordinate_genot_ss(genotype_filename=None,
                         continue
 
             # everything seems ok.
-            ok_indices['g'].append(g_i)
-            ok_indices['ss'].append(ss_i)
-            ok_nts.append(g_nt)
+            if ss_i < len(ss_indices):
+                ok_indices['ss'].append(ss_i)
+                ok_indices['g'].append(g_i)
+                ok_nts.append(g_nt)
+            else:
+                print "Skipping SNP because of index error"

Can't correctly install LDpred on a server

Hi,

Thanks for sharing this tool ! I have struggling trying to install ldpred.func but I can't get it to work. When I try to call the basic help function it prompt a 'None' message.
Is it compatible with python2 and 3 ? I have tested both but so far have not been successful...

Many thanks for your help !

Best,

Salim.