Giter Site home page Giter Site logo

carlaml / ldpred-funct Goto Github PK

View Code? Open in Web Editor NEW
17.0 17.0 7.0 32.4 MB

Here we develop the software for the method LDpredfunct described in https://www.biorxiv.org/content/early/2018/07/24/375337

License: GNU General Public License v3.0

Python 100.00%

ldpred-funct's People

Contributors

carlaml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

ldpred-funct's Issues

Question about input file

Dear,

I am trying to apply LDpred-funct to generate PRSs from the summary statistics. I have already successfully used the S-LDSC on the 1000G_Phase1_baseline_ldscores.tgz, weights_hm3_no_hla.tgz, 1000G_Phase1_frq.tgz, and 1000G_Phase1_cell_type_groups.tgz to generate the FUNCTFILE as the input of the LDpred-funct.

In the LDpred-funct package, you used TSI_simulated_trait.txt as the input of the phenotype file to calculate PRSs. I wonder what kind of phenotype file I should use in this scenario, or how should I generate a proper phenotype file as the input?

Thanks in advance!

Python3

Hi Carla, I'm trying a polygenic risk score using LDpred-funct and I'm I'm having some issues to adapt the scripts to Python3. Sorry, I'm really new on it all. Would you have these scripts compatible with Python3?

Thanks in advance!
Maria

Separate genotyping files used for LD-reference and validation(PRS)

Hi,

When using flag --gf [PLINK FILES], the plink files will be used for both LD-reference panel and validation, right?

Do you have any options (or flags) that separate these two functions?
Let say, I want to use 1000 Genome European plink files as the LD-reference panel, but using UK-Biobank plink files as my validation.

(Sorry, maybe you've mentioned this flag somewhere in the github pages, but I cannot find it, please guide me)

Thank you very much,

Restu

LDSC Per Snp Heritability

In the documentation and the paper you suggest using LDSC to get per SNP heritability values. However, I am only able to achieve total heritability of a trait that is caused by the SNPs, not many values, one for each SNP. Could you point me to the exact LDSC documentation that produces these per-SNP heritabilities. Also, should I be generating partitioned heritability based off of functional annotation? Thank you.

Can't run through the test data provided

can't run through the example data in the test folder. It prompts error with the code. I have installed the software correctly.
Looks like the software can't accept the FUNCT_same_h2gi arguments in the parse_sum_stats_standard_ldscore function.

Could you help me have a check run of the test data?

Traceback (most recent call last):
File "ldpredfunct.py", line 234, in
main()
File "ldpredfunct.py", line 174, in main
outfile=p_dict['coord'] + "_snps_NaN.txt", FUNCT_FILE=p_dict["FUNCT_FILE"],CHISQ=p_dict['chisq'],FUNCT_same_h2gi=p_dict['FUNCT_same_h2gi'],h2g=p_dict['H2'])
TypeError: parse_sum_stats_standard_ldscore() got an unexpected keyword argument 'FUNCT_same_h2gi'

Questions about LD files to use

I am trying to use LDpred-funct in order to generate a polygenic risk score from a summary statistic. I am following the tutorial reported here but I have some questions:

At first, I used S-LDSC in order to calculate regression coefficients. The github tutorial page of S-LDSC suggests to use as baseline model LD scores the ones in the file: 1000G_Phase1_baseline_ldscores.tgz downloaded from https://data.broadinstitute.org/alkesgroup/LDSCORE/

I managed to run S-LDSC and to obtain the result files. However, at the point of calculating the per -SNP heritability, present tutorial says to use baselineLD.*.annot.gz files, in order to extract the so called X matrices.

  1. From which compressed tgz file, baselineLD.*.annot.gz files have to be taken? I see different possibilities in the download page:
    1000G_Phase3_baselineLD_ldscores.tgz
    1000G_Phase3_baselineLD_v1.1_ldscores.tgz, 1000G_Phase3_baselineLD_v2.0_ldscores.tgz, 1000G_Phase3_baselineLD_v2.1_ldscores.tgz

  2. In the S-LDSC step when calculating the regression coefficients, I used a different baseline model LD file (baselineLD.*.annot.gz) than those reported above. Is this correct or possibly I should have used one of the baselineLD file above? (in case, which one?)

  3. The tutorial says to divide regression coefficients by the parameter h2g, the latter being taken from the *.log file generated by S-LDSC. In the *.log file I don't see any h2g parameter, but rather a h2 parameter. Is that the correct one to be taken?

Thank you in advance for the answers to these questions

"UnboundLocalError" during Step 2 when running LDpred-funct

Thanks for the work you've done developing and maintaining LDpred-funct! I'm running to an issue when running LDpred-funct that occurs during "Step 2: Compute posterior mean effect sizes" after the coordination step. In particular, I'm running into UnboundLocalError: local variable 'ldpred_effect_sizes' referenced before assignment when trying to calculate posterior means for chromosome 1. This error seems to be coming from line 298 ldpred_effect_sizes.extend(updated_betas) in LDpredfunct_bayes_shrink.py. I'm not sure what is causing this error or how to resolve it, and I'd appreciate any help in fixing this issue.

I've attached the log file in case it would be helpful for you to figure out what the issue is. Thank you so much!
LDpred_funct.log

Prediction R2 exceeding h2g

Testing on BMI trait, with h2g = 0.2011
However, LDpred-funct produces cross-validation average PRS R2 = 0.3144
How could this be possible?

Index out of bounds

Hello,
for some reason, if the last SNP in the SS file is included in the score, there is an index out of bounds error.

Traceback (most recent call last):
File "LDpred-funct/ldpredfunct.py", line 234, in
main()
File "LDpred-funct/ldpredfunct.py", line 181, in main
method="STANDARD_FUNCT", skip_ambiguous=p_dict['skip_ambiguous'])
File "LDpred-funct/coord_genotypes_ldpredfunct_v1_2.py", line 595, in coordinate_genot_ss
ss_flips = ss_flips[ok_indices['ss']] ### record flips
IndexError: index 174814 is out of bounds for axis 0 with size 174814

I have a quick workaround, but this doesn't solve the problem:


--- a/coord_genotypes_ldpredfunct_v1_2.py
+++ b/coord_genotypes_ldpredfunct_v1_2.py
@@ -564,9 +564,12 @@ def coordinate_genot_ss(genotype_filename=None,
                         continue
 
             # everything seems ok.
-            ok_indices['g'].append(g_i)
-            ok_indices['ss'].append(ss_i)
-            ok_nts.append(g_nt)
+            if ss_i < len(ss_indices):
+                ok_indices['ss'].append(ss_i)
+                ok_indices['g'].append(g_i)
+                ok_nts.append(g_nt)
+            else:
+                print "Skipping SNP because of index error"
 


Can't correctly install LDpred on a server

Hi,

Thanks for sharing this tool ! I have struggling trying to install ldpred.func but I can't get it to work. When I try to call the basic help function it prompt a 'None' message.
Is it compatible with python2 and 3 ? I have tested both but so far have not been successful...

Many thanks for your help !

Best,

Salim.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.