choishingwan / prsice Goto Github PK

View Code? Open in Web Editor NEW

180.0 13.0 84.0 161.78 MB

A software package for calculating, applying, evaluating and plotting the results of polygenic risk scores

Home Page: http://prsice.info

License: GNU General Public License v3.0

R 1.27% CMake 0.09% Makefile 0.01% C++ 36.28% C 62.24% Shell 0.11%

prs gwas

prsice's Introduction

PRSice

PRSice (pronounced 'precise') is a software package for calculating, applying, evaluating and plotting the results of polygenic risk scores (PRS). PRSice can run at high-resolution to provide the best-fit PRS as well as provide results calculated at broad P-value thresholds, illustrating results corresponding to either, can thin SNPs according to linkage disequilibrium and P-value ("clumping"), and can be applied across multiple traits in a single run.

Based on a permutation study we estimate a significance threshold of P = 0.001 for high-resolution PRS analyses - the work on this is included in our Bioinformatics paper on PRSice.

PRSice is a software package written in R and C++. PRSice runs as a command-line program with a variety of user-options and is freely available for download below, compatible for Unix/Linux/Mac OS

NOTE

Please refer to our website for more update instructions

Prerequisite

GCC version 4.8.1 or higher (for c++11) R version 3.2.3 or higher (for plotting)

Installation

You can directly download the binary files here. If you want to install PRSice, all you have to do is (The binary file will located in PRSice)

git clone https://github.com/choishingwan/PRSice.git
cd PRSice
g++ --std=c++11 -I inc/ -isystem lib/ -DNDEBUG -O3 -march=native src/*.cpp -lz -lpthread -o PRSice

Or if you have CMake version 3.1 or higher, you can do (The binary file will located in PRSice/bin)

git clone https://github.com/choishingwan/PRSice.git
cd PRSice
mkdir build
cd build
cmake ../
make

Rosalind users

You can compile a static version using the following command

git clone https://github.com/choishingwan/PRSice.git
cd PRSice
make

Citation

If you PRSice in any published work, please cite the following manuscript:

Choi SW, and O’Reilly PF. "PRSice-2: Polygenic Risk Score Software for Biobank-Scale Data." GigaScience 8, no. 7 (July 1, 2019). https://doi.org/10.1093/gigascience/giz082.

Note to Self

PLINK PRS range is inclusive. e.g. 0 - 0.5 includes also SNPs with p-value of 0 and 0.5

prsice's People

Contributors

Stargazers

Watchers

Forkers

mgandal rxseadew geneticresources monsanto-pinheiro jprnz biostat0903 yu-1011 17211020135 shameem356 recal2011 helenasophie mingchen4 wanglei19950504 oasisye dhlbh shahramb160 minta821 wenjiany jiawenchenn katiesaxby hl685 souzadevinicius katiesaxby1 lding2019 soozan888 bwbai wangdi2014 scimerc mpage21 hj1994412 jizhao666 ravishankaraindia lgb-cyber ofrei jinbinchan karatugo leachau sudorook getoarsopa rpatil524 ryanj-shao minqiao jjbblue tahseen-igib ningjing0831 sg31415 skywalker21th babasaraki nvrivera seinyoyo jujiaokang jinguameng newspring1 fmadani 448754951 rabiul3 raymondshang rach4r ermia1313 wsqmyself dyjun maorui223 shicheng-guo ammydk doris9922 liulang666666 jtnedoctor presteddy56 youngjune29bhak zbjbiubiubiu shmilyhbf smusleh xudongsunbit monica-lab hihg-um zlantanera ysayyed11 tkm1214 xwu226 yuzbill haihua-guo aliez2024 juditperala

prsice's Issues

Error Message

Hi,

I'm getting the following error message:

Read in Command Line Arguments & interpret

#################################
Error in assign(args[i], value, inherits = TRUE) :
cannot change value of locked binding for 'T'
Calls: parseCommandArgs -> assign
Execution halted

Please help?

Pathway PRSice (region selection)

Have not implemented the region selection algorithm

Should we use proxy SNPs or do we only use SNPs within the target region (e.g. Pathway?)

Plotting

One of the biggest pain is the difficulties in plotting the graphs in c++.

There doesn't seems to be any simple library. Might want to try out the pngwriter

Dosage File

Does not support dosage file at the moment

PRS for controls/unknown phenotypes

I have been trying to use this software to compute PRS for a group of individuals where it is unknown whether they have the phenotype. However when everyone in the .fam file of the target is a control I get an error:

There are no cases

And the program terminates. Is there a way to use this tool to give PRS for unknown phenotype/control individuals at a few p-value cutoffs, rather than an exhaustive search to the best model fit?

LD clumping and risk-increasing alleles

Hi, I have a couple of general questions to the methods I hope you can help me with.

In the Detailed Guide, it says "When your target sample is small (e.g. < 500 samples), you might want to use an external reference panel to improve the LD estimation for clumping.". I have around 500 samples in my target sample, but if I were to have a suitable reference panel for my work it would probably be the 1000 Genomes Phase 1 version 3 EUR samples, which have only 379 samples. Would you suggest I use a reference panel which is smaller than my target sample?
I'm new to PRS but the research unit I work in haven't used PRSice before and their older method always counted risk-increasing alleles- i.e. exhange A1 and A2 and invert the effect in the discovery sample in order to only sum risk-increasing alleles (output scores are >=0). It would help my understanding if I knew the reasoning behind PRSice not only counting risk-increasing alleles (the toy data output has PRS both above and below 0)? Or is there a command option for this I have overlooked?

Thanks in advance!
Joseph

Covariate calculation

Currently cannot calculate the covariate (e.g. PCA/MDS).

Currently should be of fairly low priority

Issues when calculating a PRS for a specific p-value (i.e. how to turn of full model PRS calculation and graphing errors)

Hi,

I am looking to calculate a PRS for a specific p-value cutoff: 0.00035. I am doing this for several cohorts. I am having two issues in doing this:

In one cohort, the full-model PRS (for p-value=1.0) outperforms the PRS at p-value 0.00035. I would like to turn off a PRS calculation for the full model so that participants can get the PRS corresponding to SNPs with p-value <= 0.00035 instead of 1 (i.e. to have this be the PRS in the best file.

I tried adding
" --full F \ " AND " --lower 0.00035 and --upper 0.00035" AND "--fastscore and --bar-levels 0.00035" to my code with no success. Is there something else I should be doing?

Second, two cohorts are having issues with the graphic outputs.

Error #1: Plotting the quantile plot
Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
Calls: run_plot ... model.matrix -> model.matrix.default -> contrasts<-
Execution halted

---could be because all participants in this cohort are female? (although I did get graphs for another all female cohort)

Error #2: Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, :
NA/NaN/Inf in 'y'
Calls: run_plot ... quantile_plot -> rstandard -> glm -> eval -> eval -> glm.fit
In addition: Warning messages:
1: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
2: In Ops.factor(eta, offset) : ‘-’ not meaningful for factors
3: In Ops.factor(y, mu) : ‘-’ not meaningful for factors
Execution halted

----could be because sparse genotyping in this cohort?

Any input would be greatly appreciated! Thank you!!!

clumping-r2 algorithm

Hi,

I have a general question about how the clumping algorithm works, with background on why below:

The reason I am curious is because of differences I got in results with the same input files. The two models I ran were:

Target files, base file coded as reference and minor allele, where either allele could be the risk allele
Same target files and same base file, but this time coded as reference and minor allele, where the reference allele was always the risk allele

Mathematically, I think these two base file types should give the same results, however they didn't. In contemplating how they gave different results, I noticed that the number of common SNPs (in base and target files) post-clumping differed in each instance---I presume this is what drove the differences in results (i.e. a different p-value was selected as best (i.e. highest R2)).

So my question, more specifically is what could cause differences in clumping between these two different runs? Or is there something else that could be causing these differences? If you would like any more specifics about my datasets or command-line options, I can send that along.

Thanks!
Kathryn

UPDATE***********

After further investigation, it seems like the differences stem from different number of mismatched variants being excluded (when finding overlap between base and target file). What kinds of SNPs are excluded at this point and why might this differ in these two different coding schemes?

Thanks!

gen file script

Can you please post an example script using a gen genotype file and impute sample file?

INFO score filtering on BGEN file directly

We can calculate the INFO score using the following equation (similar to PLINK)

import numpy as np

m=Mean of expected genotype
v=variance of expected genotype
p=m/2
p_a = 2p(1-p)
INFO = v/p_a

Support of different regression model

Should be able to add a "formula" parameter such that we can also natively support interactions (which seems to be one reason why people use --no-regress)

--keep-ambig not implemented?

Hi,
is --keep-ambig implemented ? if so it was not working for me, can you please provide an example usage code?

many thanks!
Daphna

Automatically use position information

Maybe add a flag to allow PRSice using the CHR:Coordinate mapping instead of using the rs id?

mismatched variants?

Hello,

This seems like a great software and I am looking forward to making use of it in my own research. I am trying to conduct a polygenic risk score analysis using a base file from published data to see its effect on a PLINK format target file. The target and base files both use rsIDs as SNP IDs, which matches (I also double checked that they match using the merge command on stata).

However, I am getting the following error:

Base file: ../basefile.txt 
207076 SNP(s) observed in base file, with: 
96112 variant(s) excluded due to p-value threshold 
1 variant(s) not found in target file 
207075 mismatched variant(s) excluded 
0 total SNPs included from base file 
 
Error: No valid SNPs remaining

As you can see, there are 207,076 variants in the base file, 1 of which is not found in the target file (I believe this is actually the header). Therefore there should technically be 207075 matches. However, "207075 mismatched variant(s)" were then excluded. Do you know why these variants could have 'mismatched' even though their rsIDs did match?

Thank you!
Melis

File check up-front

Might want to check all the file inputs at the very beginning (especially covariate file). It is rather annoying that the program error out after clumping and other procedures.

Performance of Clumping

Current PRSice script isn't very optimize with clumping. Better handling of the multi-threading in the clumping code will help to improve the speed drastically

Single program for PRSice

Rinside seems like a good choice, the problem is the dependency

What is the meaning of "change in residualized phenotype"?

In relation to the quantile plots, does this mean the change in the phenotype for those in a given quantile compared to the reference, while taking into account difference in covariates?

Regress out Covariates from PRS when --no-regress is used

Requested by Jessye

Standard deviation for the regression coefficient

It might be nice to also provide the SD for users

plink files

Currently only allow the input of one single file, does not allow the input of multiple chromosome

Not recognize non-numeric phenotypes / covariates

Might want to automatically change them to numbers (though that might not be the best way to handle this?)

Allow different PRS calculation

Might also be nice to allow user to calculate different form of PRS instead of the average

Error in reg$coefficients[1:num_quant, 1] : subscript out of bounds

Hi,

I am getting the error below, any suggestions? My code runs error free when I remove the print quantile plot option.

Plotting the quantile plot
Error in reg$coefficients[1:num_quant, 1] : subscript out of bounds
Calls: run_plot -> quantile_plot
Execution halted

Thanks!
Kathryn

Installation

Hi,

I'm trying to install the software. I'm on Mac OS X High Sierra, is it correct that I have to build it from source? Below is the message I got from the terminal.

Thanks,

Sander

swvanderlaan@Sanders-MacBook-Pro ~/git/PRSice
$ make
g++ -std=c++11 -g -I inc/ -isystem lib/  -pthread src/*.c* -o PRSice_debug
clang: warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]
src/plink_common.cpp:5772:25: error: constant expression evaluates to 18446744073709551615 which cannot be narrowed to type
      'long long' [-Wc++11-narrowing]
  const __m128i all1 = {0xffffffffffffffffLLU, 0xffffffffffffffffLLU};
                        ^~~~~~~~~~~~~~~~~~~~~
src/plink_common.cpp:5772:25: note: insert an explicit cast to silence this issue
  const __m128i all1 = {0xffffffffffffffffLLU, 0xffffffffffffffffLLU};
                        ^~~~~~~~~~~~~~~~~~~~~
                        static_cast<long long>( )
src/plink_common.cpp:5772:48: error: constant expression evaluates to 18446744073709551615 which cannot be narrowed to type
      'long long' [-Wc++11-narrowing]
  const __m128i all1 = {0xffffffffffffffffLLU, 0xffffffffffffffffLLU};
                                               ^~~~~~~~~~~~~~~~~~~~~
src/plink_common.cpp:5772:48: note: insert an explicit cast to silence this issue
  const __m128i all1 = {0xffffffffffffffffLLU, 0xffffffffffffffffLLU};
                                               ^~~~~~~~~~~~~~~~~~~~~
                                               static_cast<long long>( )
2 errors generated.
make: *** [PRSice_debug] Error 1

Base Data Set Format

This is quite a simple question, but please can you tell me how I can save my Excel spreadsheet of summary stats (the base file) as a whitespace tab delimited file as specified for the input?

Warning: No significant --clump results.

Hi,

I am getting the warning below (and soon after the execution is halted):

Warning: No significant --clump results. Skipping.
tail: cannot open `cleaned_base.clumped' for reading: No such file or directory
#################################

Deal with strand flips if target is in genotype format and produce input files for polygenic scoring

#################################
Error in read.table("Complete_Allele_List.txt", head = F) :
no lines available in input
Execution halted

Any idea what "No significant --clump results" means?

Thanks!

Problem with LD-files, get "WARNING: SNPs with chromosome number larger than 26"

Good morning,

I have a problem when I try to run the script below, including an external LD-file. It seems the problem is that there are SNPs with chromosome number larger than 26, but I don't think there is anything wrong with those files (official UKBB files at my University, so lot's of people are using them...). The job also seems to die at different chromosomes? I would greatly appreciate any help!

Thank you!

Kind regards,
Jenny

PRSice job with Locke BMI Comb on BMI, using different thresholds
SGE job ID: 8387676
SGE task ID: undefined
Run on host: compH001.cluster
Operating system: Linux
Working directory: /gpfs1/well/lindgren/jc/PRSice.scripts
Username: linc4222
Started date: Wed Mar 14 17:57:35 GMT 2018
##########################################################

PRSice 2.1.0.beta (14 Feb 2018)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Shing Wan (Sam) Choi, Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly
GNU General Public License v3

If you use PRSice in any published work, please cite:
Jack Euesden Cathryn M. Lewis Paul F. O'Reilly (2015)
PRSice: Polygenic Risk Score software.
Bioinformatics 31 (9): 1466-1468

2018-03-14 17:57:41
/well/lindgren/software/PRSice/PRSice_linux \
    --A1 A1 \
    --A2 A2 \
    --bar-levels 0.001000,0.050000,0.100000,0.200000,0.300000,0.400000,0.500000 \
    --base /well/lindgren/jc/PRSice.bmi/bmi.giant.2015.eur.comb.grch37.hrc.txt \
    --beta  \
    --binary-target F \
    --bp BP \
    --chr CHR \
    --clump-kb 500 \
    --clump-p 1.000000 \
    --clump-r2 0.100000 \
    --cov-col sex,age_at_assess,ArrayType,@PC[1-10],AgeSq \
    --cov-file /well/lindgren/UKBIOBANK_Info/Anthropometric_FatTraits/UKBiobank_Anthropometric_bodyFatPercent_FFMI_inv_residuals_COMBINED_October2017update.txt \
    --extract /well/lindgren/jc/ukbb/ukbb.snplist.non.dup.maf.info.hwe.txt \
    --ignore-fid  \
    --info-base INFO,0.9 \
    --interval 0.000050 \
    --ld /well/ukbb-wtchg/v2/imputation/ukb_imp_chr#_v2,/well/lindgren/UKBIOBANK_DATA_LINDGREN/Phenotype_data/July2017/ukb1186_imp_chr1_v2_s487398.sample \
    --ld-hard-thres 0.900000 \
    --ld-keep /well/lindgren/sara/giant-fineMapping/meta-analysis/index-snps/ld-panel/ukbb.imputed.snps.hrc.info_0.3.maf_0.0001.v2.chr1.fam \
    --ld-type bgen \
    --lower 0.000100 \
    --model add \
    --no-full  \
    --out /well/lindgren/jc/PRSice.bmi/locke.bmi.eur.comb.on.ukbb.bmi.diff.thresholds.180307/locke.bmi.eur.comb.on.ukbb.bmi.diff.thresholds.180307 \
    --pheno-col BMI \
    --pheno-file /well/lindgren/UKBIOBANK_Info/Anthropometric_FatTraits/UKBiobank_Anthropometric_bodyFatPercent_FFMI_inv_residuals_COMBINED_October2017update.txt \
    --print-snp  \
    --pvalue P \
    --score std \
    --se SE \
    --seed 3450830109 \
    --snp SNP \
    --stat BETA \
    --target /well/ukbb-wtchg/v2/imputation/ukb_imp_chr#_v2,/well/lindgren/UKBIOBANK_DATA_LINDGREN/Phenotype_data/July2017/ukb1186_imp_chr1_v2_s487398.sample \
    --thread 10 \
    --type bgen \
    --upper 0.500000


Loading Genotype file:
/well/ukbb-wtchg/v2/imputation/ukb_imp_chr#_v2 (bgen)
With sample file:
/well/lindgren/UKBIOBANK_DATA_LINDGREN/Phenotype_data/July2017/ukb1186_imp_chr1_v2_s487398.sample


Detected bgen sample file format
487409 people (0 male(s), 0 female(s)) observed
487409 founder(s) included


7402K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr1_v2.bgen
8129K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr2_v2.bgen
6696K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr3_v2.bgen
6555K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr4_v2.bgen
6070K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr5_v2.bgen
5349K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr6_v2.bgen
5405K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr7_v2.bgen
5282K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr8_v2.bgen
4066K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr9_v2.bgen
4562K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr10_v2.bgen
4628K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr11_v2.bgen
4431K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr12_v2.bgen
3270K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr13_v2.bgen
3037K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr14_v2.bgen
2767K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr15_v2.bgen
3089K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr16_v2.bgen
2660K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr17_v2.bgen
2599K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr18_v2.bgen
2087K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr19_v2.bgen
2082K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr20_v2.bgen
1261K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr21_v2.bgen
1255K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr22_v2.bgen
13902649 ambiguous variant(s) excluded
31505390 variant(s) included


1 region included

Start processing bmi.giant.2015.eur.comb.grch37.hrc
==============================


Reading 100.00%
Base file:
/well/lindgren/jc/PRSice.bmi/bmi.giant.2015.eur.comb.grch37.hrc.txt
2541036 variant(s) observed in base file, with:
6 duplicated variant(s)
629109 variant(s) excluded due to p-value threshold
20 ambiguous variant(s) excluded
1247728 variant(s) not found in target file
47 mismatched variant(s) excluded
664170 total variant(s) included from base file

WARNING: Mismatched SNPs detected between base and
target!You should check the files are based on the same
genome build
Or that can just be InDels


Loading reference panel


Loading Genotype file:
/well/ukbb-wtchg/v2/imputation/ukb_imp_chr#_v2 (bgen)
With sample file:
/well/lindgren/UKBIOBANK_DATA_LINDGREN/Phenotype_data/July2017/ukb1186_imp_chr1_v2_s487398.sample


Detected bgen sample file format
487409 people (0 male(s), 0 female(s)) observed
487409 founder(s) included


7402K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr1_v2.bgen
8129K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr2_v2.bgen
6696K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr3_v2.bgen
6555K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr4_v2.bgen
6070K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr5_v2.bgen
5349K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr6_v2.bgen
5405K SNPs processed in /well/ukbb-wtchg/v2/imputation/ukb_imp_chr7_v2.bgen
WARNING: SNPs with chromosome number larger than 26uon/ukb_imp_chr8_v2.bgen
         They will be ignored!
Error:
Execution halted
###########################################################
Finished at: Wed Mar 14 19:20:53 GMT 2018
###########################################################

Support for GZ base file

Sometimes it is good to have a compressed base file to reduce the storage requirement

mend.score

We have not implemented this in the new version

mend.score: If mend.score T, the first n SNPs will be added from the base data set, sorted by P-value, one by one in order to verify that there are no individual loci of large effect influencing target phenotype. If mend.score F, these will not be added. Default value is F

Score options

requested by Joni. Allow for different scoring options

Better log output

There are a number of problem with the current log

Not server friendly (might want to add a --verbose option where we can turn off the % information)
Need manual capturing (should automatically log the input and output statistics in a log file)
Some information are ambiguous (number of SNPs excluded, but why are they excluded?)

Clarification on clumping

I have 2 quick questions on the clumping algorithm you use:

Do you use all the target dataset or only the controls of this target dataset (when there is a binary phenotype, of course)?
How do you estimate LD: with the pearson correlation or some haplotypic estimate (like PLINK)?

Error happens in small sample size

Hi,

When I try to utilize PRSice in our data which only contain 105 samples, errors happened as below:
ERROR: GLM model did not converge!
Please send me the DEBUG files

So the question is that whether PRSice is usable for small sample size analysis?

Thanks!

Eugene

Number of SNPs for each threshold

I am using the latest version of PRSice.

By default, the score for the p-value threshold 1.0 is produced including the score for the thresholds set in the --bar-levels parameter.

Also the number of SNPs used for each threshold calculation is not available even if I set the --print-snp option.

With the same set of parameters used in the previous version and the current version, the score are not exactly the same, although the correlation is very high (0.986). Is there a reason for this change?

Thanks
Anbu.

Tutorial problem

I'm trying to learn how to use this tool, so I started the tutorial. But I get this error and I don't have at the moment other TARGET dataset to use. I tried to redownload the tutorial files but I get the same error. Could you help me?

R -q --file=/usr/local/bin/PRSice_v1.25/PRSice_v1.25.R --args \ base TOY_BASE_GWAS.assoc \ target TOY_TARGET_DATA \ slower 0 \ supper 0.5 \ sinc 0.01 \ covary F \ clump.snps F \ plink /usr/local/bin/PRSice_v1.25/plink_1.9_linux_160914 \ figname EXAMPLE_1

library(batch)

start.time <- proc.time()[3]

options(echo = FALSE)

#################################

PRSice: Polygenic Risk Score software

Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly 2014

If you use PRSice in published work, please cite:

"PRSice: Polygenic Risk Score software"

Euesden, Lewis, O'Reilly, Bioinformatics (2015) 31 (9):1466-1468

#################################

Read in Command Line Arguments & interpret

#################################

$ base

[1] "TOY_BASE_GWAS.assoc"

$ target

[1] "TOY_TARGET_DATA"

$ slower

[1] 0

$ supper

[1] 0.5

$ sinc

[1] 0.01

$ covary

[1] "F"

$ clump.snps

[1] "F"

$ plink

[1] "/usr/local/bin/PRSice_v1.25/plink_1.9_linux_160914"

$ figname

[1] "EXAMPLE_1"

#################################

Check options match

#################################

[1] "ERROR: Please Supply a TARGET DATA SET"

Installation Error

I am trying to install the mac version v2.1.1.beta of PRSice using the following:
Rscript PRSice.R

I keep receiving the following error:
Error: Cannot run PRSice without the PRSice binary file
Execution halted

Can you please advise?

Problem with --ld

User report that the use of --ld cause a memory malloc error, which is likely caused by memory problems.

Also, --keep and --remove seems to be applied to the --ld file too. (So, likely that --ld-keep and --ld-keep doesn't work)

LD for 32 bit

For 32 bit system, the LD calculation should be bugged. Need to correct it when I have time. However, we don't have a 32 bit system to test run the code...

No variants remain after clumping

When running PRSice using a reference LD panel, I get the following output with the default clumping parameters:

Loading Genotype file:
reference
(bed)

503 people (0 male(s), 0 female(s)) observed
503 founder(s) included

38 ambiguous variant(s) excluded
552458 variant(s) excluded based on MAF threshold
6566803 variant(s) included

Start performing clumping
Clumping Progress: 100.00%
Number of variant(s) after clumping : 0

No SNPs left for PRSice processing

And the program terminates. Increasing the r2 cutoff even to 1.0 doesn't help and I get the same error.

--beta T does not specify BETA column

Hi Sam,

When I specified --beta T below, PRSice failed with WARNING: OR detected but user suggest the input is beta! .

The log file shows PRSice is still looking for a column entitled "OR":

User Defined Column Headers
==============================
Chr : CHR
SNP : SNP
BP : BP
Ref Allele : A1
Alt Allele : A2
Statistic : OR
Standard Error : SE
P-value : P

This appears to be resolved by specifying --stat BETA

Can

--stat BETA be automatically set when --beta T is set, or
can the user be prompted to add --stat BETA when using --beta T ?

Cheers

Confidence interval for R2

Will be useful to have the confidence interval for R2

PRSet --perm

User reported Segmentation fault when using PRSet + --perm

Error in fix.by(by.y, y) : 'by' must specify a uniquely valid column

I get this error at the Regression Model stage in running PRSice. What could be the cause of this error? Which files are being merged? Does it mean that I need a variable by the same name in each file?

Thanks in advance for any help!

Clumped SNPs

Hello,

Would it be possible to have an option to get the list of clumped SNPs for each SNPs used to construct the best PRS (when using the --print-snp command)?

Thanks a lot,
Thomas

multiple bgen files

PRSice allows for bgen files split across chromosomes as bgenfilename#, but my data is in many smaller chunks (e.g. bgenfilename_chr1_00001-40000, bgenfilename_chr2_40001-80000, ... etc). The input doesn't seem to allow boolean. How can I use PRSice for data in this format?

For example, when I try to run:
Rscript PRSice.R
--dir .
--prsice PRSice_linux
--base basedata.assoc
--target bgenfilename_chr#*,samplefile.sample
--thread 1
--stat OR
--binary-target T
--type bgen
--keep ids.txt
--ignore-fid

I get this error message:
ERROR: Cannot open bgen file bgenfilename_chr1*.bgen

Any assistance would be greatly appreciated.

Thank you very much!

Target Data Set

I have my target data set in separate files by chromosome, named chr1-geno.qc.bed, chr1-geno.qc.bim, chr1-geno.qc.fam, for each chromosome. I've used geno as list T in the arguements in the script, but I'm not sure how to direct to the target data set.

Im also not sure how to successfully run the alternative option of merging the plink files to make a single target data file.

Thank you

Specify .bed, .bim, .fam separately

Particularly in view of UKBiobank work (where separate .fam files are passed to shared .bed and .bim files), it would be valuable for PRSice to have the option to specify the components of the target binary file separately.

ERROR: Cannot open log file

Hi,

I am trying to reproduce the wiki examples but I obtain the following error:

#####################

Rscript PRSice.R --prsice PRSice_linux --base TOY_BASE_GWAS.assoc --no-clump --target TOY_TARGET_DATA --stat OR --binary-target T --out outputs/out --full

PRSice 2.0.13.beta (14 October 2017)
https://github.com/choishingwan/PRSice
(C) 2016-2017 Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly, Sam Choi
GNU General Public License v3

Wed Oct 25 17:12:58 2017

./PRSice_linux
--base TOY_BASE_GWAS.assoc
--out outputs/out
--target TOY_TARGET_DATA
--bar-levels 0.001,0.05,0.1,0.2,0.3,0.4,0.5
--binary-target T
--stat OR
--chr CHR
--A1 A1
--A2 A2
--snp SNP
--bp BP
--pvalue P
--thread 1
--interval 0.000050
--lower 0.000100
--upper 0.500000
--no-clump
--full

Loading Genotype file: TOY_TARGET_DATA (bed)
2000 people (1024 males, 976 females) observed
2000 founder(s) included
91062 variants included

1 region included

Start processing: TOY_BASE_GWAS

Reading 100.00%
91063 SNP(s) observed in base file, with:
2226 variant(s) located on haploid chromosome
1 variant(s) not found in target file
88836 total SNPs included from base file

Seed: 2634118250
ERROR: Cannot open log file:
Error:
Execution halted

#########

What I am doing wrong?

Thank you,

Ivan

choishingwan / prsice Goto Github PK

prsice's Introduction

PRSice

NOTE

Prerequisite

Installation

Rosalind users

Citation

Note to Self

prsice's People

Contributors

Stargazers

Watchers

Forkers

prsice's Issues

Read in Command Line Arguments & interpret

Deal with strand flips if target is in genotype format and produce input files for polygenic scoring

PRSice: Polygenic Risk Score software

Jack Euesden, Cathryn M. Lewis, Paul F. O'Reilly 2014

If you use PRSice in published work, please cite:

"PRSice: Polygenic Risk Score software"

Euesden, Lewis, O'Reilly, Bioinformatics (2015) 31 (9):1466-1468

Read in Command Line Arguments & interpret

Check options match

Start processing: TOY_BASE_GWAS

Recommend Projects

Recommend Topics

Recommend Org