suchestoncampbelllab / gwasurvivr Goto Github PK

GWAS Survival Package in R

R 100.00%

gwasurvivr's Introduction

Introduction

gwasurvivr can be used to perform survival analyses of imputed genotypes from Sanger and Michigan imputation servers and IMPUTE2 software. This vignette is a tutorial on how to perform these analyses. This package can be run locally on a Linux, Mac OS X, Windows or conveniently batched on a high performing computing cluster. gwasurvivr iteratively processes the data in chunks and therefore intense memory requirements are not necessary.
gwasurvivr package comes with three main functions to perform survival analyses using Cox proportional hazard (Cox PH) models depending on the imputation method used to generate the genotype data:

michiganCoxSurv: Performs survival analysis on imputed genetic data stored in compressed VCF files generated via Michigan imputation server.
sangerCoxSurv: Performs survival analysis on imputed genetic data stored in compressed VCF files generated via Sanger imputation server.
impute2CoxSurv: Performs survival analysis on imputed genetic data from IMPUTE2 output.
gdsCoxSurv: For files that are already in GDS format (originally in IMPUTE2 format), users can provide a path to their GDS file and perform survival analysis and avoid having to recompress their files each run.
plinkCoxSurv: For directly typed data (or imputed data that is thresholded in plink) that are plink format (.bed, .bim, .fam files), users can can perform survival analysis.

All functions fit a Cox PH model to each SNP including other user defined covariates and will save the results as a text file directly to disk that contains survival analysis results. gwasurvivr functions can also test for interaction of SNPs with a given covariate. See examples for further details.

Installation

This package is currently available on Bioconductor devel branch or by using devtools library for R >= 3.4 and going to the Sucheston Campbell Lab GitHub repository (this page). If using R 3.5, use BiocManager to install the package, if using R >= 3.4, BiocInstaller or biocLite can be used.

For R >= 3.5:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("gwasurvivr", version = "devel")

Alternatively:

if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("suchestoncampbelllab/gwasurvivr")

For R >= 3.4 and R < 3.5:

source("https://bioconductor.org/biocLite.R")
biocLite("gwasurvivr")

How to use package

Please refer to the vignette for a detailed description on how to use gwasurvivr functions for survival analysis (Cox proportional hazard model).

gwasurvivr's People

Contributors

Stargazers

Watchers

Forkers

nemochina2008 geneticresources fabbondanza genomicsiter xxu1 wangro55 aarizvi markplatts alyssacl lucavd emiuga alisajid

gwasurvivr's Issues

empty output file for SNP*covariate interaction

Hey!

When I run michiganCoxSurv function with the inter.term argument, the output file is empty. However, I still get the file with removed SNPs. There is no errors or warnings. The same issue persists when I run it on the example (https://bioconductor.org/packages/devel/bioc/vignettes/gwasurvivr/inst/doc/gwasurvivr_Introduction.html#312_SNP_with_covariate_interaction)

I'll appreciate your help with resolving this issue, thank you!

sangerCoxSurv fails in tutorial

I'm trying to go through the tutorial for gwasurvivr in the vignette: https://bioconductor.org/packages/devel/bioc/vignettes/gwasurvivr/inst/doc/gwasurvivr_Introduction.html

but as I'm going through, I'm executing the code that I see in the R session examples:


library(gwasurvivr)
print("finished loading gwasurvivr")
vcf.file <- system.file(package="gwasurvivr","extdata", "michigan.chr14.dose.vcf.gz")
pheno.fl <- system.file(package="gwasurvivr", "extdata", "simulated_pheno.txt")
pheno.file <- read.table(pheno.fl, sep=" ", header=TRUE, stringsAsFactors = FALSE)
pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L)
sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2

print("I am running michiganCoxSurv...");
michiganCoxSurv(vcf.file=vcf.file,
                covariate.file=pheno.file,
                id.column="ID_2",
                sample.ids=sample.ids,
                time.to.event="time",
                event="event",
                covariates=c("age", "SexFemale", "DrugTxYes"),
                inter.term=NULL,
                print.covs="only",
                out.file="michigan_only",
                r2.filter=0.3,
                maf.filter=0.005,
                chunk.size=100,
                verbose=TRUE,
                clusterObj=NULL)
print("I finished michiganCoxSurv");
# recode sex column and remove first column 
pheno.file$SexFemale <- ifelse(pheno.file$sex=="female", 1L, 0L)
# select only experimental group sample.ids
sample.ids <- pheno.file[pheno.file$group=="experimental",]$ID_2
head(sample.ids)
print("running sangerCoxSurv...");
sangerCoxSurv(vcf.file=vcf.file,
              covariate.file=pheno.file,
              id.column="ID_2",
              sample.ids=sample.ids,
              time.to.event="time",
              event="event",
              covariates=c("age", "SexFemale", "DrugTxYes"),
              inter.term=NULL,
              print.covs="only",
              out.file="sanger_only",
              info.filter=0.3,
              maf.filter=0.005,
              chunk.size=100,
              verbose=TRUE,
              clusterObj=NULL)
print("I finished sangerCoxSurv");

However, this fails and gives a strange error with sangerCoxSurv:

[1] "running sangerCoxSurv..."
Analysis started on 2020-04-14 at 17:07:36
Covariates included in the models are: age, DrugTxYes, SexFemale
52 samples are included in the analysis
Analyzing chunk 0-100
Error in `$<-.data.frame`(`*tmp*`, "RefPanelAF", value = list()) : 
  replacement has 0 rows, data has 3
Calls: sangerCoxSurv -> coxVcfSanger -> $<- -> $<-.data.frame
In addition: Warning message:
In .vcf_usertag(map, tag, nm, verbose) :
  ScanVcfParam ‘info’ fields not found in  header: ‘RefPanelAF’ ‘TYPED’ ‘INFO’
Execution halted```

why am I getting this error? did I do something wrong?

session information:```


> library(gwasurvivr)
> sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gwasurvivr_1.4.0

loaded via a namespace (and not attached):
 [1] Biobase_2.46.0              httr_1.4.1                 
 [3] tidyr_1.0.2                 bit64_0.9-7                
 [5] splines_3.6.3               assertthat_0.2.1           
 [7] askpass_1.1                 BiocFileCache_1.10.2       
 [9] stats4_3.6.3                GWASTools_1.32.0           
[11] blob_1.2.1                  BSgenome_1.54.0            
[13] GenomeInfoDbData_1.2.2      GWASExactHW_1.01           
[15] Rsamtools_2.2.3             progress_1.2.2             
[17] pillar_1.4.3                RSQLite_2.2.0              
[19] backports_1.1.5             lattice_0.20-41            
[21] quantreg_5.54               glue_1.3.2                 
[23] digest_0.6.25               GenomicRanges_1.38.0       
[25] XVector_0.26.0              sandwich_2.5-1             
[27] Matrix_1.2-18               XML_3.99-0.3               
[29] pkgconfig_2.0.3             broom_0.5.5                
[31] biomaRt_2.42.0              SparseM_1.78               
[33] zlibbioc_1.32.0             purrr_0.3.3                
[35] BiocParallel_1.20.1         MatrixModels_0.4-1         
[37] tibble_2.1.3                openssl_1.4.1              
[39] mgcv_1.8-31                 generics_0.0.2             
[41] IRanges_2.20.2              SummarizedExperiment_1.16.1
[43] GenomicFeatures_1.38.2      BiocGenerics_0.32.0        
[45] SNPRelate_1.20.1            survival_3.1-11            
[47] magrittr_1.5                crayon_1.3.4               
[49] memoise_1.1.0               mice_3.8.0                 
[51] nlme_3.1-144                tools_3.6.3                
[53] prettyunits_1.1.1           hms_0.5.3                  
[55] lifecycle_0.2.0             matrixStats_0.56.0         
[57] stringr_1.4.0               S4Vectors_0.24.3           
[59] DelayedArray_0.12.2         gdsfmt_1.22.0              
[61] AnnotationDbi_1.48.0        Biostrings_2.54.0          
[63] compiler_3.6.3              GenomeInfoDb_1.22.0        
[65] logistf_1.23                rlang_0.4.5                
[67] grid_3.6.3                  RCurl_1.98-1.1             
[69] rappdirs_0.3.1              VariantAnnotation_1.32.0   
[71] bitops_1.0-6                DNAcopy_1.60.0             
[73] curl_4.3                    DBI_1.1.0                  
[75] R6_2.4.1                    GenomicAlignments_1.22.1   
[77] zoo_1.8-7                   rtracklayer_1.46.0         
[79] dplyr_0.8.5                 bit_1.1-15.2               
[81] stringi_1.4.6               parallel_3.6.3             
[83] Rcpp_1.0.4                  quantsmooth_1.52.0         
[85] vctrs_0.2.4                 dbplyr_1.4.2               
[87] tidyselect_1.0.0            lmtest_0.9-37

Flip dosage

Hello!
I am currently running genome-wide survival anallysis using plinkCoxSurv model adjusted for gender and age. The results for some snps have the opposite direction of effect compared to previously published data, that used minor alleles for their HR calculation. I assume the HR are calculated for A1 allele, which sometimes is a major allele. What does the option flip.dosage do?

Thank you for your help!

Error in if (nrow(genotypes) > 0) { : argument is of length zero

Hello. Thank you very much for creating and maintaining this great tool!

I tried to use plinkCoxSurv but encountered several error messages. When I was trying with chr1, Error in snpgdsBED2GDS(bed.file, fam.file, bim.file, gdsfile, cvt.chr = "int", : Stream write error occurred, so I tried with chr22 as below, then now I am getting this error message at the same part.

Analyzing part 251/2585...
Analyzing part 252/2585...
Analyzing part 253/2585...
Error in if (nrow(genotypes) > 0) { : argument is of length zero

plinkCoxSurv(bed.file="/data/chr22.bed",
covariate.file=covariate.file,
id.column="IID",
sample.ids=sample.ids,
time.to.event="age",
event="event",
covariates=c("SexFemale", "GenoArray", "pc1", "pc2", "pc3", "pc4", "pc5", "pc6", "pc7", "pc8", "pc9", "pc10"),
inter.term=NULL,
print.covs="only",
out.file="chr22",
chunk.size=50,
maf.filter=0.05,
flip.dosage=TRUE,
verbose=TRUE,
clusterObj=NULL)

I would like to ask you how to solve these two errors.
Thank you for your help in advance!

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

No covariates and effect allele documentation

"We recently conducted a GWAS using your gwasurvivr package (thank you!) and were wondering if it is possible to run the analysis without covariate adjustment (the function insists on receiving an nonempty character vector of covariates). If so, can you please advise us on how to do so?"

"On a related note, can you please clarify which allele is being modeled as the effect allele in the output file and which column contains the frequency of the effect allele in the sample? We can’t seem to find this in the documentation or supplementary documentation.

For example, AF is generally equal to MAF, and SAMP_FREQ_ALT is generally equal to SAMP_MAF. But MAF and SAMP_MAF are not equal to each other."

input format for gwasurvivr

thanks for a great package. My data is in .bgen format. I guess that can't be used as direct input for gwasurvivr? Do you recommend I use plink to export the .bgen files into .vcf format (version 4.3)? There are various export options but do you agree that this is the best option?

interaction doesn't work

Getting error: Error in is.data.frame(x) : object 'cox.out' not found
Probably survFitInt() isn't working properly and not outputting a data frame

question about chunk.size

I set chunk.size to be 10000. But it stops running after only doing analysis of one chunk.

Analyzing part 1/829...
Analysis completed on 2021-01-04 at 12:55:39

The function I use is plinkCoxSurv.

How should I use it to analysis the all 8290000 SNPs? should I set the chunk.size to be the number of snps?

Error in plinkCoxSurv - could not find function "coxph.fit"

Hi, I am just getting started and tested run the example for plinkCoxSurv. But the output said below, and generated blank documents ("impute_example.coxph" and "impute_example.snps_removed"). I've tried uninstall and install the "survival" package but seems not working. Any suggestion and help will be appreciated! Thanks in advance!

Analyzing part 1/1...
Error in checkForRemoteErrors(val) :
3 nodes produced errors; first error: could not find function "coxph.fit"

Including 'cluster' in the coxph call

Hi, I'm using your package for a GWAS on patients collected in different centers.

Since the workforce behind the computation is the survival package, I was wandering if it is possible (and maybe how) to include the term "cluster" in the coxph call.

I've seen the you used the coxph.fit function to fit the models, but I can not see how to incorporate the cluster parameter in the function. Usually, I would have called coxph(Surv(time, event) ~ covariates + cluster(center), but I'm not sure on how to modify your code to make this happen.

I'll appreciate your help and congratulations for the wonderful package

Error in rnorm(nrow(cox.params$pheno.file)) : invalid arguments

Hello
I am trying to use gwasurvivr to run a survival analysis on plink data files.
However, when I try to run it, I get the following error message:
Error in rnorm(nrow(cox.params$pheno.file)) : invalid arguments
My variables ‘event’ and ‘time to event’ are both numeric variables.
Could you help me with this?
Thanks!

MAF snps_removed is wrong

Thank you to the developer for creating this software.
When the MAF is set, it can accurately delete SNPs. While this does not affect the outcomes in the main file, the snps_removed file that is generated is incorrect.
Using plinkCoxSurv, Figure 1 shows 3 SNPs as an example, and Figures 2 and 3 are for MAF 0.4. It is evident that the snps_removed file is incorrect.
The same happens when I analyze using my own files.

Error in snpgdsBED2GDS(bed.file, fam.file, bim.file, gdsfile, cvt.chr = "int",

Hi, when using gwasurvivr, i am encountering the following error message, which i cant figure out how to get around.

Error in snpgdsBED2GDS(bed.file, fam.file, bim.file, gdsfile, cvt.chr = "int",

of note, when i run the analysis on plink files containing only a small number of variants, the program works perfectly and i dont get this error message. But the problem arises when i try and scale this up to larger files.

I have copied and pasted the output below. here im trying to run an analysis of all chromosome 1 variants available in my genetic dataset.

plinkCoxSurv(bed.file="/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/gwasready_chr1.bed", covariate.file=df, covariates=c("age", "gender"), id.column="eid_string", time.to.event="X_t", event="fail", out.file="fib4_agesex",inter.termCovariates included in the models are: gender, ageverbose = TRUE, clusterObj = NULL )
4211 samples are included in the analysis
Start snpgdsBED2GDS ...
BED file: "/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/gwasready_chr1.bed" in the SNP-major mode (Sample X SNP)
FAM file: "/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/gwasready_chr1.fam", DONE.
BIM file: "/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/gwasready_chr1.bim", DONE.
Wed Sep 11 14:28:23 2019 store sample id, snp id, position, and chromosome.
start writing: 487409 samples, 477109 SNPs ...
Wed Sep 11 14:28:23 2019 0%
Wed Sep 11 14:28:56 2019 4%
Wed Sep 11 14:29:33 2019 8%
Wed Sep 11 14:30:10 2019 12%
Wed Sep 11 14:30:41 2019 16%
Wed Sep 11 14:31:17 2019 21%
Wed Sep 11 14:31:54 2019 25%
Wed Sep 11 14:32:30 2019 29%
Wed Sep 11 14:33:06 2019 33%
Wed Sep 11 14:33:42 2019 38%
Wed Sep 11 14:34:19 2019 42%
Wed Sep 11 14:34:55 2019 46%
Wed Sep 11 14:35:31 2019 50%
Wed Sep 11 14:36:07 2019 55%
Wed Sep 11 14:36:43 2019 59%
Wed Sep 11 14:37:20 2019 63%
Wed Sep 11 14:37:51 2019 67%
Error in snpgdsBED2GDS(bed.file, fam.file, bim.file, gdsfile, cvt.chr = "int", :
Stream write error
Timing stopped at: 390 113.8 598.2

number of SNPs analyzed in total is blank at the end of analysis

Hello,

Thank you for developing gwasurvivr!
I ran several chromosomes with gwasurvivr, but I noticed that at the end of analysis, where a kind of summary is provided, there is only a blank for the number of SNPs analyzed in total. See example chr22 below (one but last line).

Is this a known missing value or should I do something to obtain this value? Of course I can count the output lines of the .coxph file, but it would be convenient to check the number here. I used gwasurvivr 1.14.0.

Analyzing chunk 4141000-4142000
Analysis completed on 2022-11-02 at 14:35:14
3995983 SNPs were removed from the analysis for not meeting the threshold criteria.
List of removed SNPs can be found in gwas_bcg_urolife_set1_recur_chr22.snps_removed
SNPs were analyzed in total
The survival output can be found at gwas_bcg_urolife_set1_recur_chr22.coxph

Add check for NA

Add an NA drop to survFit function

The reference of the HR

Hi!!
Rcently I applied plinkCoxSurv function to plink bed format for survival analysis and I have two questions:

1.In the output there are "A0","A1", and "HR" and so on. So was the HRs calculated taking reference A0 or A1?
2.In the example output impute_example.coxph, it seems that A0 is REF while A1 is ALT. But when I applied plinkCoxSurv, it seems the other way around. Then, is A0 or A1 the REF?

Thank you very much.

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

I see this is the same issue of #24 but I give you some more context:

plinkCoxSurv(bed.file = bed.file,
             covariate.file = covariate.file_2,
             id.column = "IID",
             sample.ids = samples.id,
             time.to.event = "Time_event",
             event = "status_LoA",
             covariates = c("Frame"),
             inter.term = NULL,
             print.covs = "only",
             out.file = "COX_completo_pca_proned_only_frame_0_05",
             maf.filter = 0.05,
             flip.dosage = FALSE,
             verbose = TRUE)

I checked the first time this error was reported on issue #4 but

> all.equal(samples.id, covariate.file_2$IID)
[1] TRUE

This code was working a year ago so I:

reverted to R 3.6.3 (and respective pkgs versions)
tried on Windows and Linux
tried on Rstudio server (linux) in a clean environment
tried in 4.2.2
tried pkg version 1.14.0 and 1.16.0
used chunk.size=5000000
converted Plink files to vcf and used michiganCoxSurv

All gave the same error.

I also tried simulated bed-bim-fam files with the following code:

library(genio)
# write your genotype matrix stored in an R native matrix

# (here we create a small example with random data)
# create 10 random genotypes
X <- rbinom(10, 2, 0.5)
# replace 3 random genotypes with missing values
X[sample(10, 3)] <- NA
# turn into 5x2 matrix
X <- matrix(X, nrow = 5, ncol = 2)

# also create a simulated phenotype vector
pheno <- rnorm(2) # two individuals as above

# write simulated data to all BED/BIM/FAM files in one handy command
# missing BIM and FAM columns are automatically generated
# data dimensions are validated for provided data
write_plink('random', X, pheno = pheno)

Same error

michiganCoxSurv example works as intended

Provided example works as intended.

Full error output:

Covariates included in the models are: Frame
557 samples are included in the analysis
Start file conversion from PLINK BED to SNP GDS ...
    BED file: "C:/Users/LucaVedovelli/Documents/GitHub/2022.GWAS_duchenne/data raw/plink_DMD_clean_pca_proned.bed"
        SNP-major mode (Sample X SNP), 390.7M
    FAM file: "C:/Users/LucaVedovelli/Documents/GitHub/2022.GWAS_duchenne/data raw/plink_DMD_clean_pca_proned.fam"
    BIM file: "C:/Users/LucaVedovelli/Documents/GitHub/2022.GWAS_duchenne/data raw/plink_DMD_clean_pca_proned.bim"
Thu Nov  3 22:54:11 2022     (store sample id, snp id, position, and chromosome)
    start writing: 637 samples, 2560573 SNPs ...
[==================================================] 100%, completed, 14s 
Thu Nov  3 22:54:25 2022 	Done.
Optimize the access efficiency ...
Clean up the fragments of GDS file:
    open the file 'C:\Users\LUCAVE~1\AppData\Local\Temp\RtmpchWOlj\4bc44a76303c.gds' (404.1M)
    # of fragments: 43
    save to 'C:\Users\LUCAVE~1\AppData\Local\Temp\RtmpchWOlj\4bc44a76303c.gds.tmp'
    rename 'C:\Users\LUCAVE~1\AppData\Local\Temp\RtmpchWOlj\4bc44a76303c.gds.tmp' (404.1M, reduced: 276B)
    # of fragments: 20
***** Compression time ******
User:54.91
System: 2.02
Elapsed: 57.5
*****************************
Analyzing part 1/257...
Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

Pre processing code:

bed.file <- here::here('data raw/plink_DMD_clean_pca_proned.bed')

covariate.file_2 <- read.table(here::here('data raw/dataset_corretto_completo.txt'), sep="\t", header=TRUE) |> 
  
  mutate(IID = as.character(IID),
         Time_event = as.numeric(Time_event),
         status_LoA = as.integer(status_LoA)) |> 
  
  filter(!is.na(Time_event)) |> 
  
  filter(!is.na(status_LoA)) |> 
  
  mutate(status_LoA = ifelse(status_LoA == 1, 0L, 1L))

samples.id <- covariate.file_2$IID |> as.character()

Covariate File:

Regarding to a function for PH assumption test and interpretation for resulting files

Hi,

I have two questions about using gwasurvivr.

Is there a way to test the proportional hazards assumption for each covariate included in a Cox regression model fit using gwasurvivr? (e.g. cox.zph() function in survival package)
From output files, I found SNPs with SAMP_MAF > 0.005 in both .snps_removed file and .coxph file. (I applied the option maf.filter=0.005) I don't understand why there are SNPs with SAMP_MAF > 0.005 in .snps_removed file and the SNPs in .snp_removed file are also included in .coxph file.

Here are printed messages from the analysis:

11191 SNPs were removed from the analysis for not meeting
the given threshold criteria or for having MAF = 0
List of removed SNPs are saved to
/data1/ishim/Project/chr22.snps_removed
In total 118041 SNPs were included in the analysis
The Cox model results output was saved to
/data1/ishim/Project/chr22.coxph

Parts of chr22.snps_removed:

Parts of chr22.coxph:

I found only 4 SNPs with SAMP_MAF < 0.005 in .snp_removed files for all chromosomes as below.

Strangely, the four SNPs have different values of SAMP_MAF in .coxph files.

Can I include SNPs in .snps_removed files for my further analysis?

Thank you so much for your help!

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

Hi, I'm trying to running gwasurvivr to do survival analysis.

the code I'm using is as follows:

library(gwasurvivr)

bed.file <- "/home/ziyung/GWASurvivr/GWASI+II_hg38_db153_cleaned_0329.bed"

survival_449 <- (read_csv("GWASurvivr/449_sample.csv"))

sample.ids <- as.character(survival_449$Subject_ID)
    

plinkCoxSurv(bed.file=bed.file,
             covariate.file=survival_449,
             id.column="Subject_ID",
             sample.ids=sample.ids,
             time.to.event="time_day",
             event="Status_33y",
             covariates=c("age", "sex"),
             inter.term=NULL,
             print.covs="only",
             out.file=paste("out/Survuvr_result/454_sample/survival_449_0426"),
             chunk.size=50,
             maf.filter=0.005,
             flip.dosage=TRUE,
             verbose=TRUE,
             clusterObj=NULL)

I already verify my covariant file and bed file to make sure that my sample id is matched
and use 'sample.ids <- as.character(survival_449$Subject_ID)' to assign sample id as class character

but it still shows
Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

do you have any ideas as to what may be causing this?``

question about "low variance"

Hi!

The description says ".snps_removed extension containing SNPs that were removed due to low variance or user-defined thresholds."
What is the definition of "low variance"?
It seems that some SNPs are removed even though they meet the conditions I defined, and I think "low variance" is the reason.

Thank you for help!

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

Hi im trying to use gwasurvivr to run a survival analysis on plink data files.

the code im using is as follows:

plinkCoxSurv(bed.file="/users/ptb17163/lustre/EgaDemoClient_2.2.2/plink/top20variants.bed", covariate.file=df, covariates=c("age"), id.column="eid", time.to.event="X_t", event="fail", out.file="test",inter.term = NULL, print.covs = "only", chunk.size = 10000, verbose = TRUE, clusterObj = NULL )

however, when i run this, i get the following error message:

Error in genotypes[!blankSNPs, cox.params$ids] : subscript out of bounds

do you have any ideas as to what may be causing this - and how i can resolve it?

Error in rowMeans2(genotypes, na.rm = TRUE) : Argument 'dim.' must be an integer vector of length two.

I am running into this error:

Analyzing part 10811/13838...
Error in rowMeans2(genotypes, na.rm = TRUE) :
Argument 'dim.' must be an integer vector of length two.

I even modified the pgen file in plink2 using the following command: --hard-call-threshold 0.4
where I have adjusted the parameter from 0.2 to 0.49. Yet I keep running into the same error. Can you please help?

Can we use multiple IDs?

Hi,

Since both FID and IID are not unique in my data, I wonder if I could use both of them instead of one. For example, in impute2CoxSurv(), can we use more than one id for the argument id.column

Thanks!

On which allele the HR is calculated, depending on the flip.dosage option?

Hello.
Could you please clarify what exactly the flip.dosage=TRUE/FALSE does, when using the plinkCoxSurv function?
Apparently the flip.dosage is used to flip the allele, but it's not specified in the documentation if you are going to flip from A1 to A0 or from A0 to A1, or if the frequencies of the alleles are considered, independent of what is set to be A1/A0 in your bed files.

Thanks

gwasurvivr and R "survival" package give different results

Hi,

I find that given the same genotype data and phenotype data. gwasurvivr and R survival package cox.ph() give different p-value (both significant, but p-values are quite different, for example, one might be 1e-6 and the other 1e-7)and coefs.

Could you please give me any help? Is there any possible reason?

Thanks!

Error

Dear developers,

I appreciate the development of the program
I was running a test run to understand what the flip dosage option does (which I believe it flips the A0 and A1 A0 is minor allele in the bfile?) and faced the error.
The issue maybe seems to be deriving from the package configuration?
Below are my code and error output

plinkCoxSurv(
bed.file=bed.YAOSP.OE,
covariate.file=phenoCovar.YAOSP.OE,
id.column='MMSID',
time.to.event='YAOSP',
event='Status',
covariates=c('YOB', 'AAO', 'PC1', 'PC2','PC3', 'PC4'),
inter.term=NULL,
print.covs='only',
out.file = '/castor/project/proj/soujin/TimeP/YAOSP_OE_240604',
chunk.size=10000,
maf.filter=NULL,
clusterObj=c1,
flip.dosage=F,
verbose=T
)

Analyzing part 2080/2081...
Analyzing part 2081/2081...
Error in rowVars(x, rows = rows, cols = cols, na.rm = na.rm, refine = refine, :
Argument 'dim.' must be an integer vector of length two.
Calls: plinkCoxSurv ... loadProcessWrite.PlinkGdsImpute2CoxSurv -> runOnChunks -> rowSds -> rowVars

Thanks for your help in advance

Allowing for interval survival time

Dear developers,

I am trying to use your software for an analysis using interval follow up time (e.g., delayed study entry). Currently, your package only allows to input follow up time (time.to.event). As you might have noticed, I made an extension of your repository and created functions to allow entering interval time in the analysis. More specifically, to create a survival object: Surv(time, time2, event).
I only implemented this for the Plink input end-function. See:
https://github.com/emiuga/gwasurvivr_intime

Do you think this is a correct implementation? the analysis seems to run well, but would be good to know if you think this modification would affect the analysis in some way.

I hope could update your package to analyze for this type data, to use instead of my temporary trick.

Thanks a lot for creating this package!

Sincerely yours,
Emilio

Error in open.TabixFile(vcf) : 'indexname' must be character(1)

Hi,
I am trying to run this for a project and I get the following error: Error in open.TabixFile(vcf) : 'indexname' must be character(1)

I am not sure what is the reason for this error and was hoping maybe you can shed some light.

I am using vcf file and a .txt phenotype file

michiganCoxSurv(vcf.file=vcf.file,
covariate.file=pheno.file,
id.column="newid",
time.to.event="time_to_event.",
event="event",
covariates=NULL,
inter.term=NULL,
print.covs="only",
out.file=tempfile("michigan_only"),
chunk.size=100,
verbose=TRUE,
clusterObj=NULL)

Thanks!

suchestoncampbelllab / gwasurvivr Goto Github PK

gwasurvivr's Introduction

Introduction

Installation

How to use package

gwasurvivr's People

Contributors

Stargazers

Watchers

Forkers

gwasurvivr's Issues

Recommend Projects

Recommend Topics

Recommend Org