Comments (10)
Yes, that's what I would suggest. With thresh=NULL
, the algorithm will consider any kinship value >0 a related pair. Since you have so many relationships, I think that's why it's taking so long. Setting a different threshold will eliminate the pairs with very low kinship values, and give you a block-diagonal matrix where the samples in each block are more closely related. You can experiment with different threshold values.
from genesis.
That seems unusual; it should not be taking that long.
- What version of GENESIS are you using?
- How many rows are in the pcrelate
kinBtwn
data frame? - Have you tried setting the threshold to a higher value, so there are fewer samples in each block? Note that if you are using the default setting of
scaleKin=2
, the threshold applies to the rescaled values, so if you want to make blocks of 3rd-degree relatives and above, your threshold would be2*2^(-9/2) = 0.088
from genesis.
Thank you for your help!
- GENESIS_2.22.2
> nrow(pcrel$kinBtwn)
[1] 1249425066
- No, I haven't. Just to be sure, do you suggest to run
pcrelateToMatrix(pcrel, scaleKin=2, thresh=0.088)
?
from genesis.
It made a progress! In the end, it encountered an error as below. And, it did finish successfully with a higher threshold. But I don't think I understood how to select a good threshold. Would it be 2*2^(-4/2) to make blocks of 2nd-degree relatives and above?
Creating block matrices for clusters...
Error in if (n > 0) c(NA_integer_, -n) else integer() :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In rep.fac * nx : NAs produced by integer overflow
2: In .set_row_names(as.integer(prod(d))) :
NAs introduced by coercion to integer range
In addition, I encountered an error below while trying to build a null model using the GRM created from the high threshed in the step above. Could you help me understand and resolve the issues? Thanks much for your help!
Error in .local(x, ...) :
internal_chm_factor: Cholesky factorization failed
In addition: Warning message:
In .local(x, ...) :
Cholmod warning 'not positive definite' at file ../Cholesky/t_cholmod_rowfac.c, line 430
from genesis.
Hello - going back a couple of messages:
nrow(pcrel$kinBtwn)
[1] 1249425066
Something seems off here. If you have 5K samples, then kinBtwn
should only have 12,497,500 rows. Your number is off by a factor of 100. How many rows are in the kinSelf
object?
from genesis.
Ah, I have 50k samples. Thanks for checking and sorry for the confusion!
from genesis.
After trying with different parameters, I was able to build a null model successfully. I ended up using thresh = 0.3
. That's higher than what you recommended. What would be implications to use a high thresh in building GRM?
from genesis.
I'm a bit concerned that there is an issue with your PC-Relate results -- have you tried making any plots of the estimates? For example, a histogram of estimated pairwise kinship coefficients, or a hexbin plot of kinship coefficients vs K0 (if you estimated the IBD sharing probabilities), or a histogram of the estimated inbreeding coefficients. A threshold of 0.3 is really high, so unless you have a ton of 1st and 2nd degree relatives in your sample, you really shouldn't need the threshold to be that high.
from genesis.
This dataset is a subset of UKB exome and kin vs K0 is plotted below (sampled 10M rows). How would downstream analyses (e.g. fitting null model and association tests) be affected by the high threshold? Random effects from distantly related pairs are ignored? GRM gets orders of magnitude larger even with small changes (e.g. threshold of 2.7) and I had difficulties fitting null model. Thank you.
from genesis.
That kinship vs k0 plot is concerning. The maximum value k0 should take is 1 (it can be a little bit larger than 1 because the estimator is not truncated, but it certainly should not have values as large as you are seeing). I can think of a couple possibilities for why your relatedness estimates would be bad:
- You are adjusting for either too many or not enough ancestry principal components in the PC-Relate analysis. How many PCs did you adjust for and how did you go about selecting them?
- An issue with the genotype data in the GDS file. You might want to look at allele frequency and variant heterozygosity distributions to make sure those seem reasonable and that the genotype data is as expected.
from genesis.
Related Issues (20)
- ERROR:unused arguments (two.stage = TRUE, norm.option = "all", rescale = "residSD") HOT 2
- Error when running the pcrelate function HOT 2
- High Memory Consumption with Large Dataset in PC-Relate HOT 1
- Conditional analysis HOT 1
- HOW to FIX - PCAir excludes all my SNPs because there is no chromosome information HOT 1
- Error using assocTestAggregate HOT 1
- Issue with PCAir - Segmentation fault (core dumped) HOT 1
- Issue: Running PCAir on 140K individuals HOT 1
- pcrelate: error with rbindlist() HOT 1
- SPA association tests are incorrect for non-mixed models HOT 1
- `fitted.values` in non-mixed null models are not what we expect
- Error in assocTestSingle() results when covariate is confounded with SNP genotype. HOT 6
- fitNullModel doesn't converge when provided a list of matrices with more than 1 matrix HOT 3
- Error in seqVCF2GDS(vcf.files, gds.file, verbose = FALSE) HOT 1
- correctK2() drops samples under specific conditions HOT 1
- Estimate PVE HOT 1
- add check in varCompCI that family is gaussian
- pcrelate REDUCE error while calculating individual AF betas HOT 8
- analysis pipeline for chromosome X HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from genesis.