illumina / akt Goto Github PK
View Code? Open in Web Editor NEWAncestry and Kinship Tools
License: GNU General Public License v3.0
Ancestry and Kinship Tools
License: GNU General Public License v3.0
Hello,
I am having an error with akt mendel when running the following command:
./akt mendel -p ~/akt/exomes_menonites/out.fam ~/akt/exomes_menonites/all_merged_exomes_mennonites_hg38_GRCh38.bcf > ~/akt/exomes_menonites/mendel.txt
ERROR: mendel command is deprecated. See the bcftools +mendelian plugin for equivalent functionality.
Exiting...
Is there any solution?
Hi!
In truth this isn't a bug report but it is instead a feature request (it's a small lift however).
It'd be very nice if we could estimate the kinship coefficient on a stream-- eg, we're using some whole genome simulators and it's much more convenient to keep a summary statistic than a bazillion whole genome simulations that need to be created, indexed, summarized and then deleted.
Setting line 420 in kin.cpp to:
sr->require_index = 0;
Appears to do the trick.
Much thanks.
-August
Hi,
When I used akt pca
, I got such message
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
984 samples
Bad genotypes at 1:758351
Ca you explain to me what does "Bad genotype" mean in the message? How should I handle with this issue?
Thank you!
Hi. Any ideas here ?
Thanks.
make
echo '#define VERSION "c966c25"' > version.h
echo '#define BCFTOOLS_VERSION "1.3"' >> version.h
g++ -O3 -fopenmp -mpopcnt -c tdt.cpp -Ihtslib-1.3 -I./
g++ -O3 -fopenmp -mpopcnt -c admix.cpp -Ihtslib-1.3 -I./
In file included from htslib-1.3/htslib/synced_bcf_reader.h:56,
from akt.hpp:44,
from admix.cpp:3:
htslib-1.3/htslib/vcf.h: In function ‘void bcf_format_gt(bcf_fmt_t*, int, kstring_t*)’:
htslib-1.3/htslib/vcf.h:817: error: ‘INT8_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:818: error: ‘INT16_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:819: error: ‘INT32_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h: In function ‘int bcf_enc_inttype(long int)’:
htslib-1.3/htslib/vcf.h:848: error: ‘INT8_MAX’ was not declared in this scope
htslib-1.3/htslib/vcf.h:848: error: ‘INT8_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:849: error: ‘INT16_MAX’ was not declared in this scope
htslib-1.3/htslib/vcf.h:849: error: ‘INT16_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h: In function ‘void bcf_enc_int1(kstring_t*, int32_t)’:
htslib-1.3/htslib/vcf.h:855: error: ‘INT32_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:857: error: ‘INT8_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:860: error: ‘INT8_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:861: error: ‘INT8_MAX’ was not declared in this scope
htslib-1.3/htslib/vcf.h:861: error: ‘INT8_MIN’ was not declared in this scope
htslib-1.3/htslib/vcf.h:864: error: ‘INT16_MAX’ was not declared in this scope
htslib-1.3/htslib/vcf.h:864: error: ‘INT16_MIN’ was not declared in this scope
make: *** [admix.o] Error 1
Hi there,
I'm having an issue with akt using the pca option.
It gives me the following error when it looks to open my input bcf file.
command string : $ /g/data/a32/Software/akt/akt pca /short/a32/exi569/GB_CHD_NDD/CHD_NDD_all_samples_SNPS_PCA_ID.bcf -R /g/data/a32/Software/akt/data/wes.hg19.vcf.gz -O b -o /short/a32/exi569/GB_CHD_NDD/pca.bcf > /short/a32/exi569/GB_CHD_NDD/pca.txt
Console reply :
Input: /short/a32/exi569/GB_CHD_NDD/CHD_NDD_all_samples_SNPS_PCA_ID.bcf
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
Problem opening /short/a32/exi569/GB_CHD_NDD/CHD_NDD_all_samples_SNPS_PCA_ID.bcf
Input: /short/a32/exi569/GB_CHD_NDD/CHD_NDD_all_samples_SNPS_PCA_ID.bcf
I have no problem opening and viewing the "CHD_NDD_all_samples_SNPS_PCA_ID.bcf" input file using bcftools view.
[exi569@raijin3 GB_CHD_NDD]$ bcftools view CHD_NDD_all_samples_SNPS_PCA_ID.bcf | more ##fileformat=VCFv4.1
##FILTER=<ID=PASS,Description="All filters passed">
##source=VarScan2
##INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases with Phred score >= 20">
##INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wild-type)">
##INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous-variant">
##INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-variant">
##INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called">
##FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">
##FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMtools">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phred score >= 20">
##FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">
##FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">
##FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test">
##FORMAT=<ID=RBQ,Number=1,Type=Integer,Description="Average quality of reference-supporting bases (qual1)">
##FORMAT=<ID=ABQ,Number=1,Type=Integer,Description="Average quality of variant-supporting bases (qual2)">
##FORMAT=<ID=RDF,Number=1,Type=Integer,Description="Depth of reference-supporting bases on forward strand (reads1plus)">
##FORMAT=<ID=RDR,Number=1,Type=Integer,Description="Depth of reference-supporting bases on reverse strand (reads1minus)">
##FORMAT=<ID=ADF,Number=1,Type=Integer,Description="Depth of variant-supporting bases on forward strand (reads2plus)">
##FORMAT=<ID=ADR,Number=1,Type=Integer,Description="Depth of variant-supporting bases on reverse strand (reads2minus)">
##contig=<ID=chr1>
##contig=<ID=chr2>
##contig=<ID=chr3>
##contig=<ID=chr4>
##contig=<ID=chr5>
##contig=<ID=chr6>
##contig=<ID=chr7>
##contig=<ID=chr8>
##contig=<ID=chr9>
##contig=<ID=chr10>
##contig=<ID=chr11>
##contig=<ID=chr12>
##contig=<ID=chr13>
##contig=<ID=chr14>
##contig=<ID=chr15>
##contig=<ID=chr16>
##contig=<ID=chr17>
##contig=<ID=chr18>
##contig=<ID=chr19>
##contig=<ID=chr20>
##contig=<ID=chr21>
##contig=<ID=chr22>
##contig=<ID=chrX>
##bcftools_mergeVersion=1.3.1+htslib-1.3.1
##bcftools_mergeCommand=merge -O z -o CHD_NDD_samples_SNPS.vcf.gz 02766-recalBAMS_SNPS.vcf.gz 00618-recalBAMS_SNPS.vcf.gz 00714-recalBAMS_SNPS.vcf.
gz 01152-recalBAMS_SNPS.vcf.gz 01299-recalBAMS_SNPS.vcf.gz 00773-recalBAMS_SNPS.vcf.gz 00928-recalBAMS_SNPS.vcf.gz 01023-recalBAMS_SNPS.vcf.gz 0185
0-recalBAMS_SNPS.vcf.gz 00626-recalBAMS_SNPS.vcf.gz 00734-recalBAMS_SNPS.vcf.gz 02503-recalBAMS_SNPS.vcf.gz 00720-recalBAMS_SNPS.vcf.gz 01330-recal
BAMS_SNPS.vcf.gz 00748-recalBAMS_SNPS.vcf.gz
##bcftools_mergeCommand=merge -O z -o CHD_NDD_all_samples_SNPS.vcf.gz CHD_NDD_samples_SNPS.vcf.gz CHD_samples_SNPS.vcf.gz CONTROL_samples_SNPS.vcf.
gz
##bcftools_viewVersion=1.3+htslib-1.3
##bcftools_viewCommand=view -o CHD_NDD_all_samples_SNPS.vcf CHD_NDD_all_samples_SNPS.vcf.gz
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##bcftools_viewCommand=view -o CHD_NDD_all_samples_SNPS_chr_sort_PCA.vcf -S samples_for_CHD_PCA.csv CHD_NDD_all_samples_SNPS_chr_sort.vcf
##bcftools_viewCommand=view CHD_NDD_all_samples_SNPS_PCA_ID.bcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 02766 00618 00714 01152 01299 00773 00928 01023 01850 006
26 00734 02503 00720 01330 00748 00646 01506 01537 02261 01365 01388 01832 03062 01925 01162 00918 02237 018
33 00786 00768 01581 01584 01589 01697 01941 01953 01955 01957 01962 01963 01967 01970 01972 01974 01979
chr1 107683715 . C T . PASS ADP=542;WT=0;HET=1;HOM=0;NC=0;AC=2;AN=4 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RD
R:ADF:ADR ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/1:255:551:542:269:273:50.37%:3.6098E-103:36:39:52
:217:73:200 ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. 0/1:255:719:707:367:340:48.09%:8.9707E-127:36:40:92:275:76:264 ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:.
chr1 107690692 . G A . PASS ADP=285;WT=0;HET=1;HOM=0;NC=0;AC=1;AN=2 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RD
R:ADF:ADR ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. 0/1:255:293:285:139:146:51.23%:9.9479E-56:40:34:128:11:135:11 ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.
:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:.:.:.:.:.:.:.:.
:.:.:.:.:. ./.:.:.:.:.:.:.:.:.:.:.:.:.:.
I have also attached the input file and hope you can help me in resolving this file opening error.
Thanks.
Eddie
CHD_NDD_all_samples_SNPS_PCA_ID.zip
Hello Jaredo:
Is there a discuss available of resource requirements (memory and threads)?
I see the threads are referred to as compression associated (presumably IO).
But how might one think about the chromosome and the need to chunk it?
I wonder if the constraints are discuss somewhere?
Cheers,
Chuck
I'm having an issue supplying allele frequencies as answered in #28. Unfortunately I can't provide example data, but I'll try to provide as much as I can without supplying sensitive data.
I have a study VCF (bgzip+tabix indexed). I'm using your set of "reliable" SNPs that happen to be on my platform. Some stats on my study VCF:
$ bcftools stats study.vcf.gz | grep ^SN | head -n2
SN 0 number of samples: 65
SN 0 number of records: 17508
Some stats on my frequencies VCF (also bgzip'd and tabix indexed):
$ bcftools stats frequencies.vcf.gz | grep "^SN" | head -n2
SN 0 number of samples: 0
SN 0 number of records: 17160
Just to be sure, I'm using bcftools isec to check the number of SNPs intersecting between these files.
bcftools isec study.vcf.gz frequencies.vcf.gz --prefix isec
According to the readme produced with isec, I'm primarily interested in how many SNPs are in 0002 and 0003 - the SNPs shared between the files.
$ cat isec/README.txt
This file was produced by vcfisec.
The command line was: bcftools isec --prefix isec study.vcf.gz frequencies.vcf.gz
Using the following file names:
isec/0000.vcf for records private to study.vcf.gz
isec/0001.vcf for records private to frequencies.vcf.gz
isec/0002.vcf for records from study.vcf.gz shared by both study.vcf.gz frequencies.vcf.gz
isec/0003.vcf for records from frequencies.vcf.gz shared by both study.vcf.gz frequencies.vcf.gz
I have 16,137 overlapping.
$ bcftools stats isec/0002.vcf | grep "^SN" | head -n2
SN 0 number of samples: 65
SN 0 number of records: 16137
$ bcftools stats isec/0003.vcf | grep "^SN" | head -n2
SN 0 number of samples: 0
SN 0 number of records: 16137
However, when I run akt kin
supplying this VCF with frequency data, its telling me that it's only using 9 SNPs in total.
$ akt kin -M 1 --freq-file frequencies.vcf.gz study.vcf.gz > /dev/null
Taking allele frequencies from frequencies.vcf.gz using INFO/AF
WARNING: threading is disabled
WARNING: threading is disabled
WARNING: threading is disabled
Using 0 threads
WARNING: threading is disabled
65 samples
Reading genotypes...done.
Kept 9 markers out of 9 in panel.
9/9 of study markers were in the sites file
Calculating kinship values...done.
I'm using akt 0.3.2.
$ akt
Program: akt (Ancestry and Kinship Tools)
Version: 0.3.2
I also tried converting everything to BCF:
bcftools view study.vcf.gz -Ob -o study.bcf
bcftools index study.bcf
bcftools view frequencies.vcf.gz -Ob -o frequencies.bcf
bcftools index frequencies.bcf
And running akt kin
again, but this time, I run into a different error. Not sure what this is about.
$ akt kin -M 1 --freq-file frequencies.bcf study.bcf > /dev/null
Taking allele frequencies from frequencies.bcf using INFO/AF
WARNING: threading is disabled
WARNING: threading is disabled
WARNING: threading is disabled
Using 0 threads
WARNING: threading is disabled
[E::tbx_index_load2] Invalid index header for frequencies.bcf
[E::bcf_sr_regions_init] Could not parse the file frequencies.bcf, using the columns 1,2[,-1]
ERROR: Failed to read the regions: frequencies.bcf
Exiting...
I did ensure that the frequencies VCF and the study VCF are both sorted, but that didn't appear to be the issue. Any idea what could be happening here?
clang does not have omp
clang: warning: argument unused during compilation: '-fopenmp'
In file included from admix.cpp:3:
./akt.hpp:20:10: fatal error: 'omp.h' file not found
#include <omp.h>
I converted a vcf to bcf and tried running your tool with the following command:
./akt pca -W data/wgs.grch37.vcf.gz input.bcf
The only logs I get are:
Input: input.bcf
Using file data/wgs.grch37.vcf.gz for PCA weights
Problem opening input.bcf
Is see no debug option to make this message more verbose and figure out what the issue is, does a flag for verbose output exist ?
I don't believe the problem is permission related, here's the stats
output for input.bcf
:
File: input.bcf
Size: 557747163 Blocks: 1089360 IO Block: 4096 regular file
Device: fd01h/64769d Inode: 10640983 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1000/ george) Gid: ( 1000/ george)
Access: 2020-02-10 15:23:13.371180294 +0200
Modify: 2020-02-10 15:22:44.071013086 +0200
Change: 2020-02-10 15:22:44.071013086 +0200
Birth:
We should add options to akt kin
to calculate the:
I'm looking at the history of the "reliable" sites (on many arrays, in linkage equilibrium, reasonable MAF, etc): https://github.com/Illumina/akt/commits/master/data/wgs.grch37.vcf.gz
At 87906b0 this site vcf had 17535 SNPs. At 81be1c5 it has 17491. But I couldn't find any information about what/why this changed. Can you offer a brief update on the remapping, what got dropped? Thank you!
Thanks for akt
. I'm currently working on building an akt
package for Bioconda based on the latest available release tarball from github (bioconda/bioconda-recipes#8661), which is v0.2.0.
I'd like to also eventually build an updated version, since I see there have been quite a few changes since the 2016 release of v0.2.0.
Would you be able to create a github release of akt
for a more recent version so we can build a bioconda recipe for it? Thanks.
I am running PCA for 1kGP merged SV vcf files and I am getting the seg fault error when I run the PBS script with 150G memory. The run command is in the last line. It worked fine when I extracted a single chromosome (chr21) from the VCF and run the same script.
Input: bcftool_merge.sv_chr1_22_maf_ge0.05.annotated.vcf.gz
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
1793 samples
Kept 2028657 markers out of 2028657
/var/spool/torque/mom_priv/jobs/3377732.sug-moab.SC: line 12: 11059 Segmentation fault /hgsc_software/akt/akt-0.3.3/bin/akt pca ${vcf}.vcf.gz --force -Oz -o ${vcf}.pca.cf.gz > ${vcf}.txt
Hi @jaredo -- I realize you haven't committed to this repo in a few years, but just hoping you could point me in the right direction for fixing a problem I think I've discovered with the kinship calculation here.
I have a VCF with two individuals, id1 and id2, (simulated parent-child). Each sample has a call rate of about 80%, so there are many spots where one sample is genotyped but the other isn't. When I run akt kin
using the KING estimator with missing data present the kinship coefficient is underestimated compared to using akt kin
when the missing SNPs are removed. In both examples akt correctly reports the actual number of genotypes compared (6238 out of the total 17,530).
The two-sample VCF with ~20% missing SNPs in each sample: sim20mis_all.vcf.gz
When I run akt kin
using the KING estimator, I underestimate the kinship coefficient at .19:
akt kin -M1 --force sim20mis_all.vcf.gz
id1 id2 0.00000 1.00000 0.00000 0.19066 6238
I can remove all the missing genotypes from this VCF (bcftools view -g ^miss
): sim20mis_nomiss.vcf.gz
And re-run:
akt kin -M1 --force sim20mis_nomiss.vcf.gz
Which gives me the correct kinship coefficient, .24, for parent-offspring:
id1 id2 0.00000 1.00000 0.00000 0.24072 6238
I'm trying to go through the code in kin.cpp and elsewhere but I'm unsure where to start. Thanks for any help you might be able to provide!
I'm not sure of the rationale for preventing the use of both the freq-file option and limitation to a region or set of regions. It seems like one might want to use analysis based on a file of population-reference frequencies, but also limit to certain regions (perhaps even just for performance reasons).
Furthermore, the fact that the current code defaults to using the frq_file as also being the regions file contributes to #29 (at least with some of my files). the bcf_sr code should be reasonably efficient even without specifying regions.
Hi,
I have run 'akt pca' with the following command line:
~/sources/akt/akt pca --force chr1_all_sample.bcf > pca.txt
But I get the following error:
Input: chr1_all_sample.bcf
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
12 samples
Bad genotypes at chr1:10004
chr1_all_sample.bcf has been indexed. The genotypes are the result of a 'bcftool call' analysis, restricted here to chr1.
Can you suggest a cause for the error?
Thank you,
Ian
In earlier versions of akt, the "prune" subcommand provided a nice way to prune/thin variants on the basis of linkage disequilibrium. However, the command has been deprecated and is no longer available in akt.
My understanding is that you favour using a pre-pruned set of human variants that you provide in a VCF. However, this has several limitations: most profoundly, it's of course completely inapplicable to non-human organisms - people do work in plants, yeast, dogs, cattle, flies, basically anything with DNA! But even for human data on the right reference build, the set of provided sites will come with some complex ascertainment bias, might not be optimal when working in more diverse populations with ancestry poorly represented in the 1000 Genomes dataset, and might not overlap well with all present or future genotyping array sites. The "prune" command also provided the additional flexibility of choosing the LD cutoff. There might also be additional use cases for LD pruning other than just obtaining a suitable set of SNPs for genome-wide ancestry analyses.
I would therefore like to request bringing back "prune", which I think would make an already great software even greater. Thanks for your work on akt!
-R data/1000G.snps.nochr.vcf.gz
this file doesn't exist any more and apparently -F is now to be used instead of -R; so the documentation should be updated.
What's the format for the --freq-file FILE
, a file containing population allele frequencies to use in kinship calculation
thanks!
Any ideas on what could be causing this problem? I get this error message when I run on all 17 samples, 1 sample, filtered/unfiltered (e.g. for PASS, AC, AN).
./akt.v0.3.2/akt pca -R ./akt.v0.3.2/data/wgs.grch38_chr.vcf.gz $bcf
Input: filtered_4033.bcf
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
1 samples
6002/17491 of study markers were in the sites file
akt: Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<float, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = float; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Aborted (core dumped)
Hello
This document describes calling the ibd option.
The following error occurs, when I call ibd option:
alexander@statistics-dev:/mnt/fs/akt$ ./akt ibd input.bcf -R panel.bcf -r 2 > sharing
ERROR: ibd is deprecated
Exiting...
So, how can I run searching for IBD segments?
Hi there,
Having an issue using the akt pca when projecting another set of samples into a previously created set of principal components.
I'm trying to follow the procedure given in the akt/doc/usage.md, where an initial sample is used to create a set of precalculated PC values, and then a new set of samples are projected on to it.
So my steps:
$ /g/data/a32/Software/akt/akt pca 1000G_match_CHD_NDD_chr.bcf -R /g/data/a32/Software/akt/data/wes.hg19.vcf.gz -O b -o pca.bcf > pca.txt
Input: 1000G_match_CHD_NDD_chr.bcf
MAF lower bound: 0
Thin: 1
Number principle components: 20
Reading data...
2504 samples
166/32136 of study markers were in the sites file
Printing coefficients to pca.bcf
$ /g/data/a32/Software/akt/akt pca CHD_NDD_all_samples_SNPS_PCA_ID.bcf -W pca.bcf > projections
Input: /short/a32/exi569/GB_CHD_NDD/CHD_NDD_all_samples_SNPS_PCA_ID.bcf
Using file pca.bcf for PCA weights
Failed to read the regions: pca.bcf
Hope I'm not doing something stupid.
Thanks for your assistance in advance.
Eddie
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.