zhanxw / checkvcf Goto Github PK
View Code? Open in Web Editor NEWSanity check Variant Call Format (VCF) files.
Sanity check Variant Call Format (VCF) files.
Hi, I am using checkVCF and got the error Line [ 1 ] does not have correct column number, exiting!
. I checked the columns by reading first several lines into python and split them by \t
, which gave me the same column number. I did not encounter this when processing other datasets.
How can I fix this?
Many thanks
Dear all,
I tried to use CheckVCF.py with the hg38 fasta (from UCSC) and the report gave me an enormous amount of MismatchRefBase Inconsistent reference sites. When looking for check.ref file noticed that all those variants are in lower case in the fasta file and that's the reason why it gives that warning.
Are you planning any updates to python 3?
Here's an attempt - but it's best you verify the contents...
checkVCF.python39.py.zip
According to VCF specification the "FORMAT" column is not mandatory:
http://samtools.github.io/hts-specs/VCFv4.2.pdf
This script fails due to line 219, that always expects a format column.
Hi,
When running this in a for-loop for each chromosome, the out.* file will constantly be overridden, will it not? In either case, it would be great if there is a flag to name those out.* files, or if the out.* file is automatically named after the base name of the input files. For example: cohort1.chr1.vcf.gz would lead to cohort1.chr1.out.*
Thanks!
Sander
I get the following error,
Please use the following command to clean your VCF file and then re-run checkVCF.py
(grep ^"#" $your_old_vcf; grep -v ^"#" $your_old_vcf | sed 's:^chr::ig' | sort -k1,1n -k2,2n) | bgzip -c > $your_vcf_file
Not sure how to fix this. This code does not do anything.
I am trying to check against hg38 version. Therefore, chromosomes were coded as in hg38.fa file. I see that the same VCF file works when chromosomes were coded continuously but does not serve the purpose. Please suggest how I can solve this problem. I aim to perform imputation.
VCF file is tab-delimited according to the specification.
The following line is splitting the header line by whitespace:
https://github.com/zhanxw/checkVCF/blob/master/checkVCF.py#L194
If sample name contains space, incorrect number of columns will be counted.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.