Comments (4)
Yeah, ALLELE_A and ALLELE_B are integer rather than string variables for two important reasons:
- The integer approach works better when you left-align indels with the command
bcftools norm -f REF
as this would cause a mismatch between REF/ALT and ALLELE_A/ALLELE_B if the latter was a string - The integer approach works seamlessly with BCFtools/liftover as when reference and alternate alleles are swapped the corresponding ALLELE_A/ALLELE_B values are automatically updated
I tried to follow the Picard GtcToVcf design but in this case the Picard design was inappropriate
from gtc2vcf.
Thanks Giulio. Makes sense.
Do you have any suggestions on any straightforward way of filling in the ALLELE_A/ALLELE_B INFO tags back to possibly REF/ALT for it be compatible with the Picard tools? Essentially I might need to transform the VCF to adpc.bin to do contamination checks with VerifyIDintensity. I am looking at the bcftools +fill tags but seems like it might just fill back the 0 and 1 encoding.
from gtc2vcf.
I see. Maybe I will rewrite VerifyIDintensity to work with VCFs as this seems a very simple but valuable piece of software. Where do you get the ABF allele frequencies to run VerifyIDintensity?
from gtc2vcf.
That would be an ideal solution. I did forked VerifyIDintensity and see if I could edit it to accept text input instead of the binarized adpc.bin but then paused. 😃
- Essentially it only needs the the normalized intensities, gentrain score and genotype (format here: https://github.com/broadinstitute/picard/blob/c8b2c06b29b22d4bf1bf4270788d3a3e206a8183/src/main/java/picard/arrays/illumina/IlluminaAdpcFileWriter.java). This could be easily pulled with a simple bcftools query command
bcftools query -f '[%X\t%Y\t%NORMX\t%NORMY\t%GenTrain_Score\t%GT\n]'
- Adding a switch in VerifyIDintensity to accept text input is straightforward.
- However,
- VerifyIDintensity code seems to streaming the file and pulling sample info one by one. The expected format is (sample1-snp1) .. (sample1-snpN) - (sample2-snp1) .. (sample2-snpN). On the other hand, the typical output from the bcftools query is (sample1-snp1) - (sample2-snp1) .. (sample1-snpN) .. (sample2-snpN). This is fixable by looping over sample individually and then concatenating at the end.
- Since it's internal functions pull/streams specific values based on position in the file. Either, the line width has to be consistent so all values will have to under go sprintf preprocessing or will need write TSV parser and ensure values are pulled correctly to all it's internal functions.
Needless to say a direct VCF input would be a cleaner solution.
ABF allele frequencies is typically pulled form 1000genomes project. It's a text file can be prepared separately.
Thanks.
from gtc2vcf.
Related Issues (20)
- empty vcf final file HOT 3
- Affy2vcf snp-posterior file HOT 9
- Error Encountered while parsing the input HOT 2
- Segfault with cel-files not succesful. HOT 1
- Recommended filtering HOT 11
- Unexpected Error: uncaught exception HOT 2
- Any suggestions on handling manifests with missing RefStrand and SourceSeq columns? HOT 8
- CNV and coordinates mapping issue? HOT 1
- Loci skipped with SourceSeq mapping. HOT 1
- Error while running linsolve1 HOT 4
- bpm with an alternate reference csv_manifest simultanously? HOT 2
- Add an explicit FORMAT/TAG for illumina genotype? HOT 3
- Error while running linsolve1 HOT 1
- Multi allelic records HOT 4
- Pseudo-autosomal regions (PAR). HOT 1
- Indel left-normalized within gtc2vcf by default? Options to turn this off? HOT 4
- Drop marker if SourceSeq maps to different loci equally well. HOT 3
- Couple SNPs on ChrY and ChrM shifted by -1. HOT 2
- Records represented as [D/I] in manifest. HOT 10
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gtc2vcf.