Giter Site home page Giter Site logo

baneslab / iscanvcfmerge Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 0.0 267 KB

Python tool to merge cross-species Illumina iScan genotype data with a reference set of genotypes from a pre-existing source.

License: MIT License

Python 100.00%
python python3 illumina iscan microarray genotyping

iscanvcfmerge's Introduction

iScanVCFMerge

iScanVCFMerge is a Python tool to facilitate the cross-species application of Illumina iScan system microarrays. The tool merges VCF genotypes exported from GenomeStudio with a second VCF, comprising genotypes derived from other samples and sources. Merging is based on matches of chromosome, position and certain conditions of major and minor alleles, with matched rows from each VCF concatenated into a single row (comprising all individuals) in the output files. The full algorithm is explained in the accompanying manuscript, where we reported use of the human Infinium Multi-Ethnic Global and Infinium Omni 2.5 arrays (hg19) to genotype great apes, and merged those with the genotypes of conspecifics previously published elsewhere.

What's new since the paper came out?

Version 1.2

  • Bug fixed to properly output VCF header, which had spurious line breaks on some Python versions.
  • Output VCF contigs are now sorted in the order of the sequence dictionary, as pulled from the <reference_VCF>.
  • The FORMAT column is restored with a GT value; this was causing output VCFs to fail GATK ValidateVariants.
  • Upgraded deprecated pandas functions.

Version 1.1

  • Bugs fixed to properly handle some multi-allelic sites.
  • The reference population VCF file must now be bgzipped and indexed with tabix. This requirement does not apply to the iScan VCF file, which can either be uncompressed or gzip compressed.
  • In the prior version, the complete reference population VCF file was read into memory before the relevant records were pulled. This caused issues for some users handling enormous reference VCF files. In this version, we use the Pysam library's lightweight wrapper of the htslib C-API to pull only the relevant records in the first place. The script should now run near-instantaneously, irrespective of input file size.
  • Console output is now handled by the Python logging module and is written to a .log file in the output directory.
  • Version numbering now follows 1.x vs 0.x format.

Installation

iScanVCFMerge 1.2 requires Python 3.9. It has been successfully tested on MacOS Monterey 12.3.1 and CentOS Linux 7.9.2009.

Install the required packages first, if needed:

python3 -m pip install pandas pysam

Clone from Github and execute:

git clone "https://github.com/baneslab/iScanVCFMerge.git"
cd iScanVCFMerge
python3 iScanVCFMerge.py

Please note that the PyPi package is no longer maintained.

Usage

iScanVCFMerge [-h] -I <iScan_vcf> -R <reference_vcf> -O <output_directory>

Optional arguments:

-h, --help                 Show the help message
-I, --iScanVCF             Path to your iScan VCF file (.vcf or .vcf.gz)
-R, --ReferenceVCF         Path to your reference VCF file, with which the iScan file will be merged. This must be bgzip compressed and be indexed with tabix
-O, --output_directory     Name of the output directory (will be created if it doesn't exist)

Citation

Please cite the use of this software as follows:

Fountain, E. D., Zhou, L-C., Karklus, A., Liu, Q-X., Meyers, J., Fontanilla, I. K., Rafael, E. F., Yu, J-Y., Zhang, Q., Zhu, X-L., Pei, E-L., Yuan, Y-H. and Banes, G. L. (2021). Cross-species application of Illumina iScan microarrays for cost-effective, high-throughput SNP discovery. Frontiers in Ecology and Evolution, 9:629252, doi: 10.3389/fevo.2021.629252.

The Research Resource Identifier for iScanVCFMerge is RRID:SCR_021193.

iscanvcfmerge's People

Contributors

grahamlbanes avatar

iscanvcfmerge's Issues

pip version does not work

Functions need to be re-ordered for pip compatibility. Please run the script directly with Python in the interim.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.