Giter Site home page Giter Site logo

ekg / hapbin Goto Github PK

View Code? Open in Web Editor NEW

This project forked from evotools/hapbin

0.0 3.0 0.0 13.63 MB

Efficient program for calculating Extended Haplotype Homozygosity (EHH) and Integrated Haplotype Score (iHS)

License: GNU General Public License v3.0

CMake 7.01% Shell 0.14% C++ 92.85%

hapbin's Introduction

hapbin

hapbin is a collection of tools for efficiently calculating Extended Haplotype Homozygosity (EHH), the Integrated Haplotype Score (iHS) and the Cross Population Extended Haplotype Homozogysity (XP-EHH) statistic.

Tools

The hapbin suite contains the following tools:

  • ehhbin --hap [.hap/.hapbin file] --map [.map file] --locus [locus] --out [output prefix] - calculate the EHH
  • ihsbin --hap [.hap/.hapbin file] --map [.map file] --out [output prefix] - calculate the iHS of all loci in a .hap/.hapbin file
  • xpehhbin --hapA [Population A .hap/.hapbin] --hapB [Population B .hap/.hapbin] --map [.map file] --out [output prefix] - calculate the XPEHH of all loci in .hap/.hapbin files.
  • hapbinconv --hap [.hap ASCII file] --out [.hapbin binary file] - convert .hap file to more size efficient binary format.

For additional options, see [executable] --help.

Input file formats

The hap files (--hap), containing phased haplotypes, should be in IMPUTE hap format. These can be optionally converted to smaller binary files for use with the hapbin suite of tools using hapbincov. IMPUTE provides phased haplotypes in this format for several publically available human cohorts here.

The map files (--map) should be in the same format as used by Selscan with one row per variant and four space-separated columns specifiying chromosome, locus ID, genetic position and physical position.

Output file formats

  • ehhbin outputs two columns, the EHH for each allele (0 and 1) at each location.
  • ihsbin outputs two files, the first containing unstandardised iHS for allele 0 and the second (with the .std extension) containing the corresponding standardised iHS (alleles grouped in to 2% frequency bins for standardisation by default). Each of these output files contains two columns: the SNP locus id (as specified in the map file) and corresponding iHS value.
  • xpehh output file also contains two columns: the SNP locus id (as specified in the map file) and corresponding XP-EHH value.

Examples

Example command for calculating EHH for a variant with ID (--locus) of 9189 as specified in the map input file. Output is redirected to file named 9189_EHH.txt:

 ehhbin --hap phasedHaplotypes_chr22.hap --map chr22.map --locus 9189 > 9189_EHH.txt

Example command for calculating the iHS of all variants with a minor allele frequency greater than 10% (--minmaf 0.1) and specifying that the integral of the observed decay of EHH (i.e. iHH, see Voight et al. for more information) should be calculated up to the point at which EHH drops below 0.1 (--cutoff 0.1):

 ihsbin --hap phasedHaplotypes_chr22.hap --map chr22.map --out chr22_iHS --minmaf 0.1 --cutoff 0.1

Example command for calculating XP-EHH with default values for minor allele frequency and EHH cutoff:

 xpehhbin --hapA EUR_phasedHaplotypes_chr22.hap --hapB AFR_phasedHaplotypes_chr22.hap --map chr22.map --out chr22_EURvsAFR_XPEHH

Copyright and License

This code is licensed under the GPL v3. Copyright is retained by the original authors, Colin Maclean and the University of Edinburgh.

Building from source code

Dependencies

Building the source code

An out of source build is suggested in order to keep the source directory clean. To do this, create a build directory, then run cmake [path to directory] in the build directory.

For example:

 cd /path/to/hapbin
 cd build
 cmake ../src/

Once CMake has finished generating the necessary files, simply run make.

The test programs are created in a test subdirectory. Run these test programs with -help or see the Qt 5 QTest framework documentation for testing and benchmarking options.

Running ctest or make test will run all test programs.

Installing on Ubuntu

First ensure packages required for obtaining and compiling code are installed as well as mpi packages used for parallelisation if required.

 sudo apt-get update
 sudo apt-get install git cmake libcr-dev mpich libmpich-dev

Install MPIRPC to chosen directory.

 git clone https://github.com/camaclean/MPIRPC.git
 cd MPIRPC/
 cd build/
 cmake ../src/
 make
 sudo make install

Finally download and compile hapbin.

 cd ../../
 git clone -b master https://github.com/evotools/hapbin.git
 cd hapbin/build/
 cmake ../src/
 make -j 4

Installing on ARCHER

If you are using hapbin on the ARCHER UK National HPC Service, follow these steps:

  1. Install MPIRPC.

  2. Navigate to the build directory: cd hapbin/build

  3. Run . build.archer.sh or source build.archer.sh to load/switch required environment modules and configure/install hapbin.

  4. Install to /work/[project code]/[group code]/[username]/hapbin/ by typing make install

  5. Copy desired hapbin.[haps].[threads].pbs from hapbin/tools/pbs/hapbin to /work/[project code]/[group code]/[username]/hapbin/bin/

  6. cd /work/[project code]/[group code]/[username]/hapbin/bin, edit PBS as desired, and submit to the batch queue with qsub.

hapbin's People

Contributors

ekg avatar npch avatar prenderj avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.