Giter Site home page Giter Site logo

latools's Introduction

Release: 2013-06-18

This program maps ancestry informative markers through genotype likelihoods and local ancestry deconvolutions across the genomes of admixed individuals.

###########################################################################
## INSTALLATION (UBUNTU/DEBIAN OS)                                       ##
###########################################################################

Local ancestry deconvolution files need to be formatted as UnionBedGraph files with headers that can be generated with bedtools unionbedg from local ancestry deconvolution files in bed format.
$ sudo apt-get install bedtools unzip wget
$ wget -rnd ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/ancestry_deconvolution
$ for pop in ASW CLM MXL PUR; do
$     unzip -o ${pop}_phase1_ancestry_deconvolution.zip
$     files=$(echo $pop/*.bed)
$     for file in $files; do sed -i -e s/\\\t[0-9]*$//g -e s/undet/0/g $file; done
$     names=$(echo $files | sed -e s/$pop\\\///g -e s/.bed//g)
$     bedtools unionbedg -header -i $files -names $names | gzip > $pop.bg.gz
$ done
$ files=$(echo {ASW,CLM,MXL,PUR}/*.bed)
$ names=$(echo $files | sed -e s/\\\(ASW\\\|CLM\\\|MXL\\\|PUR\\\)\\\///g -e s/.bed//g)
$ bedtools unionbedg -header -i $files -names $names | gzip > la.bg.gz
The above commands should generate the local ancestry deconvolution matrix necessary to perform admixture mapping.

To run the program you need NumPy, SciPy, PyVCF and you need to compile the included library:
$ sudo apt-get install python-numpy python-scipy python-pip libc6-dev
$ sudo pip install pyvcf
$ gcc -std=gnu99 -shared -fPIC -O3 -o latools.so latools.c sumlog.c -lm
$ The above commands should complete the installation on a Ubuntu machine.
To properly run the program, make sure latools.py and latools.so are in the same directory.

If you would rather use python3, just follow these instructions:
$ sudo apt-get install python3-numpy python3-scipy python3-pip
$ sudo pip3 install pyvcf
$ sed -i 's/python$/python3/g' latools.py

Additional suggested packages: GATK, samtools, tabix

###########################################################################
## USAGE                                                                 ##
###########################################################################

To see a full list of options, run the following command:
$ python latools.py map

###########################################################################
## EXAMPLES                                                              ##
###########################################################################

prefix="ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/ALL.chr"
suffix=".phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz"

1) Admixture map Duffy SNP rs2814778:
tabix -fh $prefix\1$suffix 1:159,174,683-159,174,683 | latools.py map la.bg.gz -o rs2814778.vcf

2) Admxiture map mismapped SNP rs62167565 from the vestegial centromere:
tabix -fh $prefix\2$suffix 2:133,105,100-133,105,100 | latools.py map la.bg.gz -o rs62167565.vcf

3) Admixture map mismapped SNPs from the 5p14.3/6p11.2 locus:
tabix -fh $prefix\5$suffix 5:21,506,326-21,573,437 | latools.py map la.bg.gz -o prim2.vcf

4) Admixture map mismapped SNPs from the DUSP22 locus:
tabix -fh $prefix\6$suffix 6:256,518-382,461 | latools.py map la.bg.gz -o dusp22.vcf

5) Admixture map mismapped SNPs from the PRIM2 locus:
tabix -fh $prefix\6$suffix 6:57,453,735-57,453,735 | latools.py map la.bg.gz -o rs201079750.vcf

6) Admxiture map mismapped SNP rs28415689:
tabix -fh $prefix\14$suffix 14:19,109,897-19,109,897 | latools.py map la.bg.gz -o rs28415689.vcf

7) Admxiture map mismapped SNP rs216090 from an Alu sequence:
tabix -fh $prefix\16$suffix 16:158,661-158,661 | latools.py map la.bg.gz -o rs216090.vcf

8) Admxiture map SNP rs4818105 highly diverged in Native Americans:
tabix -fh $prefix\21$suffix 21:41,370,397-41,370,397 | latools.py map la.bg.gz -o rs4818105.vcf

###########################################################################
## IMPORTANT LINKS                                                       ##
###########################################################################

1) Local ancestry deconvolution files:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/ancestry_deconvolution/ or
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/phase1/analysis_results/ancestry_deconvolution

2) 1000 Genomes Project genotype calls:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ or
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/

3) Description of hs37d5 reference with included decoy sequences for phase2:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/ or
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/

4) BedGraph Format:
http://genome.ucsc.edu/goldenPath/help/bedgraph.html

5) unionbedg Format:
http://bedtools.readthedocs.org/en/latest/content/tools/unionbedg.html

6) VCF (Variant Call Format):
http://vcftools.sourceforge.net/specs.html

7) PyVCF - A Variant Call Format Parser for Python:
http://pyvcf.rtfd.org/

latools's People

Contributors

freeseek avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Forkers

pqyplzxhgf

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.