Giter Site home page Giter Site logo

tanyaphung / par_nonpar_diversity_canines Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 104 KB

This repository contains scripts related to analyzing genetic diversity in the pseudoautosomal regions and non-pseudoautosmal regions of the X chromosome in canines (Phung et al.) (in preparation)

Shell 68.60% Python 31.40%

par_nonpar_diversity_canines's Introduction

PAR_nonPAR_Diversity_Canines

This repository contains scripts related to analyzing genetic diversity in the pseudoautosomal regions and non-pseudoautosmal regions of the X chromosome in canines (Phung et al.) (in preparation)

GATK Variant Calling

  • Start out with BAM files that have been aligned (BAM files were obtained from Marsden et al. 2016). Here I will describe how I obtain the VCF files from these BAM files
Step 1: Single sample variant calling with GATK
./gatk_single_sample.sh /path/to/output/directory/ whatChr SampleName /path/to/bam/file/
Step 2: Merge all of the VCFs from 20 individuals to one VCF
./merge_vcfs.sh
Filter variants: GATK hard filter, remove nonbiallelic variants and clustered SNPs
./filter_vcfs.sh
Obtain GT for each individual
./obtain_GT_for_1_ind_from_VCF.sh

Obtain callable loci

When computing heterozygosity, one needs to divide by the total number of callable sites in that window. In other words, when computing heterozygosity across 50kb nonoverlapping window, if all sites within that windows are callable, heterozygosity per site would be equal to number of heterozygotes divided by 50kb. However, empirical data are typically very messy and so not every site within each window would be callable. These scripts aim to obtain the regions of the genome that are callable.

Use GATK to obtain callable loci from BAM file

Usage:

./obtain_callable_regions.sh /path/to/GenomeAnalysisTK.jar /path/to/ref.fa path/to/bam sampleName

Note that this script is intended for obtaining the callable regions for one individual.

After GATK Callable Loci script is done, next step is to filter the output file. Specifically, I will grep out the callable reigons for each chromosome of interest. Then, I will grep out the callable loci (regions that are annotated with CALLABLE). Finally, I will grep out col1, col2, and col3 which are the name of the chromosome, the start coordinate, and the end coordinate.

./format_callable_loci.sh

Compute pairwise genetic diversity

Obtain callable sites for each species

  • Previously when I computed per individual heterozygosity, because it is for each individual, the callable sites are for that particular individuals.
  • Now in order to compute genetic diversity within 13 dogs and 6 wolves, I need to obtain callable sites within 13 dogs and 6 wolves.
  • Used bedtools intersect for this. Currently I used bedtools intersect on 2 files, then pipe the output to be intersect with another file. This is extremely inefficient but there are currenly no tools to do this more efficiently that I can find.
./intersect_callableRegions_13Dogs.sh
./intersect_callableRegions_6Wolves.sh

Obtain callable sites that are neutral for each species

./intersect_callableRegionsWithinSpecies_neutralRegions.sh

Obtain variant sites for each species

  • Use VCFtools to subset 13 dog individuals and 6 wolf individuals from the filtered VCF
./obtain_GT_for_13dogs_from_VCF.sh /path/to/input/directory /path/to/output/directory

Compute pairwise diversity

  • All scripts associated with computing pairwise diversity can be found in computePi

  • The main function is compute_pairwiseDiversity.py. For usage:

python compute_pairwiseDiversity.py -h
usage: compute_pairwiseDiversity.py [-h] --windows_bed WINDOWS_BED
                                    --targets_bed TARGETS_BED --variants
                                    VARIANTS --numAllele NUMALLELE --outfile
                                    OUTFILE

This script computes pairwise diversity.

optional arguments:
  -h, --help            show this help message and exit
  --windows_bed WINDOWS_BED
                        REQUIRED. BED file for Xkb window.
  --targets_bed TARGETS_BED
                        REQUIRED. BED file specifying the regions to be
                        partitioned into Xkb window. For example, give the
                        path for the bed file where regions represent neutral
                        region
  --variants VARIANTS   REQUIRED. Variant file. The format should be CHROM POS
                        ind1 ind2 etc. Should be tab delimit. Because of VCF
                        format, it is 1-based
  --numAllele NUMALLELE
                        REQUIRED. Indicate the number of alleles, which is
                        equal to the number of individuals in your sample
                        times 2.
  --outfile OUTFILE     REQUIRED. Name of output file.

par_nonpar_diversity_canines's People

Contributors

tanyaphung avatar

Stargazers

Lisa Bang avatar

Watchers

James Cloos avatar Lisa Bang avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.