Giter Site home page Giter Site logo

wisekh6 / ampliconarchitect Goto Github PK

View Code? Open in Web Editor NEW

This project forked from virajbdeshpande/ampliconarchitect

0.0 1.0 0.0 49 KB

AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon. In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GrCH37 reference sequence and one or more regions of interest. This is development version. Please "watch" this repository for production release and methods manuscript soon.

Python 100.00%

ampliconarchitect's Introduction

# AmpliconArchitect
## AmpliconArchitect (AA) is a tool to identify one or more connected genomic regions which have simultaneous copy number amplification and elucidates the architecture of the amplicon.
## In the current version, AA takes as input next generation sequencing reads (paired-end Illumina reads) mapped to the hg19/GrCH37 reference sequence and one or more regions of interest.
## This is development version. Please "watch" this repository for announcement of production release and methods manuscript soon.
## Note: see chromosome name format for bamfile: 'chr1' .. 'chr22', 'chrX', 'chrY' as in "data_repo/hg19full.fa". (Thanks to Sitan Qiao, BGI for noticing this missing information).

#=============================================================================================================================================================================================

# Installation:

# AA download (if you have not already cloned this source code):
git clone https://github.com/virajbdeshpande/AmpliconArchitect.git

# Dependencies:
## 1) Python 2.7

## 2) Ubuntu libraries and tools:
sudo apt-get install build-essential python-dev gfortran python-numpy python-scipy python-matplotlib python-pip zlib1g-dev samtools 

## 3) Pysam verion 0.9.0 (https://github.com/pysam-developers/pysam):
sudo pip install pysam

## 4) Mosek optimization tool (https://www.mosek.com/):
wget http://download.mosek.com/stable/7.1.0.12/mosektoolslinux64x86.tar.bz2
tar xf mosektoolslinux64x86.tar.bz2
echo Please obtain license from https://mosek.com/resources/academic-license or https://mosek.com/resources/trial-license and place in $PWD/mosek/7/licenses
echo export MOSEKPLATFORM=linux64x86 >> ~/.bashrc
export MOSEKPLATFORM=linux64x86
echo export PATH=$PATH:$PWD/mosek/7/tools/platform/$MOSEKPLATFORM/bin >> ~/.bashrc
echo export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PWD/mosek/7/tools/platform/$MOSEKPLATFORM/bin >> ~/.bashrc
echo export MOSEKLM_LICENSE_FILE=$PWD/mosek/7/licenses >> ~/.bashrc
cd $PWD/mosek/7/tools/platform/linux64x86/python/2/
sudo python setup.py install #(--user)
cd -
source ~/.bashrc


# Data repositories:
## Download the data repositories. While we include some annotations, we are unable to host some large files in the git repository.
## These may be downloaded from https://drive.google.com/open?id=0ByYcg0axX7udSjdZeUNqdlNVNHc . Thanks to Peter Ulz for noticing incorrect link earlier.
tar zxf data_repo.tar.gz
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
source ~/.bashrc

#==============================================================================================================================================================================================


#==============================================================================================================================================================================================

# AmpliconArchitect reconstruction:

# Preprocessing:

## Input 1: Coordinate sorted bam file:
## Use bwa mem or any other software to align reads to the hg19full.fa in the data_repo human reference to output a bam file.
## Sort the bam file by genomic coordinates and index the coordinate sorted bam file using "samtools index".

## Input 2: Amplicon Interval(s):
## Identify interval(s) of interest by using a CNV caller e.g. ReadDepth, but may use others or user defined intervals.
## Recommended to use file AmpliconArchitect/src/read_depth_params for calling variants with ReadDepth.
## Recommended only analyze CN gains > 5x copy count and > 100kbp in size.
## Recommended to filter out intervals with significant overlap with conserved regions.
## Input intervals in bed file format
python src/amplified_intervals.py --bed {read_depth_folder}/output/alts.dat > {outName}.bed

# Amplicon reconstruction
python src/AmpliconArchitect.py --bam <input_bam> --bed <bed file> --out <prefix for output files>

# Output description
## The software generates 3 types of output files
## 1) <outName>_summary.txt:
## This file includes a summary for all amplicons detected
## 
## 2) <outName>_amplicon<id>_graph.txt:
## For each amplicon, breakpoint graph describing sequence edges and breakpoint edges in the graph
## Edge each consists of 2 vertices in the format <chromosome>:<position><orientation>
## e.g. A sequence edge chr1:10001-->chr1:20000+ represents the 10000bp genomic sequence. A breakpoint edge chr1:10001-->chr1:20000+ implies that on chr1, the forward strand from position 20000 is continues onto the forward strand at position 10001 in the 5' to 3' direction to create a 10000bp loop. Breakpoint vertices with position -1 are reserved for source vertex at the end of linear contig.
## 
## 3) <outName>_amplicon<id>_cycle.txt:
## For each amplicon, this file represents the list of structures contained in the amplicon.
## Segments represent genomic intervals (with possible overlaps) and each segment is assigned a unique id.
## Segment id 0 is reserved for source vertices. 
## Cycles are represented as ordered lists of connected segments along with orientation of the segments.
## The last segment connects back to the first segment of the cycle.
## Linear contigs have 0+ and 0- as their first and last segments.

#==============================================================================================================================================================================================


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.