Giter Site home page Giter Site logo

cgap's Introduction

Codacy Badge

cgap

##What it is: cgap is a gene extraction pipeline. It deals with large fastq files by parsing them with blast to speed bwa alignment on a reference file. This amounts to substantial time savings ##How it works:

![alt text][cgap workflow] [cgap workflow]: media/cgap_graph.png "cgap workflow" ##How to use it:

  1. Update or clone clone the cgap repository.
  • If you don't have cgap:

    git clone https://github.com/TheCulliganMan/cgap.git
    
  • If you want to update cgap:

    cd <cgap_dir>;
    git pull;
    
  1. Use our docker image with the required binaries, or compile your own software.
docker pull theculliganman/cgap:latest
docker run -itv <your directory>:/work theculliganman/cgap /bin/bash

Requirements:

  • python [python version: 3.5.1, anaconda: 2.5.0]
  • biopython [version: 1.66]
  • samtools [version: 1.3]
  • bcftools [version: 1.2]
  • htslib [version: 1.3]
  • bwa [version: 0.7.5-r405]
  • novosort [version: 1.03.03]
  • tabix [version: 0.2.5]
  • blastn [version: 2.2.28]
  1. Make necessary changes to the cgap config file
from os.path import abspath

CORES = int(6)

# WORK LOCATIONS
##Created in your current working directory
FASTQ_PATH = abspath("fastq_files")
HITS_PATH = abspath("hit_files")
HIT_FASTQ_PATH = abspath("hit_fastq_files")
BLAST_PATH = abspath('blast_files')
BAM_PATH = abspath('bam_files')
VCF_PATH = abspath('vcf_files')
DEPTH_PATH = abspath('depth_files')
CONSENSUS_FILES = abspath('consensus_files')
PHYLIP_FILES = abspath('phylip_files')

# PROGRAM LOCATIONS
MARK_DUPLICATES_JAR_PATH = "/bin/MarkDuplicates.jar"
SAMTOOLS_PATH = 'samtools'
BWA_PATH = 'bwa'
NOVOSORT_PATH = 'novosort'
TABIX_PATH = 'tabix'
BCFTOOLS_PATH = 'bcftools'

# QUALITY SETTINGS
## Minimum quality for bases to remain unmasked.
## Integers only!!!
MASK_MIN_QUALITY = int(20)

## Minimum depth for bases to remain unmasked.
## Integers only!!!
MASK_MIN_DEPTH = int(4)

## Minimum Blast Score #we used 50.0
MIN_BLAST_SCORE = float(50.0)
  1. Run the cgap command.
python run_cgap.py \
  -refs_path <directory where fastas are located> \
  -forward <fastq_1_fw> <fastq_1_fw> <fastq_1_fw> \
  -reverse <fastq_2_rv> <fastq_2_rv> <fastq_2_rv> \
  -c 5 \
  -format_db #[optional, only if they need formatting.]

or in the form of a slurm script.

#!/bin/sh

#SBATCH --time=10:00:00          # Run time in hh:mm:ss
#SBATCH --mem=50G        # Minimum memory required per CPU (in megabytes)
#SBATCH --job-name=TortCgap
#SBATCH --ntasks=8
#SBATCH --error=/work/hdzoo/shared/cgap_bin/cgap/job.%J.err
#SBATCH --output=/work/hdzoo/shared/cgap_bin/cgap/job.%J.out

module load compiler/gcc/4.8
module load bcftools/1.3
module load blast/2.2
module load bwa/0.7
module load HTSlib/1.3
module load python/3.5
module load tabix/0.2
module load novocraft
module load samtools/1.3
module load blast-legacy
module load java/1.8

export PATH=/home/hdzoo/shared/software/bin:$PATH

python run_cgap.py \
  -refs_path galapagos_ref \
  -forward ab1tr1.fastq  ch1tr1.fastq  datr1.fastq  ep1tr1.fastq  m1tr1.fastq  p2tr1.fastq  va2tr1.fastq \
  -reverse ab1tr2.fastq  ch1tr2.fastq  datr2.fastq  ep1tr2.fastq  m1tr2.fastq  p2tr2.fastq  va2tr2.fastq \
  -c 8 -format_db;

cgap's People

Contributors

theculliganman avatar imgbotapp avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.