Giter Site home page Giter Site logo

rhshah / icallsv Goto Github PK

View Code? Open in Web Editor NEW
21.0 5.0 10.0 3.19 MB

A Framework to call Structural Variants from NGS based datasets

Home Page: http://icallsv.readthedocs.io/en/latest/

License: Apache License 2.0

Python 98.77% R 1.23%
genomics next-generation-sequencing bioinformatics structural-variation python

icallsv's Introduction

iCallSV: Structural Aberration Detection from NGS datasets

Author:Ronak H Shah
Contact:[email protected]
Source code:http://github.com/rhshah/iCallSV
Wiki:http://icallsv.readthedocs.io/en/latest/
License:Apache License 2.0
Code Health

iCallSV is a Python library and command-line software toolkit to call structural aberrations from Next Generation DNA sequencing data. Behind the scenes it uses Delly2 to do structural variant calling. It is designed for use with hybrid capture, including both whole-exome and custom target panels, and short-read sequencing platforms such as Illumina.

Citation

We are in the process of publishing a manuscript describing iCallSV as part of the Structural Variant Detection framework. If you use this software in a publication, for now, please cite our website iCallSV.

Note

For some reason docstrings is not shown by Read The Docs

So please use these url from Github with Html Preview for each module information:

Per Module Info

Required Packages

We require that you install:

pandas:v0.16.2
biopython:v1.65
pysam:v0.8.4
pyvcf:0.6.7
Delly:v0.7.5
targetSeqView:master
iAnnotateSV:v1.0.6
coloredlogs:v5.2

Required Data Files

This files are given in the data folder inside iCallSV.

BlackListFile:

(blacklist.txt) Tab-delimited file wihout header having black listed regions in order:

chromosome 1, breakpoint 1, chromosome 2, breakpoint 2

Example:7 140498077 5 175998094
BlackListGenes:

(blacklistgenes.txt) Gene listed one per line wihout header that are to be removed

Example:

LINC00486

CNOT4

HotspotFile:

(hotspotgenes.txt) Tab-delimited file wihout header having hotspot regions in order:

chromosome, start, end, name

Example:2 29416089 30143525 ALK
GenesToKeep:

(genesToInclude.txt) Gene listed one per line wihout header that are to be kept

Example:

ALK

BRAF

Configuration File Format

#~~~Template configuration file to run iCallSV~~~#
#### Path to python executable ###
[Python]
PYTHON:
#### Path to R executable and R Lib ###
[R]
RHOME:
RLIB:
#### Path to delly, bcftools executables and Version of delly (supports only 0.7.3)###
[SVcaller]
DELLY:
DellyVersion:
BCFTOOLS:
#### Path to hg19 Referece Fasta file ###
[ReferenceFasta]
REFFASTA:
#### Path to file containing regions to exclude please follow Delly documentation for this ###
[ExcludeRegion]
EXREGIONS:
#### Path to file containing regions to where lenient threshold will be used; and file containing genes to keep ###
[HotSpotRegions]
HotspotFile:
GenesToKeep:
#### Path to file containing regions/genes to filter ###
[BlackListRegions]
BlackListFile:
BlackListGenes:
#### Path to samtools executable ###
[SAMTOOLS]
SAMTOOLS:
#### Path to iAnnotateSV.py and all its required files, please follow iAnnotateSV documentation ###
[iAnnotateSV]
ANNOSV:
GENOMEBUILD:
DISTANCE:
CANONICALTRANSCRIPTFILE:
UNIPROTFILE:
CosmicCensus:
CosmicFusionCounts:
RepeatRegionAnnotation:
DGvAnnotations:
#### TargetSeqView Parameters ###
[TargetSeqView]
CalculateConfidenceScore:
GENOMEBUILD:
ReadLength:
#### Parameters to run Delly ###
[ParametersToRunDelly]
MAPQ: 20
NumberOfProcessors: 4
[ParametersToFilterDellyResults]
####Case Allele Fraction Hotspot####
CaseAltFreqHotspot: 0.05
####Total Case Coverage Hotspot#####
CaseCoverageHotspot = 5
####Control Allele Fraction Hotspot####
ControlAltFreqHotspot = 0
####Case Allele Fraction####
CaseAltFreq: 0.10
####Total Case Coverage#####
CaseCoverage = 10
####Control Allele Fraction####
ControlAltFreq = 0
###Overall Supporting Read-pairs ###
OverallSupportingReads: 5
###Overall Supporting Read-pairs Hotspot ###
OverallSupportingReadsHotspot: 3
###Overall Supporting splitreads ###
OverallSupportingSplitReads: 0
###Overall Supporting splitreads Hotspot ###
OverallSupportingSplitReadsHotspot: 0
###Case Supporting Read-pairs ###
CaseSupportingReads: 2
###Case Supporting splitreads ###
CaseSupportingSplitReads: 0
###Case Supporting Read-pairs Hotspot ###
CaseSupportingReadsHotspot: 1
###Case Supporting splitreads Hotspot ###
CaseSupportingSplitReadsHotspot: 0
###Control Supporting Read-pairs ###
ControlSupportingReads: 3
###Control Supporting Read-pairs Hotspot ###
ControlSupportingReadsHotspot: 3
###Control Supporting splitreads ###
ControlSupportingSplitReads: 3
###Control Supporting splitreads Hotspot ###
ControlSupportingSplitReadsHotspot: 3
###Length of Structural Variant###
LengthOfSV: 500
###Overall Mapping Quality Threshold###
OverallMapq: 20
###Overall Mapping Quality Threshold Hotspot###
OverallMapqHotspot: 5

Quick Usage

python iCallSV.py -sc /path/to/template.ini -abam /path/to/casebamFile -bbam /path/to/controlbamFile -aId caseID -bId controlId -o /path/to/output/directory -op prefix_for_the_output_files
> python iCallSV.py -h

usage: iCallSV.py [-h] [-v] [-V] -sc config.ini -abam caseBAMFile.bam -bbam
                  controlBAMFile.bam -aId caseID -bId controlID -o
                  /somepath/output -op TumorID

iCallSV.iCallSV -- wrapper to run iCallSV package

  Created by Ronak H Shah on 2015-03-30.
  Copyright 2015-2016 Ronak H Shah. All rights reserved.

  Licensed under the Apache License 2.0
  http://www.apache.org/licenses/LICENSE-2.0

  Distributed on an "AS IS" basis without warranties
  or conditions of any kind, either express or implied.

USAGE

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         set verbosity level [default: True]
  -V, --version         show program's version number and exit
  -sc config.ini, --svConfig config.ini
                        Full path to the structural variant configuration
  -abam caseBAMFile.bam, --caseBam caseBAMFile.bam
                        Full path to the case bam file
  -bbam controlBAMFile.bam, --controlBam controlBAMFile.bam
                        Full path to the control bam file
  -aId caseID, --caseId caseID
                        Id of the case to be analyzed, this will be the sub-
                        folder
  -bId controlID, --controlId controlID
                        Id of the control to be used, this will be used for
                        filtering variants
  -o /somepath/output, --outDir /somepath/output
                        Full Path to the output dir.
  -op TumorID, --outPrefix TumorID
                        Id of the Tumor bam file which will be used as the
                        prefix for output files

Running on SGE or LSF

Note:

For both SGE and LSF you need to provide total number of cores based on the number of threads you have assinged to delly installation using OMP_NUM_THREADS.

Note:

For example: if you set OMP_NUM_THREADS as export OMP_NUM_THREADS=3 then you need to set total number of cores to be 13 (12 + 1 extra as buffer) so for each of the Delly program it utilizes 3 cores. Here I use pythons multiprocessing module to launch delly, so all four programs would be launch as seprate process utilizing number of threads given to them but setting the OMP_NUM_THREADS

SGE

qsub -q some.q -N iCallSV_JobName -o iCallSV.stdout -e iCallSV.stderr -V -l h_vmem=6G,virtual_free=6G -pe smp 13 -wd /some/path/to/working/dir -sync y  -b y python iCallSV.py -sc template.ini -bbam control.bam -abam case.bam -aId caseID -bId controlID -op outputPrefix -o  /some/path/to/output/dir -v

LSF

bsub -q some.q -J iCallSV_JobName -o iCallSV.stdout -e iCallSV.stderr -We 24:00 -R "rusage[mem=20]" -M 30 -n 13 -cwd /some/path/to/working/dir "python iCallSV.py -sc template.ini -bbam control.bam -abam case.bam -aId caseID -bId controlID -op outputPrefix -o  /some/path/to/output/dir -v"

Utilities

Running iCallSV on MSK-IMPACT Pools

This is only for MSK-IMPACT internal samples

> python iCallSV_dmp_wrapper.py -h

usage: iCallSV_dmp_wrapper.py [options]

Run iCallSV on selected pools using MSK data

optional arguments:
  -h, --help            show this help message and exit
  -fl folders.fof, --folderList folders.fof
                        Full path folders file of files.
  -qc /some/path/qcLocation, --qcLocation /some/path/qcLocation
                        Full path qc files.
  -b /some/path/bamlocation, --bamLocation /some/path/bamlocation
                        Full path bam files.
  -P /somepath/python, --python /somepath/python
                        Full path Pyhton executables.
  -icsv /somepath/iCallSV.py, --iCallSV /somepath/iCallSV.py
                        Full path iCallSV.py executables.
  -conf /somepath/template.ini, --iCallSVconf /somepath/template.ini
                        Full path configuration file to run iCallSV
  -q all.q or clin.q, --queue all.q or clin.q
                        Name of the SGE queue
  -qsub /somepath/qsub, --qsubPath /somepath/qsub
                        Full Path to the qsub executables of SGE.
  -t 5, --threads 5     Number of Threads to be used to run iCallSV
  -v, --verbose         make lots of noise [default]
  -o /somepath/output, --outDir /somepath/output
                        Full Path to the output dir.
  -of outputfile.txt, --outDir outputfile.txt
                                            Name of the final output file.

Taking the iCallSV and chechking for processed transcript/cDNA in samples

> python check_cDNA_contamination.py -h
usage: check_cDNA_contamination.py [options]

Calculate cDNA contamination per sample based of the Structural Variants
Pipeline result

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         make lots of noise [default]
  -s SVfile.txt, --svFile SVfile.txt
                        Location of the structural variant file to be used
  -o cDNA_contamination, --outputFileName cDNA_contamination
                        Full path name for the output file

icallsv's People

Contributors

rhshah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

icallsv's Issues

Missed SV

Figure out why it was missed.

sample/23628

Can I use merge.txt instead of final.txt?

Dear Sir or Madam,
I am currently utilizing iCallSV for identifying SVs from WES data. Upon examining the outputs, I observed that the "final.txt" file is derived from the "merged.txt" file by filtering out records where both sites fall within intergenic or intronic regions or are found in genes blacklisted for analysis.
My question is, can I solely rely on the "merged.txt" file for my analysis instead of using the "final.txt" file? In one of my analyses, the "merged.txt" file contained 104 records, whereas the "final.txt" file retained only 35 records after filtering out numerous TRA (Translocation) events.

Indexing SVs based on Chr and pos of the breakpoints

lines 260-270
indexList = annoDF.loc[annoDF['chr1'].isin([chrom1]) & annoDF['pos1'].isin([int(start1)]) & annoDF['chr2'].isin([chrom2]) & annoDF['pos2'].isin([int(start2)])].index.tolist() if(len(indexList) > 1): if(verbose): logger.fatal( "iCallSV::MergeFinalFile: More then one sv have same coordinate in same sample for annotated file. Please check and rerun") sys.exit(1) else: annoIndex = indexList[0]

This is an issue for rare reciprocal SVs in which both breakpoint coordinates are identical.

task 1 failed - "'unlist' is not an exported object from 'namespace:Biostrings'"

Hello,@rhshah,sorry for bother you again,when i run iCallSV,i met a error like below:

command:
R --slave --vanilla --args 5 /gpfs/users/yanghao/project/shi-jian-zhi-ping/t_n_20171030/illumina /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina-B1701-sm/Illumina-B1701-sm_allSVFiltered_tsvInput.txt hg19 150 /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina-B1701-sm Illumina-B1701-sm_allSVFiltered_cScore.txt < /gpfs/users/yanghao/software/iCallSV/iCallSV/R/Rscripts/calculateConfidenceScore.R

stdout/stderr:
[1] "mismatch limit increased to 136 to capture reads on both references"
[1] "mismatch limit increased to 137 to capture reads on both references"
[1] "Working on event 20 of 24"
[1] "mismatch limit increased to 138 to capture reads on both references"
[1] "mismatch limit increased to 139 to capture reads on both references"
[1] "mismatch limit increased to 140 to capture reads on both references"
[1] "mismatch limit increased to 141 to capture reads on both references"
[1] "mismatch limit increased to 142 to capture reads on both references"
[1] "mismatch limit increased to 143 to capture reads on both references"
[1] "mismatch limit increased to 144 to capture reads on both references"
[1] "mismatch limit increased to 7 to capture reads on both references"
[1] "mismatch limit increased to 145 to capture reads on both references"
[1] "mismatch limit increased to 8 to capture reads on both references"
[1] "mismatch limit increased to 146 to capture reads on both references"
[1] "mismatch limit increased to 9 to capture reads on both references"
[1] "mismatch limit increased to 147 to capture reads on both references"
[1] "mismatch limit increased to 10 to capture reads on both references"
[1] "mismatch limit increased to 148 to capture reads on both references"
[1] "mismatch limit increased to 11 to capture reads on both references"
[1] "mismatch limit increased to 149 to capture reads on both references"
[1] "mismatch limit increased to 150 to capture reads on both references"
[1] "mismatch limit increased to 151 to capture reads on both references"
[1] "Working on event 22 of 24"
[1] "Working on event 16 of 24"
[1] "mismatch limit increased to 7 to capture reads on both references"
[1] "mismatch limit increased to 8 to capture reads on both references"
[1] "mismatch limit increased to 9 to capture reads on both references"
[1] "mismatch limit increased to 10 to capture reads on both references"
[1] "Working on event 24 of 24"
[1] "mismatch limit increased to 7 to capture reads on both references"
[1] "mismatch limit increased to 8 to capture reads on both references"
[1] "mismatch limit increased to 9 to capture reads on both references"
[1] "Working on event 21 of 24"
Error in { :
task 1 failed - "'unlist' is not an exported object from 'namespace:Biostrings'"
Calls: calculateConfidenceScore ... ViewAndScoreFull -> alignViewFull -> %dopar% ->
Execution halted

can you help me ?

requirement update: pyvcf

environment:

  • centos6.7
  • python3.6

when I install iCallSV with pip:

pip install iCallSV

it rise a Exception when install dependence pyvcf==0.6.7 :

Collecting distribute (from pyvcf==0.6.7->iCallSV)
  Downloading http://mirrors.aliyun.com/pypi/packages/5f/ad/1fde06877a8d7d5c9b60eff7de2d452f639916ae1d48f0b8f97bf97e570a/distribute-0.7.3.zip (145kB)
    100% |████████████████████████████████| 153kB 1.4MB/s
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-87v4p3pw/distribute/setuptools/__init__.py", line 2, in <module>
        from setuptools.extension import Extension, Library
      File "/tmp/pip-build-87v4p3pw/distribute/setuptools/extension.py", line 5, in <module>
        from setuptools.dist import _get_unpatched
      File "/tmp/pip-build-87v4p3pw/distribute/setuptools/dist.py", line 7, in <module>
        from setuptools.command.install import install
      File "/tmp/pip-build-87v4p3pw/distribute/setuptools/command/__init__.py", line 8, in <module>
        from setuptools.command import install_scripts
      File "/tmp/pip-build-87v4p3pw/distribute/setuptools/command/install_scripts.py", line 3, in <module>
        from pkg_resources import Distribution, PathMetadata, ensure_directory
      File "/tmp/pip-build-87v4p3pw/distribute/pkg_resources.py", line 1518, in <module>
        register_loader_type(importlib_bootstrap.SourceFileLoader, DefaultProvider)
    AttributeError: module 'importlib._bootstrap' has no attribute 'SourceFileLoader'

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-87v4p3pw/distribute/

I can install pyvcf==0.6.8 successfully, so would you please udpate the requirement of pyvcf?

Incorrect breakpoint site description

Re: {cvr}/sample/22203
THe site2 description for NSUN4 is incorrect. The breakpoint occurs after the 3UTR of the gene in the positive strand. However, the description sites the Promoter region. Though the "26Kb from tx start" is correct.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.