rhshah / iannotatesv Goto Github PK

iAnnotateSV is a Python library and command-line software toolkit to annotate and visualize structural variants detected from Next Generation DNA sequencing data.

Home Page: http://iannotatesv.readthedocs.org/en/latest/index.html

License: Apache License 2.0

Python 99.89% Makefile 0.11%

annotation python structural-variation

iannotatesv's Introduction

iannotatesv's People

Contributors

Stargazers

Watchers

Forkers

ahmetz mskcc sunbymoon al3n70rn andurill cococou mengchengyao anoronh4 wook2014

iannotatesv's Issues

Incorrect Breakpoint Annotation for UTR

Re: {cvr}/sample/22203
THe site2 description for NSUN4 is incorrect. The breakpoint occurs after the 3UTR of the gene in the positive strand. However, the description sites the Promoter region. Though the "26Kb from tx start" is correct.

Filtering based on iAnnotateSV results

Thanks for this really nice tool! I was wondering if there are any recommendations for filtering tumor variants after they are annotated. It has been suggested by a colleague to filter any variants where DGv_Name-DGv_VarType-site1 and DGv_Name-DGv_VarType-site2 match exactly, but I have seen places where the one is only a partial string match of the other, and i'm not sure what to do in the case of more complex annotations, for example:

DGv_Name-DGv_VarType-site1	DGv_Name-DGv_VarType-site2
nsv870829-Gain<=>nsv7177-Inversion<=>nsv829649-Loss	nsv7177-Inversion<=>nsv829649-Loss

The string nsv7177-Inversion<=>nsv829649-Loss is in both, but the first column contains: nsv870829-Gain<=> in addition. I'm not sure how to interpret the values when they contain <=>.

If there's any other way we could filter using the annotations from this tool, we would definitely love to know!

TypeError: int() argument must be a string or a number, not 'Int64Index'

Hello,there,i have a ERROR message,but i don't know how to fix it

when i run this:
/gpfs/users/yanghao/software/anaconda2/bin/python /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py -r hg19 -i /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina_B1701-sm/Illumina_B1701-sm_allSVFiltered.tab -o /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina_B1701-sm -ofp Illumina_B1701-sm_allSVFiltered -d 3000 -c /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/canonicalInfo/cannonical_transcripts.txt -rr /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/repeat_region/hg19_repeatRegion.tsv -cc /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/cosmic/CosmicConsensus_transFeb2014.tsv -dgv /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/database_of_genomic_variants/hg19_DGv_Annotation.tsv -v -p -u /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/UcscUniprotdomainInfo/hg19.uniprot.spAnnot.table.txt

raise an ERROR:
Traceback (most recent call last):
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 312, in
main()
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 203, in main
plotSV(plotDF, NewRefDF, uniprotPath, args)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 304, in plotSV
vsv.VisualizeSV(svDF, refDF, upDF, args)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/VisualizeSV.py", line 47, in VisualizeSV
(domain1Idx, maxLen1, minLen1) = processData(chr1, transcript1, refDF, upDF)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/VisualizeSV.py", line 215, in processData
transcriptIdx = int(refDF[refDF['#name'] == transcript].index)
TypeError: int() argument must be a string or a number, not 'Int64Index'

can anybody help me ?thanks alot!!

Report proper requirement in README

Add requirement of to README.rst:

ValueError: too many values to unpack

Issue encountered when running iAnnotateSV. Failed sample C-HAT9MW-L002-d02 in Project 13893_F.

Lines that are failing iAnnotateSV with the error mentioned below:

chr1	pos1	str1	chr2	pos2	str2
17	61138841	0	6	31940159	1
6	31940159	1	17	61138841	0

error message:
File "/work/bergerm1/bergerlab/charalk/misc/troubleshoot_13893_F/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 131, in processData transcriptIdx, = (transcripts[transcripts['chrom'] == chrom].index) ValueError: too many values to unpack

Look at lines 128-136 in AnnotationForKinaseDomain.py

Location chr6 31940159 refers to gene FNDC1
Location chr17 61138841 refers to gene TP53

Error files and test data can be found at /work/bergerm1/bergerlab/charalk/misc/troubleshoot_13893_F/

@rhshah

UTR annotation is wrong when the Breakpoint is way outside of the canonical transcript

Look at sample/21647 in cvr.

iAnnotateSV visualization is broken in some cases

Make New Test Files

With Newer Version Change in the Output, we need to change the Testing Files

tools for converting breakdancer output into iAnnotateSV

Is there any tools to convert the output of breakdancer into iAnnotateSV? The 3,6 columns of breakdancer are strings that record the number of reads mapped to the plus (+) or the minus (-) strand in the anchoring regions. How should I determine the orientation?

keyerror issue

Hi I am interested in using your code here but running into an issue getting it to run:

python iAnnotateSV.py -i iAnnotateSV/data/test/testData.txt -ofp test -o iAnnotateSV/data/test/ -r hg19 -d 3000 -v Traceback (most recent call last): File "iAnnotateSV.py", line 350, in <module> main() File "iAnnotateSV.py", line 218, in main NewRefDF = hp.ExtendPromoterRegion(refDF, args.distance) File "/home/kinnamam/iAnnotateSV/iAnnotateSV/helper.py", line 42, in ExtendPromoterRegion df['geneStart'] = df['txStart'] File "/home/kinnamam/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "/home/kinnamam/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'txStart'

Any ideas how to troubleshoot this? Have tried running with my own data and example data, same result.

Multiple coordinates for a transcript

Hi,
I found multiple coordinates for a transcript in hg19.sv.table.txt, such as NM_001037501, and it will cause an error in processData() function within AnnotationForKinaseDomain.py, like this:

  File "/bin/iAnnotateSV.py", line 352, in <module>
    main()
  File "/bin/iAnnotateSV.py", line 223, in main
    annDF = processSV(svDF, NewRefDF, args)
  File "/bin/iAnnotateSV.py", line 320, in processSV
    args.allCanonicalTranscriptsPath, args.uniprot, args.verbose)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 107, in run
    chr1, pos1, gene1, egene1, egene2, transcript1, refDF, upDF)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 173, in getKinaseInfo
    (domainIdx, maxLen, minLen) = processData(chrom, transcript, refDF, upDF)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 131, in processData
    transcriptIdx, = (transcripts[transcripts['chrom'] == chrom].index)
ValueError: too many values to unpack

Could I keep only one coordinates manually ?

Best,
Gerde