Giter Site home page Giter Site logo

rhshah / iannotatesv Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 10.0 17.97 MB

iAnnotateSV is a Python library and command-line software toolkit to annotate and visualize structural variants detected from Next Generation DNA sequencing data.

Home Page: http://iannotatesv.readthedocs.org/en/latest/index.html

License: Apache License 2.0

Python 99.89% Makefile 0.11%
annotation python structural-variation

iannotatesv's Introduction

Ronak's GitHub stats

iannotatesv's People

Contributors

andurill avatar ionox0 avatar rhshah avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

iannotatesv's Issues

Incorrect Breakpoint Annotation for UTR

Re: {cvr}/sample/22203
THe site2 description for NSUN4 is incorrect. The breakpoint occurs after the 3UTR of the gene in the positive strand. However, the description sites the Promoter region. Though the "26Kb from tx start" is correct.

Filtering based on iAnnotateSV results

Thanks for this really nice tool! I was wondering if there are any recommendations for filtering tumor variants after they are annotated. It has been suggested by a colleague to filter any variants where DGv_Name-DGv_VarType-site1 and DGv_Name-DGv_VarType-site2 match exactly, but I have seen places where the one is only a partial string match of the other, and i'm not sure what to do in the case of more complex annotations, for example:

DGv_Name-DGv_VarType-site1	DGv_Name-DGv_VarType-site2
nsv870829-Gain<=>nsv7177-Inversion<=>nsv829649-Loss	nsv7177-Inversion<=>nsv829649-Loss

The string nsv7177-Inversion<=>nsv829649-Loss is in both, but the first column contains: nsv870829-Gain<=> in addition. I'm not sure how to interpret the values when they contain <=>.

If there's any other way we could filter using the annotations from this tool, we would definitely love to know!

TypeError: int() argument must be a string or a number, not 'Int64Index'

Hello,there,i have a ERROR message,but i don't know how to fix it

when i run this:
/gpfs/users/yanghao/software/anaconda2/bin/python /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py -r hg19 -i /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina_B1701-sm/Illumina_B1701-sm_allSVFiltered.tab -o /gpfs/users/yanghao/test/iCallSV/StructuralVariantAnalysis/DellyDir/Illumina_B1701-sm -ofp Illumina_B1701-sm_allSVFiltered -d 3000 -c /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/canonicalInfo/cannonical_transcripts.txt -rr /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/repeat_region/hg19_repeatRegion.tsv -cc /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/cosmic/CosmicConsensus_transFeb2014.tsv -dgv /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/database_of_genomic_variants/hg19_DGv_Annotation.tsv -v -p -u /gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/data/UcscUniprotdomainInfo/hg19.uniprot.spAnnot.table.txt

raise an ERROR:
Traceback (most recent call last):
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 312, in
main()
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 203, in main
plotSV(plotDF, NewRefDF, uniprotPath, args)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/iAnnotateSV.py", line 304, in plotSV
vsv.VisualizeSV(svDF, refDF, upDF, args)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/VisualizeSV.py", line 47, in VisualizeSV
(domain1Idx, maxLen1, minLen1) = processData(chr1, transcript1, refDF, upDF)
File "/gpfs/users/yanghao/software/iAnnotateSV-1.0.5/iAnnotateSV/VisualizeSV.py", line 215, in processData
transcriptIdx = int(refDF[refDF['#name'] == transcript].index)
TypeError: int() argument must be a string or a number, not 'Int64Index'

can anybody help me ?thanks alot!!
qq 20180117211802

ValueError: too many values to unpack

Issue encountered when running iAnnotateSV. Failed sample C-HAT9MW-L002-d02 in Project 13893_F.

Lines that are failing iAnnotateSV with the error mentioned below:

chr1 pos1 str1 chr2 pos2 str2
17 61138841 0 6 31940159 1
6 31940159 1 17 61138841 0

error message:
File "/work/bergerm1/bergerlab/charalk/misc/troubleshoot_13893_F/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 131, in processData transcriptIdx, = (transcripts[transcripts['chrom'] == chrom].index) ValueError: too many values to unpack

Look at lines 128-136 in AnnotationForKinaseDomain.py

Location chr6 31940159 refers to gene FNDC1
Location chr17 61138841 refers to gene TP53

Error files and test data can be found at /work/bergerm1/bergerlab/charalk/misc/troubleshoot_13893_F/

@rhshah

Make New Test Files

With Newer Version Change in the Output, we need to change the Testing Files

tools for converting breakdancer output into iAnnotateSV

Is there any tools to convert the output of breakdancer into iAnnotateSV? The 3,6 columns of breakdancer are strings that record the number of reads mapped to the plus (+) or the minus (-) strand in the anchoring regions. How should I determine the orientation?

keyerror issue

Hi I am interested in using your code here but running into an issue getting it to run:

python iAnnotateSV.py -i iAnnotateSV/data/test/testData.txt -ofp test -o iAnnotateSV/data/test/ -r hg19 -d 3000 -v Traceback (most recent call last): File "iAnnotateSV.py", line 350, in <module> main() File "iAnnotateSV.py", line 218, in main NewRefDF = hp.ExtendPromoterRegion(refDF, args.distance) File "/home/kinnamam/iAnnotateSV/iAnnotateSV/helper.py", line 42, in ExtendPromoterRegion df['geneStart'] = df['txStart'] File "/home/kinnamam/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__ indexer = self.columns.get_loc(key) File "/home/kinnamam/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'txStart'

Any ideas how to troubleshoot this? Have tried running with my own data and example data, same result.

Multiple coordinates for a transcript

Hi,
I found multiple coordinates for a transcript in hg19.sv.table.txt, such as NM_001037501, and it will cause an error in processData() function within AnnotationForKinaseDomain.py, like this:

  File "/bin/iAnnotateSV.py", line 352, in <module>
    main()
  File "/bin/iAnnotateSV.py", line 223, in main
    annDF = processSV(svDF, NewRefDF, args)
  File "/bin/iAnnotateSV.py", line 320, in processSV
    args.allCanonicalTranscriptsPath, args.uniprot, args.verbose)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 107, in run
    chr1, pos1, gene1, egene1, egene2, transcript1, refDF, upDF)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 173, in getKinaseInfo
    (domainIdx, maxLen, minLen) = processData(chrom, transcript, refDF, upDF)
  File "/usr/local/soft/iAnnotateSV/iAnnotateSV/AnnotationForKinaseDomain.py", line 131, in processData
    transcriptIdx, = (transcripts[transcripts['chrom'] == chrom].index)
ValueError: too many values to unpack

Could I keep only one coordinates manually ?

Best,
Gerde

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.