Giter Site home page Giter Site logo

Comments (4)

lgmgeo avatar lgmgeo commented on August 17, 2024 1

changing the square-bracketed SV notation to "<BND>" for these challenging variants only solved the problem, so we have decided to keep them and handle them with care.

Perfect!

For the record, the "complicated" region that seems to have high homology across the same chromosome and even on other chromosomes through the genome is chr2:32916100-32916600 (hg38).

  • This is a Repeat (G)n region
  • No RefSeq curated genes
  • The "Human Gene LINC00486 (ENST00000414054.5) from GENCODE V44" corresponds to a long intergenic non-protein coding RNA (long non-coding RNA from RefSeq NR_027098)

In my opinion
It is quite normal for SV callers to be inaccurate for BNDs in this region (with short reads especially).
The large SVs (>20M, >200M) found in this "complicated" region would appear to be false positive SVs.
image

from annotsv.

lgmgeo avatar lgmgeo commented on August 17, 2024

Hi @jamigo,

It's nice to see that AnnotSV is useful and applied on thousands of samples.
It’s very motivating for me!

And thank you for your detailed feedback.
Your small 500 bp chr2 region looks very interesting.
I'm curious if it is located in a segmental duplication region or in a repeatMasker (SINE, LINE...) region or near centromer / telomere, which could explained badly called BNDs.

In your VCF DRAGEN SV input file, I assume your BNDs are annotated with square-bracketed notations in ALT and that reciprocal BNDs (MATEID) are indicated.
In your small 500 bp chr2 region, what types of SV do your BNDs correspond to? Is there a particular type (INV, INS...)? Do they have FILTER=PASS (I guess not if this region is indeed complex)?

Anyway, AnnotSV seems in difficulty because of these BND pairs, corresponding to large SVs.
So, for now, my advice would be:

  • To extract the BNDs in your small 500bp region and set the ALT feature to <BND> (instead of the square-bracketed notations in ALT)
  • Do the same for the corresponding MATEIDs

=> By analyzing only the BNDs and not the SV in its entire width, this should avoid bugging AnnotSV.

In the futur, I'm thinking about integrating/using a database in AnnotSV code (#15). This should fix the bug.
But it's a big job, and I have other implementations to do first.
This will be done, I hope before the end of the year at the latest.

I'll keep you posted here.

Best,

Véronique

from annotsv.

lgmgeo avatar lgmgeo commented on August 17, 2024

Note for square-bracketed ALT notation:

The comprehension of the square-bracketed notations relies on the homogenization rules from the variantextractor tool (provided by Rodrigo Martin).

  • For duplication, inversion, deletion and insertion, AnnotSV returns one full annotation per SV (one full
    annotation per breakend pair).

from annotsv.

jamigo avatar jamigo commented on August 17, 2024

AFAIK, AnnotSV is handling square-bracketed SV notation perfectly fine. The problem we have only happens when AnnotSV deals with several thousands of very-large-same-chromosome-paired BNDs (some >20M, some even >200M), maybe because AnnotSV is not releasing memory when it has to (this one could be checked), or maybe because the underlying bedtools calls demand lots of memory (this one would be difficult to address).

We were in fact thinking about leaving out all these very-large-same-chromosome-paired BNDs found in this "complicated" region, since they are obviously derived from a reference genome feature rather than from each sample's features, but changing the square-bracketed SV notation to "<BND>" for these challenging variants only solved the problem, so we have decided to keep them and handle them with care.

For the record, the "complicated" region that seems to have high homology across the same chromosome and even on other chromosomes through the genome is chr2:32916100-32916600 (hg38).

from annotsv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.