Giter Site home page Giter Site logo

Comments (9)

sigven avatar sigven commented on May 31, 2024

Thanks for the notification, Vlad. Will have a look into it!

from cpsr.

vladsavelyev avatar vladsavelyev commented on May 31, 2024

Also, wondering why is this only executed for CPSR?

         if cpsr is True:
            block_idx = annoutils.get_correct_cpg_transcript(vep_csq_records)

I'm seeing a similar issue for PCGR with a coding variant in TNXB reported as a downstream event in AL662884.2:

Screen Shot 2020-02-13 at 15 47 50

Thought it might get fixed by the same fix, but it's not.

Attaching the variant VCF:
TNXB.vcf.gz

from cpsr.

sigven avatar sigven commented on May 31, 2024

Hi Vlad,

Regarding the GJB2 case (this is somewhat complicated, here is a shot):

The first part is due to the fact that CPSR is targeted towards specific genes of interest. This means that we need to capture not only one primary transcript consequence (since a variant may be annotated with multiple genes, and we want to report the consequence related to the targeted gene). Basically, we need the primary transcript consequence pr. gene (--flag_pick_allele_gene option in VEP). The primary transcript (for each gene) will then be based on the --pick_order option. Since the CPSR report is still reporting a single consequence pr. variant as its main output, the internal software needs to resolve these cases where VEP has picked more than one consequence (i.e. get_correct_cpg_transcript, "if cpsr is True"). The get_correct_cpg_transcript looks through the variants and reports the variant consequence for which the associated gene is in fact related to the targeted genes (i.e. part of a panel or the global set). There was a bug in this code, but I have fixed that now, and will commit these change.

GJB2_13_20189473.cpsr.snvs_indels.tiers.grch38.tsv.gz

For PCGR, I believe that --flag_pick_allele will be sufficient, (currently it's the same as CPSR (--flag_pick_allele_gene), since there is no prior prioritization as to which genes are most important. And accordingly, that the VEP pick order will form the basis for how the primary consequence is picked (and also resolve the case of consequences from different genes).

That being said, I am like you struggling to see how the TNXB variant get's annotated as it is now, it seems to be reported as an inframe TNXB deletion when using the VEP web interface, and that gene carries a number of properties that would make it rank above the lncRNA gene, but when I run VEP locally (v99) even without specifying the --pick-order (which I then would expect to behave using it's default behavior), it still chooses the downstream_gene_variant as the main consequence block. Will continue to figure out how this can take place.

Sigve

from cpsr.

sigven avatar sigven commented on May 31, 2024

Regarding PCGR, I think we may use this pick order as the default one:

vep_pick_order = "appris,biotype,canonical,ccds,rank,tsl,length,mane"
or even
vep_pick_order = "biotype,appris,canonical,ccds,rank,tsl,length,mane"

Reason I think rank should not be put at the front is that I think focusing on the relevant gene/transcripts should be given more importance than the severity of the consequence (i.e. so that a missense_variant in an obscure gene would not be prioritized over a non-coding variant in a well-established and important transcript. I hope this ranking will ensure that when consequences from two different genes are encountered, the functional nature of the gene is prioritized, then the importance for the given transcript(s) within that gene, and then ranks (consequence type).

from cpsr.

vladsavelyev avatar vladsavelyev commented on May 31, 2024

Thanks so much Sigve for your investigation and for the CPSR fix. This is super helpful.

Regarding moving "rank" down the list, I remember there was an issue that made me play with moving it up in the first place sigven/pcgr#79 Need to go back to that case and see how the order "biotype,appris,canonical,ccds,rank,tsl,length,mane" will affect it.

Just to make sure, am I understanding correctly that the TNXB prioritization issue is not relevant to vep_pick_order, so moving "rank" down does not solve it?

Vlad

from cpsr.

vladsavelyev avatar vladsavelyev commented on May 31, 2024

Sigve, still not sure about TNXB in PCGR.

The issue is not that a missense variant in an obscure gene is prioritized over a non-coding variant of an important gene. It's that a inframe deletion in a well-established gene is reported as a non-coding variant in an obscure gene:

grep AL662884.2 TNXB_pcgr.pcgr.snvs_indels.tiers.tsv
6       32096536        ACAGTCGCGTGGGCAGGCGCGCGAGCCG    A    
   6:g.32096536ACAGTCGCGTGGGCAGGCGCGCGAGCCG>A      grch38  TNXB_pcgr       deletion        AL662884.2      NA      NA      YES     NA      NA      ENST00000494022  ENSG00000284829 NA      0       FALSE   FALSE   NA      NA      downstream_gene_variant NA      NA      noncoding       nonexonic       NA      NA      NA      NA      NA      NA      NA      FALSE    NA      NA      inframe_deletion:TNXB:Transcript:ENST00000375244:protein_coding, inframe_deletion:TNXB:Transcript:ENST00000479795:protein_coding, downstream_gene_variant:AL662884.2:Transcript:ENST00000494022:lncRNA, inframe_deletion:TNXB:Transcript:ENST00000613214:protein_coding, inframe_deletion:TNXB:Transcript:ENST00000644971:protein_coding, inframe_deletion:TNXB:Transcript:ENST00000647633:protein_codingNA       NA      NA      NA      NA      NA      NA      TRUE    FALSE   0       NA      NA      NA      NA      NA      72      0.181   NA      NA      NONCODING       Noncoding mutation

TNXB.vcf.gz

from cpsr.

vladsavelyev avatar vladsavelyev commented on May 31, 2024

Played a bit with VEP options. Apparently using flag_pick_allele for PCGR helps here. Since PCGR doesn't use a procedure to prioritize genes like CPSR does, we need to make sure there is one consequence selected per event, otherwise VEP picks multiple ones and PCGR end up using a less relevant one.

from cpsr.

sigven avatar sigven commented on May 31, 2024

This makes sense to me at least (I get confused at times with how the ordering actually works, and then adding two potential consequences on top of that will not make it easier.. ). I'll move from --flag_pick_allele_gene to flag_pick_allele in PCGR to ensure this will not pose a problem.

from cpsr.

vladsavelyev avatar vladsavelyev commented on May 31, 2024

Fantastic, thanks Sigve :) I guess all good now with this issue

from cpsr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.