pLI for SVs about annotsv HOT 11 CLOSED

lgmgeo commented on July 18, 2024

pLI for SVs

from annotsv.

Comments (11)

Jordi-V commented on July 18, 2024

Dear all, I've a question about pLI and Haploinsufficiency HI values from annotSV

The ExAC pLI parameter is adapted for SNPs, to know the grade of pathogeneicity for SVs, the values of pLI and HI are the same than SNV?? can we interpret exaclty as the same way as SNPs?? or is just informative for SNPs information from these repositories?? I mention this, for the sentence of GnomAD SV project which say the following

"Nearly all existing metrics, including pLI and LOEUF, are derived from SNVs. Although previous studies have attempted to compute similar scores using large CNVs detected by microarray and exome sequencing 29,30 , or to correlate
deletions with pLI 18 , no gene-level metrics comparable to LOEUF exist for SVs at WGS resolution"

My objective is say something like the SVs act in loss of function intolerance, and I want to take the values from HI and pLI, it is correct??

Thanks for your time

Jordi

from annotsv.

lgmgeo commented on July 18, 2024

Hi,

The pLi indicates the probability that a gene is intolerant to a loss of function variation (Nonsense, splice acceptor and donor variants caused by SNV). ExAC consider pLI >= 0.9 as an extremely LoF intolerant set of genes.

Haploinsufficiency (HI) indicates if a single functional copy of a gene is insufficient to maintain normal function.
As detailed in DECIPHER:

High ranks (e.g. 0-10%) indicate a gene is more likely to exhibit haploinsufficiency
Low ranks (e.g. 90-100%) indicate a gene is more likely to NOT exhibit haploinsufficiency.

Here are the information given in the gnomAD FAQ:

LOEUF stands for the "loss-of-function observed/expected upper bound fraction." It is a conservative estimate of the `
observed/expected ratio, based on the upper bound of a Poisson-derived confidence interval around the ratio. Low LOEUF 
scores indicate strong selection against predicted loss-of-function (pLoF) variation in a given gene, while high LOEUF scores 
suggest a relatively higher tolerance to inactivation. 

Its advantage over pLI is that it can be used as a continuous value rather than a dichotomous scale (e.g. pLI > 0.9) - if such a
single cutoff is still desired, pLI is a perfectly fine metric to use. 

At large sample sizes, the observed/expected ratio will be a more appropriate measure for selection, but at the moment, LOEUF provides 
a good compromise of point estimate and significance measure.

Hope this could help,
Best,
Véronique

from annotsv.

Jordi-V commented on July 18, 2024

Thanks a lot Véronique for your help.

Yes I know the relevance of pLI and HI, but how EXAC_pLI is designed to calculate the pLI for SNPs i just wanted to know if the number of EXAC_pLI is useful for SVs, but with your reply I tink that yes :D.

I think that I find a bug in annotSV, because I find in the annotations file that location of gene when the location is intronic normally in location2 is CDS, and this is not possible in my opinion, because CDS is coding region and intronic no... Even when we look the CDS_lenght in those cases are 0, so I think that value of location2 is not correct... Here an example:

AnnotSV_type Gene_name NM CDS_length tx_length location location2
split SCNN1D NM_001130413 0 57 intron5-intron5 CDS

my surprise is that tx_length have a value it means that touch som exome right??... maybe I'm loosing something.... How could you explain this ???

Thanks for your time

Jordi

from annotsv.

lgmgeo commented on July 18, 2024

Hi Jordi,

Actually, no bug.

"location2=CDS" doesn't mean that a coding region is overlapped... but that a region between the CDS start and the CDS end is overlapped. In your example, the intron5 is located between the start and the end of the CDS. That's why location2 is set to "CDS".
To know if a coding redion (exons between the CDS start and the CDS end) is overlapped, you need to look at the CDS_length feature (in your example, CDS_length = 0. Correct as the overlapped region is in intron5).

Concerning your example, you can check the data of the transcript (NM_001130413 ) with the following command:

grep NM_001130413 $ANNOTSV/share/AnnotSV/Annotations_Human/Genes/GRCh37/genes.NM.sorted.bed
1       1215815 1227405 +       SCNN1D  NM_001130413    1216041 1226990 1215815,1216605,1216790,1217621,1219357,1220950,1221305,1222147,1222488,1222887,1223052,1223318,1225650,1225856,1226016,1226274,1226444,1226633,      1216046,1216677,1216990,1217695,1219470,1221044,1221658,1222355,1222679,1222976,1223216,1223417,1225768,1225935,1226074,1226333,1226520,1227405,

txStart = 1215815
txEnd = 1227405
CDSstart =1216041
CDSend = 1226990
intron5start = 1219470
intron5end = 1220950

Let me know if something is still unclear,
Best,
Véronique

from annotsv.

Jordi-V commented on July 18, 2024

Hi Véronique,

Thanks a lot for your reply, so If the SV overlapped in intron region, and the intron is inside of CDS coordinates it means that the location2=CDS. For me, it is quite confusing, because the CDS does mean the exon region (no in this program concretely if not in general)??
I plotted my results obtained with AnnotSV, and I see that the majority of my SVs fall in intronic regions and CDS regions, that's why my concern... because the intron is not a gene coding region... or have any effect on the introns between CDS regions??

Sorry for too many questions, you help me a lot
Best,
Jordi

from annotsv.

lgmgeo commented on July 18, 2024

Your understanding is now correct. Sorry for being confusing in the documentation.
I will add some explanations in the README file for a better understanding of the location and location2 features.
If the "CDS" term is confusing, do you have any other term proposition? I can modify it.

Currently, to know if your SV fall in CDS regions, please look only at the CDSlength. If different of 0, then CDS is overlapped.

from annotsv.

Jordi-V commented on July 18, 2024

Hi Véronique,
yes we appreciated a detailed explanation about what does it mean each variable that you include, because exon*-exon* could be the juntion between to exons for example... If you detailed the variables of location and location2 could clarify the interpretations by users.

For me, if the location2 just put the coding regions, and intronic are not evolved in this regions, I will put a blank space or "no coding region" for example, because the intron, in this case, is not a coding gene region.

THanks a lot for your help!

Jordi

from annotsv.

Jordi-V commented on July 18, 2024

Hi Véronique,

I just want to mention. The refseq that you take to annotate the SVs, contain the NM and NR transcripts, maybe the refseq consortium make new updates and now includes these ones... which are MIR genes and others. I tell you because I suspect that as you mention the column transcript NM, you want to analyse just the protein-coding genes, but now you include others than these ones.

I hope this it helps to improve the program

Best

Jordi

from annotsv.

Jordi-V commented on July 18, 2024

Hi Véronique!

I hope you enjoy the summer and COVID doesn't effect the holidays too much...

I'm contacting you again because I've problems with the interpretation of location information.....

I don't know which is the difference between exon*-exon* and txstart-txend.... It means that some SV effect one exon only, and the second one effect all gene coding region (transcript)?? if this is true, what does it mean exon-txend? Something like last exon gene and to the end of transcript or gene??

My apologies for too many questions...

from annotsv.

lgmgeo commented on July 18, 2024

Hi Jordi,

Regarding a gene, AnnotSV retrieves the following coordinates:

txStart (same coordinates as exon1_start)
exon1_start, exon2_start, exon3_start...
exon1_end, exon2_end, exon3_end...
txEnd (same coordinates as the last-exon_end)
cdsStart / cdsEnd

The "location" feature corresponds to the SV location in the gene:
Values: txStart, txEnd, exon’i’, intron’i’
e.g. « txStart-exon1 »

It indicates whether the whole gene is overlapped by the SV (or only part of the gene).

For example, "exon3-exon5" indicates that the SV breakpoints are in exon3 and in exon5.
And "txStart-txEnd" indicates that the whole gene is overlapped by the SV.

Hope the "location" feature is clearer now...

from annotsv.

lgmgeo commented on July 18, 2024

Hi,

Concerning what you mentioned recently:

The refseq that you take to annotate the SVs, contain the NM and NR transcripts, maybe the refseq 
consortium make new updates and now includes these ones... which are MIR genes and others. I tell
you because I suspect that as you mention the column transcript NM, you want to analyse just the 
protein-coding genes, but now you include others than these ones.

Actually, all the genes are of interest.
So I changed the names of the values for the "tx" option (currently only in the patch_AnnotSV branch):
NM >> RefSeq
ENST >> ENSEMBL

Thanks, it will be clearer now.
Best,
Véronique

from annotsv.

pLI for SVs about annotsv HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent