Comments (11)
Dear all, I've a question about pLI and Haploinsufficiency HI values from annotSV
The ExAC pLI parameter is adapted for SNPs, to know the grade of pathogeneicity for SVs, the values of pLI and HI are the same than SNV?? can we interpret exaclty as the same way as SNPs?? or is just informative for SNPs information from these repositories?? I mention this, for the sentence of GnomAD SV project which say the following
"Nearly all existing metrics, including pLI and LOEUF, are derived from SNVs. Although previous studies have attempted to compute similar scores using large CNVs detected by microarray and exome sequencing 29,30 , or to correlate
deletions with pLI 18 , no gene-level metrics comparable to LOEUF exist for SVs at WGS resolution"
My objective is say something like the SVs act in loss of function intolerance, and I want to take the values from HI and pLI, it is correct??
Thanks for your time
Jordi
from annotsv.
Hi,
The pLi indicates the probability that a gene is intolerant to a loss of function variation (Nonsense, splice acceptor and donor variants caused by SNV). ExAC consider pLI >= 0.9 as an extremely LoF intolerant set of genes.
Haploinsufficiency (HI) indicates if a single functional copy of a gene is insufficient to maintain normal function.
As detailed in DECIPHER:
- High ranks (e.g. 0-10%) indicate a gene is more likely to exhibit haploinsufficiency
- Low ranks (e.g. 90-100%) indicate a gene is more likely to NOT exhibit haploinsufficiency.
Here are the information given in the gnomAD FAQ:
LOEUF stands for the "loss-of-function observed/expected upper bound fraction." It is a conservative estimate of the `
observed/expected ratio, based on the upper bound of a Poisson-derived confidence interval around the ratio. Low LOEUF
scores indicate strong selection against predicted loss-of-function (pLoF) variation in a given gene, while high LOEUF scores
suggest a relatively higher tolerance to inactivation.
Its advantage over pLI is that it can be used as a continuous value rather than a dichotomous scale (e.g. pLI > 0.9) - if such a
single cutoff is still desired, pLI is a perfectly fine metric to use.
At large sample sizes, the observed/expected ratio will be a more appropriate measure for selection, but at the moment, LOEUF provides
a good compromise of point estimate and significance measure.
Hope this could help,
Best,
Véronique
from annotsv.
Thanks a lot Véronique for your help.
Yes I know the relevance of pLI and HI, but how EXAC_pLI is designed to calculate the pLI for SNPs i just wanted to know if the number of EXAC_pLI is useful for SVs, but with your reply I tink that yes :D.
I think that I find a bug in annotSV, because I find in the annotations file that location of gene when the location is intronic normally in location2 is CDS, and this is not possible in my opinion, because CDS is coding region and intronic no... Even when we look the CDS_lenght in those cases are 0, so I think that value of location2 is not correct... Here an example:
AnnotSV_type Gene_name NM CDS_length tx_length location location2
split SCNN1D NM_001130413 0 57 intron5-intron5 CDS
my surprise is that tx_length have a value it means that touch som exome right??... maybe I'm loosing something.... How could you explain this ???
Thanks for your time
Jordi
from annotsv.
Hi Jordi,
Actually, no bug.
-
"location2=CDS" doesn't mean that a coding region is overlapped... but that a region between the CDS start and the CDS end is overlapped. In your example, the intron5 is located between the start and the end of the CDS. That's why location2 is set to "CDS".
-
To know if a coding redion (exons between the CDS start and the CDS end) is overlapped, you need to look at the CDS_length feature (in your example, CDS_length = 0. Correct as the overlapped region is in intron5).
Concerning your example, you can check the data of the transcript (NM_001130413 ) with the following command:
grep NM_001130413 $ANNOTSV/share/AnnotSV/Annotations_Human/Genes/GRCh37/genes.NM.sorted.bed
1 1215815 1227405 + SCNN1D NM_001130413 1216041 1226990 1215815,1216605,1216790,1217621,1219357,1220950,1221305,1222147,1222488,1222887,1223052,1223318,1225650,1225856,1226016,1226274,1226444,1226633, 1216046,1216677,1216990,1217695,1219470,1221044,1221658,1222355,1222679,1222976,1223216,1223417,1225768,1225935,1226074,1226333,1226520,1227405,
txStart = 1215815
txEnd = 1227405
CDSstart =1216041
CDSend = 1226990
intron5start = 1219470
intron5end = 1220950
Let me know if something is still unclear,
Best,
Véronique
from annotsv.
Hi Véronique,
Thanks a lot for your reply, so If the SV overlapped in intron region, and the intron is inside of CDS coordinates it means that the location2=CDS. For me, it is quite confusing, because the CDS does mean the exon region (no in this program concretely if not in general)??
I plotted my results obtained with AnnotSV, and I see that the majority of my SVs fall in intronic regions and CDS regions, that's why my concern... because the intron is not a gene coding region... or have any effect on the introns between CDS regions??
Sorry for too many questions, you help me a lot
Best,
Jordi
from annotsv.
Your understanding is now correct. Sorry for being confusing in the documentation.
I will add some explanations in the README file for a better understanding of the location and location2 features.
If the "CDS" term is confusing, do you have any other term proposition? I can modify it.
Currently, to know if your SV fall in CDS regions, please look only at the CDSlength. If different of 0, then CDS is overlapped.
from annotsv.
Hi Véronique,
yes we appreciated a detailed explanation about what does it mean each variable that you include, because exon*-exon* could be the juntion between to exons for example... If you detailed the variables of location and location2 could clarify the interpretations by users.
For me, if the location2 just put the coding regions, and intronic are not evolved in this regions, I will put a blank space or "no coding region" for example, because the intron, in this case, is not a coding gene region.
THanks a lot for your help!
Jordi
from annotsv.
Hi Véronique,
I just want to mention. The refseq that you take to annotate the SVs, contain the NM and NR transcripts, maybe the refseq consortium make new updates and now includes these ones... which are MIR genes and others. I tell you because I suspect that as you mention the column transcript NM, you want to analyse just the protein-coding genes, but now you include others than these ones.
I hope this it helps to improve the program
Best
Jordi
from annotsv.
Hi Véronique!
I hope you enjoy the summer and COVID doesn't effect the holidays too much...
I'm contacting you again because I've problems with the interpretation of location information.....
I don't know which is the difference between exon*-exon* and txstart-txend.... It means that some SV effect one exon only, and the second one effect all gene coding region (transcript)?? if this is true, what does it mean exon-txend? Something like last exon gene and to the end of transcript or gene??
My apologies for too many questions...
from annotsv.
Hi Jordi,
Regarding a gene, AnnotSV retrieves the following coordinates:
- txStart (same coordinates as exon1_start)
- exon1_start, exon2_start, exon3_start...
- exon1_end, exon2_end, exon3_end...
- txEnd (same coordinates as the last-exon_end)
- cdsStart / cdsEnd
The "location" feature corresponds to the SV location in the gene:
Values: txStart, txEnd, exon’i’, intron’i’
e.g. « txStart-exon1 »
It indicates whether the whole gene is overlapped by the SV (or only part of the gene).
For example, "exon3-exon5" indicates that the SV breakpoints are in exon3 and in exon5.
And "txStart-txEnd" indicates that the whole gene is overlapped by the SV.
Hope the "location" feature is clearer now...
from annotsv.
Hi,
Concerning what you mentioned recently:
The refseq that you take to annotate the SVs, contain the NM and NR transcripts, maybe the refseq
consortium make new updates and now includes these ones... which are MIR genes and others. I tell
you because I suspect that as you mention the column transcript NM, you want to analyse just the
protein-coding genes, but now you include others than these ones.
Actually, all the genes are of interest.
So I changed the names of the values for the "tx" option (currently only in the patch_AnnotSV branch):
NM >> RefSeq
ENST >> ENSEMBL
Thanks, it will be clearer now.
Best,
Véronique
from annotsv.
Related Issues (20)
- AnnotSV doesn't add the sample IDs causing variantconvert to fail HOT 13
- CNVs' ID modified in output file HOT 1
- AnnotSV Singularity on an HPC Error `command not found` HOT 24
- checkCOSMICfile update HOT 8
- Error with test Data during the install (requirements) HOT 5
- Installation of 3.6.6 is failing HOT 2
- Query on the allele frequency HOT 8
- Difference in results between "knotAnnotSV online" and "knotAnnotSV command-line" versions HOT 8
- VCF field format issue HOT 9
- Gnomad variant effect columns HOT 3
- [Improvement] Read HPO from phenopacket file HOT 1
- inversions lost HOT 2
- Conda (and container) version of AnnotSV runs into error when run with -hpo arg HOT 24
- gnomAD SV INS effect on ranking score / ACMG class HOT 2
- 'etc/AnnotSV/configfile' disable annotation function does not Work HOT 4
- SVinputFile too big HOT 5
- domain error: argument not in valid range HOT 11
- query on the results HOT 8
- gnomad V4 HOT 6
- PacBio pipeline structural variant annotations HOT 30
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from annotsv.