Originally discussed here: <a class="issue-link js-issue-link"

Found the latest pLI data from gnomad as well: <a href="https://gnomad.broadinstit

Assess pLI in HPO genes about rare_disease_celltyping HOT 6 OPEN

bschilder commented on June 18, 2024

Assess pLI in HPO genes

from rare_disease_celltyping.

Comments (6)

bschilder commented on June 18, 2024

https://onlinelibrary.wiley.com/doi/10.1002/humu.23763

Found this paper quite helpful in understanding pLI (and its many shortcomings as a metric).

Some additional (or alternative) metrics to consider:

Variant Effect Predictor: now has a plugin for incorporating precomputed AlphaMissense scores! ensemblVEP is a bioc package for interfacing with VEP annotations.

from rare_disease_celltyping.

bschilder commented on June 18, 2024

Ok, so while the data in VEP is super useful, the VEP software to access these annotations is a hot garbage fire that is virtually uninstallable. I tried everything under the sun:
https://gist.github.com/bschilder/8a64d266e0e3ab18075274ad539985ac

However, I was able to extract the AlphaMissense predictions directly! Turns out they already computed per gene scores for the entire protein-coding genome here (see their README):

AlphaMissense_gene_hg19.tsv.gz, AlphaMissense_gene_hg38.tsv.gz
Gene-level average predictions, which were computed by taking the mean
alphamissense_pathogenicity over all possible missense variants in a transcript
(canonical transcript).

With a little extra postprocessing, I got the gene symbols:

 am <- data.table::fread("https://storage.googleapis.com/dm_alphamissense/AlphaMissense_gene_hg38.tsv.gz")
  am$enst_id <- stringr::str_split(am$transcript_id,"\\.", simplify = TRUE)[,1] 
  map <- orthogene::map_genes(genes = unique(am$enst_id), 
                              target = "ENST",
                              species="human",
                              drop_na = FALSE, 
                              mthreshold = Inf)
  am_mapped <- unique(map[,c("input","name")]) |>
    data.table::data.table(key = "input") |>
    data.table::merge.data.table(am, by.x = "input", by.y = "enst_id")
am_mapped

pLI is still problably worth looking at, but I think ML-based AlphaMissense metric circumvents many of the shortcomings of the rule-based pLI metric.

from rare_disease_celltyping.

bschilder commented on June 18, 2024

Found the latest pLI data from gnomad as well:
https://gnomad.broadinstitute.org/downloads/#v4-constraint

Importing that now for comparison with AlphaMissense.

readme <- suppressWarnings(
    readLines("https://storage.googleapis.com/gcp-public-data--gnomad/release/v4.0/constraint/README.txt")
  )
  pli <- data.table::fread("https://storage.googleapis.com/gcp-public-data--gnomad/release/v4.0/constraint/gnomad.v4.0.constraint_metrics.tsv")
  data.table::setorderv(pli, "mane_select",order=-1)

mane <- pli[mane_select==TRUE, lapply(.SD, mean, na.rm=TRUE), 
              .SDcols = is.numeric, by="gene"][, mane_select:=TRUE]
  pli_agg <- data.table::rbindlist(
    list(
      mane,
      pli[!gene %in% mane$gene, lapply(.SD, mean, na.rm=TRUE), 
          .SDcols = is.numeric, by="gene"][, mane_select:=FALSE]
    )
  )

from rare_disease_celltyping.

NathanSkene commented on June 18, 2024

How would an AI model give us population frequency? I thought current generation AI models of protein folding are also really bad at predicting variant effects?

from rare_disease_celltyping.

KittyMurphy commented on June 18, 2024

Just having a look into what I did previously when looking at pLI and genes under selective pressure.

I used the pLI for human transcripts from this study: The mutational constraint spectrum quantified from variation in 141,456 humans.

I'm attaching the relevant supplementary table:
supplementary_dataset_11_full_constraint_metrics.tsv.zip.

But as you've shared @bschilder, there is a more up to date version of pLI data.

from rare_disease_celltyping.

bschilder commented on June 18, 2024

How would an AI model give us population frequency? I thought current generation AI models of protein folding are also really bad at predicting variant effects?

@NathanSkene Are you talking about variant population frequency, phenotype frequency, or disease frequency?
In any case, none of these were the intended usage of pLI as outlined here:
#50 (comment)

Also see here for my explanation of why pLI would not be appropriate for estimating population prevalence. Instead, getting epidemiological stats on population prevalence would make much more sense.

neurogenomics/RareDiseasePrioritisation#34 (comment)

from rare_disease_celltyping.

Assess pLI in HPO genes about rare_disease_celltyping HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent