Dear authors, I want to calculate the spliced/unspliced gene ratio b

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Calculating spliced/unpliced ratio and what to do with the ambigous count about alevin-fry HOT 2 CLOSED

combine-lab commented on June 14, 2024

Calculating spliced/unpliced ratio and what to do with the ambigous count

from alevin-fry.

Comments (2)

DongzeHE commented on June 14, 2024

Hello @mdmanurung

Thanks so much for choosing alevin-fry.

Shortly, either removing the ambiguous counts or splitting it 50/50 into spliced (S) and unspliced (U) counts is fine. These are what people usually do in their research.

TL;DR:
When we say a gene in a cell has an unspliced UMI, it means the splicing status of the mRNA molecule represented by this UMI is ambiguous; i.e., the reads of this UMI mapped equally well to some spliced transcripts and some introns of this gene. Therefore, when calculating the S/U ratio, the count of these ambiguous UMIs can be either ignored, because their splicing status is ambiguous; or split half-half into S and U counts because their reads mapped equally well to S and U.

More generally, this question relates to an active research question people are exploring now. That is, can we compare S and U counts directly without any transfer learning or domain adaptation? This is mainly because introns have internal poly-A stretches, and those stretches could become potential priming sites. If this happens, the priming mechanism of spliced transcripts (poly-A tail priming) might be totally different from that of unspliced transcripts (poly-A tail priming + internal poly-A priming). See this technical note from 10x and this paper.

In addition, one caveat in the spliced and unspliced count inferred by alevin-fry, and all other mainstream quantification tools, is that unspliced UMI counts are represented by intronic UMIs counts. However, as we know, unspliced transcripts also contain exons, which means we prefer to assign UMIs as spliced compared with unspliced ones. People do this because they (and we) want to include as many UMIs as possible in our (spliced) count matrix.

These are all the dark sides of the question. Nonetheless, if we assume that the assumptions held by single-cell are valid and the effect of these caveats is minor, simply removing ambiguous counts or splitting them 50/50 into S and U is fine.

Best,
Dongze

from alevin-fry.

mdmanurung commented on June 14, 2024

Hi Dongze,

Thank you so much for the detailed answer. I'll need some time to let that sink in.

Best,
Mikhael

from alevin-fry.

Recommend Projects

Calculating spliced/unpliced ratio and what to do with the ambigous count about alevin-fry HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent