Comments (2)
Hello @mdmanurung
Thanks so much for choosing alevin-fry.
Shortly, either removing the ambiguous counts or splitting it 50/50 into spliced (S) and unspliced (U) counts is fine. These are what people usually do in their research.
TL;DR:
When we say a gene in a cell has an unspliced UMI, it means the splicing status of the mRNA molecule represented by this UMI is ambiguous; i.e., the reads of this UMI mapped equally well to some spliced transcripts and some introns of this gene. Therefore, when calculating the S/U ratio, the count of these ambiguous UMIs can be either ignored, because their splicing status is ambiguous; or split half-half into S and U counts because their reads mapped equally well to S and U.
More generally, this question relates to an active research question people are exploring now. That is, can we compare S and U counts directly without any transfer learning or domain adaptation? This is mainly because introns have internal poly-A stretches, and those stretches could become potential priming sites. If this happens, the priming mechanism of spliced transcripts (poly-A tail priming) might be totally different from that of unspliced transcripts (poly-A tail priming + internal poly-A priming). See this technical note from 10x and this paper.
In addition, one caveat in the spliced and unspliced count inferred by alevin-fry, and all other mainstream quantification tools, is that unspliced UMI counts are represented by intronic UMIs counts. However, as we know, unspliced transcripts also contain exons, which means we prefer to assign UMIs as spliced compared with unspliced ones. People do this because they (and we) want to include as many UMIs as possible in our (spliced) count matrix.
These are all the dark sides of the question. Nonetheless, if we assume that the assumptions held by single-cell are valid and the effect of these caveats is minor, simply removing ambiguous counts or splitting them 50/50 into S and U is fine.
Best,
Dongze
from alevin-fry.
Hi Dongze,
Thank you so much for the detailed answer. I'll need some time to let that sink in.
Best,
Mikhael
from alevin-fry.
Related Issues (20)
- Raw and filtered count data similar to cell ranger output.
- Unmaintained dependency used by alevin fry HOT 1
- Update documentation to include recommended processing for 10x scRNA 5' V2 HOT 2
- Feature request: Support for 10x "flex" fixed RNA data HOT 3
- alevin-fry not generating all required output files HOT 6
- technical limitation to bc length? HOT 2
- Alevin-fry for SMARt-seq3 data
- request for a tutorial using alevin-fry for multiome datasets
- Request for a decoy-aware index in alevin-fry (with a specific case) HOT 6
- Merging replicates with different permit lists HOT 2
- Using genotype based demultiplexing tools on alevin-fry output HOT 1
- Cannot get output HOT 2
- Don't correct barcodes HOT 1
- The barcode or umi spans multi reads HOT 7
- zero-length barcode HOT 2
- almost no genes detected
- CorrectedReads in featureDump.txt
- only 100 cells output from feature barcoding data HOT 19
- How to realize umi-tools directional algorithm in alevin-fry HOT 5
- ExitStatus(unix_wait_status(6)) HOT 24
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alevin-fry.