Comments (5)
Hi @cnk113,
Thanks for the question. If you are running the pipeline we used throughout the pre-print, that is building a splici index, then mapping reads to the index with (--sketch
) mode, and then quantifying in USA mode with the cr-like
resolution, then the matrix counts should already be integral. The only case in which you'd have non-integer counts under USA mode quantification is if you are using the cr-like-em
resolution method. In that case, it should be reasonable to round entries if your downstream tools require it.
If you are running in some other configuration, it's likely worth evaluating if you should instead adopt the splici index and USA mode quantification, given the benefits it confers. Anyway, I'm happy to answer any follow-up questions.
Best,
Rob
from alevin-fry.
Yeah, I was using w/ the multimapping parameters on. One more question, if I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices? Just want to clarify if there were any additional heuristics involved when quantifying those matrices separately.
from alevin-fry.
Hi @cnk113,
I'm not completely sure I understand your follow-up question:
If I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices?
The USA mode output --- which consists of a cell x 3*gene size matrix, allocates the counts within each gene to a given splicing state. So, given this matrix, you can sum the columns (splicing states) to get the counts you desire for each gene. For example, in a single nucleus experiment, you likely want to sum all 3 (spliced + unspliced + ambiguous). In a single-cell experiment, you generally want to sum spliced + ambiguous. In an RNA-velocity experiment, you'd want to provide spliced+ambiguous as one matrix and unspliced as the other.
However, it is important that the splicing status are quantified together (i.e. in USA mode). This is because, in order to resolve the most likely origin of a read and UMI, you would like to consider all possible mappings of that UMI simultaneously. Therefore, there is information available to you if you look at the spliced and unspliced targets simultaneously that is lost if you look at them separately. This is why USA mode quantification counts UMIs for all splicing states at the same time and only separates them in the output matrix.
If you have further questions, please feel free to follow up.
Best,
Rob
from alevin-fry.
Hey Rob,
Sorry my mistake, I should've been more clear. I'm wondering on the quantification differences in USA matrix (specifically spliced) compared to a non USA mode for just spliced quantifications like in a conventional scRNA-seq run. Either way your explanation cleared up the confusion!
Thanks,
Chang
from alevin-fry.
Hi Chang,
Great --- that makes a lot of sense. So yes, the answer is that we generally recommend running in USA mode unless there is a particular reason it is infeasible, because just mapping against the spliced transcriptome can lead to an increased rate of spurious mapping. In particular, check out Table 1 and Figure 2 from the alevin-fry preprint.
Best,
Rob
from alevin-fry.
Related Issues (20)
- Raw and filtered count data similar to cell ranger output.
- Unmaintained dependency used by alevin fry HOT 1
- Update documentation to include recommended processing for 10x scRNA 5' V2 HOT 2
- Feature request: Support for 10x "flex" fixed RNA data HOT 3
- alevin-fry not generating all required output files HOT 6
- technical limitation to bc length? HOT 2
- Alevin-fry for SMARt-seq3 data
- request for a tutorial using alevin-fry for multiome datasets
- Request for a decoy-aware index in alevin-fry (with a specific case) HOT 6
- Merging replicates with different permit lists HOT 2
- Using genotype based demultiplexing tools on alevin-fry output HOT 1
- Cannot get output HOT 2
- Don't correct barcodes HOT 1
- The barcode or umi spans multi reads HOT 7
- zero-length barcode HOT 2
- almost no genes detected
- CorrectedReads in featureDump.txt
- only 100 cells output from feature barcoding data HOT 19
- How to realize umi-tools directional algorithm in alevin-fry HOT 5
- ExitStatus(unix_wait_status(6)) HOT 24
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from alevin-fry.