Hello, I was wondering if rounding the quantification be justifiable

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Rounding the matrix? about alevin-fry HOT 5 CLOSED

combine-lab commented on June 14, 2024

Rounding the matrix?

from alevin-fry.

Comments (5)

rob-p commented on June 14, 2024

Hi @cnk113,

Thanks for the question. If you are running the pipeline we used throughout the pre-print, that is building a splici index, then mapping reads to the index with (--sketch) mode, and then quantifying in USA mode with the cr-like resolution, then the matrix counts should already be integral. The only case in which you'd have non-integer counts under USA mode quantification is if you are using the cr-like-em resolution method. In that case, it should be reasonable to round entries if your downstream tools require it.

If you are running in some other configuration, it's likely worth evaluating if you should instead adopt the splici index and USA mode quantification, given the benefits it confers. Anyway, I'm happy to answer any follow-up questions.

Best,
Rob

from alevin-fry.

cnk113 commented on June 14, 2024

Yeah, I was using w/ the multimapping parameters on. One more question, if I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices? Just want to clarify if there were any additional heuristics involved when quantifying those matrices separately.

from alevin-fry.

rob-p commented on June 14, 2024

Hi @cnk113,

I'm not completely sure I understand your follow-up question:

If I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices?

The USA mode output --- which consists of a cell x 3*gene size matrix, allocates the counts within each gene to a given splicing state. So, given this matrix, you can sum the columns (splicing states) to get the counts you desire for each gene. For example, in a single nucleus experiment, you likely want to sum all 3 (spliced + unspliced + ambiguous). In a single-cell experiment, you generally want to sum spliced + ambiguous. In an RNA-velocity experiment, you'd want to provide spliced+ambiguous as one matrix and unspliced as the other.

However, it is important that the splicing status are quantified together (i.e. in USA mode). This is because, in order to resolve the most likely origin of a read and UMI, you would like to consider all possible mappings of that UMI simultaneously. Therefore, there is information available to you if you look at the spliced and unspliced targets simultaneously that is lost if you look at them separately. This is why USA mode quantification counts UMIs for all splicing states at the same time and only separates them in the output matrix.

If you have further questions, please feel free to follow up.

Best,
Rob

from alevin-fry.

cnk113 commented on June 14, 2024

Hey Rob,

Sorry my mistake, I should've been more clear. I'm wondering on the quantification differences in USA matrix (specifically spliced) compared to a non USA mode for just spliced quantifications like in a conventional scRNA-seq run. Either way your explanation cleared up the confusion!

Thanks,
Chang

from alevin-fry.

rob-p commented on June 14, 2024

Hi Chang,

Great --- that makes a lot of sense. So yes, the answer is that we generally recommend running in USA mode unless there is a particular reason it is infeasible, because just mapping against the spliced transcriptome can lead to an increased rate of spurious mapping. In particular, check out Table 1 and Figure 2 from the alevin-fry preprint.

Best,
Rob

from alevin-fry.

Rounding the matrix? about alevin-fry HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent