Giter Site home page Giter Site logo

Rounding the matrix? about alevin-fry HOT 5 CLOSED

combine-lab avatar combine-lab commented on June 14, 2024
Rounding the matrix?

from alevin-fry.

Comments (5)

rob-p avatar rob-p commented on June 14, 2024

Hi @cnk113,

Thanks for the question. If you are running the pipeline we used throughout the pre-print, that is building a splici index, then mapping reads to the index with (--sketch) mode, and then quantifying in USA mode with the cr-like resolution, then the matrix counts should already be integral. The only case in which you'd have non-integer counts under USA mode quantification is if you are using the cr-like-em resolution method. In that case, it should be reasonable to round entries if your downstream tools require it.

If you are running in some other configuration, it's likely worth evaluating if you should instead adopt the splici index and USA mode quantification, given the benefits it confers. Anyway, I'm happy to answer any follow-up questions.

Best,
Rob

from alevin-fry.

cnk113 avatar cnk113 commented on June 14, 2024

Yeah, I was using w/ the multimapping parameters on. One more question, if I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices? Just want to clarify if there were any additional heuristics involved when quantifying those matrices separately.

from alevin-fry.

rob-p avatar rob-p commented on June 14, 2024

Hi @cnk113,

I'm not completely sure I understand your follow-up question:

If I were to use the output of the spliced/unspliced/ambi matrices would it be ~ the same as just manually adding matrices?

The USA mode output --- which consists of a cell x 3*gene size matrix, allocates the counts within each gene to a given splicing state. So, given this matrix, you can sum the columns (splicing states) to get the counts you desire for each gene. For example, in a single nucleus experiment, you likely want to sum all 3 (spliced + unspliced + ambiguous). In a single-cell experiment, you generally want to sum spliced + ambiguous. In an RNA-velocity experiment, you'd want to provide spliced+ambiguous as one matrix and unspliced as the other.

However, it is important that the splicing status are quantified together (i.e. in USA mode). This is because, in order to resolve the most likely origin of a read and UMI, you would like to consider all possible mappings of that UMI simultaneously. Therefore, there is information available to you if you look at the spliced and unspliced targets simultaneously that is lost if you look at them separately. This is why USA mode quantification counts UMIs for all splicing states at the same time and only separates them in the output matrix.

If you have further questions, please feel free to follow up.

Best,
Rob

from alevin-fry.

cnk113 avatar cnk113 commented on June 14, 2024

Hey Rob,

Sorry my mistake, I should've been more clear. I'm wondering on the quantification differences in USA matrix (specifically spliced) compared to a non USA mode for just spliced quantifications like in a conventional scRNA-seq run. Either way your explanation cleared up the confusion!

Thanks,
Chang

from alevin-fry.

rob-p avatar rob-p commented on June 14, 2024

Hi Chang,

Great --- that makes a lot of sense. So yes, the answer is that we generally recommend running in USA mode unless there is a particular reason it is infeasible, because just mapping against the spliced transcriptome can lead to an increased rate of spurious mapping. In particular, check out Table 1 and Figure 2 from the alevin-fry preprint.

Best,
Rob

from alevin-fry.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.