Giter Site home page Giter Site logo

Comments (7)

LTLA avatar LTLA commented on June 9, 2024

I would not be opposed to pairwiseTTests() emitting the t-statistic in its various DataFrames. But once it goes into combineMarkers(), there is no single statistic anymore; it all gets aggregated together, depending on the choice of aggregation method specified by pval.type=.

from scran.

LTLA avatar LTLA commented on June 9, 2024

I forgot that, even within pairwiseTTests(), p-values are aggregated across blocks, so again there isn't necessarily a single t-statistic in those cases. I don't particularly fancy having a function that sometimes reports t-statistics and sometimes doesn't; that behavior seems too difficult to me.

Why do you need the t-statistics anyway? You can get very close with the standardized log-fold change; for any use of pairwiseTTests() with design=, Cohen's D only differs from a t-statistic by a constant scaling factor, so should be similarly applicable in any situation that I can imagine.

from scran.

lcolladotor avatar lcolladotor commented on June 9, 2024

Hi Aaron,

Ahh yeah, I forgot about how the p-values are combined. Just to expand a bit, @mattntran wants the pairwise t-stats to then compare them across datasets like we do at http://research.libd.org/spatialLIBD/reference/layer_stat_cor.html in order to make plots like in http://research.libd.org/spatialLIBD/reference/layer_stat_cor_plot.html.

However, instead of computing these stats at the pseudo-bulk level like we did, he wants to compute them at the single nucleus level for this snRNA-seq data. That's why he is using scran::findMarkers().

Regarding the aggregation methods controlled by pval.type, maybe when scran chooses the min or max p-value, scran could report the corresponding t-statistic for that test. As for t-statistics across blocks, hm..., maybe it would need to be a data frame with 1 column per block. I recognize that this sounds a bit complicated for supporting all the uses cases (which again, I wasn't thinking about before, oops). So it's quite a bit of work.

Given the complexity of expanding this for the general use case, maybe Matt and I just need to hack our way through the code to get the pairwise t-stats and close this issue. Or like you said Aaron, use the standarized log-fold change (Matt, is there a reason I'm missing as to why the correlation works best using the t-stats? Hm... I guess that the constant scaling factor (variance) can be really different between data sets.)

Best,
Leo

from scran.

lcolladotor avatar lcolladotor commented on June 9, 2024

Hi Aaron,

Looking in more detail, maybe we don't need to hack anything and simply use std.lfc = TRUE (@mattntran pointed out this argument to me and I see now documented at

scran/R/pairwiseTTests.R

Lines 73 to 76 in cbc535d

#' If \code{std.lfc=TRUE}, the log-fold change for each gene is standardized by the variance.
#' When the Welch t-test is being used, this is equivalent to Cohen's d.
#' Standardized log-fold changes may be more appealing for visualization as it avoids large fold changes due to large variance.
#' The choice of \code{std.lfc} does not affect the calculation of the p-values.
). I see at that when lfc is 0 (the default), the t stat is calculated using
cur.t <- cur.lfc/sqrt(cur.err)

I see that .fit_lm_internal() is used when the design is specified

scran/R/pairwiseTTests.R

Lines 184 to 187 in cbc535d

} else if (!is.null(design)) {
.fit_lm_internal(x, subset.row, groups, design=design, direction=direction, lfc=lfc,
std.lfc=std.lfc, gene.names=gene.names, log.p=log.p, BPPARAM=BPPARAM)
} else {
(earlier I looked at .test_block_internal()). When std.lfc = TRUE, I see at

scran/R/pairwiseTTests.R

Lines 405 to 408 in cbc535d

if (std.lfc) {
# Computing Cohen's D.
cur.lfc <- cur.lfc / sqrt(sigma2)
}
that Cohen's D is calculated which is similar to
cur.t <- cur.lfc/sqrt(cur.err)
. I just re-read https://en.wikipedia.org/wiki/Effect_size#Cohen's_d and I see that d = t / sqrt(N).

Earlier I didn't realize this and was thinking that regular logFC are not scaled by their variance and hence comparing them across datasets is hard. However, with Cohen's d we could do that.

Am I on the right track? As in use std.lfc = TRUE then either just correlate the standardized LFC (Cohen's d) across datasets and/or multiply Cohen's d by sqrt(N) without having to hack anything.

Best,
Leo

from scran.

LTLA avatar LTLA commented on June 9, 2024

Yes, that sounds right. I would say that Cohen's D is even better for comparison across datasets, as it is not affected by whether one dataset has more or fewer cells than the other.

from scran.

lcolladotor avatar lcolladotor commented on June 9, 2024

Ahh, nice, thanks Aaron!

from scran.

mattntran avatar mattntran commented on June 9, 2024

Thanks for the feedback and insight, Aaron! I've started just running iterations with and without std.lfc = TRUE so we can have both observations to reference, and since the test runs really quickly 👍

from scran.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.