Hi Aaron, For a project with <a class="user-mention notranslate" dat

Returning in findMarkers() the t-statistics (or other pairwise statistics calculated),about marionilab/scran

LTLA commented on June 9, 2024

I would not be opposed to pairwiseTTests() emitting the t-statistic in its various DataFrames. But once it goes into combineMarkers(), there is no single statistic anymore; it all gets aggregated together, depending on the choice of aggregation method specified by pval.type=.

from scran.

LTLA commented on June 9, 2024

I forgot that, even within pairwiseTTests(), p-values are aggregated across blocks, so again there isn't necessarily a single t-statistic in those cases. I don't particularly fancy having a function that sometimes reports t-statistics and sometimes doesn't; that behavior seems too difficult to me.

Why do you need the t-statistics anyway? You can get very close with the standardized log-fold change; for any use of pairwiseTTests() with design=, Cohen's D only differs from a t-statistic by a constant scaling factor, so should be similarly applicable in any situation that I can imagine.

from scran.

lcolladotor commented on June 9, 2024

Hi Aaron,

Ahh yeah, I forgot about how the p-values are combined. Just to expand a bit, @mattntran wants the pairwise t-stats to then compare them across datasets like we do at http://research.libd.org/spatialLIBD/reference/layer_stat_cor.html in order to make plots like in http://research.libd.org/spatialLIBD/reference/layer_stat_cor_plot.html.

However, instead of computing these stats at the pseudo-bulk level like we did, he wants to compute them at the single nucleus level for this snRNA-seq data. That's why he is using scran::findMarkers().

Regarding the aggregation methods controlled by pval.type, maybe when scran chooses the min or max p-value, scran could report the corresponding t-statistic for that test. As for t-statistics across blocks, hm..., maybe it would need to be a data frame with 1 column per block. I recognize that this sounds a bit complicated for supporting all the uses cases (which again, I wasn't thinking about before, oops). So it's quite a bit of work.

Given the complexity of expanding this for the general use case, maybe Matt and I just need to hack our way through the code to get the pairwise t-stats and close this issue. Or like you said Aaron, use the standarized log-fold change (Matt, is there a reason I'm missing as to why the correlation works best using the t-stats? Hm... I guess that the constant scaling factor (variance) can be really different between data sets.)

Best,
Leo

from scran.

lcolladotor commented on June 9, 2024

Hi Aaron,

Looking in more detail, maybe we don't need to hack anything and simply use std.lfc = TRUE (@mattntran pointed out this argument to me and I see now documented at

scran/R/pairwiseTTests.R

Lines 73 to 76 in cbc535d

    
           #' If \code{std.lfc=TRUE}, the log-fold change for each gene is standardized by the variance. 
        
           #' When the Welch t-test is being used, this is equivalent to Cohen's d. 
        
           #' Standardized log-fold changes may be more appealing for visualization as it avoids large fold changes due to large variance. 
        
           #' The choice of \code{std.lfc} does not affect the calculation of the p-values.

). I see at that when lfc is 0 (the default), the t stat is calculated using

scran/R/pairwiseTTests.R

Line 437 in cbc535d

cur.t <- cur.lfc/sqrt(cur.err)

I see that .fit_lm_internal() is used when the design is specified

scran/R/pairwiseTTests.R

Lines 184 to 187 in cbc535d

    
           } else if (!is.null(design)) { 
        
               .fit_lm_internal(x, subset.row, groups, design=design, direction=direction, lfc=lfc,  
        
                   std.lfc=std.lfc, gene.names=gene.names, log.p=log.p, BPPARAM=BPPARAM) 
        
           } else {

(earlier I looked at .test_block_internal()). When std.lfc = TRUE, I see at

scran/R/pairwiseTTests.R

Lines 405 to 408 in cbc535d

    
           if (std.lfc) { 
        
               # Computing Cohen's D. 
        
               cur.lfc <- cur.lfc / sqrt(sigma2) 
        
           }

that Cohen's D is calculated which is similar to

scran/R/pairwiseTTests.R

Line 437 in cbc535d

cur.t <- cur.lfc/sqrt(cur.err)

. I just re-read https://en.wikipedia.org/wiki/Effect_size#Cohen's_d and I see that d = t / sqrt(N).

Earlier I didn't realize this and was thinking that regular logFC are not scaled by their variance and hence comparing them across datasets is hard. However, with Cohen's d we could do that.

Am I on the right track? As in use std.lfc = TRUE then either just correlate the standardized LFC (Cohen's d) across datasets and/or multiply Cohen's d by sqrt(N) without having to hack anything.

Best,
Leo

from scran.

LTLA commented on June 9, 2024

Yes, that sounds right. I would say that Cohen's D is even better for comparison across datasets, as it is not affected by whether one dataset has more or fewer cells than the other.

from scran.

lcolladotor commented on June 9, 2024

Ahh, nice, thanks Aaron!

from scran.

mattntran commented on June 9, 2024

Thanks for the feedback and insight, Aaron! I've started just running iterations with and without std.lfc = TRUE so we can have both observations to reference, and since the test runs really quickly 👍

from scran.

Returning in findMarkers() the t-statistics (or other pairwise statistics calculated) about scran HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	#' If \code{std.lfc=TRUE}, the log-fold change for each gene is standardized by the variance.
	#' When the Welch t-test is being used, this is equivalent to Cohen's d.
	#' Standardized log-fold changes may be more appealing for visualization as it avoids large fold changes due to large variance.
	#' The choice of \code{std.lfc} does not affect the calculation of the p-values.

	} else if (!is.null(design)) {
	.fit_lm_internal(x, subset.row, groups, design=design, direction=direction, lfc=lfc,
	std.lfc=std.lfc, gene.names=gene.names, log.p=log.p, BPPARAM=BPPARAM)
	} else {

	if (std.lfc) {
	# Computing Cohen's D.
	cur.lfc <- cur.lfc / sqrt(sigma2)
	}