Comments (7)
I would not be opposed to pairwiseTTests()
emitting the t-statistic in its various DataFrame
s. But once it goes into combineMarkers()
, there is no single statistic anymore; it all gets aggregated together, depending on the choice of aggregation method specified by pval.type=
.
from scran.
I forgot that, even within pairwiseTTests()
, p-values are aggregated across blocks, so again there isn't necessarily a single t-statistic in those cases. I don't particularly fancy having a function that sometimes reports t-statistics and sometimes doesn't; that behavior seems too difficult to me.
Why do you need the t-statistics anyway? You can get very close with the standardized log-fold change; for any use of pairwiseTTests()
with design=
, Cohen's D only differs from a t-statistic by a constant scaling factor, so should be similarly applicable in any situation that I can imagine.
from scran.
Hi Aaron,
Ahh yeah, I forgot about how the p-values are combined. Just to expand a bit, @mattntran wants the pairwise t-stats to then compare them across datasets like we do at http://research.libd.org/spatialLIBD/reference/layer_stat_cor.html in order to make plots like in http://research.libd.org/spatialLIBD/reference/layer_stat_cor_plot.html.
However, instead of computing these stats at the pseudo-bulk level like we did, he wants to compute them at the single nucleus level for this snRNA-seq data. That's why he is using scran::findMarkers()
.
Regarding the aggregation methods controlled by pval.type
, maybe when scran
chooses the min or max p-value, scran
could report the corresponding t-statistic for that test. As for t-statistics across blocks, hm..., maybe it would need to be a data frame with 1 column per block. I recognize that this sounds a bit complicated for supporting all the uses cases (which again, I wasn't thinking about before, oops). So it's quite a bit of work.
Given the complexity of expanding this for the general use case, maybe Matt and I just need to hack our way through the code to get the pairwise t-stats and close this issue. Or like you said Aaron, use the standarized log-fold change (Matt, is there a reason I'm missing as to why the correlation works best using the t-stats? Hm... I guess that the constant scaling factor (variance) can be really different between data sets.)
Best,
Leo
from scran.
Hi Aaron,
Looking in more detail, maybe we don't need to hack anything and simply use std.lfc = TRUE
(@mattntran pointed out this argument to me and I see now documented at
Lines 73 to 76 in cbc535d
lfc
is 0 (the default), the t
stat is calculated using Line 437 in cbc535d
I see that .fit_lm_internal()
is used when the design
is specified
Lines 184 to 187 in cbc535d
.test_block_internal()
). When std.lfc = TRUE
, I see at Lines 405 to 408 in cbc535d
Line 437 in cbc535d
d = t / sqrt(N)
.
Earlier I didn't realize this and was thinking that regular logFC are not scaled by their variance and hence comparing them across datasets is hard. However, with Cohen's d we could do that.
Am I on the right track? As in use std.lfc = TRUE
then either just correlate the standardized LFC (Cohen's d) across datasets and/or multiply Cohen's d by sqrt(N)
without having to hack anything.
Best,
Leo
from scran.
Yes, that sounds right. I would say that Cohen's D is even better for comparison across datasets, as it is not affected by whether one dataset has more or fewer cells than the other.
from scran.
Ahh, nice, thanks Aaron!
from scran.
Thanks for the feedback and insight, Aaron! I've started just running iterations with and without std.lfc = TRUE
so we can have both observations to reference, and since the test runs really quickly 👍
from scran.
Related Issues (20)
- Different list for findmarker function HOT 1
- scran normalize HOT 1
- Depreciated scran functions and commands. Any updated tutorial?
- Error normalizing Anndata
- Old versions disappeared from bioconductor? HOT 3
- cant load scran HOT 6
- Error in FUN(x, table, nomatch = nomatch, incomparables = incomparables) : 'match' requires vector arguments HOT 2
- DEG results only contain downregulated gene HOT 2
- `scoreMarkers()`: Parallelisation changes `rank.*` statistics HOT 2
- Marker detection in multimodal data HOT 1
- sizeFactors for mixed patient samples HOT 2
- `scran::findMarkers` always return all available markers as differential, albeit with different p-values HOT 4
- Integer Overflow HOT 2
- rank.logFC.detected Produces Dubious Rankings HOT 4
- error using scaledColRanks HOT 3
- Add lfc= option back to scoreMarkers
- Error in (function (classes, fdef, mtable) : unable to find an inherited method for function ‘assay’ for signature ‘"dgRMatrix", "character"’ HOT 1
- BiocParallel errror (invalid length of row names) in combineMarkers() with full.stats = TRUE HOT 2
- Cyclone not finishing when using BPPARAM=MulticoreParam(workers = n))
- Receiving 'ref.clust' not in 'clusters' for cluster label that is part of predefined clusters HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scran.