Comments (4)
So helpful. Thanks again for taking the time to explain, I really appreciate it!
from decoupler-py.
Hi @adamklie
Glad you find the package useful!
As you know decoupler takes an input matrix of gene expression (GEX) and a prior knowledge network (Net) which we transform into matrix format internally. In your case, your GEX is made out of the contrasts' statistics between conditions (if you only have one then it is only one row). When you run ulm
, for each "sample" (in your case contrasts) and TF, you fit a univariate linear model. The response variable is the observed GEX (in your case the observed change between conditions) and the explanatory are the weights for that TF. Once the model is fitted, we extract the t-value of the fitted model as the "activity". If the genes that belong to a certain TF show increased activity when their weights are positive, the slope will be positive and we will have positive activities. If there is disagreement, for example genes with negative logFCs and positive weights, the slope will be negative.
To briefly answer your question, yes, we fit a separate model for each TF in ulm
. However, in the case of mlm
we fit one for each sample/contrast, where we include all TFs at the same time, thus the name "multivariate".
Hope this is helpful! Feel free to ask more questions if needed.
from decoupler-py.
Thank you so much for the detailed explanation and the figure! Makes it very clear how this is working. Now I'm trying to think about when you might expect one to work better than the other. I would expect that many of the explanatory TFs would have correlated weights that might make it harder to fit a mlm,
but that many target genes should be explained by the action of multiple TFs. I guess its not hard to try both and inspect the fit, but are there other considerations or sub-selections of the network or data that make sense before fitting the mlm
?
from decoupler-py.
Very good points. In the end there is no free lunch, there is a tradeoff.
The advantage of ulm
is that we don't care if two TFs have highly correlated weights because we are testing then independently, but you might get false positives and we are not accounting for TF interaction effects when modeling activities.
The advantage of mlm
is that since it is multivariate, it accounts for these interaction effects when modeling activities but if the network is too co-linear it might break for those TFs. BTW, you can always check how co-linear a given net is for your data using:
dc.check_corr(net, mat=mat)
If you see that some TFs pairs have high correlations (> 0.9), you should definitely double check how the obtained activities look for these when using mlm
.
Therefore, if you are not sure of which one to pick, you can always use the consensus
score, which by default models activities based on the results of ulm
, norm_wsum
and mlm
. In our benchmarks we have seen that depending on the data, one of these three methods outperforms the other two, but that their consensus is always the slightly better alternative.
Hope this is helpful!
from decoupler-py.
Related Issues (20)
- run_ora_df HOT 2
- Loading resources for mouse is not working HOT 9
- Flag for sorting of gene names HOT 4
- decoupler.plot_barplot ax HOT 3
- EOFError: Compressed file ended before the end-of-stream marker was reached HOT 2
- Nan values in benchmarking pipeline still give error - Decoupler v 1.6 HOT 2
- copy.deepcopy() in get_contrast() HOT 2
- Failed to download `CollecTRI` from OmniPath. Invalid value `loops` for `InteractionsQuery`. AttributeError: module 'omnipath' has no attribute 'static' HOT 1
- Error while translating human to mouse MSigDB HOT 6
- Cannot convert gene symbol to rat HOT 5
- Verbose output from bencharking pipeline seems to say 'for 25 unique sources' despite more sources HOT 2
- PLOT issues HOT 2
- dc.run_mlm only returning data for 1 source HOT 2
- run_ulm on TF with only positively regulated target genes HOT 2
- Is run_aucell thread-safe to run in a multi-threaded script? HOT 3
- plot_network function top target HOT 1
- why ulm (and not mlm) for transription factor activities ? there must be some overlaps HOT 1
- Progeny pvals HOT 1
- Problem when using dc.get_contrast() HOT 2
- Limiting usage of cores or threads HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decoupler-py.