Describe your question Thanks once again for putting together thi

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? about decoupler-py HOT 4 CLOSED

saezlab commented on June 8, 2024

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference?

from decoupler-py.

Comments (4)

adamklie commented on June 8, 2024 1

So helpful. Thanks again for taking the time to explain, I really appreciate it!

from decoupler-py.

PauBadiaM commented on June 8, 2024

Hi @adamklie

Glad you find the package useful!

As you know decoupler takes an input matrix of gene expression (GEX) and a prior knowledge network (Net) which we transform into matrix format internally. In your case, your GEX is made out of the contrasts' statistics between conditions (if you only have one then it is only one row). When you run ulm, for each "sample" (in your case contrasts) and TF, you fit a univariate linear model. The response variable is the observed GEX (in your case the observed change between conditions) and the explanatory are the weights for that TF. Once the model is fitted, we extract the t-value of the fitted model as the "activity". If the genes that belong to a certain TF show increased activity when their weights are positive, the slope will be positive and we will have positive activities. If there is disagreement, for example genes with negative logFCs and positive weights, the slope will be negative.

To briefly answer your question, yes, we fit a separate model for each TF in ulm. However, in the case of mlm we fit one for each sample/contrast, where we include all TFs at the same time, thus the name "multivariate".

Hope this is helpful! Feel free to ask more questions if needed.

from decoupler-py.

adamklie commented on June 8, 2024

Thank you so much for the detailed explanation and the figure! Makes it very clear how this is working. Now I'm trying to think about when you might expect one to work better than the other. I would expect that many of the explanatory TFs would have correlated weights that might make it harder to fit a mlm, but that many target genes should be explained by the action of multiple TFs. I guess its not hard to try both and inspect the fit, but are there other considerations or sub-selections of the network or data that make sense before fitting the mlm?

from decoupler-py.

PauBadiaM commented on June 8, 2024

Very good points. In the end there is no free lunch, there is a tradeoff.

The advantage of ulm is that we don't care if two TFs have highly correlated weights because we are testing then independently, but you might get false positives and we are not accounting for TF interaction effects when modeling activities.

The advantage of mlm is that since it is multivariate, it accounts for these interaction effects when modeling activities but if the network is too co-linear it might break for those TFs. BTW, you can always check how co-linear a given net is for your data using:

dc.check_corr(net, mat=mat)

If you see that some TFs pairs have high correlations (> 0.9), you should definitely double check how the obtained activities look for these when using mlm.

Therefore, if you are not sure of which one to pick, you can always use the consensus score, which by default models activities based on the results of ulm, norm_wsum and mlm. In our benchmarks we have seen that depending on the data, one of these three methods outperforms the other two, but that their consensus is always the slightly better alternative.

Hope this is helpful!

from decoupler-py.

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? about decoupler-py HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent