hello everyone, First of all, thanks for developing decoupleR, I am

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Dear <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

which data to use to infer TF Activity about decoupler-py HOT 2 CLOSED

saezlab commented on June 16, 2024

which data to use to infer TF Activity

from decoupler-py.

Comments (2)

PauBadiaM commented on June 16, 2024

Hi @wariobrega

Thanks for checking out the package! Sorry for the late response, I was on vacation.

This is a fantastic question, just so you know we don't even have a perfect solution to it. The short answer is that it really depends, here comes the long one:

In my opinion, if possible, the best way to estimate activities is at the contrast level. First you perform DEG between conditions, preferably at the cell type pseudobulk level if in single-cell, to obtain statistics at the gene level (can be logFC, t-values or anything else). The use of a statistical estimates as input makes the activity prediction robuster in theory and additionally, the fact that these go from negative to positive values allows methods like wsum to correctly estimate inhibiting sources (for example, if a repressor TF has its target genes with very low log-normalized values, meaning they are inhibited, it will get a low activity when it should be highly active, this is not a problem for methods based on linear models such as ulm or mlm though). The only downside of this is that you need enough replicates (number of samples) to be able to do it, I would say at least 3 for each group.

Inference of activity at the sample (or cell) level is also possible from the log-normalized counts, but then you might have to deal with noisy values (especially in single cell), which I would still do it for exploratory purposes. Another problem of this is that methods like wsum will not work correctly for some sources with negative edges like I explained before. One solution would be to scale the log-normalized counts (basically z-score them) in order to obtain positive and negative values. This works nice in bulk but not so much in single cell since there are many dropouts and all the genes with zeros will get assigned low negative values by default. To correct this we tried scaling only the non-zero values in single cell but the results where not that good. By the way, in case you are working with trajectories or cell fate in single-cell, another alternative is to use the velocity vectors as input for activity inference, this is something we are currently exploring.

Regarding the selection of a gene universe, be it the highly variable genes or any other of interest, I personally would be against it for activity inference, the more information that is available the better. The only time where I would do it is to speed up calculations in case you are working with a huge atlas but in the python version scalability shouldn't be a problem.

To sum up, you can use any gene statistic that you want as input for activity inference, but preferably you should use one that yields negative and positive values.

Hope this was helpful! Let me know if you have any more questons

from decoupler-py.

wariobrega commented on June 16, 2024

Dear @PauBadiaM ,

Thanks a lot for the nice reply, super helpful :)

I sent you another couple of questions in private that are more project-specific rather than general, so I close here the Issue :)

Thanks again,

Daniele

from decoupler-py.

which data to use to infer TF Activity about decoupler-py HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent