Giter Site home page Giter Site logo

Comments (2)

PauBadiaM avatar PauBadiaM commented on June 16, 2024

Hi @wariobrega

Thanks for checking out the package! Sorry for the late response, I was on vacation.

This is a fantastic question, just so you know we don't even have a perfect solution to it. The short answer is that it really depends, here comes the long one:

In my opinion, if possible, the best way to estimate activities is at the contrast level. First you perform DEG between conditions, preferably at the cell type pseudobulk level if in single-cell, to obtain statistics at the gene level (can be logFC, t-values or anything else). The use of a statistical estimates as input makes the activity prediction robuster in theory and additionally, the fact that these go from negative to positive values allows methods like wsum to correctly estimate inhibiting sources (for example, if a repressor TF has its target genes with very low log-normalized values, meaning they are inhibited, it will get a low activity when it should be highly active, this is not a problem for methods based on linear models such as ulm or mlm though). The only downside of this is that you need enough replicates (number of samples) to be able to do it, I would say at least 3 for each group.

Inference of activity at the sample (or cell) level is also possible from the log-normalized counts, but then you might have to deal with noisy values (especially in single cell), which I would still do it for exploratory purposes. Another problem of this is that methods like wsum will not work correctly for some sources with negative edges like I explained before. One solution would be to scale the log-normalized counts (basically z-score them) in order to obtain positive and negative values. This works nice in bulk but not so much in single cell since there are many dropouts and all the genes with zeros will get assigned low negative values by default. To correct this we tried scaling only the non-zero values in single cell but the results where not that good. By the way, in case you are working with trajectories or cell fate in single-cell, another alternative is to use the velocity vectors as input for activity inference, this is something we are currently exploring.

Regarding the selection of a gene universe, be it the highly variable genes or any other of interest, I personally would be against it for activity inference, the more information that is available the better. The only time where I would do it is to speed up calculations in case you are working with a huge atlas but in the python version scalability shouldn't be a problem.

To sum up, you can use any gene statistic that you want as input for activity inference, but preferably you should use one that yields negative and positive values.

Hope this was helpful! Let me know if you have any more questons

from decoupler-py.

wariobrega avatar wariobrega commented on June 16, 2024

Dear @PauBadiaM ,

Thanks a lot for the nice reply, super helpful :)

I sent you another couple of questions in private that are more project-specific rather than general, so I close here the Issue :)

Thanks again,

Daniele

from decoupler-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.