Comments (2)
Hi @wariobrega
Thanks for checking out the package! Sorry for the late response, I was on vacation.
This is a fantastic question, just so you know we don't even have a perfect solution to it. The short answer is that it really depends, here comes the long one:
In my opinion, if possible, the best way to estimate activities is at the contrast level. First you perform DEG between conditions, preferably at the cell type pseudobulk level if in single-cell, to obtain statistics at the gene level (can be logFC, t-values or anything else). The use of a statistical estimates as input makes the activity prediction robuster in theory and additionally, the fact that these go from negative to positive values allows methods like wsum
to correctly estimate inhibiting sources (for example, if a repressor TF has its target genes with very low log-normalized values, meaning they are inhibited, it will get a low activity when it should be highly active, this is not a problem for methods based on linear models such as ulm
or mlm
though). The only downside of this is that you need enough replicates (number of samples) to be able to do it, I would say at least 3 for each group.
Inference of activity at the sample (or cell) level is also possible from the log-normalized counts, but then you might have to deal with noisy values (especially in single cell), which I would still do it for exploratory purposes. Another problem of this is that methods like wsum
will not work correctly for some sources with negative edges like I explained before. One solution would be to scale the log-normalized counts (basically z-score them) in order to obtain positive and negative values. This works nice in bulk but not so much in single cell since there are many dropouts and all the genes with zeros will get assigned low negative values by default. To correct this we tried scaling only the non-zero values in single cell but the results where not that good. By the way, in case you are working with trajectories or cell fate in single-cell, another alternative is to use the velocity vectors as input for activity inference, this is something we are currently exploring.
Regarding the selection of a gene universe, be it the highly variable genes or any other of interest, I personally would be against it for activity inference, the more information that is available the better. The only time where I would do it is to speed up calculations in case you are working with a huge atlas but in the python version scalability shouldn't be a problem.
To sum up, you can use any gene statistic that you want as input for activity inference, but preferably you should use one that yields negative and positive values.
Hope this was helpful! Let me know if you have any more questons
from decoupler-py.
Dear @PauBadiaM ,
Thanks a lot for the nice reply, super helpful :)
I sent you another couple of questions in private that are more project-specific rather than general, so I close here the Issue :)
Thanks again,
Daniele
from decoupler-py.
Related Issues (20)
- run_ora_df HOT 2
- Loading resources for mouse is not working HOT 9
- Differential expression error in pseudo-bulk step HOT 4
- shuffle_nets function produces networks with repeated edges HOT 1
- Module request: UCell signatures HOT 2
- use of the run_gsva method : format of the net argument HOT 1
- Switching to conda forge HOT 7
- Problems running decoupleR with Compressed Sparse Column (csc) count matrix HOT 2
- Error downlaoding progeny model for mouse species HOT 3
- shuffle_net output not random HOT 1
- method run_gsea() error : SystemError: CPUDispatcher(<function nb_gsea at 0x7f7477d3b9c0>) returned a result with an exception set HOT 6
- Pseudobulk for each sample HOT 2
- Method dc.run_gsva error HOT 2
- Get gene markers used to annotate the cell type HOT 2
- dc.get_progeny(organism = 'mouse') fails with ImportError HOT 5
- ValueError: Invalid value `loops` for `InteractionsQuery` in dc.get_collectri() HOT 2
- Functional PB Tutorial fails at dc.plot_associations HOT 5
- dc.get_collectri() does not work HOT 2
- Announcement: some Galaxy modules for some decoupler functionalities HOT 1
- Limiting usage of cores or threads HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from decoupler-py.