saezlab / decoupler-py Goto Github PK

Python package to perform enrichment analysis from omics data.

Home Page: https://decoupler-py.readthedocs.io/

License: GNU General Public License v3.0

Python 99.93% Dockerfile 0.07%

python data-science bioinformatics transcriptomics numba single-cell spatial-transcriptomics enrichment enrichment-analysis

decoupler-py's Introduction

decoupler - Ensemble of methods to infer biological activities

decoupler is a package containing different enrichment statistical methods to extract biological activities from omics data within a unified framework. This is its faster and memory efficient Python implementation, for the R version go here.

For further information and example tutorials, please check our documentation.

If you have any question or problem do not hesitate to open an issue.

Installation

decoupler can be installed from pip (lightweight installation)::

pip install decoupler

It can also be installed from conda and mamba (this includes extra dependencies):

mamba create -n=decoupler conda-forge::decoupler-py

Alternatively, to stay up-to-date with the newest unreleased version, install from source:

pip install git+https://github.com/saezlab/decoupler-py.git

scverse

decoupler is part of the scverse ecosystem, a collection of tools for single-cell omics data analysis in python. For more information check the link.

License

Footprint methods inside decoupler can be used for academic or commercial purposes, except viper which holds a non-commercial license.

The data redistributed by OmniPath does not have a license, each original resource carries their own. Here one can find the license information of all the resources in OmniPath.

Citation

Badia-i-Mompel P., Vélez Santiago J., Braunger J., Geiss C., Dimitrov D., Müller-Dott S., Taus P., Dugourd A., Holland C.H., Ramirez Flores R.O. and Saez-Rodriguez J. 2022. decoupleR: Ensemble of computational methods to infer biological activities from omics data. Bioinformatics Advances. https://doi.org/10.1093/bioadv/vbac016

decoupler-py's People

Contributors

Stargazers

Watchers

decoupler-py's Issues

Weird reindexing

Hello there,
thank you very much for sharing decoupler-py and my compliments for this tool.

I am following the tutorial on Bulk functional analysis and I didn't have any problems until when I ran the following:

dc.plot_targets(results_df,stat='stat', source_name='Jund',net=collectri,top=15)

I am receiving this error message :

_InvalidIndexError: Reindexing only valid with uniquely valued Index objects

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/decoupler/plotting.py:338, in plot_targets(data, stat, source_name, net, source, target, weight, top, figsize, dpi, ax, return_fig, save)
    335 w = net[net[source] == source_name].set_index(target)[[weight]]
    337 # Join
--> 338 data = pd.concat([data, w], axis=1, join='inner')
    340 # Define activation/inhibition color
    341 pos = ((data[weight] >= 0) & (data[stat] >= 0)) | ((data[weight] < 0) & (data[stat] < 0))_

Is there a way to deal with duplicated indexes and preserve the multiplicity of TF's targets?

I set up the conda env by using the latest git repo.

Are there missing pathways in REACTOME?

Hi,

Thanks for the great resource.
I am transitioning my workflows to python based tools and decoupler is great. However, I am getting a few discrepant results from my analysis in R using XGR.

The main reason appears to be a few pathways are missing in the 'reactome_pathways' collection in decoupler. An example is Translocation_of_ZAP-70_at_immunological_synapse.

I was just wondering if these databases were pre-trimmed down?

Many thanks

Please push version 1.4.0 to pypi, only 1.3.4 is available

Hi there,

thanks for this toolkit, I was trying your pseudo bulk methods from the latest pypi, but it is missing certain things compared to the docs. I noticed then that the version on pip is not the latest release (1.4.0) but only 1.3.4, could please push 1.4.0.

Thanks!

which data to use to infer TF Activity

hello everyone,

First of all, thanks for developing decoupleR, I am using it in my project and I am finding it extremely useful.

I have more of a theoretical question regarding the TF/Pathway activity inference with the methods that you wrapped and ported to python.

Which data would you recommend to use for inferring The activity? at the moment I am using log-normalized data of all genes that are expressed in the cells (no highly variable selection, in order to get as many regulons as possible.

I don't know however if this is the correct approach. Could you shed some light on this issue?

Thanks again and keep up with the excellent work!

Daniele

Pseudobulk issue?

Hello,

I'm running into an issue when I try to generate a pseudobulk dataframe.

I'm currently running the code:

padata = dc.get_pseudobulk(adata, sample_col='sample', groups_col='bulk_labels', min_prop=0.01, min_smpls=1)

Where both sample_col and groups_col are found in the obs of my adata.

I'm receiving the error below when I run the code:

ValueError: could not broadcast input array from shape (378,17) into shape (17,)

I know I've set the min_prop and min_smpls low, but that was just out of initial testing/exploration.

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference?

Describe your question
Thanks once again for putting together this package. I've found it super usable and useful so far. I just have hopefully a quick question about the underlying method for analysis of bulk RNA-seq data.

I'm getting some awesome results from fitting a univariate linear model to log fold changes (LFC) from DESeq2 using the DoRothEA network, but I don't really understand how I'm getting these TF activities. I follow the general idea that "the observed molecular readouts in mat are the response variable and the regulator weights in net are the explanatory ones," but I guess I don't really understand what these are in my specific context and how I end up with an activity score for each TF. I'm not sure I even follow what each data point is in this context. Am I fitting a separate model for each TF?

If someone could help me understand this better or point me to the details (that I apologize in advance if I missed them) that would be amazing!

Regarding the function 'get_metadata_association'

Hi,

I am new to decoupler and am now trying to go through the tutorial of pseudobulk analysis. However, I encountered a problem at step 12 where the tutorial code looks like this:
dc.get_metadata_associations(pp_pdata,
obs_keys = ['sex', 'disease', 'cell_type', 'psbulk_n_cells', 'psbulk_counts'],
obsm_key='X_pca', #where the PCs are stored
uns_key='pca_anova', #where the results are stored
inplace=True)

And it returned with the error
AttributeError: module 'decoupler' has no attribute 'get_metadata_associations'

I am using ver 1.4.0 of decoupler.

I searched the doc but it seems to me that we really don't have a function named get_metadata_associations. This maybe a dumb question but what shall I do to correct the code? Is the function obsolete?

Error importing Progeny

Hi guys!

Congrats for this cool resource :). Very helpful to have the resources available both in Python and R.

I am having the following issue when trying to import progeny (dc.get_dorothea works fine for me):

model = dc.get_progeny(organism='human', top=100)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [69], in <cell line: 1>()
----> 1 model = dc.get_progeny(organism='human', top=100)

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/decoupler/omnip.py:40, in get_progeny(organism, top)
     38 p = op.requests.Annotations.get(resources='PROGENy')
     39 p = p.set_index(['record_id', 'uniprot', 'genesymbol', 'entity_type', 'source', 'label'])
---> 40 p = p.unstack('label').droplevel(axis=1, level=0).reset_index()
     41 p.columns.name = None
     42 p = p[['genesymbol', 'p_value', 'pathway', 'weight']]

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/pandas/core/frame.py:5016, in DataFrame.reset_index(self, level, drop, inplace, col_level, col_fill)
   5014         # to ndarray and maybe infer different dtype
   5015         level_values = maybe_casted_values(lev, lab)
-> 5016         new_obj.insert(0, name, level_values)
   5018 new_obj.index = new_index
   5019 if not inplace:

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/pandas/core/frame.py:3763, in DataFrame.insert(self, loc, column, value, allow_duplicates)
   3761 self._ensure_valid_index(value)
   3762 value = self._sanitize_column(column, value, broadcast=False)
-> 3763 self._mgr.insert(loc, column, value, allow_duplicates=allow_duplicates)

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/pandas/core/internals/managers.py:1197, in BlockManager.insert(self, loc, item, value, allow_duplicates)
   1194     raise TypeError("loc must be int")
   1196 # insert to the axis; this could possibly raise a TypeError
-> 1197 new_axis = self.items.insert(loc, item)
   1199 if value.ndim == self.ndim - 1 and not is_extension_array_dtype(value.dtype):
   1200     # TODO(EA2D): special case not needed with 2D EAs
   1201     value = safe_reshape(value, (1,) + value.shape)

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/pandas/core/indexes/extension.py:379, in NDArrayBackedExtensionIndex.insert(self, loc, item)
    361 """
    362 Make new Index inserting new item at location. Follows
    363 Python list.append semantics for negative values.
   (...)
    376 ValueError if the item is not valid for this dtype.
    377 """
    378 arr = self._data
--> 379 code = arr._validate_scalar(item)
    381 new_vals = np.concatenate((arr._ndarray[:loc], [code], arr._ndarray[loc:]))
    382 new_arr = arr._from_backing_data(new_vals)

File /projects/site/pred/SpatialOmics/ST_pipeline/envs/besca_st/lib/python3.8/site-packages/pandas/core/arrays/categorical.py:1251, in Categorical._validate_fill_value(self, fill_value)
   1249     fill_value = self._unbox_scalar(fill_value)
   1250 else:
-> 1251     raise TypeError(
   1252         f"'fill_value={fill_value}' is not present "
   1253         "in this Categorical's categories"
   1254     )
   1255 return fill_value

TypeError: 'fill_value=source' is not present in this Categorical's categories

Any idea?

Many thanks!
Alberto

enr_pvals Ora method

Hello there,

it's still me. I apologize again for opening another issue. I promise this is the last one.
well, I was trying to run the ora analysis (just to get an overview of the method) and I got stuck in a weird stall situation, in which:

enr_pvals = dc.get_ora_df(
    df=top_genes,
    net=msigdb,
    groupby='group',
    features='gene symbol',   #I just changed GeneName to gene symbol
    source='geneset',
    target='genesymbol'
)
enr_pvals

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[42], line 2
      1 # Run ora
----> 2 enr_pvals = dc.get_ora_df(
      3     df=top_genes,
      4     net=msigdb,
      5     groupby='group',
      6     features='gene symbol',
      7     source='geneset',
      8     target='genesymbol'
      9 )
     11 enr_pvals

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/decoupler/method_ora.py:150, in get_ora_df(df, net, groupby, features, source, target, n_background, min_n, verbose)
    148 if features not in cols:
    149     raise ValueError('Column name "{0}" for features not found in df. Please specify a valid column.'.format(features))
--> 150 c = np.unique(df[features].values)
    152 # Transform net
    153 net = rename_net(net, source=source, target=target, weight=None)

File <__array_function__ internals>:200, in unique(*args, **kwargs)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/numpy/lib/arraysetops.py:274, in unique(ar, return_index, return_inverse, return_counts, axis, equal_nan)
    272 ar = np.asanyarray(ar)
    273 if axis is None:
--> 274     ret = _unique1d(ar, return_index, return_inverse, return_counts, 
    275                     equal_nan=equal_nan)
    276     return _unpack_tuple(ret)
    278 # axis was specified and not None

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/numpy/lib/arraysetops.py:336, in _unique1d(ar, return_index, return_inverse, return_counts, equal_nan)
    334     aux = ar[perm]
    335 else:
--> 336     ar.sort()
    337     aux = ar
    338 mask = np.empty(aux.shape, dtype=np.bool_)

**TypeError: '<' not supported between instances of 'float' and 'str**

Any idea?
Best,

Andrea

Check other options besides csr_matrix in check_X inside get_pseudobulk

Describe the bug
Some anndata objects contain in .X another scipy sparse matrices such as csc (compressed sparse column) which cause an error from the np.isfinite function in check_X(X, mode=mode, skip_checks=skip_checks) that can't be skipped in get_pseudobulk.

To Reproduce
In your pseudobulk vignette for example do:
adata.layers['counts'] = adata.X.tocsc(copy = True)

Then just run the pseudobulk function and you will get the error

Expected behavior
I recommend either expanding the checks with isinstance(sc_dat.X, csc_matrix) or explicitly mention that you expect csr_matrix

How do you use the run_gsea method with an expression data frame (like a DESeq2 result)?

Hi! I'm trying to use the run_gsea method with similar inputs to get_ora_df:

This works:

dc.get_ora_df(expression_df, msigdb_hm, groupby='group', features='gene_symbols', source='geneset', target='genesymbol', min_n=3)

but this:

gsea_hm_low_exp = dc.run_gsea(expression_df, net=msigdb_hm, source="geneset", target='genesymbol')

doesn't. I suspect get_ora_df is doing something with the inputs before actually calling ora, which I guess is needed in a similar way here?

I don't see any paramters to specify the stat to use for ranking or the index field for the expression data data frame, which looks like this:

I couldn't find any examples the the readthedocs for run_gsea either. Thanks!

Pseudobulk looses some samples

Describe the bug
When creating a pseudobulk object from an AnnData object that contains negative values, some samples are silently excluded.

While for DE analysis, it might not be intended to use negative values, when aggregating scaled values by mean I think this should still work.

To Reproduce

import decoupler as dc
import scanpy as sc
import numpy as np

np.random.seed(42)

adata = sc.datasets.pbmc3k()
# assign random sample id
adata.obs["sample_id"] = np.random.randint(low=1, high=6, size=adata.shape[0]).astype(str)
sc.pp.scale(adata)

pb = dc.get_pseudobulk(adata, sample_col="sample_id", groups_col=None, mode="mean", skip_checks=True)

adata.obs["sample_id"].unique()
# array(['4', '5', '3', '2', '1'], dtype=object)

pb
# AnnData object with n_obs × n_vars = 3 × 16634
#    obs: 'sample_id', 'psbulk_n_cells', 'psbulk_counts'
#    var: 'gene_ids', 'mean', 'std'
#    layers: 'psbulk_props'

assert pb.shape[0] == adata.obs["sample_id"].nunique()

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[40], line 1
----> 1 assert pb.shape[0] == adata.obs["sample_id"].nunique()

AssertionError:

Expected behavior
n_obs of the output anndata object always equals to the number of unique values in the grouping variable.

System

decoupler installed from main branch commit b38c07a

`LinAlgError: Singular matrix` in `run_mlm` for zebrafish

Describe the bug
When trying to compute TF activities for zebrafish, I run into an error. This happened to me when working on real zebrafish data, which I cannot share. To give an example, I will use the pbmc3k_processed dataset and fake it into zebrafish.

To Reproduce

import decoupler as dc
import scanpy as sc

adata = sc.datasets.pbmc3k_processed()
adata.var_names = adata.var_names.str.lower() # turn the genesymbols into fake zebrafish symbols
sc.pp.subsample(adata, n_obs=100) # faster computation

net = dc.get_dorothea(organism='zebrafish', levels=['A'])
net = net[net['source'].isin(['myca', 'mycb'])] # reduce complexity of net to two sources to pinpoint the source of the error

dc.run_mlm(mat=adata, net=net, source='source', target='target', weight='weight', verbose=True, use_raw=False)

Running mlm on mat with 100 samples and 1838 targets for 2 sources.

---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
Cell In[246], line 14
     10 net = dc.get_dorothea(organism='zebrafish', levels=['A'])
     12 net = net[net['source'].isin(['myca', 'mycb'])]
---> 14 dc.run_mlm(mat=adata, net=net, source='source', target='target', weight='weight', verbose=True, use_raw=False)

File /opt/conda/lib/python3.9/site-packages/decoupler/method_mlm.py:121, in run_mlm(mat, net, source, target, weight, batch_size, min_n, verbose, use_raw)
    118     print('Running mlm on mat with {0} samples and {1} targets for {2} sources.'.format(m.shape[0], len(c), net.shape[1]))
    120 # Run MLM
--> 121 estimate, pvals = mlm(m, net, batch_size=batch_size, verbose=verbose)
    123 # Transform to df
    124 estimate = pd.DataFrame(estimate, index=r, columns=sources)

File /opt/conda/lib/python3.9/site-packages/decoupler/method_mlm.py:48, in mlm(mat, net, batch_size, verbose)
     45 net = np.column_stack((np.ones((n_features, ), dtype=np.float32), net))
     47 # Compute inv and df for lm
---> 48 inv = np.linalg.inv(np.dot(net.T, net))
     49 df = n_features - n_fsets
     51 # Init empty acts

File <__array_function__ internals>:180, in inv(*args, **kwargs)

File /opt/conda/lib/python3.9/site-packages/numpy/linalg/linalg.py:552, in inv(a)
    550 signature = 'D->D' if isComplexType(t) else 'd->d'
    551 extobj = get_linalg_error_extobj(_raise_linalgerror_singular)
--> 552 ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
    553 return wrap(ainv.astype(result_t, copy=False))

File /opt/conda/lib/python3.9/site-packages/numpy/linalg/linalg.py:89, in _raise_linalgerror_singular(err, flag)
     88 def _raise_linalgerror_singular(err, flag):
---> 89     raise LinAlgError("Singular matrix")

LinAlgError: Singular matrix

I think the problem in the above example is that the two sources myca and mycb have exactly the same targets in net above.

Expected behavior
I expect that run_mlm would return TF activities / that the net returned by dc.get_dorothea() is owrking with run_mlm.

System

OS: Linux-5.15.0-52-generic-x86_64-with-glibc2.31
Python version 3.9.15
Versions of libraries involved
scanpy==1.9.1 anndata==0.8.0 umap==0.5.3 numpy==1.23.5 scipy==1.10.0 pandas==1.5.3 scikit-learn==1.2.1 statsmodels==0.13.5 python-igraph==0.10.4 louvain==0.8.0 pynndescent==0.5.8 decoupler==1.3.3

Additional context

Differences between wmean and wsum methods

Describe your question
Hello, thanks a lot for developing this awesome package, I was wondering if I can inquire what the differences are between the implementations of wmeans and wsum, because their docstring seems to be the same. Thanks so much in advance!

wmeans

    Weighted sum (WMEAN).
    WMEAN infers regulator activities by first multiplying each target feature by its associated weight which then are summed
    to an enrichment score (`wmean_estimate`). Furthermore, permutations of random target features can be performed to obtain a
    null distribution that can be used to compute a z-score (`wmean_norm`), or a corrected estimate (`wmean_corr`) by
    multiplying `wmean_estimate` by the minus log10 of the obtained empirical p-value.

wsum

    Weighted sum (WSUM).
    WSUM infers regulator activities by first multiplying each target feature by its associated weight which then are summed
    to an enrichment score (`wsum_estimate`). Furthermore, permutations of random target features can be performed to obtain a
    null distribution that can be used to compute a z-score (`wsum_norm`), or a corrected estimate (`wsum_corr`) by multiplying
    `wsum_estimate` by the minus log10 of the obtained empirical p-value.

dc.get_resource("MSigDB", organism = "mouse") fails with errors in pypath

Describe the bug
Decoupler fails to Import MSigDB resouce using when trying to get the "mouse" version of the genes.

I am not quite sure if this is a problem lies in deocupleR or in pypath. I've seen that similar issues have been raised in the past saezlab/pypath#218 but none of the workarounds solved the error.

I've run into the error on python 3.11 with the latest versions of pypath and decoupler-py installed from github.

To Reproduce

import decoupler as dc
msigdb = dc.get_resource('MSigDB', organism = "mouse")
msigdb

Traceback:

>>> msigdb = dc.get_resource('MSigDB',organism = "mouse")                                                                                                                                                                        
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/decoupler/omnip.py", line 229, in get_resource
    df = translate_net(
         ^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/decoupler/omnip.py", line 586, in translate_net
    hom_net = homology.translate_df(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 1898, in translate_df
    return manager.translate_df(**args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 477, in translate_df
    table = self.which_table(
            ^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 178, in which_table
    self.load(key)
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 209, in load
    self.tables[key] = self._load(key)
                       ^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 219, in _load
    return ProteinHomology(
           ^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 712, in __init__
    self.load(source)
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 728, in load
    self.load_homologene(source)
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/homology.py", line 1203, in load_homologene
    mapping.map_name(e, 'entrez', 'uniprot', self.target)
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 3564, in map_name
    return mapper.map_name(
           ^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/share/common.py", line 2772, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 2197, in map_name
    mapped_names = self.uniprot_cleanup(
                   ^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 2226, in uniprot_cleanup
    uniprots = self.primary_uniprot(uniprots)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 2838, in primary_uniprot
    primary = self.map_name(
              ^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/share/common.py", line 2772, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 1982, in map_name
    mapped_names = self._map_name(
                   ^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 2514, in _map_name
    tbl = self.which_table(
          ^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 1575, in which_table
    self.load_mapping(
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 3212, in load_mapping
    reader = MapReader(param = resource, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 257, in __init__
    self.load()
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 287, in load
    self.read()
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 449, in read
    getattr(self, method)()
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/utils/mapping.py", line 560, in read_mapping_file
    for i, line in enumerate(infile):
  File "/omics/groups/OE0436/internal/heyer/conda/envs/decoupler_test/lib/python3.11/site-packages/pypath/inputs/uniprot.py", line 479, in get_uniprot_sec
    enumerate(c.result)
TypeError: 'NoneType' object is not iterable

Expected behavior
Expect array to be loaded with MSigDB data from given organism.

System

OS: CentOS 7 & MacOS 13.4.1
Python version 3.11.4
Versions of libraries involved
- decoupler-py d768a29
- pypath saezlab/pypath@d732a52
- python 3.11.4 <- installed via conda

Let me know if you require any further information from me.

Additional context
Output Pip Freeze:

(decoupler_test)$ pip freeze
anndata==0.9.1
attrs==23.1.0
bcrypt==4.0.1
beautifulsoup4==4.12.2
boltons==23.0.0
certifi==2023.5.7
cffi==1.15.1
charset-normalizer==3.1.0
contourpy==1.1.0
cryptography==41.0.1
cycler==0.11.0
decoupler @ git+https://github.com/saezlab/decoupler-py@d768a29731377c7086c308d52004c4bf35014b43
dill==0.3.6
docrep==0.3.2
et-xmlfile==1.1.0
face==20.1.1
fonttools==4.40.0
future==0.18.3
glom==23.3.0
h5py==3.9.0
idna==3.4
inflect==6.0.4
kiwisolver==1.4.4
llvmlite==0.40.1
lxml==4.9.2
matplotlib==3.7.1
natsort==8.4.0
numba==0.57.1
numpy==1.24.4
omnipath==1.0.7
openpyxl==3.1.2
packaging==23.1
pandas==2.0.3
paramiko==3.2.0
Pillow==9.5.0
psutil==5.9.5
pycparser==2.21
pycurl==7.45.1
pydantic==1.10.9
PyNaCl==1.5.0
pyparsing==3.1.0
pypath-omnipath @ git+https://github.com/saezlab/pypath.git@d732a52f6a4ad30c0c56040d69d94c6583b06e42
pyreadr==0.4.7
pysftp==0.2.9
python-dateutil==2.8.2
pytz==2023.3
PyYAML==6.0
rdata==0.9
requests==2.31.0
scipy==1.11.1
six==1.16.0
soupsieve==2.4.1
sqlparse==0.4.4
tabulate==0.9.0
timeloop==1.0.2
toml==0.10.2
tqdm==4.65.0
typing_extensions==4.7.0
tzdata==2023.3
urllib3==2.0.3
wrapt==1.15.0
xarray==2023.6.0
xlrd==2.0.1
(decoupler_test) $

Are dc.run_consensus p-values valid?

I see that on the pseudo-bulk notebook dc.run_consensus is used twice without paying much attention to the pvals results generated. Is it that these p-values are meaningless due to the method used to merge the results between different methods? or why are they neglected in the tutorial?

Thanks!

ValueError: Invalid value `loops` for `InteractionsQuery`.

Running this line, gives me the following error. I am on Ubuntu, installed latest version of decoupler 1.4.0, and lastest version of omnipath 1.0.7.
On the side note, I didn't find the requirements file.
Thanks

net = dc.get_collectri(organism='human', split_complexes=False)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/decoupler/omnip.py", line 425, in get_collectri
    ct = op.interactions.CollecTRI.get(genesymbols=True, organism=_organism, loops=True, **kwargs)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/_core/requests/_utils.py", line 114, in wrapper
    return wrapped(*args, **kwargs)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/_core/requests/_utils.py", line 31, in _get_helper
    return cls()._get(**kwargs)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/_core/requests/_request.py", line 108, in _get
    kwargs = self._validate_params(kwargs)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/_core/requests/_request.py", line 195, in _validate_params
    res[self._query_type(k).param] = self._query_type(k)(v)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/_core/query/_query.py", line 136, in __call__
    return self.value(value)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/constants/_constants.py", line 51, in __call__
    return super().__call__(*args, **kw)
  File "/root/anaconda3/envs/py10/lib/python3.10/enum.py", line 385, in __call__
    return cls.__new__(cls, value)
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/constants/_constants.py", line 15, in wrapper
    raise e
  File "/root/anaconda3/envs/py10/lib/python3.10/site-packages/omnipath/constants/_constants.py", line 11, in wrapper
    return fun(*args, **kwargs)
  File "/root/anaconda3/envs/py10/lib/python3.10/enum.py", line 710, in __new__
    raise ve_exc
ValueError: Invalid value `loops` for `InteractionsQuery`.```

UnboundLocalError in plot_psbul_samples()

Description of the bug
Usage of plot_psbul_samples() raises
UnboundLocalError: local variable 'fig' referenced before assignment

Full error

---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
Cell In[6], line 21
     11 pdata = dc.get_pseudobulk(adata,
     12                       sample_col='patient',
     13                       groups_col=ct,
   (...)
     17                       min_counts=0
     18                      )
     19 print(pdata)
---> 21 dc.plot_psbulk_samples(pdata, groupby=ct, ax=axs[ii, 0])
     22 dc.plot_psbulk_samples(pdata, groupby='patient', ax=axs[ii, 1])
     23 dc.plot_psbulk_samples(pdata, groupby='condition.l3', ax=axs[ii, 2])

File [~/mambaforge/envs/scanpy/lib/python3.10/site-packages/decoupler/plotting.py:953](https://file+.vscode-resource.vscode-cdn.net/Users/demian/Documents/Projects/KiPfi/KiPfi_KPMPdata/workflow/dev/~/mambaforge/envs/scanpy/lib/python3.10/site-packages/decoupler/plotting.py:953), in plot_psbulk_samples(adata, groupby, figsize, dpi, ax, return_fig, save, **kwargs)
    950     ax.set_xlabel('Log10 number of cells')
    951     ax.set_ylabel('Log10 total sum of counts')
--> 953 save_plot(fig, ax, save)
    955 if return_fig:
    956     return fig

UnboundLocalError: local variable 'fig' referenced before assignment

To Reproduce
Use dc.plot_psbulk_samples(pdata, groupby='patient', ax=axs), i.e. plot into existing figures.

Add pseudobulking by median

Hey,

currently you're creating pseudobulks by summing up counts (perfectly fine) but we need pseudobulks that take the median for an implementation of DIALOGUE. We're currently using a custom implementation but I wouldn't be sad if your get_pseudobulk function had an option for that.

WDYT?

Issue with `get_pseudobulk` on a large dataframe

Hi,

Thank you for the package - looks very useful; I'm keen to try it on my data. I've executed the tutorial notebook which btw produces issues with get_pseudobulk if simple pip install decoupler is used. Installing the latest version from GitHub seems to fix the issue. Also, stat_res.lfc_shrink() refuses to work without coeff - perhaps some syntax has changed between pyDeseq2 versions? Fixing it to stat_res.lfc_shrink(coeff='disease_normal_vs_COVID-19') worked.

At any rate, I wanted to see if you'd have any advice on another issue. I am trying to analyse a very large dataset (440k cells, 43 cell types, 5 conditions). For whatever reason, get_pseudobulk on raw matrix produces a much smaller data frame (number of genes goes from 28k to about 8k), and no pseudobulk QC metrics are produced. Do you know what might cause the issue, and how can I debug it? Thank you in advance!

run_ora using adata as the input

Describe the bug
Using adata as input, if there is an empty sample (i.e., expression of the cell/sample is 0 for all genes), there will be an error writing the results to adata.obsm due to the deletion of the empty sample(s).

dc.run_ora(
mat=adata,
net=msigdb,
source='geneset',
target='genesymbol',
verbose=True,
use_raw=False
)

To Reproduce
As described above.

Expected behavior
As described above.

System

OS: [e.g. macOS Ventura]
Python version [e.g. 3.9]

Additional context
By any means, an empty sample should not be present in the data. However, as decoupler actively check for empty samples and genes and deletes them, for integrity, it might be reasonable to return the repaired adata with results.

Pypath error in species conversion of MSigDB

Tried to load the mouse version of msigdb with
dc.get_resource('MSigDB', organism = 'mouse')

also tried translating after loading human version
mouse_msigdb = dc.translate_net(msigdb, target_organism = 'mouse', unique_by = ('geneset', 'genesymbol'))

Both gives the following error:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/decoupler/omnip.py:410, in translate_net(net, columns, source_organism, target_organism, id_type, unique_by, **kwargs)
    409 import pypath
--> 410 from pypath.utils import homology
    411 from pypath.share import common

File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/pypath/utils/homology.py:47, in <module>
     45 import pandas as pd
---> 47 import pypath.utils.mapping as mapping
     48 import pypath.share.common as common

File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/pypath/utils/mapping.py:72, in <module>
     71 import pypath.resources.urls as urls
---> 72 import pypath.share.curl as curl
     73 import pypath.inputs as inputs

File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/pypath/share/curl.py:49, in <module>
     47 _logger = session_mod.get_log()
---> 49 import pycurl
     50 try:

ImportError: pycurl: libcurl link-time ssl backends (secure-transport, openssl) do not include compile-time ssl backend (none/other)

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
Input In [2], in <cell line: 1>()
----> 1 msigdb = dc.get_resource('MSigDB', organism = 'mouse')
      2 msigdb

File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/decoupler/omnip.py:198, in get_resource(name, organism)
    194 df = df.drop(columns=['record_id', 'uniprot', 'entity_type', 'source'])
    196 if not _is_human(organism):
--> 198     df = translate_net(
    199         df,
    200         target_organism=organism,
    201         columns='genesymbol',
    202         unique_by=None,
    203     )
    205 return df

File ~/miniconda3/envs/decoupler/lib/python3.9/site-packages/decoupler/omnip.py:426, in translate_net(net, columns, source_organism, target_organism, id_type, unique_by, **kwargs)
    419         raise RuntimeError(
    420             'The installed version of pypath-omnipath is too old, '
    421             f'the oldest compatible version is {PYPATH_MIN_VERSION}.'
    422         )
    424 except Exception:
--> 426     raise ImportError(
    427         'pypath-omnipath is not installed. Please install it with: '
    428         'pip install git+https://github.com/saezlab/pypath.git'
    429     )
    431 _source_organism = taxonomy.ensure_ncbi_tax_id(source_organism)
    432 _target_organism = taxonomy.ensure_ncbi_tax_id(target_organism)

ImportError: pypath-omnipath is not installed. Please install it with: pip install git+https://github.com/saezlab/pypath.git

Uninstalling and reinstalling pypath-omnipath via GitHub doesn't help.

Loading mouse dorothea and progeny works fine!

Can you please have a look @deeenes?
Thank you

Is there any plan or ongoing work to add decoupler-py to conda-forge or bioconda?

Hi, it would greatly facilitate this package from workflow environments and other setups if decoupler-py could be added to conda-forge or bioconda, depending on where its dependencies are. Thanks!

module 'sklearn' has no attribute 'tree'

Describe the bug
UDT model doesn't run properly which I believe due to sklearn API changes, here I used scikit-learn 1.1.2

Traceback (most recent call last):
  File "./decoupleR.py", line 55, in <module>
    # estimate = dc.run_udt(mat=expr,net=net)
  File "/data/salomonis-archive/FASTQs/NCI-R01/ENCODE_KD_fastq_frank/decoupler_env/lib/python3.8/site-packages/decoupler/method_udt.py", line 105, in run_udt
    estimate = udt(m.A, net, min_leaf=min_leaf, seed=seed, verbose=verbose)
  File "/data/salomonis-archive/FASTQs/NCI-R01/ENCODE_KD_fastq_frank/decoupler_env/lib/python3.8/site-packages/decoupler/method_udt.py", line 44, in udt
    acts[i, j] = fit_dt(sk, net[:, j], mat[i], min_leaf=min_leaf, seed=seed)
  File "/data/salomonis-archive/FASTQs/NCI-R01/ENCODE_KD_fastq_frank/decoupler_env/lib/python3.8/site-packages/decoupler/method_udt.py", line 26, in fit_dt
    regr = sk.tree.DecisionTreeRegressor(min_samples_leaf=min_leaf, random_state=seed)
AttributeError: module 'sklearn' has no attribute 'tree'

Benchmarking analogous to decoupleRBench

Hi all,

Delighted to see that the python implementation runs smoothly and fast so far.
Is also the implementation of decoupleRBench in work/planned?

Cheers

dc.run_gsva Error

dc.run_gsva(mat=dc_data, net=MSigDB, source='geneset', target='genesymbol', verbose=True,use_raw=True)

AttributeError                            Traceback (most recent call last)
Input In [76], in <cell line: 3>()
      1 dc_data = sce.copy()
----> 3 dc.run_gsva(mat=dc_data, net=MSigDB, source='geneset', target='genesymbol', verbose=True,use_raw=True)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/decoupler/method_gsva.py:205, in run_gsva(mat, net, source, target, kcdf, mx_diff, abs_rnk, min_n, seed, verbose, use_raw)
    158 """
    159 Gene Set Variation Analysis (GSVA).
    160 
   (...)
    201     GSVA scores. Stored in `.obsm['gsva_estimate']` if `mat` is AnnData.
    202 """
    204 # Extract sparse matrix and array of genes
--> 205 m, r, c = extract(mat, use_raw=use_raw, verbose=verbose)
    207 # Transform net
    208 net = rename_net(net, source=source, target=target, weight=None)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/decoupler/pre.py:65, in extract(mat, use_raw, verbose, dtype)
     63     n_empty_samples = m.shape[0]
     64 else:
---> 65     msk = np.sum(m != 0, axis=0).A1 >= 3
     66     n_empty_samples = m.shape[0] - 3
     68 n_empty_features = np.sum(~msk)

AttributeError: 'numpy.ndarray' object has no attribute 'A1'

Mouse genes

Hello,

I am trying to analyse a dataset of mouse cells and would like to perform over-enrichment analyses and trajectory inferences with decoupler. However I can't download ressources for mouse (only Progeny), and the functions complains that the genes identifiers are not the same.
Is there a way to use these functions with mouse data?

Thanks a lot

Error importing decoupler library in python

Hello, I installed the decoupler library (v 1.4.0) in a python environment : python v 3.9; numpy v 1.24.3

Installation was successful but when I try to import the library (import decoupler as dc) I get the error: AttributeError: module 'numpy' has no attribute 'long'

I think that this is an issue with the version of numpy from what I found on this post: https://levelup.gitconnected.com/fix-attributeerror-module-numpy-has-no-attribute-float-d7d68c5a4971.

Would there be an update of the numpy code found in decoupler? I need to use a version of numpy >1.2 so downgrading numpy would not be the best solution for me.

Thank you!

PanglaoDB

Hi,

I was testing Decoupler-py on my anndata object following the tutorial and i got this error:

markers = dc.get_resource('PanglaoDB')
markers

"PanglaoDB is not a valid resource. Please, run decoupler.show_resources to see the list of available resources."

How I can use the information of PanglaoDB with Decoupler?

Best regards

run_ora

I was also wondering in the run_ora function how the top 5% of most expressed genes is chosen! Thanks again

global regulation

Hi everyone

Great to see that the package gets some recognition.

I have a theoretical question regarding the TFA inference methods. Do you have any experience/recommendation if/how to deal with potential global regulation (e.g. growth rate dependence of RNA-Seq data)?

Is this usually not a problem for TPM-normalized data/your method compendium or do you account for it during pre-/postprocessing?

Thanks a load!

PanglaoDB tutorial doesn't work

Dear Decoupler Team,
I was excited to use the method in the notebook Cell type annotation from marker genes on my data. However it didn't work. I got the following error.
PanglaoDB is not a valid resource. Please, run decoupler.show_resources to see the list of available resources.
On the Omnipath website it still lists PanglaoDB. I have the newest pypi - versions of Omnipath and Decoupler installed

Editing plots using matplotlib

Is there a way of editing plots using matplotlib pyplot (e.g. by specificying a predetermined axis for a figure)?

SSL error

Thanks for the great tool. I'm running into an SSL error when trying to query progeny:

progeny = dc.get_progeny(top=300)

Gives:
SSLError: HTTPSConnectionPool(host='omnipathdb.org', port=443): Max retries exceeded with url: /annotations?format=tsv&resources=PROGENy (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)')))

Any idea how to fix this?

ChunkedEncodingError in downloading msigDB

Hi,
while trying to download the msigdb with decoupler:

msigdb = dc.get_resource('MSigDB',organism='mouse')
msigdb

254M/? [00:07<00:00, 38.9MB/s]

I meet this error:

---------------------------------------------------------------------------
IncompleteRead                            Traceback (most recent call last)
File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:705, in HTTPResponse._error_catcher(self)
    704 try:
--> 705     yield
    707 except SocketTimeout as e:
    708     # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
    709     # there is yet no clean way to get at it from this context.

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:1075, in HTTPResponse.read_chunked(self, amt, decode_content)
   1074     break
-> 1075 chunk = self._handle_chunk(amt)
   1076 decoded = self._decode(
   1077     chunk, decode_content=decode_content, flush_decoder=False
   1078 )

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:1026, in HTTPResponse._handle_chunk(self, amt)
   1025 else:  # amt > self.chunk_left
-> 1026     returned_chunk = self._fp._safe_read(self.chunk_left)  # type: ignore[union-attr]
   1027     self._fp._safe_read(2)  # type: ignore[union-attr] # Toss the CRLF at the end of the chunk.

File ~/miniconda3/envs/pydeseq2/lib/python3.10/http/client.py:633, in HTTPResponse._safe_read(self, amt)
    632 if len(data) < amt:
--> 633     raise IncompleteRead(data, amt-len(data))
    634 return data

IncompleteRead: IncompleteRead(7623 bytes read, 557 more expected)

The above exception was the direct cause of the following exception:

ProtocolError                             Traceback (most recent call last)
File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/requests/models.py:816, in Response.iter_content.<locals>.generate()
    815 try:
--> 816     yield from self.raw.stream(chunk_size, decode_content=True)
    817 except ProtocolError as e:

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:932, in HTTPResponse.stream(self, amt, decode_content)
    931 if self.chunked and self.supports_chunked_reads():
--> 932     yield from self.read_chunked(amt, decode_content=decode_content)
    933 else:

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:1060, in HTTPResponse.read_chunked(self, amt, decode_content)
   1055     raise BodyNotHttplibCompatible(
   1056         "Body should be http.client.HTTPResponse like. "
   1057         "It should have have an fp attribute which returns raw chunks."
   1058     )
-> 1060 with self._error_catcher():
   1061     # Don't bother reading the body of a HEAD request.
   1062     if self._original_response and is_response_to_head(self._original_response):

File ~/miniconda3/envs/pydeseq2/lib/python3.10/contextlib.py:153, in _GeneratorContextManager.__exit__(self, typ, value, traceback)
    152 try:
--> 153     self.gen.throw(typ, value, traceback)
    154 except StopIteration as exc:
    155     # Suppress StopIteration *unless* it's the same exception that
    156     # was passed to throw().  This prevents a StopIteration
    157     # raised inside the "with" statement from being suppressed.

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/urllib3/response.py:722, in HTTPResponse._error_catcher(self)
    720 except (HTTPException, OSError) as e:
    721     # This includes IncompleteRead.
--> 722     raise ProtocolError(f"Connection broken: {e!r}", e) from e
    724 # If no exception is thrown, we should avoid cleaning up
    725 # unnecessarily.

ProtocolError: ('Connection broken: IncompleteRead(7623 bytes read, 557 more expected)', IncompleteRead(7623 bytes read, 557 more expected))

During handling of the above exception, another exception occurred:

ChunkedEncodingError                      Traceback (most recent call last)
Cell In[34], line 1
----> 1 msigdb = dc.get_resource('MSigDB',organism='mouse')
      2 msigdb

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/decoupler/omnip.py:209, in get_resource(name, organism, **kwargs)
    205 assert name in resources, msg
    207 op = _check_if_omnipath()
--> 209 df = op.requests.Annotations.get(
    210     resources=name,
    211     entity_type='protein',
    212     **kwargs
    213 )
    214 df = df.set_index([
    215     'record_id', 'uniprot',
    216     'genesymbol', 'entity_type',
    217     'source', 'label',
    218 ])
    219 df = df.unstack('label').droplevel(axis=1, level=0)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/omnipath/_core/requests/_utils.py:114, in _inject_api_method.<locals>.wrapper(wrapped, _instance, args, kwargs)
    112 @wrapt.decorator(adapter=wrapt.adapter_factory(argspec_factory))
    113 def wrapper(wrapped, _instance, args, kwargs):
--> 114     return wrapped(*args, **kwargs)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/omnipath/_core/requests/_annotations.py:123, in Annotations.get(cls, proteins, resources, force_full_download, wide, **kwargs)
    109     return pd.concat(
    110         [
    111             inst._get(
   (...)
    118         ]
    119     )
    121 logging.info(f"Downloading annotations for all proteins from {res_info}")
--> 123 return inst._get(proteins=None, resources=resources, **kwargs)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/omnipath/_core/requests/_request.py:112, in OmnipathRequestABC._get(self, **kwargs)
    109 kwargs = self._finalize_params(kwargs)
    110 self._last_param["final"] = kwargs.copy()
--> 112 res = self._downloader.maybe_download(
    113     self._query_type.endpoint, params=kwargs, callback=callback, is_final=False
    114 )
    116 if self._downloader._options.convert_dtypes:
    117     res = self._convert_dtypes(res)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/omnipath/_core/downloader/_downloader.py:127, in Downloader.maybe_download(self, url, callback, params, cache, is_final, **_)
    125     res = self._options.cache[key]
    126 else:
--> 127     res = callback(self._download(req))
    128     if cache:
    129         logging.debug(f"Caching result to `{self._options.cache}[{key!r}]`")

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/omnipath/_core/downloader/_downloader.py:167, in Downloader._download(self, req)
    157 total = resp.headers.get("content-length", None)
    159 with tqdm(
    160     unit="B",
    161     unit_scale=True,
   (...)
    165     disable=not self._options.progress_bar,
    166 ) as t:
--> 167     for chunk in resp.iter_content(chunk_size=self._options.chunk_size):
    168         t.update(len(chunk))
    169         handle.write(chunk)

File ~/miniconda3/envs/pydeseq2/lib/python3.10/site-packages/requests/models.py:818, in Response.iter_content.<locals>.generate()
    816     yield from self.raw.stream(chunk_size, decode_content=True)
    817 except ProtocolError as e:
--> 818     raise ChunkedEncodingError(e)
    819 except DecodeError as e:
    820     raise ContentDecodingError(e)

ChunkedEncodingError: ('Connection broken: IncompleteRead(7623 bytes read, 557 more expected)', IncompleteRead(7623 bytes read, 557 more expected))

is it related to some local setting on my network/machine or it depends on msigdb ?
Thanks again for taking the time to reply.
Best

plot_psbulk_samples does not exist

Describe the bug
module 'decoupler' has no attribute 'plot_psbulk_samples' though it is part of the package

To Reproduce
run your pipeline using the latest decoupler installation

Please provide exact steps to reproduce the bug in a clean Python environment. In case it's not clear what's causing this bug, please provide the data or the data generation procedure.
Your own data

Expected behavior
plot as provided in pipeline

System
Google Colab
Additional context
NA

Pseudobulk improvements

As decoupler appears to become the go-to method of scverse to generate pseudobulk (e.g. theislab/single-cell-best-practices#141), I suggest a few improvements to the pseudobulk function in decoupler:

documentation

I feel a few points could be documented better:
- How are the cutoffs defined, e.g. does min_prop mean one group needs to have more expressing cells than this fraction, or all cells (same applies for the other filtering attributes)?
- How were the default values chosen, and are there recommendations to choose other values for certain use-cases? To me, 0.2 seems a bit conservative for general usage, but maybe you had reasons for this value.
allow to specify a callback function for mode

Taking a callback in addition to sum/mean/median would allow maximum flexibility.
A usecase that comes to my mind is to compute the fraction of cells expressing a gene. This could be used to make dotplots of pseudobulk (mode = lambda x: np.sum(x >= 1) / len(x))

An extension of this idea is to allow passing a Dict[str, Callable] to allow different aggregations into different layers, e.g.
```
dc.get_pseudobulk(adata, ..., mode = {
    'counts': np.sum, 
    'fractions': lambda x: np.sum(x >= 1) / len(x)
})
```
allow to group by multiple columns

I commonly need pseudobulk by sample, condition and cell-type. It would be great if groups_col could take multiple keys, e.g.
```
 pb = dc.get_pseudobulk(adata, sample_col="patient", groups_col=["cell_type", "condition"])
```
include stats in .obs: n_cells

It would be nice to generate an additional column in obs that stores how many cells were included into the pseudobulk sample. This may be useful as an additional covariate when performing differential expression analysis.

CC @Zethson

MSigDB Loading Error

msigdb = dc.get_resource('MSigDB')

MaxRetryError Traceback (most recent call last)
File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/requests/adapters.py:440, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
439 if not chunked:
--> 440 resp = conn.urlopen(
441 method=request.method,
442 url=url,
443 body=request.body,
444 headers=request.headers,
445 redirect=False,
446 assert_same_host=False,
447 preload_content=False,
448 decode_content=False,
449 retries=self.max_retries,
450 timeout=timeout
451 )
453 # Send the request.
454 else:

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/urllib3/connectionpool.py:876, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
875 log.debug("Retry: %s", url)
--> 876 return self.urlopen(
877 method,
878 url,
879 body,
880 headers,
881 retries=retries,
882 redirect=redirect,
883 assert_same_host=assert_same_host,
884 timeout=timeout,
885 pool_timeout=pool_timeout,
886 release_conn=release_conn,
887 chunked=chunked,
888 body_pos=body_pos,
889 **response_kw
890 )
892 return response

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/urllib3/connectionpool.py:866, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
865 try:
--> 866 retries = retries.increment(method, url, response=response, _pool=self)
867 except MaxRetryError:

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='omnipathdb.org', port=443): Max retries exceeded with url: /annotations?entity_types=protein&format=tsv&resources=MSigDB (Caused by ResponseError('too many 500 error responses'))

During handling of the above exception, another exception occurred:

RetryError Traceback (most recent call last)
Input In [37], in <cell line: 1>()
----> 1 msigdb = dc.get_resource('MSigDB')
2 msigdb

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/decoupler/omnip.py:82, in get_resource(name)
78 assert name in resources, msg.format(name)
80 op = check_if_omnipath()
---> 82 df = op.requests.Annotations.get(resources=name, entity_type="protein")
83 df = df.set_index(['record_id', 'uniprot', 'genesymbol', 'entity_type', 'source', 'label'])
84 df = df.unstack('label').droplevel(axis=1, level=0)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/omnipath/_core/requests/_utils.py:104, in _inject_api_method..wrapper(wrapped, _instance, args, kwargs)
102 @wrapt.decorator(adapter=wrapt.adapter_factory(argspec_factory))
103 def wrapper(wrapped, _instance, args, kwargs):
--> 104 return wrapped(*args, **kwargs)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/omnipath/_core/requests/_annotations.py:114, in Annotations.get(cls, proteins, resources, force_full_download, **kwargs)
100 return pd.concat(
101 [
102 inst._get(
(...)
109 ]
110 )
112 logging.info(f"Downloading annotations for all proteins from {res_info}")
--> 114 return inst._get(proteins=None, resources=resources, **kwargs)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/omnipath/_core/requests/_request.py:107, in OmnipathRequestABC._get(self, **kwargs)
104 kwargs = self._validate_params(kwargs)
105 kwargs = self._finalize_params(kwargs)
--> 107 res = self._downloader.maybe_download(
108 self._query_type.endpoint, params=kwargs, callback=callback, is_final=False
109 )
111 if self._downloader._options.convert_dtypes:
112 res = self._convert_dtypes(res)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/omnipath/_core/downloader/downloader.py:128, in Downloader.maybe_download(self, url, callback, params, cache, is_final, **)
126 res = self._options.cache[key]
127 else:
--> 128 res = callback(self._download(req))
129 if cache:
130 logging.debug(f"Caching result to {self._options.cache}[{key!r}]")

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/omnipath/_core/downloader/_downloader.py:154, in Downloader._download(self, req)
151 logging.info(f"Downloading data from {req.url}")
153 handle = BytesIO()
--> 154 with self._session.send(
155 req, stream=True, timeout=self._options.timeout
156 ) as resp:
157 resp.raise_for_status()
158 total = resp.headers.get("content-length", None)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/requests/sessions.py:645, in Session.send(self, request, **kwargs)
642 start = preferred_clock()
644 # Send the request
--> 645 r = adapter.send(request, **kwargs)
647 # Total elapsed time of the request (approximately)
648 elapsed = preferred_clock() - start

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/requests/adapters.py:510, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
507 raise ConnectTimeout(e, request=request)
509 if isinstance(e.reason, ResponseError):
--> 510 raise RetryError(e, request=request)
512 if isinstance(e.reason, _ProxyError):
513 raise ProxyError(e, request=request)

RetryError: HTTPSConnectionPool(host='omnipathdb.org', port=443): Max retries exceeded with url: /annotations?entity_types=protein&format=tsv&resources=MSigDB (Caused by ResponseError('too many 500 error responses'))

We can not download PROGENy database in decouplerpy.

Dear authors:
When I tried to used this function to download the PROGENy Database, for example:

model = dc.get_progeny(organism='human', top=100)
model

It was wrong and the error is here:

RetryError: HTTPSConnectionPool(host='omnipathdb.org', port=443): Max retries exceeded with url: /annotations?format=tsv&resources=PROGENy (Caused by ResponseError('too many 500 error responses'))

Would you have any recommendation for running Pathway activity inference when we don't have any controls?

I have a RNA-seq dataset without controls (or all you can consider them all controls) and I am interested in an unsupervised ranking or clustering of samples with regard to how they are expressing pathways of interest. I am looking to stratify samples in terms of their pathway activity for specific pathways. Would you have any recommendation for how to run this analysis?

run_mdt: module has no attribute

Hi decoupler enthusiasts,

I am playing around with decoupler using version 1.1.0 and noticed an issue with mdt. I cannot run_mdt (neither via decouple nor via run_mdt). OP: AttributeError: "module 'decoupler' has no attribute 'run_mdt'".
Also quickly checked show_methods(), where run_mdt also was not listed.
Module exists in package folder though.

Cheers

rank_sources_groups

Thank you in advance for taking the time to look at this and also for a great tool.

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Hello I am going through the steps here to annotate my own data. The steps up to there are exactly the same as in the tutorial.

https://decoupler-py.readthedocs.io/en/latest/notebooks/cell_annotation.html

And I run into issues with the following line:
df = dc.rank_sources_groups(acts, groupby='leiden', reference='rest', method='t-test_overestim_var')

Error:

AttributeError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 df = dc.rank_sources_groups(acts, groupby='leiden', reference='rest', method='t-test_overestim_var')
2 df

AttributeError: module 'decoupler' has no attribute 'rank_sources_groups'

My version of decoupler is 1.4.0

Thank you again and sorry if it is very basic question.
Carmen

run_mlm function process

Hi! I was wondering if you could help me understand how the multivariate linear model works for pathway activity inference. I understand that the independent variables are interaction weights and the dependent variable the gene expression. But if someone could write a step-by-step of the function's process from input to output, that would be greatly appreciated!:)

dc.run_mlm() and dc.get_gsea_df()

Describe your question
Hi, thanks for the great tool, I have several questions:

How to choose the progeny top terms in dc.get_progeny()? I found set top=100 to top=500 will get a different result
For dc.run_mlm() to get the pathway activity for each cell(cell level) with subcluster annotation:
i. Is it reasonable for just compare mlm_estimate across each subclusters and get the mean of each subcluster value to plot matrixplot? I found the mean/medain of some pathway have non significant mlm_pvals.
ii. Is it a better choice for see subcluster as condiction (like set subcluster A to condition A and other subclusters to condition B) to do pseudo-bulk level dc.run_mlm(), I find the result is very close to cell level.
For dc.get_gsea_df() and dc.run_gsea():
i. I can not run dc.run_gsea() for each cell, I'll get the ZeroDivisionError.
i. For pseudo-bulk, I have two conditions(health and disease) and each condition has two time points(before and after treatment). I want to get the difference of before and after treatment between two conditions. If I don't filter by dc.filter_by_expr(pdata, group='time_point', min_count=10, min_total_count=15) for each group, dc.get_gsea_df() also get error. But if I do filter, the backgrounds of two conditions will be different, is acceptable?

Confidence levels

I have observed that neither the R nor python version of decoupler provides the regulons with confidence level E.
Why is this so?

The only way to retrieve these regulons is through the R dorothea library.

On the other hand, when using a version of python earlier than 3.8 the function summarize_acts() will yield the same value for the whole row. I think this value corresponds to the mean value of the mean of all regulons within each row.

summarize_acts Error

I runned

dc.run_gsva(mat=dc_data, net=MSigDB, source='geneset', target='genesymbol', verbose=True,use_raw=False)
mean_enr = dc.summarize_acts(dc_data, groupby='leiden', min_std=1)

But:

Input In [45], in <cell line: 1>()
----> 1 mean_enr = dc.summarize_acts(dc_data, groupby='leiden', min_std=1)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/decoupler/utils.py:289, in summarize_acts(acts, groupby, obs, var, mode, min_std)
    287 msk = obs == groups[i]
    288 if mode == 'mean':
--> 289     summary[i] = np.mean(acts[msk], axis=0, where=np.isfinite(acts[msk]))
    290 elif mode == 'median':
    291     summary[i] = np.median(acts[msk], axis=0)

File <__array_function__ internals>:5, in mean(*args, **kwargs)

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/numpy/core/fromnumeric.py:3438, in mean(a, axis, dtype, out, keepdims, where)
   3436         pass
   3437     else:
-> 3438         return mean(axis=axis, dtype=dtype, out=out, **kwargs)
   3440 return _methods._mean(a, axis=axis, dtype=dtype,
   3441                       out=out, **kwargs)

TypeError: mean() got an unexpected keyword argument 'where'```

TypingError Decoupler Import

Dear Decoupler team,

I followed the PerMedCoE hands-on course on Tuesday and I'm trying to apply the analysis to my data (spatial transcriptomics) following the GoogleCollab file.

I managed to create the 'signalling' environment and I believe I managed to install all packages in Jupyter. However, I get a TypingError when importing decoupler in python and can't figure out what's wrong (see below).

Any assistance would be very welcome.

Have a great day,

Johanna

My code:

!pip install cvxpy==1.3.1 cylp==0.91.5 gurobipy==10.0.1
!pip install pypath-omnipath
!pip install omnipath
!pip install decoupler
!pip install corneto-0.9.1a0-py3-none-any.whl
!pip install --upgrade numba

import decoupler as dc

Error message:

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-66-3ed1422ba4b9> in <module>
----> 1 import decoupler as dc

~/opt/anaconda3/lib/python3.7/site-packages/decoupler/__init__.py in <module>
     14 from .method_udt import run_udt  # noqa: F401
     15 from .method_ora import run_ora, test1r, get_ora_df  # noqa: F401
---> 16 from .method_gsva import run_gsva  # noqa: F401
     17 from .method_gsea import run_gsea  # noqa: F401
     18 from .method_viper import run_viper  # noqa: F401

~/opt/anaconda3/lib/python3.7/site-packages/decoupler/method_gsva.py in <module>
     80 
     81 
---> 82 @nb.njit(nb.types.Tuple((nb.f4[:, :], nb.i8[:, :]))(nb.f4[:, :]), parallel=True, cache=True)
     83 def nb_get_D_I(mat):
     84     n = mat.shape[1]

~/opt/anaconda3/lib/python3.7/site-packages/numba/decorators.py in wrapper(func)
    184             with typeinfer.register_dispatcher(disp):
    185                 for sig in sigs:
--> 186                     disp.compile(sig)
    187                 disp.disable_compile()
    188         return disp

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

~/opt/anaconda3/lib/python3.7/site-packages/numba/dispatcher.py in compile(self, sig)
    691 
    692             self._cache_misses[sig] += 1
--> 693             cres = self._compiler.compile(args, return_type)
    694             self.add_overload(cres)
    695             self._cache.save_overload(sig, cres)

~/opt/anaconda3/lib/python3.7/site-packages/numba/dispatcher.py in compile(self, args, return_type)
     78             return retval
     79         else:
---> 80             raise retval
     81 
     82     def _compile_cached(self, args, return_type):

~/opt/anaconda3/lib/python3.7/site-packages/numba/dispatcher.py in _compile_cached(self, args, return_type)
     88 
     89         try:
---> 90             retval = self._compile_core(args, return_type)
     91         except errors.TypingError as e:
     92             self._failed_cache[key] = e

~/opt/anaconda3/lib/python3.7/site-packages/numba/dispatcher.py in _compile_core(self, args, return_type)
    106                                       args=args, return_type=return_type,
    107                                       flags=flags, locals=self.locals,
--> 108                                       pipeline_class=self.pipeline_class)
    109         # Check typing error if object mode is used
    110         if cres.typing_error is not None and not flags.enable_pyobject:

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    970     pipeline = pipeline_class(typingctx, targetctx, library,
    971                               args, return_type, flags, locals)
--> 972     return pipeline.compile_extra(func)
    973 
    974 

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in compile_extra(self, func)
    388         self.lifted = ()
    389         self.lifted_from = None
--> 390         return self._compile_bytecode()
    391 
    392     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in _compile_bytecode(self)
    901         """
    902         assert self.func_ir is None
--> 903         return self._compile_core()
    904 
    905     def _compile_ir(self):

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in _compile_core(self)
    888         self.define_pipelines(pm)
    889         pm.finalize()
--> 890         res = pm.run(self.status)
    891         if res is not None:
    892             # Early pipeline completion

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in run(self, status)
    264                     # No more fallback pipelines?
    265                     if is_final_pipeline:
--> 266                         raise patched_exception
    267                     # Go to next fallback pipeline
    268                     else:

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in run(self, status)
    255                 try:
    256                     event("-- %s" % stage_name)
--> 257                     stage()
    258                 except _EarlyPipelineCompletion as e:
    259                     return e.result

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in stage_nopython_frontend(self)
    513                 self.args,
    514                 self.return_type,
--> 515                 self.locals)
    516             self.typemap = typemap
    517             self.return_type = return_type

~/opt/anaconda3/lib/python3.7/site-packages/numba/compiler.py in type_inference_stage(typingctx, interp, args, return_type, locals)
   1122 
   1123         infer.build_constraint()
-> 1124         infer.propagate()
   1125         typemap, restype, calltypes = infer.unify()
   1126 

~/opt/anaconda3/lib/python3.7/site-packages/numba/typeinfer.py in propagate(self, raise_errors)
    925         if errors:
    926             if raise_errors:
--> 927                 raise errors[0]
    928             else:
    929                 return errors

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Invalid use of Function(<built-in function arange>) with argument(s) of type(s): (dtype=class(float32), start=int64, step=Literal[int](-1), stop=Literal[int](0))
 * parameterized
In definition 0:
    AssertionError: 
    raised from /Users/johannavogenstahl/opt/anaconda3/lib/python3.7/site-packages/numba/typing/npydecl.py:631
In definition 1:
    AssertionError: 
    raised from /Users/johannavogenstahl/opt/anaconda3/lib/python3.7/site-packages/numba/typing/npydecl.py:631
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<built-in function arange>)
[2] During: typing of call at /Users/johannavogenstahl/opt/anaconda3/lib/python3.7/site-packages/decoupler/method_gsva.py (85)


File "../../../opt/anaconda3/lib/python3.7/site-packages/decoupler/method_gsva.py", line 85:
def nb_get_D_I(mat):
    <source elided>
    n = mat.shape[1]
    rev_idx = np.abs(np.arange(start=n, stop=0, step=-1, dtype=nb.f4) - n / 2)
    ^

Pseudobulk output AnnData retains only index in .var

Hi,
in the latest decoupler version 1.4.0 I noticed that the output AnnData of get_pseudobulk() is missing any information that was originally stored in the .var slot.

For example:

import scanpy as sc
import decoupler as dc

adata = sc.datasets.pbmc3k()

adata.obs["sample_id"] = "1"
adata.obs["cell_type"] = "all"
adata.layers['counts'] = adata.X.copy()

pdata = dc.get_pseudobulk(
    adata,
    sample_col="sample_id",
    groups_col="cell_type",
    layer="counts",
    mode="sum",
    min_cells=0,
    min_counts=0,
)

adata.var.columns
pdata.var.columns

While the original adata.var contains the index and a column gene_ids, pdata.var only keeps the index. Subsequently, I need to reinsert any info I might want to use downstream like gene metadata from a gtf file.

I think the default behaviour should be to keep everything in .var.

https://omnipathdb.org/ is down hence produces error when trying to use decoupler

https://omnipathdb.org/ is down hence produces error when trying to use decoupler. I saw from another post (saezlab/pypath#143), that your team were able to restart. Any chance you might be able to do so again? Thank you.

CytoSig resource providing incorrect data

Describe the bug
It's unclear what the CytoSig resource is outputting. For example,

res = dc.get_resource('CytoSig')
res

Provides a DataFrame with this info

which has 36 unique cytokines.

Meanwhile going to the cytosig repo and downloading their data

pd.read_csv("https://raw.githubusercontent.com/data2intelligence/CytoSig/master/CytoSig/signature.centroid", sep="\t")

provides 43 columns (cytokines, and with completely different names than the resource code above, but with names that make a lot more sense)

saezlab / decoupler-py Goto Github PK

decoupler-py's Introduction

decoupler - Ensemble of methods to infer biological activities

Installation

scverse

License

Citation

decoupler-py's People

Contributors

Stargazers

Watchers

Forkers

decoupler-py's Issues

Error:

Recommend Projects

Recommend Topics

Recommend Org