saezlab / liana-py Goto Github PK

View Code? Open in Web Editor NEW

119.0 3.0 13.0 225.42 MB

LIANA+: an all-in-one framework for cell-cell communication

Home Page: http://liana-py.readthedocs.io/

License: GNU General Public License v3.0

Python 100.00%

cell-cell-communication ligand-receptor python single-cell single-cell-rna-seq spatial spatialomics

liana-py's Introduction

LIANA+: an all-in-one cell-cell communication framework

LIANA+ is a scalable framework that integrates and extends existing methods and knowledge to study cell-cell communication in single-cell, spatially-resolved, and multi-modal omics data. It is part of the scverse ecosystem, and relies on AnnData & MuData objects as input.

Development & Contributions

We welcome suggestions, ideas, and contributions! Please use do not hesitate to contact us, or use the issues or the LIANA+ Development project to make suggestions.

Tutorials

A set of extensive tutorials can be found in the LIANA+ documentation.

API

For further information please check LIANA's API documentation.

Cite LIANA+:

Dimitrov D., Schäfer P.S.L, Farr E., Rodriguez Mier P., Lobentanzer S., Dugourd A., Tanevski J., Ramirez Flores R.O. and Saez-Rodriguez J. 2023 LIANA+: an all-in-one cell-cell communication framework. BioRxiv. https://www.biorxiv.org/content/10.1101/2023.08.19.553863v1

Dimitrov, D., Türei, D., Garrido-Rodriguez M., Burmedi P.L., Nagai, J.S., Boys, C., Flores, R.O.R., Kim, H., Szalai, B., Costa, I.G., Valdeolivas, A., Dugourd, A. and Saez-Rodriguez, J. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat Commun 13, 3224 (2022). https://doi.org/10.1038/s41467-022-30755-0

Similarly, please consider citing any of the methods and/or resources implemented in liana, that were particularly relevant for your research!

liana-py's People

Contributors

Stargazers

Watchers

Forkers

eliasfarr dnjst demian1 schae211 driesschaumont pablormier svanuytven paubadiam baldanmatt complexdata gomez-dan soniyalama

liana-py's Issues

MuData & Decoupler-py as dependencies

Should include MuData to setup.py (currently liana's installation will complain).

Should also consider adding decoupler-py>=1.4.0 as it's used throughout most tutorials (and some internal functions depend on it).

liana-py on other species

Hi, liana team
glad to see python version of liana. and it is very helpful to my project.
can you give a tutorial on how to run liana-py on other species like mouse?
It seemed that liana R version moved further than the python version.
Thank you.

Wulin

KeyError: missing elements

Hi! I have tried to use the method rank_aggregate() and cellphonedb() in Liana. Both have the below issue. I checked my data as shown below. Wonder how this key error always happens. Thank you!

Run rank_aggregate

li.mt.rank_aggregate(adata, groupby='celltype.lv1', expr_prop=0.1, verbose=True, key_added='cpdb_res')

run cellphonedb

cellphonedb(adata, groupby='celltype.lv1', expr_prop=0.1, use_raw=False, return_all_lrs=True, verbose=True)

File ~/anaconda3/envs/scenicplus/lib/python3.8/site-packages/liana/method/_pipe_utils/_pre.py:57, in assert_covered(subset, superset, subset_name, superset_name, prop_missing_allowed, verbose)
50 if prop_missing > prop_missing_allowed:
51 msg = (
52 f"Please check if appropriate organism/ID type was provided! "
53 f"Allowed proportion ({prop_missing_allowed}) of missing "
54 f"{subset_name} elements exceeded ({prop_missing:.2f}). "
55 f"Too few features from the resource were found in the data."
56 )
---> 57 raise ValueError(msg + f" [{x_missing}] missing from {superset_name}")
59 if verbose & (prop_missing > 0):
60 print(f"{prop_missing:.2f} of entities in the resource are missing from the data.")

ValueError: Please check if appropriate organism/ID type was provided! Allowed proportion (0.98) of missing resource elements exceeded (1.00). Too few features from the resource were found in the data. [A1BG, A2M, AANAT, ABCA1, ACE, ACKR1, ACKR2, ACKR3, ACKR4, ACTR2, ACVR1, ACVR1B, ACVR1C, ACVR2A, ACVR2B, ACVRL1, ADA, ADAM10, ADAM11, ADAM12, ADAM15, ADAM17, ADAM2, ADAM22, ADAM23, ADAM28, ADAM29, ADAM7, ADAM9, ADAMTS3, ADCY1, ADCY7, ADCY8, ADCY9 ....] missing from var_names

print(adata)
AnnData object with n_obs × n_vars = 24062 × 2000
obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'percent.mt', 'time_point', 'RNA_snn_res.0.5', 'seurat_clusters', 'celltype.lv1', 'celltype.lv2', 'time_month', 'scDblFinder.class', 'scDblFinder.score', 'celltype.lv3'
var: 'vst.mean', 'vst.variance', 'vst.variance.expected', 'vst.variance.standardized', 'vst.variable'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'

adata.X, adata.raw.X
(array([[-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683,
2.08207319, -0.05807534],
[ 0.73692432, -0.34369911, -0.19467216, ..., 2.23740636,
1.67791563, -0.05807534],
[-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683,
-0.45570303, -0.05807534],
...,
[-0.32665763, -0.34369911, -0.19467216, ..., 2.89027397,
-0.45570303, -0.05807534],
[ 5.5336266 , -0.34369911, -0.19467216, ..., -0.48180683,
-0.45570303, -0.05807534],
[-0.32665763, -0.34369911, -0.19467216, ..., -0.48180683,
2.08376311, -0.05807534]]),
<24062x32285 sparse matrix of type '<class 'numpy.float64'>'
with 85893851 stored elements in Compressed Sparse Row format>)

What was steady_rank and why is it removed?

Hi, just a brief question, could you explain why version 0.1.9 removed the steady_rank metric, and how the magnitude_rank differs from that?

I had been conducting analyses previously with a cutoff value of "steady_rank" to determine significant interactions, since the interest is in confident interactions regardless of cell type specificity, and want to choose an appropriate replacement.

Thanks

Add aggregate function for MISTy models (extend plots)

When running multiple misty models across slides it would be good to have a function to aggregate them.

SyntaxError in Python 3.7

I used pip to install LIANA, the installation went smoothly. However, when import it shows

Traceback (most recent call last):

  File "/opt/miniconda3/envs/python37/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3457, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "/tmp/ipykernel_2999548/1757714486.py", line 1, in <module>
    import liana as li

  File "/opt/miniconda3/envs/python37/lib/python3.7/site-packages/liana/__init__.py", line 4, in <module>
    from liana import method as mt, plotting as pl, resource as rs, utils as ut, multi as mu, funcomics as fun

  File "/opt/miniconda3/envs/python37/lib/python3.7/site-packages/liana/method/__init__.py", line 1, in <module>
    from ._Method import Method, MethodMeta, _show_methods

  File "/opt/miniconda3/envs/python37/lib/python3.7/site-packages/liana/method/_Method.py", line 133
    for sample in (progress_bar := tqdm(samples, disable=not verbose)):
                                ^
SyntaxError: invalid syntax

I believe the error show up because the operation ":=" is not added into python until 3.8. I created another environment with python 3.8, the problem was solved. Can you fix this to make Liana suitable for 3.7?

Trouble with liana: 'liana.resource' has no attribute 'explode_complexes'

Hello,

To make my analysis reproducible, I am creating an environment for my decoupler analyses and I am not able to run the liana quantification anymore. I am using the code described in the doc to perform this analysis:

liana_lr = ln.resource.select_resource()
liana_lr = ln.resource.explode_complexes(liana_lr)

# Create two new DataFrames, each containing one of the pairs of columns to be concatenated
df1 = liana_lr[['interaction', 'ligand']]
df2 = liana_lr[['interaction', 'receptor']]

# Rename the columns in each new DataFrame
df1.columns = ['interaction', 'genes']
df2.columns = ['interaction', 'genes']

# Concatenate the two new DataFrames
liana_lr = pd.concat([df1, df2], axis=0)
liana_lr['weight'] = 1

# Find duplicated rows
duplicates = liana_lr.duplicated()

# Remove duplicated rows
liana_lr = liana_lr[~duplicates]

I get the following error message:

'liana.resource' has no attribute 'explode_complexes'

It seems that the code of liana select_resource was updated last month. I have the impression that the function ln.resource.explode_complexes is not accessible anymore.

Have you encountered a similar issue?

I am using the following versions:
decoupler 1.4.0
liana 1.0.3

Thanks for your help.

By default do not convert views to `csr_matrix` when building `MistyData`.

When building MistyData objects, the .X in each view is transformed into sparse csr_matrix format. This is not optimal to do in dense matrices with few zeros since the csr_matrix is going to consume more memory than its dense counterpart. I would maybe check the proportion of 0s and then decide.

Difference on dotplot when running Steady-state Ligand-Receptor inference vignette from the Rank Aggregate section

Describe the bug
Hi !
I'm currently running the Steady-state Ligand-Receptor inference vignette from (https://liana-py.readthedocs.io/en/latest/notebooks/basic_usage.html).

I noticed a bug when running the second dotplot from the Rank Aggregate section.

I obtained :

Instead of :

It should be noted that i change filter_fun parameter by filter_lambda parameter from li.pl.dotplot() function. #80

Best regards

Maxime

Verbosity/Logging

Need to update the way that logging is done in liana -> write one function with logging to handle different cases.

Option to pass tensor_c2c parameters from liana

Is it possible to add a parameter **kwargsto

liana-py/liana/multi/to_tensor_c2c.py

Lines 22 to 34 in 37a0be1

    
           def to_tensor_c2c(adata=None, 
        
                             liana_res=None, 
        
                             sample_key=None, 
        
                             source_key='source', 
        
                             target_key='target', 
        
                             ligand_key='ligand_complex', 
        
                             receptor_key='receptor_complex', 
        
                             score_key=None, 
        
                             non_expressed_fill=None, 
        
                             inverse_fun = lambda x: 1 - x, 
        
                             non_negative = True, 
        
                             return_dict=False, 
        
                             ):

So we can pass them to tensor-cell2cell directly here and control other properties when building the tensor:

liana-py/liana/multi/to_tensor_c2c.py

Lines 127 to 132 in 37a0be1

    
           tensor = c2c.tensor.dataframes_to_tensor(liana_res, 
        
                                                    sender_col=source_key, 
        
                                                    receiver_col=target_key, 
        
                                                    ligand_col=ligand_key, 
        
                                                    receptor_col=receptor_key, 
        
                                                    score_col=score_key)

For example, parameters such as how, outer_fraction, among others could be now controlled.

External + GPU-enabled MISTy models

I should add a class to define misty's models, and enable external object models to be passed, e.g. asserting and handling the features, passing kwargs

limit plotting in li.pl.dotplot to selected ligands/receptors

I was looking for a way to limit plotting using li.pl.dotplot function to a selection of ligand/receptor complexes. But the plot function does not seem to offer that. What are my options?

Numpy 1.24.x compatibility

Describe the bug
Lianapy is incompatible with numpy 1.24.x because "float" is no longer importable

With numpy version 1.23.5

0.40 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 713 samples and 17467 features
Assuming that counts were `natural` log-normalized!
.venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:419: RuntimeWarning: overflow encountered in power
.venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:412: RuntimeWarning: invalid value encountered in subtract
venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:412: RuntimeWarning: invalid value encountered in subtract
Running CellPhoneDB
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1136.74it/s]
Running Connectome
Running log2FC
Running NATMI
Running SingleCellSignalR
Running CellChat
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 187.00it/s]

With version 1.24.x

0.40 of entities in the resource are missing from the data.
Generating ligand-receptor stats for 713 samples and 17467 features
Assuming that counts were `natural` log-normalized!
.venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:419: RuntimeWarning: overflow encountered in power
.venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:412: RuntimeWarning: invalid value encountered in subtract
.venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py:412: RuntimeWarning: invalid value encountered in subtract
Running CellPhoneDB
Traceback (most recent call last):
  File "src/interpret/lianapy/script.py", line 63, in <module>
    main()
  File "script.py", line 41, in main
    liana.mt.rank_aggregate(
  File ".venv/lib/python3.10/site-packages/liana/method/sc/_rank_aggregate.py", line 139, in __call__
    liana_res = liana_pipe(adata=adata,
  File ".venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py", line 185, in liana_pipe
    _run_method(lr_res=lr_res.copy(),
  File "venv/lib/python3.10/site-packages/liana/method/_liana_pipe.py", line 462, in _run_method
    _get_means_perms(adata=adata,
  File "venv/lib/python3.10/site-packages/liana/method/_pipe_utils/_get_mean_perms.py", line 49, in _get_means_perms
    if isinstance(norm_factor, np.float):
  File ".venv/lib/python3.10/site-packages/numpy/__init__.py", line 305, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations. Did you mean: 'cfloat'?

To Reproduce
If possible please provide a minimal reproducible example.
Unfortunatly, I am not able to send any data. However, I think this bug can be reproduced by using any random anndata file as long as version of numpy is 1.24.0 or older.

Implement Geometric mean /w Perms

Querying Pypath & BioCypher knowledge graph

LIANA's resources should be directly accessed via Pypath.

Extensive PK graphs with BioCypher would also allow further refining of results and inference.

This is to be added in future updates.

The mouse resource are missing important interactions that should be possible to convert

Describe the bug
Hi and thanks for a nice tool. For example, Cxcl9 is missing as ligand in the database, while Cxcl10 is there, and they have many receptors in common. Converting CXCL9 to Cxcl9 should be doable? I'm guessing something has gone wrong and that many interactions are missing in the mouse data.

To Reproduce
Just look in the database

Implement CellChat

AttributeError: module 'corneto.methods.carnival' has no attribute 'select_mip_solver'

Hi
I am trying to replicate the results in this link:
https://liana-py.readthedocs.io/en/stable/notebooks/targeted.html
but got this error:
AttributeError: module 'corneto.methods.carnival' has no attribute 'select_mip_solver'
when I run the same code in your link

df_res, problem = li.mt.find_causalnet(
    prior_graph,
    input_scores,
    output_scores,
    node_weights,
    # penalize (max_penalty) nodes with counts in less than 0.1 of the cells
    node_cutoff=0.1,
    max_penalty=1,
    # the penaly of those in > 0.1 prop of cells set to:
    min_penalty=0.01,
    edge_penalty=0.01,
    verbose=True,
    max_seconds=60*3
    )

I appreciate any suggestions.

Visualize interactions across samples

Dotplot of ligand-receptor interactions across samples

Required to e.g. supplement analysis with Tensor-cell2cell

li.multi.df_to_lr function gave gene Value error

Hi !
I was trying "Hypothesis-testing for CCC & Downstream Signalling Networks" tutorial.

During the "DEA to Ligand-Receptor Interactions" part I confront an issue.

When I run,

lr_res = li.multi.df_to_lr(aadata,
dea_df=dea_df,
resource_name='consensus',
expr_prop=0.1, # calculated for adata as passed - used to filter interactions
groupby='cell_type2',
stat_keys=['stat', 'pvalue', 'padj'],
use_raw=False,
complex_col='stat', # NOTE: we use the Wald Stat to deal with complexes
verbose=True,
return_all_lrs=False,
)

I get warning that "all gene name values are not are not valid obs/ var names or indices."
I checked, that aadata's var names has all genes.
I am attaching a screenshot of the situation.

I wonder what went wrong.

Best regards

Metabolite-mediated CCC tutorial

TODO: add a tutorial on metabolite-mediated CCC using MetalinksDB

comparison across conditions

Hi, liana team
in liana R version, there is 'LIANA across Conditions with cell2cell-Tensor'.
Is it possible to add this feature (or even other features) to liana-py?
Since sc dataset can be run more efficiently in python and liana is such a useful tool.
Thank you.

About 'expr_prop' in rank aggregates

Hi,

According to the documentation, 'expr_prop' corresponds to the Minimum expression proportion for the ligands/receptors (and their subunits) in the corresponding cell identities. It is unclear to me what this means.

a) Is it referring to the ratio of expression between ligand / receptor in say Cell type 1? i.e. 0.5 would mean the receptor is twice as expressed as the ligand in Cell type 1 on average (for example, 3/6; ligand/receptor).

b) Is it referring to the minimum proportion of cells within Cell type 1 that express the ligand and the receptor? i.e. 0.5 would mean that both ligand and receptor are expressed in 50% of cells of Cell type 1.

ValueError: Observations annot. `var` must have as many rows as `X` has columns...

Hello!

First of all, thank you for this very complete package!

I am trying to run lr_bivar function on a spatial object but I am running into this error:

ValueError: Observations annot. `var` must have as many rows as `X` has columns (17353), but has 17506 rows.

On note, I am creating the anndata object from an .h5ad file writen from a Seurat object. Nevertheless, I checked var and X dimensions and they were the same (17353). I think the error may be related to lr_bivar creating another anndata object within the function.

I have included all the files needed to create the spatial anndata object in this folder.

Here is my code:

import pandas as pd
import anndata as ad
import liana as li

# Read H5AD
adata = ad.read_h5ad("path_to/CID44971.h5ad")
adata.layers['counts'] = adata.X.copy()

# Fill image info
from pathlib import Path
from matplotlib.image import imread
import json
path = Path("path_to/CID44971_spatial/slide1/")
library_id = "library_id"

adata.uns["spatial"] = {"library_id": {}}
files = dict(
            tissue_positions_file = path / "tissue_positions_list.csv",
            scalefactors_json_file = path / "scalefactors_json.json",
            hires_image = path / "tissue_hires_image.png",
            lowres_image = path / "tissue_lowres_image.png",
        )

adata.uns["spatial"][library_id]["images"] = dict()
for res in ["hires", "lowres"]:
    adata.uns["spatial"][library_id]["images"][res] = imread(
        str(files[f"{res}_image"])
    )

adata.uns["spatial"][library_id]["scalefactors"] = json.loads(
    files["scalefactors_json_file"].read_bytes()
)
positions = pd.read_csv(
    files["tissue_positions_file"],
    header = None,
    index_col = 0,
)
positions.columns = [
    "in_tissue",
    "array_row",
    "array_col",
    "pxl_col_in_fullres",
    "pxl_row_in_fullres",
]
adata.obs = adata.obs.join(positions, how = "left")
adata.obsm["spatial"] = adata.obs[["x", "y"]].to_numpy()
adata.obs.drop(columns=["x", "y"], inplace = True)

# Liana
li.ut.spatial_neighbors(adata, bandwidth = 200, cutoff = 0.1, kernel = "gaussian", set_diag = True)
li.mt.lr_bivar(adata,
               function_name = "cosine", # Name of the function
               n_perms = 100, # Number of permutations to calculate a p-value
               mask_negatives = False, # Whether to mask LowLow/NegativeNegative interactions
               add_categories = True, # Whether to add local categories to the results
               expr_prop = 0.2, # Minimum expr. proportion for ligands/receptors and their subunits
               use_raw = False,
               verbose = True)

And here is the complete error message:

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/liana/method/sp/_lr_bivar.py:86, in SpatialLR.__call__(self, adata, function_name, resource_name, resource, interactions, expr_prop, n_perms, mask_negatives, seed, add_categories, use_raw, layer, connectivity_key, inplace, key_added, obsm_added, lr_sep, verbose)
     15 def __call__(self,
     16              adata: AnnData,
     17              function_name: str,
   (...)
     33              verbose: Optional[bool] = False,
     34              ):
     35     \"\"\"
     36     Local ligand-receptor interaction metrics and global scores.
     37     
   (...)
     83     if `inplace=False`, returns a the above.
     84     \"\"\"
---> 86     lr_res, local_scores = super().__call__(
     87         mdata=adata,
     88         function_name=function_name,
     89         connectivity_key=connectivity_key,
     90         resource_name=resource_name,
     91         resource=resource,
     92         interactions=interactions,
     93         nz_threshold=expr_prop,
     94         n_perms=n_perms,
     95         mask_negatives=mask_negatives,
     96         add_categories=add_categories,
     97         x_mod=True,
     98         y_mod=True,
     99         x_use_raw=use_raw,
    100         x_layer=layer,
    101         seed=seed,
    102         verbose=verbose,
    103         xy_sep=lr_sep,
    104         x_name='ligand',
    105         y_name='receptor',
    106         inplace=False,
    107         complex_sep='_'
    108         )
    110     return self._handle_return(adata, lr_res, local_scores, key_added, obsm_added, inplace)

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/liana/method/sp/_SpatialBivariate.py:221, in SpatialBivariate.__call__(self, mdata, x_mod, y_mod, function_name, interactions, resource, resource_name, connectivity_key, mod_added, key_added, mask_negatives, add_categories, n_perms, seed, nz_threshold, x_use_raw, x_layer, x_transform, y_use_raw, y_layer, y_transform, x_name, y_name, complex_sep, xy_sep, remove_self_interactions, inplace, verbose)
    218 assert_covered(entities, adata.var_names, verbose=verbose)
    220 # Filter to only include the relevant features
--> 221 adata = adata[:, np.intersect1d(entities, adata.var.index)]
    223 xy_stats = pd.DataFrame({'means': adata.X.mean(axis=0).A.flatten(),
    224                          'props': _get_props(adata.X)},
    225                         index=adata.var_names
    226                         ).reset_index().rename(columns={'index': 'gene'})
    227 # join global stats to LRs from resource

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:1171, in AnnData.__getitem__(self, index)
   1169 \"\"\"Returns a sliced view of the object.\"\"\"
   1170 oidx, vidx = self._normalize_indices(index)
-> 1171 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:360, in AnnData.__init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    358     if not isinstance(X, AnnData):
    359         raise ValueError(\"`X` has to be an AnnData object.\")
--> 360     self._init_as_view(X, oidx, vidx)
    361 else:
    362     self._init_as_actual(
    363         X=X,
    364         obs=obs,
   (...)
    376         filemode=filemode,
    377     )

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:416, in AnnData._init_as_view(self, adata_ref, oidx, vidx)
    414 # fix categories
    415 uns = copy(adata_ref._uns)
--> 416 self._remove_unused_categories(adata_ref.obs, obs_sub, uns)
    417 self._remove_unused_categories(adata_ref.var, var_sub, uns)
    418 # set attributes

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:1181, in AnnData._remove_unused_categories(self, df_full, df_sub, uns)
   1179 all_categories = df_full[k].cat.categories
   1180 with pd.option_context(\"mode.chained_assignment\", None):
-> 1181     df_sub[k] = df_sub[k].cat.remove_unused_categories()
   1182 # also correct the colors...
   1183 color_key = f\"{k}_colors\"

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/views.py:78, in _SetItemMixin.__setitem__(self, idx, value)
     71 else:
     72     warnings.warn(
     73         f\"Trying to modify attribute `.{self._view_args.attrname}` of view, \"
     74         \"initializing view as actual.\",
     75         ImplicitModificationWarning,
     76         stacklevel=2,
     77     )
---> 78     with view_update(*self._view_args) as container:
     79         container[idx] = value

File ~/miniconda3/envs/liana/lib/python3.11/contextlib.py:137, in _GeneratorContextManager.__enter__(self)
    135 del self.args, self.kwds, self.func
    136 try:
--> 137     return next(self.gen)
    138 except StopIteration:
    139     raise RuntimeError(\"generator didn't yield\") from None

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/views.py:52, in view_update(adata_view, attr_name, keys)
     32 @contextmanager
     33 def view_update(adata_view: AnnData, attr_name: str, keys: tuple[str, ...]):
     34     \"\"\"Context manager for updating a view of an AnnData object.
     35 
     36     Contains logic for \"actualizing\" a view. Yields the object to be modified in-place.
   (...)
     50     `adata.attr[key1][key2][keyn]...`
     51     \"\"\"
---> 52     new = adata_view.copy()
     53     attr = getattr(new, attr_name)
     54     container = reduce(lambda d, k: d[k], keys, attr)

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:1582, in AnnData.copy(self, filename)
   1576 if not self.isbacked:
   1577     if self.is_view and self._has_X():
   1578         # TODO: How do I unambiguously check if this is a copy?
   1579         # Subsetting this way means we don’t have to have a view type
   1580         # defined for the matrix, which is needed for some of the
   1581         # current distributed backend. Specifically Dask.
-> 1582         return self._mutated_copy(
   1583             X=_subset(self._adata_ref.X, (self._oidx, self._vidx)).copy()
   1584         )
   1585     else:
   1586         return self._mutated_copy()

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:1527, in AnnData._mutated_copy(self, **kwargs)
   1525 elif self.raw is not None:
   1526     new[\"raw\"] = self.raw.copy()
-> 1527 return AnnData(**new)

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:362, in AnnData.__init__(self, X, obs, var, uns, obsm, varm, layers, raw, dtype, shape, filename, filemode, asview, obsp, varp, oidx, vidx)
    360     self._init_as_view(X, oidx, vidx)
    361 else:
--> 362     self._init_as_actual(
    363         X=X,
    364         obs=obs,
    365         var=var,
    366         uns=uns,
    367         obsm=obsm,
    368         varm=varm,
    369         raw=raw,
    370         layers=layers,
    371         dtype=dtype,
    372         shape=shape,
    373         obsp=obsp,
    374         varp=varp,
    375         filename=filename,
    376         filemode=filemode,
    377     )

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:548, in AnnData._init_as_actual(self, X, obs, var, uns, obsm, varm, varp, obsp, raw, layers, dtype, shape, filename, filemode)
    544 # annotations
    545 self._obs = _gen_dataframe(
    546     obs, [\"obs_names\", \"row_names\"], source=source, attr=\"obs\", length=n_obs
    547 )
--> 548 self._var = _gen_dataframe(
    549     var, [\"var_names\", \"col_names\"], source=source, attr=\"var\", length=n_vars
    550 )
    552 # now we can verify if indices match!
    553 for attr_name, x_name, idx in x_indices:

File ~/miniconda3/envs/liana/lib/python3.11/functools.py:909, in singledispatch.<locals>.wrapper(*args, **kw)
    905 if not args:
    906     raise TypeError(f'{funcname} requires at least '
    907                     '1 positional argument')
--> 909 return dispatch(args[0].__class__)(*args, **kw)

File ~/miniconda3/envs/liana/lib/python3.11/site-packages/anndata/_core/anndata.py:180, in _gen_dataframe_df(anno, index_names, source, attr, length)
    170 @_gen_dataframe.register(pd.DataFrame)
    171 def _gen_dataframe_df(
    172     anno: pd.DataFrame,
   (...)
    177     length: int | None = None,
    178 ):
    179     if length is not None and length != len(anno):
--> 180         raise _mk_df_error(source, attr, length, len(anno))
    181     anno = anno.copy(deep=False)
    182     if not is_string_dtype(anno.index):

ValueError: Observations annot. `var` must have as many rows as `X` has columns (17353), but has 17506 rows."
}

Thank you very much for your help!

Run permutation-based methods with `n_perms=0`

Currently permutation-based methods like cellchat and cellphonedb need have at least n_perms=1 to be able to run. Since in some situations we don't care about the permuted score or p-val, just the first estimate, it would be good to make this methods handle n_perms=1

The solution could be that the permuted score should be 1 in case of p-values and for z-scores you could set them to 0. Alternatively to nans.

Improve documentation

Currently, the documentation is repetitive and with redundant parameter descriptions.

docrep can be used to store all repetitive parameters as constants - e.g. as done in squidpy:
https://github.com/scverse/squidpy/blob/453015014a4b875db79a84e9ead9c0c59d72d909/squidpy/_docs.py#L11

Help understanding causal network output

Hello,

I wanted to ask for clarification regarding the output of the causal network vignette "Hypothesis-testing for CCC & Downstream Signalling Networks".

Here is my current understanding:
column name (observed values when only a few) - description
source - source node in the PPI, there are some values here that I don't understand ("_s","_pert_c0","_meas_c0","")
source_type (unmeasured,input) - are the input nodes the receptors that were identified in the previous step? are unmeasured genes that have no expression? or genes that are not part of that receptor group?
source_weight - not sure what this is, the example only has two values (0 and -4.66)
source_pred_val (1,0,-1) - node is upregulated, downregulated, is zero no differential expression?
target - target node in the PPI, there are some values here that I don't understand ("_s","_t","_meas_c0","")
target_type (unmeasured, output) - similar to "source_type" except "output" would be TFs?
target_weight - not sure what this is? the example has a few more values than "source_weight", they are all 0 or positive.
target_pred_val (1,-1) - node is upregulated, downregulated? no zeros in this one
edge_type (1,-1,0) - unsure what this is
edge_pred_val(1,-1) - whether the edge upregulates or downregulates the target node.

Thank you in advance!

Edgar

Hypothesis-testing for comparisons between conditions

Hi @dbdimitrov,

this is to follow up our discussion on methods for differential cell2cell communication in theislab/single-cell-best-practices#140.

In your NatComm paper and the CCC book chapter, you seem to focus mostly on methods for inferring steady-state cell2cell interactions. However, recently, I tend to encounter more datasets where I actually want to compare cell2cell communication between conditions.
For instance, in our lung cancer atlas paper, we performed comparisons between the LUAD and LUSC subtypes, or different patient subgroups, driven by the question "how could a certain cell-type shape the tumor microenvironment by expressing certain cell surface ligands?".

I am currently aware of the following methods to compare between conditions:

cellphonedb v3. Essentially, the differential expression of ligands and receptors is computed with a DE method of the user's choice.
differential niechenet. Again, this is based on a custom list of differentially expressed genes
iTALK also seems to be based on differential gene expression which can be computed either between cell-types or between conditions.

All these approaches suffer from the same pseudoreplication-bias problem as single-cell DE in general, so one would need to either use pseudo-bulk or LME models to avoid inflated p-values.

In the atlas (and also for other projects), so far I have been using DESeq2 on Pseudobulk and a custom Python reimplementation of the cellphonedb v3 algorithm (https://github.com/icbi-lab/nmp-liver/blob/main/lib/scanpy_helper/scanpy_helpers/cell2cell.py) combined with a list of ligands and receptors from omnipath. I ended up with the following visualization, which shows differentially expressed ligands between conditions at the top, and other cells that express the corresponding receptors (i.e. might be impacted by those changes) at the bottom:

Do you have any other suggestions/recommendations how to best compare cell-cell interactions between conditions? And do you plan to implement any of these methods in liana-py?

CC @Zethson @abyssum

Split global from local scores

I should separate Global Moran's R from local, and implement more simple bivariate metrics, e.g. Geary's C

Single-cell Multi-omics tutorial

LIANA+ supports MuData as input, hence with appropriate data transformation and resource; any multi-omics dataset can be used.

To highlight this, I should include a tutorial on multi-omics data - e.g. CITE-seq and spatial CITE-seq

Error in function documentation / vignette

Hello !
I'm currently running the Steady-state Ligand-Receptor inference vignette from (https://liana-py.readthedocs.io/en/latest/notebooks/basic_usage.html).

When running li.pl.dotplot() and li.pl.tileplot() from the vignette i have the following errors :

li.pl.dotplot(adata = adata,
              colour='lr_means',
              size='cellphone_pvals',
              inverse_size=True, # we inverse sign since we want small p-values to have large sizes
              source_labels=['CD34+', 'CD56+ NK', 'CD14+ Monocyte'],
              target_labels=['CD34+', 'CD56+ NK'],
              figure_size=(8, 7),
              # finally, since cpdbv2 suggests using a filter to FPs
              # we filter the pvals column to <= 0.05
              filter_fun=lambda x: x['cellphone_pvals'] <= 0.05,
              uns_key='cpdb_res' # uns_key to use, default is 'liana_res'
             )

Error :
TypeError: dotplot() got an unexpected keyword argument 'filter_fun'

I noticed that filter_fun parameter have been change as filter_lambda.

I get the same type of error when running li.pl.tileplot() :

my_plot = li.pl.tileplot(adata = adata,
                         # NOTE: fill & label need to exist for both
                         # ligand_ and receptor_ columns
                         fill='means',
                         label='props',
                         label_fun=lambda x: f'{x:.2f}',
                         top_n=10,
                         orderby='cellphone_pvals',
                         orderby_ascending=True,
                         source_labels=['CD34+', 'CD56+ NK', 'CD14+ Monocyte'],
                         target_labels=['CD34+', 'CD56+ NK'],
                         uns_key='cpdb_res', # NOTE: default is 'liana_res'
                         source_title='Ligand',
                         target_title='Receptor',
                         )
my_plot

Error :
TypeError: tileplot() got an unexpected keyword argument 'source_title'

source_title and target_title parameters have been removed

So i think this vignette and function documentation in the API are not up to date.

Thanks a lot.

Best regards.

Maxime

Getting TypeError: Accessing a group is done with bytes or str, not <class 'tuple'> when running `mu.tl.mofa()`

I have my own anndata running the through the Jupyter notebook. Everything works fine until the saving the h5ad files after running mofa and below is the error message in detail:

Warning: Output file models/mofacell.h5ad already exists, it will be replaced
Saving model in models/mofacell.h5ad...
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In [13], line 1
----> 1 mu.tl.mofa(mdata,
      2            use_obs='union',
      3            convergence_mode='medium',
      4            n_factors=5,
      5            seed=1337,
      6            outfile='models/mofacell.h5ad',
      7            use_var='highly_variable',  
      8            gpu_mode=True
      9            
     10            )

File /software/team283/tl7/miniconda3/envs/scarches/lib/python3.9/site-packages/muon/_core/tools.py:585, in mofa(data, groups_label, use_raw, use_layer, use_var, use_obs, likelihoods, n_factors, scale_views, scale_groups, center_groups, ard_weights, ard_factors, spikeslab_weights, spikeslab_factors, n_iterations, convergence_mode, gpu_mode, use_float32, smooth_covariate, smooth_warping, smooth_kwargs, save_parameters, save_data, save_metadata, seed, outfile, expectations, save_interrupted, verbose, quiet, copy)
    582     data.obsm["X_mofa"] = z
    584 # Weights
--> 585 w = np.concatenate([v[:, :] for k, v in f["expectations"]["W"].items()], axis=1).T
    586 if use_var:
    587     # Set the weights of features that were not used to zero
    588     data.varm["LFs"] = np.zeros(shape=(data.n_vars, w.shape[1]))

File /software/team283/tl7/miniconda3/envs/scarches/lib/python3.9/site-packages/muon/_core/tools.py:585, in <listcomp>(.0)
    582     data.obsm["X_mofa"] = z
    584 # Weights
--> 585 w = np.concatenate([v[:, :] for k, v in f["expectations"]["W"].items()], axis=1).T
    586 if use_var:
    587     # Set the weights of features that were not used to zero
    588     data.varm["LFs"] = np.zeros(shape=(data.n_vars, w.shape[1]))

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File /software/team283/tl7/miniconda3/envs/scarches/lib/python3.9/site-packages/h5py/_hl/group.py:330, in Group.__getitem__(self, name)
    328     oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
    329 else:
--> 330     raise TypeError("Accessing a group is done with bytes or str, "
    331                     " not {}".format(type(name)))
    333 otype = h5i.get_type(oid)
    334 if otype == h5i.GROUP:

TypeError: Accessing a group is done with bytes or str,  not <class 'tuple'>

Check sample_key & groupby

Additional checks and transformation to categories for obs columns: groupby & sample_key

If either column is not categorical, liana may throw an exception

Should we exclude CellTalkDB from curated resources list?

@deeenes I keep finding strange interactions that occur as PPIs but not actual CCC and are reported as such in CellTalkDB - e.g. SORBS1-INSR or interactions with SMAD3. Might be best if we exclude CellTalkDB from the curated resources list in OmniPath, and hence LIANA.

AttributeError: module 'liana.method' has no attribute 'build_prior_network'

I ran the following code incorrectly:
prior_graph = li.method.build_prior_network(input_pkn, input_scores, output_scores, verbose=True)
The content of the error is:
AttributeError: module 'liana.method' has no attribute 'build_prior_network'

Additional metadata in results dataframe

Hi, I would really like a way to print the percent expression and CPM values for the (minimally expressed) ligand and receptor in the results dataframe for li.mt.rank_aggregate() or any of the individual methods (which I know also have the supp_cols argument - maybe it is possible there? but rank_aggregate() seems to lose supp_cols)

The ones of interest would be something like:

ligand_percent
receptor_percent
ligand_cpm
receptor_cpm

It'd improve the interpretability of the results a lot - for instance, when return_all_lrs=True and an interaction fails the tests, it gives insight into why it fails, because of the ligand or receptor. Also, it allows for simple TPM threshold-based interaction calling - the most rudimentary method discussed by Armingol et al. 2021 (https://doi.org/10.1038/s41576-020-00292-x), but still a useful one. I don't know if you think that CPM thresholding would work as a "method" to liana, but either way least returning this info in the dataframe would be nice.

Thanks for any input!

Clean up MISTy's pipeline

Refactor MISTy's pipeline & remove extra_masking
Check for targets with variance of 0 prior to modelling
Stream rather than concat

Error with import liana

Hi I updated to the new release of liana-py 0.1.7
and am getting an error with import liana, I did not have this error with the previous release on the same system and python environment

Python 3.7.2 (default, Dec 18 2019, 16:55:18)
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import liana
Traceback (most recent call last):
File "", line 1, in
File "/.mounts/labs/abelsonlab/public/Ido/python_venvs/liana_venv/lib/python3.7/site-packages/liana/init.py", line 4, in
from liana import method as mt, plotting as pl, resource as rs, utils as ut, multi as mu, funcomics as fun
File "/.mounts/labs/abelsonlab/public/Ido/python_venvs/liana_venv/lib/python3.7/site-packages/liana/method/init.py", line 1, in
from ._Method import Method, MethodMeta, _show_methods
File "/.mounts/labs/abelsonlab/public/Ido/python_venvs/liana_venv/lib/python3.7/site-packages/liana/method/_Method.py", line 133
for sample in (progress_bar := tqdm(samples, disable=not verbose)):
^
SyntaxError: invalid syntax

Missing Resources

I have been trying to run liana and I get this error. I have checked var_names and they are in the human gene symbol format. Furthermore, I have manually checked the var_names that are printed in the error message as "missing from var_names" are actually present in the adata. I was hoping to get some further guidance on how I could resolve this. Thank you.

li.mt.rank_aggregate(a_immune, groupby='cell_type', expr_prop=0.1, verbose=True)

Generate Documentation /w Basic tutorial

Cellphone DB (and other database versions)

Hi,

Sorry if this information is available somewhere but I cannot find it. I would like to know what version of cellphoneDB (and other databsases) you have inside the liana-py package.

I am aware you import your databases from OmniPathR (unless I am mistaken) and have checked their version of cellphoneDB: saezlab/OmnipathR#85

I wanted to double check that your version matches theirs and if not could it be updated?

Thanks,
Simon

Add bumpversion + metadata

Generate Union of Causal Net Predictions

Some MILP problems result in multiple equally good (local) solutions. A non-arbitrary way (independent of random state) to deal with those is to create a union of them.

SCIPy will additionally not converge properly with larger problems (now more notable due to recent update and size increase of the OmniPath PPI).

Implement an option to internally generate a union of solutions
Change example to use Gurobi

Automatic sorting of LR results

Right now LR results are not sorted by default, this can cause problems if users are not aware of it and just take the top N rows of the returned dataframe without checking the stats values.

A solution could be to sort the obtained results by the default stat of each method, be it magnitude or specificity score.

Bug report for a tileplot function.

Here is my code.

my_plot = li.pl.tileplot(adata = adata,
                         # NOTE: fill & label need to exist for both
                         # ligand_ and receptor_ columns
                         fill='means',
                         label='props',
                         label_fun=lambda x: f'{x:.2f}',
                         top_n=10,
                         orderby='cellphone_pvals',
                         orderby_ascending=True,
                         source_labels=["T cell", "NK cell",],
                         target_labels=["DC", "IFN-TAMs", "Inflam-TAMs", "Prolif-TAMs", "Reg-TAMs", "LA-TAMs", "Monocyte", "Neutrophils"],
                         # uns_key='cpdb_res' # NOTE: default is 'liana_res',
                         figure_size=(10, 6),
                         )
my_plot

This is my plot.

as you see, The scales of the left and right colormaps are different.

Thank you for fix, in advance.

Sort interactions names in `li.pl.tileplot` by the `orderby` argument

Currently, interactions are not sorted in li.pl.tileplot, would be good to sort them by the orderby argument.

Implement GPU to speed up the CCC caculation in process of rank_aggregate method (CellChat part)?

Hello, very powerful method to prediction CCC from scRNA-seq.
When I try to use the "li.mt.rank_aggregate", the CellChat run slowly.
Would you speed up the process based on GPU?

Thanks!

Clarification regarding hypothesis testing

Hi Daniel,
Thank you for the fantastic work!
I was looking into applying the hypothesis-testing workflow onto some of my samples and was wondering if you could offer some clarification.
When I run li.multi.df_to_lr after the DEA, is there any way to specify sampleID? If I have many different samples for each condition, is there a way for liana to ensure only sender and receiver cells from the same sample are used, as far as I see this approach is only used in the Tensor C2C or MOFA workflows with li.mt.rank_aggregate.by_sample, but not in this workflow. Would you recommend just filtering the adata to only include each sample one at a time before running liana?
Thank you,
Ido

Set scanpy as a "common" dependency

Currently scanpy is being used as a core dependency of LIANA.

However, in reality it's used only in a couple of places, the most problematic being the calculation of DEGs within _liana_pipe.py.
If these are changed, then scanpy can be removed from the core dependencies.

	def to_tensor_c2c(adata=None,
	liana_res=None,
	sample_key=None,
	source_key='source',
	target_key='target',
	ligand_key='ligand_complex',
	receptor_key='receptor_complex',
	score_key=None,
	non_expressed_fill=None,
	inverse_fun = lambda x: 1 - x,
	non_negative = True,
	return_dict=False,
	):

	tensor = c2c.tensor.dataframes_to_tensor(liana_res,
	sender_col=source_key,
	receiver_col=target_key,
	ligand_col=ligand_key,
	receptor_col=receptor_key,
	score_col=score_key)