Giter Site home page Giter Site logo

theislab / cpa Goto Github PK

View Code? Open in Web Editor NEW
75.0 4.0 15.0 55.34 MB

The Compositional Perturbation Autoencoder (CPA) is a deep generative framework to learn effects of perturbations at the single-cell level. CPA performs OOD predictions of unseen combinations of drugs, learns interpretable embeddings, estimates dose-response curves, and provides uncertainty estimates.

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
single-cell-genomics single-cell-rna-seq scrna-seq human-cell-atlas

cpa's Introduction

CPA - Compositional Perturbation Autoencoder

PyPI version Documentation Status Downloads

What is CPA?

Alt text

CPA is a framework to learn the effects of perturbations at the single-cell level. CPA encodes and learns phenotypic drug responses across different cell types, doses, and combinations. CPA allows:

  • Out-of-distribution predictions of unseen drug and gene combinations at various doses and among different cell types.
  • Learn interpretable drug and cell-type latent spaces.
  • Estimate the dose-response curve for each perturbation and their combinations.
  • Transfer pertubration effects from on cell-type to an unseen cell-type.
  • Enable batch effect removal on a latent space and also gene expression space.

Installation

Installing CPA

You can install CPA using pip and also directly from the github to access latest development version. See detailed instructions here.

How to use CPA

Several tutorials are available here to get you started with CPA. The following table contains the list of tutorials:

Description Link
Predicting combinatorial drug perturbations Open In Colab - Open In Documentation
Predicting unseen perturbations uisng external embeddings enabling the model to predict unseen reponses to unseen drugs Open In Colab - Open In Documentation
Predicting combinatorial CRISPR perturbations Open In Colab - Open In Documentation
Context transfer (i.e. predict the effect of a perturbation (e.g. disease) on unseen cell types or transfer perturbation effects from one context to another) demo on IFN-β scRNA perturbation dataset Open In Colab - Open In Documentation
Batch effect removal in gene expression and latent space Open In Colab - Open In Documentation

How to optmize CPA hyperparamters for your data

We provide an example script to use the built-in hyperparameter optimization function in CPA (based on scvi-tools hyperparam optimizer). You can find the script at examples/tune_script.py.

After the hyperparameter optimization using tune_script.py is done, result_grid.pkl is saved in your current directory using the pickle library. You can load the results using the following code:

import pickle
with open('result_grid.pkl', 'rb') as f:
    result_grid = pickle.load(f)

From here, you can follow the instructions in the Ray Documentations to analyze the run, and choose the best hyperparameters for your data.

You can also use the integration with wandb to log the hyperparameter optimization results. You can find the script at examples/tune_script_wandb.py. --> use_wandb=True

Everything is based on Ray Tune. You can find more information about the hyperparameter optimization in the Ray Tune Documentations.

The tuner is adapted and adjusted from scvi-tools v1.2.0 (unreleased) release notes

Datasets and Pre-trained models

Datasets and pre-trained models are available here.

Recepie for Pre-processing a custom scRNAseq perturbation dataset

If you have access to you raw data, you can do the following steps to pre-process your dataset. A raw dataset should be a scanpy object containing raw counts and available required metadata (i.e. perturbation, dosage, etc.).

Pre-processing steps

  1. Check for required information in cell metadata: a) Perturbation information should be in adata.obs. b) Dosage information should be in adata.obs. In cases like CRISPR gene knockouts, disease states, time perturbations, etc, you can create & add a dummy dosage in your adata.obs. For example:

        adata.obs['dosage'] = adata.obs['perturbation'].astype(str).apply(lambda x: '+'.join(['1.0' for _ in x.split('+')])).values

    c) [If available] Cell type information should be in adata.obs. d) [Multi-batch integration] Batch information should be in adata.obs.

  2. Filter out cells with low number of counts (sc.pp.filter_cells). For example:

    sc.pp.filter_cells(adata, min_counts=100)

    [optional]

    sc.pp.filter_genes(adata, min_counts=5)
  3. Save the raw counts in adata.layers['counts'].

    adata.layers['counts'] = adata.X.copy()
  4. Normalize the counts (sc.pp.normalize_total).

    sc.pp.normalize_total(adata, target_sum=1e4, exclude_highly_expressed=True)
  5. Log transform the normalized counts (sc.pp.log1p).

    sc.pp.log1p(adata)
  6. Highly variable genes selection: There are two options: 1. Use the sc.pp.highly_variable_genes function to select highly variable genes. python sc.pp.highly_variable_genes(adata, n_top_genes=5000, subset=True) 2. (Highly Recommended specially for Multi-batch integration scenarios) Use scIB's highly variable genes selection function to select highly variable genes. This function is more robust to batch effects and can be used to select highly variable genes across multiple datasets. python import scIB adata_hvg = scIB.pp.hvg_batch(adata, batch_key='batch', n_top_genes=5000, copy=True)

Congrats! Now you're dataset is ready to be used with CPA. Don't forget to save your pre-processed dataset using adata.write_h5ad function.

Support and contribute

If you have a question or new architecture or a model that could be integrated into our pipeline, you can post an issue

Reference

If CPA is helpful in your research, please consider citing the Lotfollahi et al. 2023

@article{lotfollahi2023predicting,
    title={Predicting cellular responses to complex perturbations in high-throughput screens},
    author={Lotfollahi, Mohammad and Klimovskaia Susmelj, Anna and De Donno, Carlo and Hetzel, Leon and Ji, Yuge and Ibarra, Ignacio L and Srivatsan, Sanjay R and Naghipourfar, Mohsen and Daza, Riza M and 
    Martin, Beth and others},
    journal={Molecular Systems Biology},
    pages={e11517},
    year={2023}
}

cpa's People

Contributors

alejandrotl avatar arianamani avatar m0hammadl avatar naghipourfar avatar zgr2788 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cpa's Issues

How to use external gene embedding in CPA?

Thanks for the great jobs!

In the tutorial, you mentioned that "if we want to use CPA to predict X+Y when either X, Y, or both are unseen" in predicting Crispr perturbation, we need to use external gene embedding, but I cannot figure out how to provide this to CPA although I follows the drug example you referred? So, could you give a more detailed example about providing gene embedding to CPA? 

Thanks for your reply.

tutorial CPAModule.init

Hi all, great work.
Trying to run this function from the tutorial.
Have passed test arguments and geting the error below. Any clue? Seems like the packages could have been updated but the tutorial still failing to call the functions properly maybe?

Tutorial

Thanks a lot!

model = cpa.CPA(adata=adata,
n_latent=64,
loss_ae='gauss',
doser_type='logsigm',
split_key='split',
train_split='train',
valid_split='test',
test_split='ood',
**ae_hparams,
)


TypeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 model = cpa.CPA(adata=adata,
2 n_latent=64,
3 loss_ae='gauss',
4 doser_type='logsigm',
5 split_key='split',

/usr/local/lib/python3.10/dist-packages/cpa/_model.py in init(self, adata, split_key, train_split, valid_split, test_split, **hyper_params)
99 }
100
--> 101 self.module = CPAModule(
102 n_genes=adata.n_vars,
103 n_perts=len(self.pert_encoder),

TypeError: CPAModule.init() got an unexpected keyword argument 'loss_ae'

Model Training Error

Hi! Thanks for this great drug perturbation prediction approach.
I am trying to apply the CPA model on my RNA-Seq data but unfortunately while training the data with my data I get the error:
"ValueError: Input X contains NaN.
NearestNeighbors does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values"

When I checked my data again for NaNs, I couldn't find any missing values. Do you have any idea what could be wrong? How important is the introduction of adata.uns or the split columns in the adata.obs?
I would be grateful for any help!

AttributeError: module 'cpa' has no attribute 'CPA'

Hi - thanks for your awesome work!

I have not succeeded in running the Sci-Plex 2 Notebook on colab
I am getting:

AttributeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 cpa_real.CPA.setup_anndata(adata,
2 perturbation_keys={
3 'perturbation': 'condition',
4 'dosage': 'dose_val',
5 },

AttributeError: module 'cpa' has no attribute 'CPA'

when running:
cpa.CPA.setup_anndata(adata, perturbation_keys={ 'perturbation': 'condition', 'dosage': 'dose_val', }, categorical_covariate_keys=['cell_type'], control_key='control', )

Anything that I can do to fix?
Thanks a lot!

ImportError due to torchmetrics

Dear authors,

I received the following error when importing cpa:

Click to expand!

ImportError Traceback (most recent call last)
/tmp/ipykernel_4076984/2672872602.py in
----> 1 import cpa
2 import scanpy as sc

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/cpa/init.py in
3 warnings.simplefilter('ignore')
4
----> 5 from ._model import CPA
6 from ._module import CPAModule
7 from . import _plotting as pl

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/cpa/_model.py in
7 import pandas as pd
8 import torch
----> 9 from pytorch_lightning.callbacks import EarlyStopping
10 from scvi.data import setup_anndata, register_tensor_from_anndata, transfer_anndata_setup, get_from_registry
11 from scvi.data._anndata import _check_anndata_setup_equivalence

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/pytorch_lightning/init.py in
18 _PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
19
---> 20 from pytorch_lightning import metrics # noqa: E402
21 from pytorch_lightning.callbacks import Callback # noqa: E402
22 from pytorch_lightning.core import LightningDataModule, LightningModule # noqa: E402

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/pytorch_lightning/metrics/init.py in
13 # limitations under the License.
14
---> 15 from pytorch_lightning.metrics.classification import ( # noqa: F401
16 Accuracy,
17 AUC,

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/pytorch_lightning/metrics/classification/init.py in
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
---> 14 from pytorch_lightning.metrics.classification.accuracy import Accuracy # noqa: F401
15 from pytorch_lightning.metrics.classification.auc import AUC # noqa: F401
16 from pytorch_lightning.metrics.classification.auroc import AUROC # noqa: F401

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/pytorch_lightning/metrics/classification/accuracy.py in
16 from torchmetrics import Accuracy as _Accuracy
17
---> 18 from pytorch_lightning.metrics.utils import deprecated_metrics
19
20

/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/pytorch_lightning/metrics/utils.py in
20 from torchmetrics.utilities.data import dim_zero_mean as _dim_zero_mean
21 from torchmetrics.utilities.data import dim_zero_sum as _dim_zero_sum
---> 22 from torchmetrics.utilities.data import get_num_classes as _get_num_classes
23 from torchmetrics.utilities.data import select_topk as _select_topk
24 from torchmetrics.utilities.data import to_categorical as _to_categorical

ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/exports/para-lipg-hpc/mdmanurung/conda/envs/cpa-env/lib/python3.7/site-packages/torchmetrics/utilities/data.py)

This seems to be caused by broken torchmetrics. I have tried downgrading torchmetrics to version 0.6.0 as shown here, but it did not fix the problem. I would greatly appreciate if you could solve this issue.

Thanks in advance.

Regards,
Mikhael

Colab error when running setup_anndata()

Hi,

When I run the CPA colab tutorial, in the following line:

cpa.CPA.setup_anndata(adata,
perturbation_keys={
'perturbation': 'condition',
'dosage': 'dose_val',
},
categorical_covariate_keys=['cell_type'],
control_key='control',
)

the below error occures in both tutorial:

TypeError: CategoricalDataFrameField.init() got an unexpected keyword argument 'obs_key'

I would appreciate any help to be able to run it.

Errors in running notebook

Hello
When trying to run the notebook I run into a few problems:
1)
cpa.CPA.setup_anndata(adata,
drug_key='condition',
dose_key='dose_val',
categorical_covariate_keys=['cell_type'],
control_key='control',
combinatorial=True,
)
TypeError: register_fields() got unexpected keyword arguments {'combinatorial': True} passed without a source_registry.

model = cpa.CPA(adata=adata,
n_latent=256,
loss_ae='gauss',
doser_type='logsigm',
split_key='split',
**ae_hparams,
)
TypeError: init() missing 1 required positional argument: 'n_cat_list'

Are you able to provide solutions?
Thank you

Pre-processing for bulk RNA-seq?

hello,

Thanks for this great software CPA.

I want to use it for my bulk RNA-seq after drug treatment.

Could you give some suggestions about Pre-processing for my data?
Should I normalized the data first and then format it into h5ad format?

Thanks !!!

Trained models

Hello, thanks for the great work!
I noticed that you are still working on updating the tutorial and documentation (#15) to reflect the latest changes. Could you please also upload the trained models? That would be really helpful when running the code.
Best,

TypeError: __init__() got an unexpected keyword argument 'checkpoint_callback'

Hi
I just tried running the Norman data tutorial and get the following error:

model.train(max_epochs=2000,
use_gpu=True,
batch_size=1024,
early_stopping=True,
plan_kwargs=trainer_params,
early_stopping_patience=15,
check_val_every_n_epoch=5,
save_path='/home/mohsen/projects/cpa/lightning_logs/Norman2019_prep_new/',
)

TypeError Traceback (most recent call last)
Cell In [16], line 1
----> 1 model.train(max_epochs=2000,
2 use_gpu=True,
3 batch_size=1024,
4 early_stopping=True,
5 plan_kwargs=trainer_params,
6 early_stopping_patience=15,
7 check_val_every_n_epoch=5,
8 save_path='/home/mohsen/projects/cpa/lightning_logs/Norman2019_prep_new/',
9 )

File ~/miniconda3/envs/envCPA/lib/python3.9/site-packages/cpa/_model.py:317, in CPA.train(self, max_epochs, use_gpu, train_size, validation_size, batch_size, early_stopping, plan_kwargs, hyperopt, save_path, **trainer_kwargs)
314 checkpoint = SaveBestState(monitor='cpa_metric', mode='max', period=20, verbose=False)
315 trainer_kwargs['callbacks'].append(checkpoint)
--> 317 runner = TrainRunner(
318 self,
319 training_plan=self.training_plan,
320 data_splitter=data_splitter,
321 max_epochs=max_epochs,
322 use_gpu=use_gpu,
323 early_stopping_monitor="cpa_metric",
324 early_stopping_mode='max',
325 checkpoint_callback=True,
326 **trainer_kwargs,
327 )
328 runner()
330 self.epoch_history = pd.DataFrame().from_dict(self.training_plan.epoch_history)

File ~/miniconda3/envs/envCPA/lib/python3.9/site-packages/scvi/train/_trainrunner.py:67, in TrainRunner.init(self, model, training_plan, data_splitter, max_epochs, use_gpu, **trainer_kwargs)
65 self.lightning_devices = lightning_devices
66 self.device = device
---> 67 self.trainer = Trainer(
68 max_epochs=max_epochs,
69 accelerator=accelerator,
70 devices=lightning_devices,
71 gpus=None,
72 **trainer_kwargs,
73 )

File ~/miniconda3/envs/envCPA/lib/python3.9/site-packages/scvi/train/_trainer.py:141, in Trainer.init(self, gpus, benchmark, flush_logs_every_n_steps, check_val_every_n_epoch, max_epochs, default_root_dir, enable_checkpointing, num_sanity_val_steps, enable_model_summary, early_stopping, early_stopping_monitor, early_stopping_min_delta, early_stopping_patience, early_stopping_mode, enable_progress_bar, progress_bar_refresh_rate, simple_progress_bar, logger, log_every_n_steps, replace_sampler_ddp, **kwargs)
138 if logger is None:
139 logger = SimpleLogger()
--> 141 super().init(
142 gpus=gpus,
143 benchmark=benchmark,
144 check_val_every_n_epoch=check_val_every_n_epoch,
145 max_epochs=max_epochs,
146 default_root_dir=default_root_dir,
147 enable_checkpointing=enable_checkpointing,
148 num_sanity_val_steps=num_sanity_val_steps,
149 enable_model_summary=enable_model_summary,
150 logger=logger,
151 log_every_n_steps=log_every_n_steps,
152 replace_sampler_ddp=replace_sampler_ddp,
153 enable_progress_bar=enable_progress_bar,
154 **kwargs,
155 )

File ~/miniconda3/envs/envCPA/lib/python3.9/site-packages/pytorch_lightning/utilities/argparse.py:345, in _defaults_from_env_vars..insert_env_defaults(self, *args, **kwargs)
342 kwargs = dict(list(env_variables.items()) + list(kwargs.items()))
344 # all args were already moved to kwargs
--> 345 return fn(self, **kwargs)

TypeError: init() got an unexpected keyword argument 'checkpoint_callback'

Are you able to advise how to solve?

Thanks
Esther

cpa.pl.plot_history does not include valid

Hi,

I am training a CPA model on the sciplex2 dataset provided in the tutorial, and when I run the cpa.pl.plot_history function it does not include any point for the valid (orang) values. I checked the df variable inside the cpa.pl.plot_history function and it seems no values for the valid mode is captured. I would appreciate if you could provide a clue on what could be reason for it?

sciplex3-train-A549-K562-test-MCF7-uncertaintyFalse-seed13_cpa_modelling_history

Screenshot 2023-08-28 133728

Thanks!

Prediction on unseen dataset without overlap

Hi, I would like to understand the the appropriate way to predict the response on my own dataset using chemical/drug perturbation datasets in the tutorial.

My dataset have no overlapped celltypes (or species), treatments with the perturbation studies from the tutorial. How shall I set the proper train model to predict the perturbation effects? Or whether such task is proper for implementing cpa.

Thank you

Invalid dashes in "–extra-index-url" Installation instructions

What I did:

I attempted to install cpa-tools proceeding according to the instructions on the Installation page.

System: Linux (Debian )
pytorch variant: cpu

I was able to successfully create a virtual environment by running:

conda create -n cpa python=3.9

And I invoked this newly created environment using:

conda activate cpa

My issue started when trying to install torch:

pip install torch==1.13.1+cpu –extra-index-url https://download.pytorch.org/whl/cpu

Expected behavior:

When copying the installation instructions for the CPU only version of pytorch, I expected to get:

Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cpu
Collecting torch==1.13.1+cpu
  Using cached https://download.pytorch.org/whl/cpu/torch-1.13.1%2Bcpu-cp39-cp39-linux_x86_64.whl (199.1 MB)
Collecting typing-extensions (from torch==1.13.1+cpu)
  Using cached typing_extensions-4.9.0-py3-none-any.whl.metadata (3.0 kB)
Using cached typing_extensions-4.9.0-py3-none-any.whl (32 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.13.1+cpu typing-extensions-4.9.0

Observed behavior:
When copying this statement to the command line,

pip install torch==1.13.1+cpu –extra-index-url https://download.pytorch.org/whl/cpu

I obtained the error:

ERROR: Invalid requirement: '–extra-index-url'

An observer familiar with command line arguments will notice that it is unusual to have a long-form command line option with only a single dash ('-'). A keen observer will notice that the character on the installation page is, in fact, not an "en dash" ('-') but an "em dash" ('–'). I suspect that the author of the documentation intended for "--" to be entered, but that readthedocs interpreted this as an "em dash".

Suggested fix:

Please alter all instances of the "em dash" ('–') with two "en dashes" ('--') on the lines concerning the installation of pytorch. Replacing the offending character as described above resolved the issue for me:

pip install torch==1.13.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu

Tuner isn't logging plan_kwargs on WandB

In the hyperparameter tuning code, here:

plan_kwargs[key] = train_args.pop(key)

when we pop the plan arguments from train_args, it results in plan arguments being ignored while reporting the parameters and you won't find their values in the wandb report analysis.

Possible fix:
Replace

cpa/cpa/_tuner.py

Lines 567 to 574 in f415f9f

for key in plan_kwargs_keys:
plan_kwargs[key] = train_args.pop(key)
train_args = {
"enable_progress_bar": True,
"logger": experiment.get_logger(get_context().get_trial_name()),
"callbacks": [experiment.metrics_callback],
**train_args,
}

with

actual_train_args = {}
for key in train_args.keys():
    if key in plan_kwargs_keys:
        plan_kwargs[key] = train_args[key]
    else:
        actual_train_args[key] = train_args[key]

train_args = {
    "enable_progress_bar": True,
    "logger": experiment.get_logger(get_context().get_trial_name()),
    "callbacks": [experiment.metrics_callback],
    **actual_train_args,
}

Hyperparameter optimization

The provided tutorials seem to have very specific hyperparameters tuned to the dataset. Since we're interested in running CPA on new data, do you have recommendations for which hyperparemeters are key for CPA performance?

Predicting the gene expression on a new anndata variable with specific perturbation.

Dear Authors,

I am trying to prediction the gene expression level of the trained model on a testing set of my anndata where I want to predict how the cells response to the perturbation given a condition as an input.

Is there a function like that in CPA model as I tried to find in the tutorial and API and it doesn't seem to have.

Thanks with Warm regards,

Rom Uddamvathanak

Generalization to unseen categories

In the context transfer tutorial (predicting perturbation responses for unseen cell-types), the train set contains the OOD cell-type (B cell):

OOD split

(adata[adata.obs['split_B'] == 'ood'].obs['cell_type'].values == 'B').sum()
# Prints 774

Train split

(adata[adata.obs['split_B'] == 'train'].obs['cell_type'].values == 'B').sum()
# Prints 543

I am interested in the scenario where certain conditions are not available at train time. When I do inference on unseen conditions using a trained CPA model, I get the following error:

ValueError: Category CATEGORY_NAME not found in source registry. Cannot transfer setup without `extend_categories = True`.

How can I set up CPA to generalize to unseen categories?

Understanding your tutorial using the Norman (2019) data ...

I got CPA 0.8.2 and followed the tutorial "Predicting single-cell response to unseen combinatorial CRISPR perturbations". The goal is to predict gene expression response to perturbation responses of X+Y when you have seen single cells from X and Y.
I can reproduce all the results from the tutorial, but I have difficulties to understand some points :-(

  1. adata.obs['split'] is filled randomly with 'train', 'valid' and 'test' values and then used for training the CPA model. But if the goal is to predict the effect of perturbations X+Y when I have only seen perturbations X and Y separately, then I should not provide X+Y in the training data !? So, is the construction of adata.obs['split'] correct ??

  2. I thought the whole point of CPA is to disentangle the effects of different perturbations in such a way that I can later apply such perturbations in different combinations. However, the model.predict() method that is used in the tutorials does not take any parameters to indicate which perturbations should be predicted. How does CPA know which perturbations to apply? And how can I specify that?

It seems I'm missing here something important and I'm grateful for any help!

model.predict() error: cannot unpack non-iterable NoneType object

Hi,

I am getting the below error in the model.predict() line when reproducing the Sciplex2 tutorial example with CPA version 0.3.3:

TypeError: cannot unpack non-iterable NoneType object

I used the tutorial code and only changed a few parameter names to be consistent with the updated GitHub version. I would appreciate any guidance on what the source of this error could be.

Thanks!

Run CPA on Kang et al dataset without dosage information

Hello,
Thank you for your contribution to the field! I'm interested in training and testing CPA on the Kang et al. (PBMC) dataset. As far as I understand, this dataset doesn't include dosage information, and the data file isn't provided on the website either. Using the PBMC file provided on the scGen website, I was wondering if you could guide me on how to run the following line without the 'dosage' information, as it throws a KeyError: 'dosage' when I remove the 'dosage': 'dose_val' line.

cpa.CPA.setup_anndata(adata,
perturbation_keys={
'perturbation': 'condition',
'dosage': 'dose_val',
},
categorical_covariate_keys=['cell_type'],
control_key='control',
)

Thanks in advance!

v0.5 "ValueError: Expected a parent" in tutorial

Hello Everybody,

I tried to follow the SciPlex2 tutorial with the new version 0.5.
After adjusting ae_hparams and trainer_params I call model.train() and the CPA crashes with "ValueError: Expected a parent".

Some debugging shows that the problem is in _validate_callbacks_list() when the is_overridden() method is called on the EarlyStopping callback. Since is_overridden() is called with 'parent=None' and there is no code to set 'parent' for an EarlyStopping instance the ValueError is raised.

What am I doing wrong?

Any help is appreciated,
Axel

AttributeError: 'tuple' object has no attribute 'copy'

Hello,

I had download the pretrained model:

146722 Jun 20 04:51 attr.pkl
89559 Jun 20 04:51 history.csv
39920131 Apr 12  2021 model_params.pt
39022 Jun 20 04:51 var_names.csv

And I run

adata = sc.read('/path/GSM_new.h5ad')
cpa.CPA.setup_anndata(adata,
                      drug_key='condition',
                      dose_key='dose_val',
                      categorical_covariate_keys=['cell_type'],
                      control_key='control',
                      combinatorial=True,
                     )

model = cpa.CPA.load('/Path/GSM/test/', adata, use_gpu=True)

But I got errors:

   1551 # copy state_dict so _load_from_state_dict can modify it
   1552 metadata = getattr(state_dict, '_metadata', None)
-> 1553 state_dict = state_dict.copy()
   1554 if metadata is not None:
   1555     # mypy isn't aware that "_metadata" exists in state_dict
   1556     state_dict._metadata = metadata  # type: ignore[attr-defined]

AttributeError: 'tuple' object has no attribute 'copy'

It is ok when I load my own pretrained models.

Could you help with this problem ?

Thanks !!!

Predicting using trained model.

Hi, I've successfully trained a model from scratch by following the tutorial on the following link
https://cpa-tools.readthedocs.io/en/latest/tutorials/combosciplex_Rdkit_embeddings.html

However, I'm currently lost on how to use the trained model in predicting an unseen dataset. I've tried creating the a new anndata with unseen perturbation but the following error occured.

INFO     Input AnnData not setup with scvi-tools. attempting to transfer AnnData setup                             
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[48], line 1
----> 1 model.predict(ood_adata, batch_size=1024)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\torch\autograd\grad_mode.py:27](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/torch/autograd/grad_mode.py:27), in _DecoratorContextManager.__call__..decorate_context(*args, **kwargs)
     24 @functools.wraps(func)
     25 def decorate_context(*args, **kwargs):
     26     with self.clone():
---> 27         return func(*args, **kwargs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\cpa\_model.py:679](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/cpa/_model.py:679), in CPA.predict(self, adata, indices, batch_size, n_samples, return_mean)
    676 assert self.module.recon_loss in ["gauss", "nb", "zinb"]
    677 self.module.eval()
--> 679 adata = self._validate_anndata(adata)
    680 if indices is None:
    681     indices = np.arange(adata.n_obs)

File [c:\Users\Ardo\.conda\envs\env.cpa\lib\site-packages\scvi\model\base\_base_model.py:415](file:///C:/Users/Ardo/.conda/envs/env.cpa/lib/site-packages/scvi/model/base/_base_model.py:415), in BaseModelClass._validate_anndata(self, adata, copy_if_view)
    409 if adata_manager is None:
    410     logger.info(
    411         "Input AnnData not setup with scvi-tools. "
    412         + "attempting to transfer AnnData setup"
    413     )
    414     self._register_manager_for_instance(
...
    230     self.attr_key,
    231     categorical_dtype=cat_dtype,
    232 )

ValueError: Category CHEMBL1213492+CHEMBL491473 not found in source registry. Cannot transfer setup without `extend_categories = True`.

Any help would be appreciated.

The versions for scvi-tools for CPA

Hello,
I have installed the CPA software but I found the results very different from the published.
I think it is due to that I have installed the latest version of scvi-tools.
Could you please provide the specific version of scvi-tools used in this software ?

Best

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.