I got CPA 0.8.2 and followed the tutorial "<a href="https://cpa-tools.readthedocs.io/e

Understanding your tutorial using the Norman (2019) data ... about cpa HOT 5 OPEN

AxKo commented on June 8, 2024

Understanding your tutorial using the Norman (2019) data ...

from cpa.

Comments (5)

ArianAmani commented on June 8, 2024

Thank you for your interest in CPA.
About the first question, the notebook will be updated soon to contain more meaningful and curated splits for combinatorial perturbations. I'll update as soon as possible here.

As for your second question, the model.predict() method, takes the perturbations and dosages from the perturbation_key and dosage_key columns of your input adata and applies those perturbations to the basal latent obtained from each cell.
So in the tutorial example you mentioned:

The model takes the perturbations from the cond_harm column of the data and adds those perturbations in the predict method to the output.

cpa.CPA.setup_anndata(adata, 
                      perturbation_key='cond_harm',
                      control_group='ctrl',
                      dosage_key='dose_value',
                      categorical_covariate_keys=['cell_type'],
                      is_count_data=True,
                      deg_uns_key='rank_genes_groups_cov',
                      deg_uns_cat_key='cov_cond',
                      max_comb_len=2,
                     )

link to cell

So if you'd like to predict a specific perturbation for a given cell, you can change the perturbation or dosage in the mentioned columns of your adata.

Feel free to reply if there are further issues.

from cpa.

AxKo commented on June 8, 2024

Ah, okay, thanks for that information.

But the cond_harm column takes a single value and not a list, which means that I can only apply a single perturbation to the basal latent representation. Is that correct ?

And the content of dosage_key are strings like '1.0+1.0' (and not float values). Then, how can I specify a new value (e.g. 1.5) in a way that CPA understands it?

Thanks

from cpa.

ArianAmani commented on June 8, 2024

You can apply combinations of perturbations. CPA uses strings with the following format for specifying perturbations and dosage values in the adata:

The value of the cond_harm column:
- "PERT1" --> A single perturbation (e.g. "SGK1")
- "PERT1+PERT2" --> Combination of perturbations PERT1 and PERT2 (e.g. "FOXL2+HOXB9")
- So you can specify your combination of perturbations using the + character as the split between different perturbations and CPA will understand them.
The same thing applies to the dosage column. The dosages are given to the model as strings of the following format:
- "1.0" --> Dosage 1.0 when we have one perturbation. (e.g. "1.5" or any other number)
- "1.0+1.5" --> Dosages 1.0 and 1.5 for PERT1 and PERT2 respectively.
- CPA will split these strings using the + character and converts the string numbers to floats ("1.0+1.5" --> [1.0, 1.5])

It is actually done in the setup_anndata method of the model:

cpa/cpa/_model.py

Lines 294 to 317 in c63d5cf

    
           pert_map = {} 
        
           for condition in tqdm(perturbations): 
        
               perts_list = np.where(np.isin(perts_names_unique, condition.split("+")))[0] 
        
               pert_map[condition] = list(perts_list) + [ 
        
                   CPA_REGISTRY_KEYS.PADDING_IDX 
        
                   for _ in range(max_comb_len - len(perts_list)) 
        
               ] 
        
           dose_map = {} 
        
           for dosage_str in tqdm(dosages): 
        
               dosages_list = [float(i) for i in dosage_str.split("+")] 
        
               dose_map[dosage_str] = list(dosages_list) + [ 
        
                   0.0 for _ in range(max_comb_len - len(dosages_list)) 
        
               ] 
        
           data_perts = np.vstack( 
        
               np.vectorize(lambda x: pert_map[x], otypes=[np.ndarray])(perturbations) 
        
           ).astype(int) 
        
           adata.obsm[CPA_REGISTRY_KEYS.PERTURBATIONS] = data_perts 
        
           data_perts_dosages = np.vstack( 
        
               np.vectorize(lambda x: dose_map[x], otypes=[np.ndarray])(dosages) 
        
           ).astype(float) 
        
           adata.obsm[CPA_REGISTRY_KEYS.PERTURBATIONS_DOSAGES] = data_perts_dosages

As you can see in the code, setup_anndata creates lists of perturbation ids and respective dosages from the given strings in the perturbation and dosage columns of adata.obs and saves them in adata.obsm and uses this as the input data to the model, for example:

If you check your adata after running setup_anndata you will see the following obsm values:
- obsm: 'X_pca', 'X_umap', 'perts', 'perts_doses', 'deg_mask', 'deg_mask_r2'

Here perts is the list of perturbation IDs which is used to retrieve perturbation embeddings from the PerturbationNetwork and pert_doses is the respective dosages.

1 perturbation:

ID zero for perturbations is used for padding because vectors need to be the same length.

2 perturbations:

I hope this helps and again, free to reply if there are further issues.

from cpa.

AxKo commented on June 8, 2024

Very good, that's what I was looking for !

Actually, I only now looked at your "Batch Correction in Expression Space" tutorial with the description of custom_predict( ) and how to use it. That is obviously the function I need !

Many thanks

from cpa.

AxKo commented on June 8, 2024

I am sorry, but I have to reopen this :-(

Looking at custom_predict I see that it allows me to select individual categorical covariates that I want to add, but it only allows me to add all or none perturbations. So that means if I want to add individual perturbations, I have to follow your advice from above !?

I think, I'm also confused what the difference is between perturbations and categorical covariates. I thought perturbations would be continuous variables, but in many of the tutorials the perturbation comes in form of discrete values (IFN stimulation or not, gene knockout or not, etc). Does that mean these tutorials could have been written differently by declaring those 'perturbations' as categorical covariates ??

Thanks

from cpa.

Understanding your tutorial using the Norman (2019) data ... about cpa HOT 5 OPEN

Comments (5)

1 perturbation:

2 perturbations:

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	pert_map = {}
	for condition in tqdm(perturbations):
	perts_list = np.where(np.isin(perts_names_unique, condition.split("+")))[0]
	pert_map[condition] = list(perts_list) + [
	CPA_REGISTRY_KEYS.PADDING_IDX
	for _ in range(max_comb_len - len(perts_list))
	]

	dose_map = {}
	for dosage_str in tqdm(dosages):
	dosages_list = [float(i) for i in dosage_str.split("+")]
	dose_map[dosage_str] = list(dosages_list) + [
	0.0 for _ in range(max_comb_len - len(dosages_list))
	]

	data_perts = np.vstack(
	np.vectorize(lambda x: pert_map[x], otypes=[np.ndarray])(perturbations)
	).astype(int)
	adata.obsm[CPA_REGISTRY_KEYS.PERTURBATIONS] = data_perts

	data_perts_dosages = np.vstack(
	np.vectorize(lambda x: dose_map[x], otypes=[np.ndarray])(dosages)
	).astype(float)
	adata.obsm[CPA_REGISTRY_KEYS.PERTURBATIONS_DOSAGES] = data_perts_dosages