teichlab / bbknn Goto Github PK

View Code? Open in Web Editor NEW

140.0 140.0 23.0 61.73 MB

Batch balanced KNN

License: MIT License

Python 100.00%

bbknn's People

Contributors

Stargazers

Watchers

bbknn's Issues

scanorama bbknn

Dear,
Scanorama handles the mutual nearest neighbors-based matching, batch correction, and panorama assembly. I have not find assembly function in pancreas-4-Scanorama.ipynb. what's the corresponding function of scanorama's assembly function in bbknn (or scanpy)?

bbknn publication

Dear,

Do you plan to publish BBKNN on a high impact factor journal later? Some people argue that bioAxiv is not a serious journal and we are a little worried. After all, it will take us much time to follow BBKNN.

incompatible with annoy==1.17.0 ?

I was trying to run bbknn (and also scrublet) and both independently kept crashing my sessions.

I think it's got to do with annoy==1.17.0 and simply downgrading to annoy==1.16.3 made everything run again.

Details for pbmc dataset used #30

[Reopening Issue] Thanks for your quick reply! I am having trouble finding the 5' dataset on the 10X Genomics website. Is it no longer available? Can you share the link?

"The input data was downloaded from the 10X Genomics website. The exact 5′dataset was ‘PBMCs of a healthy donor 5′gene expression’, under Cell Ranger 2.1.0, under V(D)J + 5′Gene Expression. The exact 3′dataset was ‘8k PBMCs from a Healthy Donor’, under Cell Ranger 2.1.0, under Chromium Demonstration (v2 Chemistry)."

Does bbknn really work?

I have tried bbknn based on the scanpy tutorial, but the running time is very quick, and the result is very similar to the original data(in fact, same)
I just use:
bbknn.bbknn(adata,batch_key='batch')

Anything wrong here? Thanks.

bbknn spark error when runing with data integrated from scRNA data

Hi @Teichlab,
The weird condition occurred from data integrated through scanpy using bbknn.
The samples can be integrated by umap plot, but the tsne plotting showing the data that can not be integrated.

UMAP showing integrated

TSNE showing no integrated

  The different showing below between umap and tsne.
   Based on this, how can i get same integrated effect on tsne when using bbknn to integrated. if it should not been done,
   whether had another way to replace this selection?

Best
hanhuihong

Collaboration

Hi, Authors of BBKNN,

My name is Feng Zhang. I recently build a pipeline (BEER, published in Cell Discovery, https://github.com/jumphone/BEER) to remove batch-effect related PCA subspaces.

And I find that it works well when combing it with BBKNN (especially when integrating scRNA-seq and scATAC-seq data).

Maybe we can collaborate with each other to further improve the performance.

Best,
Feng Zhang

problem : name 'logg' is not defined

I tried to redo this example to using bbknn demo (https://nbviewer.jupyter.org/github/Teichlab/bbknn/blob/master/examples/demo.ipynb)
but I faced two problem
1- NameError: name 'logg' is not defined
2- AttributeError: module 'bbknn' has no attribute 'ridge_regression'

NameError Traceback (most recent call last)
in
3 except ImportError:
4 pass
----> 5 bbknn.bbknn(adata)
6 sc.tl.umap(adata)
7 sc.pl.umap(adata, color=['batch','celltype'])

~\miniconda3\envs\sc_trial\lib\site-packages\bbknn_init_.py in bbknn(adata, batch_key, approx, metric, copy, **kwargs)
259 '''
260 start = logg.info('computing batch balanced neighbors')
--> 261 adata = adata.copy() if copy else adata
262 #basic sanity checks to begin
263 #is our batch key actually present in the object?

NameError: name 'logg' is not defined

2- regression :
AttributeError: module 'bbknn' has no attribute 'ridge_regression'

I am working on windows and I changed bbknn between 1.3.6 and 1.4 and 1.5 and same problem in all
how can I solve these problems?

`ImportError` for `scikit_learn` >= 1.0.0

We are currently trying to import DistanceMetrics from sklearn.neighbors in bbknn/matrix.py (link to code) but this is only possible for scikit_learn<1.0.0. DistanceMetrics is moved to sklearn.metric in the later versions of sklearn.

I will raise a PR for fixing this via handling the ImportError -- but would like to hear opinions from the maintainers for sure!

TypeError: info() got an unexpected keyword argument 'r

Good afternoon,

I'm running on the pbmc notebook with python 3.7.7, Scanpy 1.5.1, bbknn 1.3.3. After running the bbknn, the following error message appeared. I tested the "sc.external.pp.bbknn," and the same message was shown. Updating the version to python 3.8.0, scanpy 1.6.0 did not work either.

TypeError Traceback (most recent call last)
in
----> 1 bdata = bbknn.bbknn(adata,batch_key='Sample',copy=True)

~\Anaconda3\lib\site-packages\bbknn_init_.py in bbknn(adata, batch_key, approx, metric, copy, **kwargs)
259 If True, return a copy instead of writing to the supplied adata.
260 '''
--> 261 logg.info('computing batch balanced neighbors', r=True)
262 adata = adata.copy() if copy else adata
263 #basic sanity checks to begin

TypeError: info() got an unexpected keyword argument 'r'

Do you have any idea what causes this error? Thank you.

Any plan on making R package for bbknn?

I really like your robustness of integration especially the fast performance.
I kind of prefer using R solely due to their intuitiveness.
So are there any plan on making R package for bbknn? It would be really wonderful

Issues importing bbknn after successful install

Hello,

I was able to successfully install bbknn; however, when I went to run it in a notebook, I got the following error:

AssertionError: Failed in nopython mode pipeline (step: native lowering)
Storing i64 to ptr of i32 ('dim'). FE type int32

I realize that this error is related to numba (currently running 0.54.1) . I tried downgrading it to 0.52.0; however, I got a few compatibility issues with bbknn. Do you have any suggestions?

Thank you!

X_umap_3D

Great package!

How did you go about initially computing the X_umap_3D coordinates without overwriting X_umap?

error during ridge_regression

Hi there,

First of all, thank you for this great package! It has been very helpful for my graduation project so far.
I am currently following the bbknn tutorial notebook, but I get an error at the ridge_regression part of the tutorial.
"AttributeError: module 'numpy' has no attribute 'int'."

"np.int was a deprecated alias for the builtin int. To avoid this error in existing code, use int by itself. Doing this will not modify any behavior and is safe. When replacing np.int, you may wish to use e.g. np.int64 or np.int32 to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20;""

I get this error when following your tutorial notebook with my own dataset. I am quite new to scRNAseq analysis and I am simply wondering if it's my package management that's wrong or that it might be something else?

Anyway, grateful in advance for your reply!

Greetings,

Julia

neighbors_within_batch argument usage in R?

Hello! I'm trying to recapitulate some results (using bbknn in R) from a paper that uses bbknn for scRNAseq batch correction where they say they set neighbors_within_batch to 10 but I'm running into an issue. I'm able to run the code fine in R without setting a neighbors_within_batch argument (see a possible edit below to the end py_to_r code). When I set the neighbors_within_batch bbknn$bbknn(adata, batch_key=0, neighbors_within_batch=10) I get an error with this traceback:

Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: 'float' object cannot be interpreted as an integer
5.
stop(structure(list(message = "TypeError: 'float' object cannot be interpreted as an integer",
call = py_call_impl(callable, dots$args, dots$keywords),
cppstack = structure(list(file = "", line = -1L, stack = c("1 reticulate.so 0x0000000114a023ed _ZN4Rcpp9exceptionC2EPKcb + 221",
"2 reticulate.so 0x0000000114a0a485 _ZN4Rcpp4stopERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE + 53", ...
4.
get_graph at init.py#148
3.
bbknn_pca_matrix at init.py#355
2.
bbknn at init.py#294
1.
bbknn$bbknn(adata, batch_key = 0, neighbors_within_batch = 10)

My full code is as follows (pca matrix and batch assignment vector not shown; I don't think either of these is causing the error since I can run this code minus the neighbors_within_batch argument but happy to post how I generated them/what they contain if useful):

adata = anndata$AnnData(X=pca, obs=batch)
sc$tl$pca(adata)
adata$obsm$X_pca = pca
bbknn$bbknn(adata, batch_key=0, neighbors_within_batch=10)
sc$tl$umap(adata)
umap = py_to_r(adata$obsm[["X_umap"]])

I'm at a loss for what's causing this error... do you have any idea what I'm doing wrong? I'm assuming I can use the neighbors_within_batch parameter in R?

Also, I think umap = py_to_r(adata$obsm$X_umap) should be umap = py_to_r(adata$obsm[["X_umap"]])? I was only able to get the latter to work...

Thanks,
Rachel

P.S. I'm sorry if I've missed including anything or if this looks funky when it gets posted; this is the first time I've asked about an issue on github. Happy to provide more details if needed!

considering using `pynndescent` for appox nn search?

no more annoy dependency, pynndescent is also default for umap, and it performs much better

https://github.com/lmcinnes/pynndescent

Performace varied on different operating systems

Hello:
When I installed bbknn 1.4.0 on ubuntu 16 and ubuntu 18, I got different performances on these two different operating systems. Concretely, the cell similarity scores varied between batches when running bbknn on ubuntu 16 and 18. Can you help me solve it?

Thank you very much!

BBKNN for ATAC data

Hi! Thank you for the awesome package :)
I was wondering if it is possible to use other dimensionality reduction methods than PCA for bbknn.bbknn, such as LSI which is commonly used for ATAC data? use_rep argument would suggest it is possible, but I wanted to check what your thoughts were before running it!

Update on bioconda

Hi,

Can you update the recipe on bioconda? The bioconda version is still at 1.5.1

Thanks

KeyError: 'connectivities' while running running UMAP function

Hello,

I'm following the tutorial which integrates with Scanpy. I was able to successfully run this line of code:

sc.external.pp.bbknn(adata, batch_key='sample_id',metric='euclidean')

In the next step I get this error:

`sc.tl.umap(adata)`
computing UMAP

KeyError Traceback (most recent call last)
in
----> 1 sc.tl.umap(adata)

/miniconda3/envs/scanpy/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method)
142 X_umap = simplicial_set_embedding(
143 X,
--> 144 adata.uns['neighbors']['connectivities'].tocoo(),
145 n_components,
146 alpha,

KeyError: 'connectivities'

Am I doing something incorrectly ? How can I fix this problem ? Thanks a lot for your help.

kernals died when using bbknn

I'm trying to do bbknn with my own single-cell data and demo data, but kernals died every time when I doing whatever my own data or demo data.
Is this because of my device? or?

sc.settings.verbosity = 3sc.logging.print_header()
sc.settings.set_figure_params(dpi=80)
scanpy==1.10.1 anndata==0.10.5.post1 umap==0.5.5 numpy==1.26.4 scipy==1.11.1 pandas==2.2.2 scikit-learn==1.3.0 statsmodels==0.14.0 igraph==0.11.5 louvain==0.8.2 pynndescent==0.5.11

adata = sc.read('pancreas.h5ad', backup_url='https://www.dropbox.com/s/qj1jlm9w10wmt0u/pancreas.h5ad?dl=1')
bbknn.bbknn(adata, batch_key='batch')

IndexError: index 2 is out of bounds for axis 0 with size 2

I have a dataset that includes one batch containing only 2 cells.
The command was sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False)
It seems that bbknn doesn't like that small batch and gave me this error:

computing batch balanced neighbors
WARNING: unrecognised metric for type of neighbor calculation, switching to euclidean

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-274-086c83c814fc> in <module>
----> 1 sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False)

/usr/local/lib/python3.6/dist-packages/scanpy/external/pp/_bbknn.py in bbknn(adata, batch_key, approx, metric, copy, n_pcs, trim, n_trees, use_faiss, set_op_mix_ratio, local_connectivity, **kwargs)
    118         set_op_mix_ratio=set_op_mix_ratio,
    119         local_connectivity=local_connectivity,
--> 120         **kwargs,
    121     )

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn(adata, batch_key, use_rep, approx, metric, copy, **kwargs)
    289         #call BBKNN proper
    290 	bbknn_out = bbknn_pca_matrix(pca=pca, batch_list=batch_list,
--> 291 								 approx=approx, metric=metric, **kwargs)
    292         #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key
    293         adata.uns['neighbors'] = {}

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn_pca_matrix(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, n_trees, use_faiss, metric, set_op_mix_ratio, local_connectivity)
    346 	knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,n_pcs=n_pcs,n_trees=n_trees,
    347                                                                                    approx=approx,metric=metric,use_faiss=use_faiss,
--> 348 										   neighbors_within_batch=neighbors_within_batch)
    349         #sort the neighbours so that they're actually in order from closest to furthest
    350         newidx = np.argsort(knn_distances,axis=1)

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in get_graph(pca, batch_list, neighbors_within_batch, n_pcs, approx, metric, use_faiss, n_trees)
    171                         for i in range(ckdout[1].shape[0]):
    172                                 for j in range(ckdout[1].shape[1]):
--> 173                                         ckdout[1][i,j] = ind_to[ckdout[1][i,j]]
    174                         #save the results within the appropriate rows and columns of the structures
    175                         col_range = np.arange(to_ind*neighbors_within_batch, (to_ind+1)*neighbors_within_batch)

IndexError: index 2 is out of bounds for axis 0 with size 2

Is that really due to the fact that one batch contains only two cells?

Correcting for more than 1 covariate using BBKNN

Hi,

Is there a way to correct for more than 2 covariates (e.g. 10X batches and donors) using BBKNN ?
Also, thank you for this super helpfull tool !
Best,

ValueError: No hyperplanes of adequate size were found! When not using annoy

Hi there,
Having an issue when I try to run BBKNN without annoy. Had this error, then freshly installed everything in a new conda environment, I'm still getting the error passing from pynndescent when I run the code:
bbknn.bbknn(adata,batch_key='batch_name',use_annoy=False,metric='manhattan',neighbors_within_batch=3)

Thanks so much! This package works amazingly for correcting batch-driven compositional problems!!

Full error message below:

    122         batch_list = adata.obs[batch_key].values
    123         #call BBKNN proper
--> 124 	bbknn_out = bbknn_matrix(pca=pca, batch_list=batch_list, approx=approx,
    125 							 use_annoy=use_annoy, metric=params['metric'], **kwargs)
    126         #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in bbknn(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, annoy_n_trees, pynndescent_n_neighbors, pynndescent_random_state, use_annoy, use_faiss, metric, set_op_mix_ratio, local_connectivity)
    312         params = check_knn_metric(params, counts)
    313         #obtain the batch balanced KNN graph
--> 314         knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,params=params)
    315         #sort the neighbours so that they're actually in order from closest to furthest
    316         newidx = np.argsort(knn_distances,axis=1)

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in get_graph(pca, batch_list, params)
    173                 ind_to = np.arange(len(batch_list))[mask_to]
    174                 #create the faiss/cKDTree/KDTree/annoy, depending on approx/metric
--> 175                 ckd = create_tree(data=pca[mask_to,:params['n_pcs']], params=params)
    176                 for from_ind in range(len(batches)):
    177                         #this is the batch that will have its neighbours identified

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in create_tree(data, params)
     95                                                                         n_neighbors=params['pynndescent_n_neighbors'],
     96 									random_state=params['pynndescent_random_state'])
---> 97                 ckd.prepare()
     98         elif params['computation'] == 'faiss':
     99                 ckd = faiss.IndexFlatL2(data.shape[1])

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in prepare(self)
   1524     def prepare(self):
   1525         if not hasattr(self, "_search_graph"):
-> 1526             self._init_search_graph()
   1527         if not hasattr(self, "_search_function"):
   1528             if self._is_sparse:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self)
    962                 best_trees = [self._rp_forest[idx] for idx in best_tree_indices]
    963                 del self._rp_forest
--> 964                 self._search_forest = [
    965                     convert_tree_format(tree, self._raw_data.shape[0])
    966                     for tree in best_trees

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in <listcomp>(.0)
    963                 del self._rp_forest
    964                 self._search_forest = [
--> 965                     convert_tree_format(tree, self._raw_data.shape[0])
    966                     for tree in best_trees
    967                 ]

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in convert_tree_format(tree, data_size)
   1161     if tree.hyperplanes[0].ndim == 1:
   1162         # dense hyperplanes
-> 1163         hyperplane_dim = dense_hyperplane_dim(tree.hyperplanes)
   1164         hyperplanes = np.zeros((n_nodes, hyperplane_dim), dtype=np.float32)
   1165     else:

~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in dense_hyperplane_dim()
   1143             return hyperplanes[i].shape[0]
   1144 
-> 1145     raise ValueError("No hyperplanes of adequate size were found!")
   1146 
   1147 

ValueError: No hyperplanes of adequate size were found!```

bbknn for a subset of the anndata

Let's assume we have 3 batches, have you thought about using bbknn for batch correcting 2 out of 3 batches?

scanpy update incompatibility

calling bbknn either from scanpy.external (or bbknn direct) yields:

Error in py_call_impl(callable, dots$args, dots$keywords) : AttributeError: 'tuple' object has no attribute 'tocsr'

this is reported on scanpy scverse/scanpy#1249 where the solution maybe is downgrading umap, but that's not ideal...

umap-learn 0.4.3 bbknn 1.3.4 scanpy 1.5.1 anndata 0.7.3

Any ideas?

should change location of 'distances' and 'connectivities' for new versions of anndata

/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py:294: FutureWarning: This location for 'distances' is deprecated. It has been moved to .obsp[distances], and will not be accesible here in a future version of anndata.
  adata.uns['neighbors']['distances'] = bbknn_out[0]
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py:295: FutureWarning: This location for 'connectivities' is deprecated. It has been moved to .obsp[connectivities], and will not be accesible here in a future version of anndata.
  adata.uns['neighbors']['connectivities'] = bbknn_out[1]

New versions of anndata/scanpy look for the distances and connectivities to be in adata.obsp instead of uns['neighbors']. Probably should update the outputs of bbknn to the adata.obsp as well.

`n_trees` is now `annoy_n_trees`

Thanks for the great method!

The change of n_trees to annoy_n_trees seems to have broken compatibility with scanpy's bbknn module (sc.pp.external.bbknn). Are there any plans to make changes to that module as well?

Removing multiple covariates during integration

Hi,

I was wondering if it is possible for bbknn to correct for more than one batch indicator. i.e. different projects and different donors.

Thanks!

Connor

Logging error

It looks like logging has been broken by an update to scanpy.

It's pretty straight forward to fix the timing error, now logg.info returns a date time, which you pass to the next logg.info you want to have the elapsed time.

I'm not sure how to replace the end argument. @flying-sheep, any suggestion?

Please also require packaging in setup.py

Hi Krzysztof,

the bbknn bioconda build fails for current version 1.3.10 and 1.3.11 because the packaging package you introduced in d6c60f5 cannot be found. I added it to the recipe dependencies, but I recommend adding it to install_requires as well.

Thanks,
Jens

Understanding `neighbors_within_batch` parameter?

Thanks for the nice tool! I'm trying to conceptually understand the neighbors_within_batch parameter. I read the docstring, but I'm still not clear exactly what this means? Is it 'k' when approx=True? Setting this value higher leads to a more spread out UMAP (i.e. less correction), which may be preferable for some datasets? Is there a reason for the default value of 3?

bbknn/bbknn/__init__.py

Lines 216 to 218 in 7e736d4

    
           	neighbors_within_batch : ``int``, optional (default: 3) 
        
           		How many top neighbours to report for each batch; total number of neighbours  
        
           		will be this number times the number of batches.

"sklearn" as a dependency

Just a heads up - you are using "sklearn" as a dependency but this isn't the correct package on PyPi. It should be "scikit-learn".

Something I noticed because I made the same mistake when writing a package a few years ago!

save_knn

Hi @ktpolanski , I find your save_knn function extremely useful (in fact, it was the primary reason I was using bbknn in the first place) ! I am currently reverting to the 1.3.0 version of bbknn so I can use save_knn , but it would be nice to have the save_knn option in future versions of bbknn, if possible.

Standalone function

Really exciting method! I don't usually usually use scanpy for my pipelines. Do you have a BBKNN function that works without it? Maybe something that takes in only PCs and batch labels?

Is it possible to identify marker genes?

Hello:

Suppose we ran a clustering method on bbknn output and identified a few clusters that hopefully represent distinct cell types. Do I get it right that it's impossible to identify marker genes for those clusters? BBKNN doesn't alter the original data or PCs obtained from the original data, so we never obtain the gene expression adjusted for batch effect. If I am right, is there a method to adjust the original data for batch effect using bbknn output?

Thanks in advance,
Nik

np.matrix and ridge_regression

Hello,
I'm trying to use bbknn.ridge_regression but get the following output when I run
bbknn.ridge_regression(adata, batch_key=['batch'], confounder_key=['cell_type'])
Is this an issue with compatibility with current numpy?
Many thanks

TypeError Traceback (most recent call last)
Cell In[19], line 9
7 import bbknn
8 # bbknn.bbknn(adata_v3)
----> 9 bbknn.ridge_regression(adata_v3, batch_key=['batch'], confounder_key=['cell_type'])
10 # scanpy.tl.pca(adata_v3)
11 # bbknn.bbknn(adata_v3)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/bbknn/init.py:196, in ridge_regression(adata, batch_key, confounder_key, chunksize, copy, **kwargs)
193 X_exp = X_exp.todense()
194 #fit the ridge regression model, compute the expression explained by the technical
195 #effect, and the remaining residual
--> 196 LR.fit(dummy,X_exp)
197 X_explained.append(dm.dot(LR.coef_[:,batch_index].T))
198 X_remain.append(X_exp - X_explained[-1])

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:1151, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1144 estimator._validate_params()
1146 with config_context(
1147 skip_parameter_validation=(
1148 prefer_skip_nested_validation or global_skip_validation
1149 )
1150 ):
-> 1151 return fit_method(estimator, *args, **kwargs)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/linear_model/_ridge.py:1134, in Ridge.fit(self, X, y, sample_weight)
1114 """Fit Ridge regression model.
1115
1116 Parameters
(...)
1131 Fitted estimator.
1132 """
1133 _accept_sparse = _get_valid_accept_sparse(sparse.issparse(X), self.solver)
-> 1134 X, y = self._validate_data(
1135 X,
1136 y,
1137 accept_sparse=_accept_sparse,
1138 dtype=[np.float64, np.float32],
1139 multi_output=True,
1140 y_numeric=True,
1141 )
1142 return super().fit(X, y, sample_weight=sample_weight)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:621, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
619 y = check_array(y, input_name="y", **check_y_params)
620 else:
--> 621 X, y = check_X_y(X, y, **check_params)
622 out = X, y
624 if not no_val_X and check_params.get("ensure_2d", True):

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1163, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
1143 raise ValueError(
1144 f"{estimator_name} requires y to be passed, but the target y is None"
1145 )
1147 X = check_array(
1148 X,
1149 accept_sparse=accept_sparse,
(...)
1160 input_name="X",
1161 )
-> 1163 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
1165 check_consistent_length(X, y)
1167 return X, y

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1173, in _check_y(y, multi_output, y_numeric, estimator)
1171 """Isolated part of check_X_y dedicated to y validation"""
1172 if multi_output:
-> 1173 y = check_array(
1174 y,
1175 accept_sparse="csr",
1176 force_all_finite=True,
1177 ensure_2d=False,
1178 dtype=None,
1179 input_name="y",
1180 estimator=estimator,
1181 )
1182 else:
1183 estimator_name = _check_estimator_name(estimator)

File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:753, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
662 """Input validation on an array, list, sparse matrix or similar.
663
664 By default, the input is checked to be a non-empty 2D array containing
(...)
750 The converted and validated array.
751 """
752 if isinstance(array, np.matrix):
--> 753 raise TypeError(
754 "np.matrix is not supported. Please convert to a numpy array with "
755 "np.asarray. For more information see: "
756 "https://numpy.org/doc/stable/reference/generated/numpy.matrix.html"
757 )
759 xp, is_array_api_compliant = get_namespace(array)
761 # store reference to original array to check if copy is needed when
762 # function returns

TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html

Details for pbmc dataset used

Hi,

I am using the dataset mentioned in the notebook linked to this rep.
The link to download the data is ftp://ngs.sanger.ac.uk/production/teichmann/BBKNN/PBMC.merged.h5ad

Can you provide details about where the dataset is obtained (sequencing technologies and such)? Is there a publication from your group which explains this dataset?

umap error after package updates

I updated scanpy (1.5.1), umap-learn (0.4.6) and BBKNN. But I found the following error when running the umap function:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra',
 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or
 'precomputed', or a callable

Here is my codes:

pca <- sce@[email protected]
anndata = import("anndata", convert=FALSE)
sc = import("scanpy",convert=FALSE)
np = import("numpy",convert=FALSE)
bbknn = import("bbknn", convert=FALSE)
adata = anndata$AnnData(X=pca, obs=sce$patient)
sc$tl$pca(adata)

adata$obsm$X_pca = pca
bbknn$bbknn(adata, batch_key=0)
sc$tl$umap(adata)

I could run it with no problem before I update these packages. Could you help me figure this out?

BBKNN incompatible with 'x86_64', need 'arm64e'

Is bbknn not able to work on the new mac chips? I get the following error:
mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))

error reproducing notebook

Hi there,

I tried to reproduce the pancreas Jupyter notebook (planning to eventually add some more data to it). However, after loading the 4 datasets in the holder, when I run adata = holder[0].concatenate(holder[1:], join='outer') I get the following error:

<ipython-input-27-8bddebd9bb8e> in <module>
----> 1 adata = holder[0].concatenate(holder[1:], join='outer')
      2 #adata.X = adata.X.tocsr()
      3 #adata = adata[:,['ERCC' not in item.upper() for item in adata.var_names]]
      4 #adata.raw = sc.pp.log1p(adata, copy=True)
      5 #sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)

/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in concatenate(self, join, batch_key, batch_categories, index_unique, *adatas)
   1807                 # constructed like that
   1808                 X[obs_i:obs_i+ad.n_obs,
-> 1809                   var_names.isin(vars_intersect)] = ad[:, vars_intersect].X
   1810             else:
   1811                 Xs.append(ad[:, vars_intersect].X)

/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in __getitem__(self, index)
   1299     def __getitem__(self, index):
   1300         """Returns a sliced view of the object."""
-> 1301         return self._getitem_view(index)
   1302 
   1303     def _getitem_view(self, index):

/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _getitem_view(self, index)
   1302 
   1303     def _getitem_view(self, index):
-> 1304         oidx, vidx = self._normalize_indices(index)
   1305         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1306 

/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _normalize_indices(self, index)
   1278                 return index
   1279         obs, var = super(AnnData, self)._unpack_index(index)
-> 1280         obs = _normalize_index(obs, self.obs_names)
   1281         var = _normalize_index(var, self.var_names)
   1282         return obs, var

/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _normalize_index(index, names)
    238     if not isinstance(names, RangeIndex):
    239         assert names.dtype != float and names.dtype != int, \
--> 240             'Don’t call _normalize_index with non-categorical/string names'
    241 
    242     # the following is insanely slow for sequences, we replaced it using pandas below

AssertionError: Don’t call _normalize_index with non-categorical/string names

These are the current versions I'm using:
scanpy==1.3.6 anndata==0.6.13 numpy==1.15.3 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1.

Do you know what is causing this?

Thank you!

bbknn.bbknn_pca_matrix() output is different from that of bbknn.bbknn()

Are differences expected? If not, I can attach the code and data.

Bandwidth parameter no longer supported by scanpy or UMAP

This was originally reported by a scanpy user here: scverse/scanpy#632.

Scanpy has just removed a frozen version of the umap library we'd been using (PR: scverse/scanpy#576). The current version of umap doesn't support a bandwidth parameter, so now compute_connectivities_umap doesn't either. It looks like this is causing an issue with these lines, where bandwidth is explicitly passed:

bbknn/bbknn/__init__.py

Lines 272 to 274 in 93f25dc

    
           dist, cnts = compute_connectivities_umap(knn_indices, knn_distances, knn_indices.shape[0],  
        
           										 knn_indices.shape[1], bandwidth=bandwidth,  
        
           										 local_connectivity=local_connectivity)

Sorry about the break with so little notice!

Export BBKNN results to R software

Dear BBKNN team,

I am using BBKNN in R as indicated in this github page and I am wondering how I could, for instance, export BBKNN results to perform UMAP/clustering/trajectory analysis with some customized scripts in R.

This is not an issue with the software at all, but I could not find the "batch-corrected data" to export from the anndata object (I am aware that the algorithm does not change the data matrix), but then which data I could use as input, for instance, to run umap with the umap R package?

Thank you in advance!

An error when running bbknn$bbknn(adata,batch_key=0)

Hi @Teichlab,

I want to use bbknn in R. But an error come out as follow:
`

bbknn = import(module = 'bbknn')
sc = import("scanpy",convert=FALSE)
np = import("numpy")
scipy = import("scipy")
b <- brca[[1]]
pca.input <- b@reductions$[email protected]
batches <- [email protected]$sample
adata <- anndata$AnnData(X = pca.input, obs = batches)
sc$tl$pca(adata)
None
adata$obsm$X_pca <- r_to_py(pca.input)
bbknn$bbknn(adata, batch_key = 0)

*** caught illegal operation ***
address 0x2b6db2dc781d, cause 'illegal operand'

Traceback:
1: py_call_impl(callable, dots$args, dots$keywords)
2: bbknn$bbknn(adata, batch_key = 0)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:

`
Could you please help me figure out this problem?

Thanks a lot

Best

Zhaohui Ruan

ModuleNotFoundError

A ModuleNotFoundError raised for me when I was trying bbknn.bbknn(adata) function, at ./bbknn/__init__.py line 262, where you wrote

start = logg.info(...)

I know that this logg came from line 10

from scanpy import logging as logg

Here you are using a "try/except" syntax, such that you are allowing the nonexistence of scanpy. But in the core function (line 262) it seems still mandatory.

Simply installing scanpy has already fixed it, I am just writing this down in case anyone else new to all these pipelines encounters a similar issue.

edge weights

Dear,
I am unfamiliar with graph theory. Why do you convet the neighbour distance collections to exponentially related connectivities ? How to assign weights to the edges ? Does BBKNN construct the connectivity graph with Jaccard index (which is used in Seuart and Scanpy for louvain clustering)?

	neighbors_within_batch : ``int``, optional (default: 3)
	How many top neighbours to report for each batch; total number of neighbours
	will be this number times the number of batches.

	dist, cnts = compute_connectivities_umap(knn_indices, knn_distances, knn_indices.shape[0],
	knn_indices.shape[1], bandwidth=bandwidth,
	local_connectivity=local_connectivity)

teichlab / bbknn Goto Github PK

bbknn's People

Contributors

Stargazers

Watchers

Forkers

bbknn's Issues

[Reopening Issue] Thanks for your quick reply! I am having trouble finding the 5' dataset on the 10X Genomics website. Is it no longer available? Can you share the link?

I'm running on the pbmc notebook with python 3.7.7, Scanpy 1.5.1, bbknn 1.3.3. After running the bbknn, the following error message appeared. I tested the "sc.external.pp.bbknn," and the same message was shown. Updating the version to python 3.8.0, scanpy 1.6.0 did not work either.

sc.tl.umap(adata) computing UMAP

Recommend Projects

Recommend Topics

Recommend Org

`sc.tl.umap(adata)`
computing UMAP