teichlab / bbknn Goto Github PK
View Code? Open in Web Editor NEWBatch balanced KNN
License: MIT License
Batch balanced KNN
License: MIT License
Dear,
Scanorama handles the mutual nearest neighbors-based matching, batch correction, and panorama assembly. I have not find assembly function in pancreas-4-Scanorama.ipynb. what's the corresponding function of scanorama's assembly function in bbknn (or scanpy)?
Dear,
Do you plan to publish BBKNN on a high impact factor journal later? Some people argue that bioAxiv is not a serious journal and we are a little worried. After all, it will take us much time to follow BBKNN.
I was trying to run bbknn (and also scrublet) and both independently kept crashing my sessions.
I think it's got to do with annoy==1.17.0
and simply downgrading to annoy==1.16.3
made everything run again.
"The input data was downloaded from the 10X Genomics website. The exact 5′dataset was ‘PBMCs of a healthy donor 5′gene expression’, under Cell Ranger 2.1.0, under V(D)J + 5′Gene Expression. The exact 3′dataset was ‘8k PBMCs from a Healthy Donor’, under Cell Ranger 2.1.0, under Chromium Demonstration (v2 Chemistry)."
I have tried bbknn based on the scanpy tutorial, but the running time is very quick, and the result is very similar to the original data(in fact, same)
I just use:
bbknn.bbknn(adata,batch_key='batch')
Anything wrong here? Thanks.
Hi @Teichlab,
The weird condition occurred from data integrated through scanpy using bbknn.
The samples can be integrated by umap plot, but the tsne plotting showing the data that can not be integrated.
The different showing below between umap and tsne.
Based on this, how can i get same integrated effect on tsne when using bbknn to integrated. if it should not been done,
whether had another way to replace this selection?
Best
hanhuihong
Hi, Authors of BBKNN,
My name is Feng Zhang. I recently build a pipeline (BEER, published in Cell Discovery, https://github.com/jumphone/BEER) to remove batch-effect related PCA subspaces.
And I find that it works well when combing it with BBKNN (especially when integrating scRNA-seq and scATAC-seq data).
Maybe we can collaborate with each other to further improve the performance.
Best,
Feng Zhang
I tried to redo this example to using bbknn demo (https://nbviewer.jupyter.org/github/Teichlab/bbknn/blob/master/examples/demo.ipynb)
but I faced two problem
1- NameError: name 'logg' is not defined
2- AttributeError: module 'bbknn' has no attribute 'ridge_regression'
NameError Traceback (most recent call last)
in
3 except ImportError:
4 pass
----> 5 bbknn.bbknn(adata)
6 sc.tl.umap(adata)
7 sc.pl.umap(adata, color=['batch','celltype'])
~\miniconda3\envs\sc_trial\lib\site-packages\bbknn_init_.py in bbknn(adata, batch_key, approx, metric, copy, **kwargs)
259 '''
260 start = logg.info('computing batch balanced neighbors')
--> 261 adata = adata.copy() if copy else adata
262 #basic sanity checks to begin
263 #is our batch key actually present in the object?
NameError: name 'logg' is not defined
2- regression :
AttributeError: module 'bbknn' has no attribute 'ridge_regression'
I am working on windows and I changed bbknn between 1.3.6 and 1.4 and 1.5 and same problem in all
how can I solve these problems?
We are currently trying to import DistanceMetrics
from sklearn.neighbors
in bbknn/matrix.py
(link to code) but this is only possible for scikit_learn<1.0.0
. DistanceMetrics
is moved to sklearn.metric
in the later versions of sklearn
.
I will raise a PR for fixing this via handling the ImportError
-- but would like to hear opinions from the maintainers for sure!
Good afternoon,
TypeError Traceback (most recent call last)
in
----> 1 bdata = bbknn.bbknn(adata,batch_key='Sample',copy=True)
~\Anaconda3\lib\site-packages\bbknn_init_.py in bbknn(adata, batch_key, approx, metric, copy, **kwargs)
259 If True
, return a copy instead of writing to the supplied adata.
260 '''
--> 261 logg.info('computing batch balanced neighbors', r=True)
262 adata = adata.copy() if copy else adata
263 #basic sanity checks to begin
TypeError: info() got an unexpected keyword argument 'r'
Do you have any idea what causes this error? Thank you.
I really like your robustness of integration especially the fast performance.
I kind of prefer using R solely due to their intuitiveness.
So are there any plan on making R package for bbknn? It would be really wonderful
Hello,
I was able to successfully install bbknn; however, when I went to run it in a notebook, I got the following error:
AssertionError: Failed in nopython mode pipeline (step: native lowering)
Storing i64 to ptr of i32 ('dim'). FE type int32
I realize that this error is related to numba (currently running 0.54.1) . I tried downgrading it to 0.52.0; however, I got a few compatibility issues with bbknn. Do you have any suggestions?
Thank you!
Great package!
How did you go about initially computing the X_umap_3D coordinates without overwriting X_umap?
Hi there,
First of all, thank you for this great package! It has been very helpful for my graduation project so far.
I am currently following the bbknn tutorial notebook, but I get an error at the ridge_regression part of the tutorial.
"AttributeError: module 'numpy' has no attribute 'int'."
"np.int
was a deprecated alias for the builtin int
. To avoid this error in existing code, use int
by itself. Doing this will not modify any behavior and is safe. When replacing np.int
, you may wish to use e.g. np.int64
or np.int32
to specify the precision. If you wish to review your current use, check the release note link for additional information.
The aliases was originally deprecated in NumPy 1.20;""
I get this error when following your tutorial notebook with my own dataset. I am quite new to scRNAseq analysis and I am simply wondering if it's my package management that's wrong or that it might be something else?
Anyway, grateful in advance for your reply!
Greetings,
Julia
Hello! I'm trying to recapitulate some results (using bbknn in R) from a paper that uses bbknn for scRNAseq batch correction where they say they set neighbors_within_batch to 10 but I'm running into an issue. I'm able to run the code fine in R without setting a neighbors_within_batch argument (see a possible edit below to the end py_to_r code). When I set the neighbors_within_batch bbknn$bbknn(adata, batch_key=0, neighbors_within_batch=10)
I get an error with this traceback:
Error in py_call_impl(callable, dots$args, dots$keywords) :
TypeError: 'float' object cannot be interpreted as an integer
5.
stop(structure(list(message = "TypeError: 'float' object cannot be interpreted as an integer",
call = py_call_impl(callable, dots$args, dots$keywords),
cppstack = structure(list(file = "", line = -1L, stack = c("1 reticulate.so 0x0000000114a023ed _ZN4Rcpp9exceptionC2EPKcb + 221",
"2 reticulate.so 0x0000000114a0a485 _ZN4Rcpp4stopERKNSt3__112basic_stringIcNS0_11char_traitsIcEENS0_9allocatorIcEEEE + 53", ...
4.
get_graph at init.py#148
3.
bbknn_pca_matrix at init.py#355
2.
bbknn at init.py#294
1.
bbknn$bbknn(adata, batch_key = 0, neighbors_within_batch = 10)
My full code is as follows (pca matrix and batch assignment vector not shown; I don't think either of these is causing the error since I can run this code minus the neighbors_within_batch argument but happy to post how I generated them/what they contain if useful):
adata = anndata$AnnData(X=pca, obs=batch)
sc$tl$pca(adata)
adata$obsm$X_pca = pca
bbknn$bbknn(adata, batch_key=0, neighbors_within_batch=10)
sc$tl$umap(adata)
umap = py_to_r(adata$obsm[["X_umap"]])
I'm at a loss for what's causing this error... do you have any idea what I'm doing wrong? I'm assuming I can use the neighbors_within_batch parameter in R?
Also, I think umap = py_to_r(adata$obsm$X_umap)
should be umap = py_to_r(adata$obsm[["X_umap"]])
? I was only able to get the latter to work...
Thanks,
Rachel
P.S. I'm sorry if I've missed including anything or if this looks funky when it gets posted; this is the first time I've asked about an issue on github. Happy to provide more details if needed!
no more annoy
dependency, pynndescent
is also default for umap, and it performs much better
Hello:
When I installed bbknn 1.4.0 on ubuntu 16 and ubuntu 18, I got different performances on these two different operating systems. Concretely, the cell similarity scores varied between batches when running bbknn on ubuntu 16 and 18. Can you help me solve it?
Thank you very much!
Hi! Thank you for the awesome package :)
I was wondering if it is possible to use other dimensionality reduction methods than PCA for bbknn.bbknn
, such as LSI which is commonly used for ATAC data? use_rep
argument would suggest it is possible, but I wanted to check what your thoughts were before running it!
Hi,
Can you update the recipe on bioconda? The bioconda version is still at 1.5.1
Thanks
Hello,
I'm following the tutorial which integrates with Scanpy. I was able to successfully run this line of code:
sc.external.pp.bbknn(adata, batch_key='sample_id',metric='euclidean')
In the next step I get this error:
sc.tl.umap(adata)
KeyError Traceback (most recent call last)
in
----> 1 sc.tl.umap(adata)
/miniconda3/envs/scanpy/lib/python3.7/site-packages/scanpy/tools/_umap.py in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method)
142 X_umap = simplicial_set_embedding(
143 X,
--> 144 adata.uns['neighbors']['connectivities'].tocoo(),
145 n_components,
146 alpha,
KeyError: 'connectivities'
Am I doing something incorrectly ? How can I fix this problem ? Thanks a lot for your help.
I'm trying to do bbknn with my own single-cell data and demo data, but kernals died every time when I doing whatever my own data or demo data.
Is this because of my device? or?
sc.settings.verbosity = 3sc.logging.print_header()
sc.settings.set_figure_params(dpi=80)
scanpy==1.10.1 anndata==0.10.5.post1 umap==0.5.5 numpy==1.26.4 scipy==1.11.1 pandas==2.2.2 scikit-learn==1.3.0 statsmodels==0.14.0 igraph==0.11.5 louvain==0.8.2 pynndescent==0.5.11
adata = sc.read('pancreas.h5ad', backup_url='https://www.dropbox.com/s/qj1jlm9w10wmt0u/pancreas.h5ad?dl=1')
bbknn.bbknn(adata, batch_key='batch')
I have a dataset that includes one batch containing only 2 cells.
The command was sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False)
It seems that bbknn doesn't like that small batch and gave me this error:
computing batch balanced neighbors
WARNING: unrecognised metric for type of neighbor calculation, switching to euclidean
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-274-086c83c814fc> in <module>
----> 1 sc.external.pp.bbknn(bdata,batch_key = "batch",approx=False)
/usr/local/lib/python3.6/dist-packages/scanpy/external/pp/_bbknn.py in bbknn(adata, batch_key, approx, metric, copy, n_pcs, trim, n_trees, use_faiss, set_op_mix_ratio, local_connectivity, **kwargs)
118 set_op_mix_ratio=set_op_mix_ratio,
119 local_connectivity=local_connectivity,
--> 120 **kwargs,
121 )
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn(adata, batch_key, use_rep, approx, metric, copy, **kwargs)
289 #call BBKNN proper
290 bbknn_out = bbknn_pca_matrix(pca=pca, batch_list=batch_list,
--> 291 approx=approx, metric=metric, **kwargs)
292 #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key
293 adata.uns['neighbors'] = {}
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in bbknn_pca_matrix(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, n_trees, use_faiss, metric, set_op_mix_ratio, local_connectivity)
346 knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,n_pcs=n_pcs,n_trees=n_trees,
347 approx=approx,metric=metric,use_faiss=use_faiss,
--> 348 neighbors_within_batch=neighbors_within_batch)
349 #sort the neighbours so that they're actually in order from closest to furthest
350 newidx = np.argsort(knn_distances,axis=1)
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py in get_graph(pca, batch_list, neighbors_within_batch, n_pcs, approx, metric, use_faiss, n_trees)
171 for i in range(ckdout[1].shape[0]):
172 for j in range(ckdout[1].shape[1]):
--> 173 ckdout[1][i,j] = ind_to[ckdout[1][i,j]]
174 #save the results within the appropriate rows and columns of the structures
175 col_range = np.arange(to_ind*neighbors_within_batch, (to_ind+1)*neighbors_within_batch)
IndexError: index 2 is out of bounds for axis 0 with size 2
Is that really due to the fact that one batch contains only two cells?
Hi,
Is there a way to correct for more than 2 covariates (e.g. 10X batches and donors) using BBKNN ?
Also, thank you for this super helpfull tool !
Best,
Hi there,
Having an issue when I try to run BBKNN without annoy. Had this error, then freshly installed everything in a new conda environment, I'm still getting the error passing from pynndescent when I run the code:
bbknn.bbknn(adata,batch_key='batch_name',use_annoy=False,metric='manhattan',neighbors_within_batch=3)
Thanks so much! This package works amazingly for correcting batch-driven compositional problems!!
Full error message below:
122 batch_list = adata.obs[batch_key].values
123 #call BBKNN proper
--> 124 bbknn_out = bbknn_matrix(pca=pca, batch_list=batch_list, approx=approx,
125 use_annoy=use_annoy, metric=params['metric'], **kwargs)
126 #store the parameters in .uns['neighbors']['params'], add use_rep and batch_key
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in bbknn(pca, batch_list, neighbors_within_batch, n_pcs, trim, approx, annoy_n_trees, pynndescent_n_neighbors, pynndescent_random_state, use_annoy, use_faiss, metric, set_op_mix_ratio, local_connectivity)
312 params = check_knn_metric(params, counts)
313 #obtain the batch balanced KNN graph
--> 314 knn_distances, knn_indices = get_graph(pca=pca,batch_list=batch_list,params=params)
315 #sort the neighbours so that they're actually in order from closest to furthest
316 newidx = np.argsort(knn_distances,axis=1)
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in get_graph(pca, batch_list, params)
173 ind_to = np.arange(len(batch_list))[mask_to]
174 #create the faiss/cKDTree/KDTree/annoy, depending on approx/metric
--> 175 ckd = create_tree(data=pca[mask_to,:params['n_pcs']], params=params)
176 for from_ind in range(len(batches)):
177 #this is the batch that will have its neighbours identified
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/bbknn/matrix.py in create_tree(data, params)
95 n_neighbors=params['pynndescent_n_neighbors'],
96 random_state=params['pynndescent_random_state'])
---> 97 ckd.prepare()
98 elif params['computation'] == 'faiss':
99 ckd = faiss.IndexFlatL2(data.shape[1])
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in prepare(self)
1524 def prepare(self):
1525 if not hasattr(self, "_search_graph"):
-> 1526 self._init_search_graph()
1527 if not hasattr(self, "_search_function"):
1528 if self._is_sparse:
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in _init_search_graph(self)
962 best_trees = [self._rp_forest[idx] for idx in best_tree_indices]
963 del self._rp_forest
--> 964 self._search_forest = [
965 convert_tree_format(tree, self._raw_data.shape[0])
966 for tree in best_trees
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/pynndescent_.py in <listcomp>(.0)
963 del self._rp_forest
964 self._search_forest = [
--> 965 convert_tree_format(tree, self._raw_data.shape[0])
966 for tree in best_trees
967 ]
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in convert_tree_format(tree, data_size)
1161 if tree.hyperplanes[0].ndim == 1:
1162 # dense hyperplanes
-> 1163 hyperplane_dim = dense_hyperplane_dim(tree.hyperplanes)
1164 hyperplanes = np.zeros((n_nodes, hyperplane_dim), dtype=np.float32)
1165 else:
~/utils/miniconda3/envs/scanpy/lib/python3.9/site-packages/pynndescent/rp_trees.py in dense_hyperplane_dim()
1143 return hyperplanes[i].shape[0]
1144
-> 1145 raise ValueError("No hyperplanes of adequate size were found!")
1146
1147
ValueError: No hyperplanes of adequate size were found!```
Let's assume we have 3 batches, have you thought about using bbknn for batch correcting 2 out of 3 batches?
calling bbknn either from scanpy.external (or bbknn direct) yields:
Error in py_call_impl(callable, dots$args, dots$keywords) : AttributeError: 'tuple' object has no attribute 'tocsr'
this is reported on scanpy scverse/scanpy#1249 where the solution maybe is downgrading umap, but that's not ideal...
umap-learn 0.4.3 bbknn 1.3.4 scanpy 1.5.1 anndata 0.7.3
Any ideas?
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py:294: FutureWarning: This location for 'distances' is deprecated. It has been moved to .obsp[distances], and will not be accesible here in a future version of anndata.
adata.uns['neighbors']['distances'] = bbknn_out[0]
/usr/local/lib/python3.6/dist-packages/bbknn/__init__.py:295: FutureWarning: This location for 'connectivities' is deprecated. It has been moved to .obsp[connectivities], and will not be accesible here in a future version of anndata.
adata.uns['neighbors']['connectivities'] = bbknn_out[1]
New versions of anndata/scanpy look for the distances
and connectivities
to be in adata.obsp instead of uns['neighbors']. Probably should update the outputs of bbknn to the adata.obsp as well.
Thanks for the great method!
The change of n_trees
to annoy_n_trees
seems to have broken compatibility with scanpy's bbknn module (sc.pp.external.bbknn
). Are there any plans to make changes to that module as well?
Hi,
I was wondering if it is possible for bbknn to correct for more than one batch indicator. i.e. different projects and different donors.
Thanks!
Connor
It looks like logging has been broken by an update to scanpy.
It's pretty straight forward to fix the timing error, now logg.info
returns a date time, which you pass to the next logg.info
you want to have the elapsed time.
I'm not sure how to replace the end
argument. @flying-sheep, any suggestion?
Hi Krzysztof,
the bbknn bioconda build fails for current version 1.3.10
and 1.3.11
because the packaging
package you introduced in d6c60f5 cannot be found. I added it to the recipe dependencies, but I recommend adding it to install_requires
as well.
Thanks,
Jens
Thanks for the nice tool! I'm trying to conceptually understand the neighbors_within_batch
parameter. I read the docstring, but I'm still not clear exactly what this means? Is it 'k' when approx=True
? Setting this value higher leads to a more spread out UMAP (i.e. less correction), which may be preferable for some datasets? Is there a reason for the default value of 3
?
Lines 216 to 218 in 7e736d4
Just a heads up - you are using "sklearn" as a dependency but this isn't the correct package on PyPi. It should be "scikit-learn".
Something I noticed because I made the same mistake when writing a package a few years ago!
Hi @ktpolanski , I find your save_knn function extremely useful (in fact, it was the primary reason I was using bbknn in the first place) ! I am currently reverting to the 1.3.0 version of bbknn so I can use save_knn , but it would be nice to have the save_knn option in future versions of bbknn, if possible.
Really exciting method! I don't usually usually use scanpy for my pipelines. Do you have a BBKNN function that works without it? Maybe something that takes in only PCs and batch labels?
Hello:
Suppose we ran a clustering method on bbknn output and identified a few clusters that hopefully represent distinct cell types. Do I get it right that it's impossible to identify marker genes for those clusters? BBKNN doesn't alter the original data or PCs obtained from the original data, so we never obtain the gene expression adjusted for batch effect. If I am right, is there a method to adjust the original data for batch effect using bbknn output?
Thanks in advance,
Nik
Hello,
I'm trying to use bbknn.ridge_regression but get the following output when I run
bbknn.ridge_regression(adata, batch_key=['batch'], confounder_key=['cell_type'])
Is this an issue with compatibility with current numpy?
Many thanks
TypeError Traceback (most recent call last)
Cell In[19], line 9
7 import bbknn
8 # bbknn.bbknn(adata_v3)
----> 9 bbknn.ridge_regression(adata_v3, batch_key=['batch'], confounder_key=['cell_type'])
10 # scanpy.tl.pca(adata_v3)
11 # bbknn.bbknn(adata_v3)
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/bbknn/init.py:196, in ridge_regression(adata, batch_key, confounder_key, chunksize, copy, **kwargs)
193 X_exp = X_exp.todense()
194 #fit the ridge regression model, compute the expression explained by the technical
195 #effect, and the remaining residual
--> 196 LR.fit(dummy,X_exp)
197 X_explained.append(dm.dot(LR.coef_[:,batch_index].T))
198 X_remain.append(X_exp - X_explained[-1])
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:1151, in _fit_context..decorator..wrapper(estimator, *args, **kwargs)
1144 estimator._validate_params()
1146 with config_context(
1147 skip_parameter_validation=(
1148 prefer_skip_nested_validation or global_skip_validation
1149 )
1150 ):
-> 1151 return fit_method(estimator, *args, **kwargs)
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/linear_model/_ridge.py:1134, in Ridge.fit(self, X, y, sample_weight)
1114 """Fit Ridge regression model.
1115
1116 Parameters
(...)
1131 Fitted estimator.
1132 """
1133 _accept_sparse = _get_valid_accept_sparse(sparse.issparse(X), self.solver)
-> 1134 X, y = self._validate_data(
1135 X,
1136 y,
1137 accept_sparse=_accept_sparse,
1138 dtype=[np.float64, np.float32],
1139 multi_output=True,
1140 y_numeric=True,
1141 )
1142 return super().fit(X, y, sample_weight=sample_weight)
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/base.py:621, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
619 y = check_array(y, input_name="y", **check_y_params)
620 else:
--> 621 X, y = check_X_y(X, y, **check_params)
622 out = X, y
624 if not no_val_X and check_params.get("ensure_2d", True):
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1163, in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
1143 raise ValueError(
1144 f"{estimator_name} requires y to be passed, but the target y is None"
1145 )
1147 X = check_array(
1148 X,
1149 accept_sparse=accept_sparse,
(...)
1160 input_name="X",
1161 )
-> 1163 y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
1165 check_consistent_length(X, y)
1167 return X, y
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:1173, in _check_y(y, multi_output, y_numeric, estimator)
1171 """Isolated part of check_X_y dedicated to y validation"""
1172 if multi_output:
-> 1173 y = check_array(
1174 y,
1175 accept_sparse="csr",
1176 force_all_finite=True,
1177 ensure_2d=False,
1178 dtype=None,
1179 input_name="y",
1180 estimator=estimator,
1181 )
1182 else:
1183 estimator_name = _check_estimator_name(estimator)
File ~/miniforge3/envs/mypython3/lib/python3.9/site-packages/sklearn/utils/validation.py:753, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
662 """Input validation on an array, list, sparse matrix or similar.
663
664 By default, the input is checked to be a non-empty 2D array containing
(...)
750 The converted and validated array.
751 """
752 if isinstance(array, np.matrix):
--> 753 raise TypeError(
754 "np.matrix is not supported. Please convert to a numpy array with "
755 "np.asarray. For more information see: "
756 "https://numpy.org/doc/stable/reference/generated/numpy.matrix.html"
757 )
759 xp, is_array_api_compliant = get_namespace(array)
761 # store reference to original array to check if copy is needed when
762 # function returns
TypeError: np.matrix is not supported. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html
Hi,
I am using the dataset mentioned in the notebook linked to this rep.
The link to download the data is ftp://ngs.sanger.ac.uk/production/teichmann/BBKNN/PBMC.merged.h5ad
Can you provide details about where the dataset is obtained (sequencing technologies and such)? Is there a publication from your group which explains this dataset?
I updated scanpy (1.5.1), umap-learn (0.4.6) and BBKNN. But I found the following error when running the umap function:
Error in py_call_impl(callable, dots$args, dots$keywords) :
ValueError: Unknown metric angular. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra',
'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto',
'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or
'precomputed', or a callable
Here is my codes:
pca <- sce@[email protected]
anndata = import("anndata", convert=FALSE)
sc = import("scanpy",convert=FALSE)
np = import("numpy",convert=FALSE)
bbknn = import("bbknn", convert=FALSE)
adata = anndata$AnnData(X=pca, obs=sce$patient)
sc$tl$pca(adata)
adata$obsm$X_pca = pca
bbknn$bbknn(adata, batch_key=0)
sc$tl$umap(adata)
I could run it with no problem before I update these packages. Could you help me figure this out?
Is bbknn not able to work on the new mac chips? I get the following error:
mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
Hi there,
I tried to reproduce the pancreas Jupyter notebook (planning to eventually add some more data to it). However, after loading the 4 datasets in the holder
, when I run adata = holder[0].concatenate(holder[1:], join='outer')
I get the following error:
<ipython-input-27-8bddebd9bb8e> in <module>
----> 1 adata = holder[0].concatenate(holder[1:], join='outer')
2 #adata.X = adata.X.tocsr()
3 #adata = adata[:,['ERCC' not in item.upper() for item in adata.var_names]]
4 #adata.raw = sc.pp.log1p(adata, copy=True)
5 #sc.pp.normalize_per_cell(adata, counts_per_cell_after=1e4)
/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in concatenate(self, join, batch_key, batch_categories, index_unique, *adatas)
1807 # constructed like that
1808 X[obs_i:obs_i+ad.n_obs,
-> 1809 var_names.isin(vars_intersect)] = ad[:, vars_intersect].X
1810 else:
1811 Xs.append(ad[:, vars_intersect].X)
/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in __getitem__(self, index)
1299 def __getitem__(self, index):
1300 """Returns a sliced view of the object."""
-> 1301 return self._getitem_view(index)
1302
1303 def _getitem_view(self, index):
/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _getitem_view(self, index)
1302
1303 def _getitem_view(self, index):
-> 1304 oidx, vidx = self._normalize_indices(index)
1305 return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
1306
/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _normalize_indices(self, index)
1278 return index
1279 obs, var = super(AnnData, self)._unpack_index(index)
-> 1280 obs = _normalize_index(obs, self.obs_names)
1281 var = _normalize_index(var, self.var_names)
1282 return obs, var
/anaconda3/envs/leiden/lib/python3.6/site-packages/anndata/base.py in _normalize_index(index, names)
238 if not isinstance(names, RangeIndex):
239 assert names.dtype != float and names.dtype != int, \
--> 240 'Don’t call _normalize_index with non-categorical/string names'
241
242 # the following is insanely slow for sequences, we replaced it using pandas below
AssertionError: Don’t call _normalize_index with non-categorical/string names
These are the current versions I'm using:
scanpy==1.3.6 anndata==0.6.13 numpy==1.15.3 scipy==1.1.0 pandas==0.23.4 scikit-learn==0.20.0 statsmodels==0.9.0 python-igraph==0.7.1 louvain==0.6.1
.
Do you know what is causing this?
Thank you!
Are differences expected? If not, I can attach the code and data.
This was originally reported by a scanpy user here: scverse/scanpy#632.
Scanpy has just removed a frozen version of the umap
library we'd been using (PR: scverse/scanpy#576). The current version of umap doesn't support a bandwidth parameter, so now compute_connectivities_umap
doesn't either. It looks like this is causing an issue with these lines, where bandwidth is explicitly passed:
Lines 272 to 274 in 93f25dc
Sorry about the break with so little notice!
Dear BBKNN team,
I am using BBKNN in R as indicated in this github page and I am wondering how I could, for instance, export BBKNN results to perform UMAP/clustering/trajectory analysis with some customized scripts in R.
This is not an issue with the software at all, but I could not find the "batch-corrected data" to export from the anndata object (I am aware that the algorithm does not change the data matrix), but then which data I could use as input, for instance, to run umap with the umap R package?
Thank you in advance!
Hi @Teichlab,
I want to use bbknn in R. But an error come out as follow:
`
bbknn = import(module = 'bbknn')
sc = import("scanpy",convert=FALSE)
np = import("numpy")
scipy = import("scipy")
b <- brca[[1]]
pca.input <- b@reductions$[email protected]
batches <- [email protected]$sample
adata <- anndata$AnnData(X = pca.input, obs = batches)
sc$tl$pca(adata)
None
adata$obsm$X_pca <- r_to_py(pca.input)
bbknn$bbknn(adata, batch_key = 0)
*** caught illegal operation ***
address 0x2b6db2dc781d, cause 'illegal operand'
Traceback:
1: py_call_impl(callable, dots$args, dots$keywords)
2: bbknn$bbknn(adata, batch_key = 0)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection:
`
Could you please help me figure out this problem?
Thanks a lot
Best
Zhaohui Ruan
A ModuleNotFoundError raised for me when I was trying bbknn.bbknn(adata)
function, at ./bbknn/__init__.py
line 262, where you wrote
start = logg.info(...)
I know that this logg
came from line 10
from scanpy import logging as logg
Here you are using a "try/except" syntax, such that you are allowing the nonexistence of scanpy
. But in the core function (line 262) it seems still mandatory.
Simply installing scanpy
has already fixed it, I am just writing this down in case anyone else new to all these pipelines encounters a similar issue.
Dear,
I am unfamiliar with graph theory. Why do you convet the neighbour distance collections to exponentially related connectivities ? How to assign weights to the edges ? Does BBKNN construct the connectivity graph with Jaccard index (which is used in Seuart and Scanpy for louvain clustering)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.