Giter Site home page Giter Site logo

gao-lab / cell_blast Goto Github PK

View Code? Open in Web Editor NEW
82.0 82.0 12.0 33.24 MB

A BLAST-like toolkit for large-scale scRNA-seq data querying and annotation.

Home Page: http://cblast.gao-lab.org

License: MIT License

Python 100.00%
bioinformatics deep-learning single-cell single-cell-rna-seq

cell_blast's People

Contributors

jeff1995 avatar szhorvat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cell_blast's Issues

The tensorflow showing error with no session attribute

Hi Developer,
The cell-blast had been installed in conda environment with python3.9 and then install the tensorflow with pip by default version:
tensorflow 2.9.1
tensorflow-estimator 2.9.0

when i run the cell-blast workflow, the tensorflow trigger below error:
from . import module, nn, utils
File "/share/apps/virtualEnv/miniconda3-py38/envs/CellBlast-CellAnno/lib/python3.9/site-packages/Cell_BLAST/module.py", line 15, in
class Module(object):
File "/share/apps/virtualEnv/miniconda3-py38/envs/CellBlast-CellAnno/lib/python3.9/site-packages/Cell_BLAST/module.py", line 32, in Module
def _save_weights(self, sess: tf.Session, path: str) -> None:
AttributeError: module 'tensorflow' has no attribute 'Session'

how can i fix this issue?
Any advice would be appreciated
hanhuihong

Tensorflow version 2.0.0 question

i run import Cell_BLAST as cb and these feedback:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/__init__.py", line 11, in <module> from . import (blast, config, data, directi, latent, metrics, prob, rmbatch, File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py", line 19, in <module> from . import config, data, directi, metrics, utils File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/directi.py", line 16, in <module> from . import config, data, latent, model, prob, rmbatch, utils File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/latent.py", line 13, in <module> from . import module, nn, utils File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/module.py", line 15, in <module> class Module(object): File "/usr/local/lib/python3.6/dist-packages/Cell_BLAST/module.py", line 32, in Module def _save_weights(self, sess: tf.Session, path: str) -> None: AttributeError: module 'tensorflow' has no attribute 'Session'
How to fix these problems?

how to get h5 files ?

The author gave examples, and the input is "data.h5", I wonder how can you get h5 file like you said with the original expression profile?

Could we use immgen datasets as a reference?

Dear Developers?
Thank you for you CellBlast.
I recently used SingleR and found the built-in ImmGen reference very useful. Could we use that dataset in CellBlast?

Thank you for your response.

Possible wasserstein_distance solution

Hi Cellblast Team
I'm not a python expert
I have a problem with wasserstein_distance

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function wasserstein_distance at 0x7f4491cae700>) found for signature:

wasserstein_distance(array(float64, 1d, C), array(float64, 1d, C))

There are 2 candidate implementations:
- Of which 2 did not match due to:
Overload of function 'wasserstein_distance': File: Cell_BLAST/blast.py: Line 48.
With argument(s): '(array(float64, 1d, C), array(float64, 1d, C))':
No match.

During: resolving callee type: Function(<function wasserstein_distance at 0x7f4491cae700>)
During: typing of call at /home/yjc/anaconda3/envs/cellblast/lib/python3.9/site-packages/Cell_BLAST/blast.py (229)

File "../../../anaconda3/envs/cellblast/lib/python3.9/site-packages/Cell_BLAST/blast.py", line 229:
def npd_v1(

return 0.5 * (
scipy.stats.wasserstein_distance(
^

I ask chatgpt for help and it responds

Given the code you've provided, I can see that you're trying to use the scipy.stats.wasserstein_distance function within a function that is being compiled by Numba with nopython=True. As I explained earlier, Numba doesn't 't support all Python libraries or functions in nopython mode. scipy.stats.wasserstein_distance is one of those unsupported functions.

To resolve this, you will need to create your own implementation of the wasserstein_distance function that can be compiled by Numba. The Wasserstein distance, also known as the earth mover's distance, can be computed using the cumulative distributions of two different distributions .

Here is a basic implementation of the wasserstein_distance function that should work with Numba:

@numba.njit
def wasserstein_distance(u_values, v_values):
     u_values_sorted = np. sort(u_values)
     v_values_sorted = np.sort(v_values)
     return np.mean(np.abs(np.cumsum(u_values_sorted) - np.cumsum(v_values_sorted)))

Then, replace the scipy.stats.wasserstein_distance calls in your npd_v1 function with this new wasserstein_distance function:

@numba.jit(nopython=True, nogil=True, cache=True)
def npd_v1(
     x: np.ndarray,
     y: np.ndarray,
     x_posterior: np.ndarray,
     y_posterior: np.ndarray,
     eps: float = 0.0,
) -> np.ndarray:
     ...
     return 0.5 * (
         wasserstein_distance(
             xy_posterior1[: len(x_posterior)], xy_posterior1[-len(y_posterior) :]
         )
         + wasserstein_distance(
             xy_posterior2[: len(x_posterior)], xy_posterior2[-len(y_posterior) :]
         )
     )

Fortunately, I managed to run through this code, I want to check with more professional you whether it is correct. I'd be happy if this is correct and helps you

snakemake CPU question

the snakemake runs no other problem ,but in this step run_ZIFA.py,it took 100%CPU occupied.
timeout 12h python -u run_ZIFA.py -i ../Datasets/data/Bach/data.h5 -o ../Results/ZIFA/Bach/dim_20/seed_15/result.h5 -g seurat_genes -d 20 -s 15 --clean cell_ontology_class > ../Results/ZIFA/Bach/dim_20/seed_15/log.txt 2>&1
few hours no output,is it something wrong?

Update the reference dataset.

Hello, will you update the reference dataset?
For example, the brain of fruit flies.
Thank you for this useful tool, thank you.

how to use

Hi,
Thanks for developing such a good tool. I want to try to use it, but I can not open the website https://cblast.gao-lab.org/download . And I can not find the Tutorial on how to use this tool. Please help me.
Thanks.

Error in "wasserstein_distance" function

Hi CellBLAST team!
I'm new at using your python package and I encountered some problems. I've download the "Chen" reference panel from your website to perform a first test and I've performed the following steps in the python interpreter:

>>> import numpy as np
>>> import pandas as pd
>>> import tensorflow as tf
>>> import Cell_BLAST as cb
>>> reference = cb.data.ExprDataSet.read_dataset("/home/biobam/Downloads/Chen.h5")
>>> models = []
>>> for i in range(4):
>>>     models.append(cb.directi.fit_DIRECTi(reference, random_seed = i))
>>> blastdb = cb.blast.BLAST(models, reference)

And the last step raises this error, which I don't know how to solve:

>>> blastdb = cb.blast.BLAST(models, reference)
[INFO] Cell BLAST: Projecting to latent space...
[INFO] Cell BLAST: Fitting nearest neighbor trees...
[INFO] Cell BLAST: Sampling from posteriors...
[INFO] Cell BLAST: Generating empirical null distributions...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 473, in __init__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 615, in _force_components
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 602, in _get_empirical
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 1041, in __call__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 859, in dispatch_one_batch
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 777, in _dispatch
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 208, in apply_async
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 572, in __init__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in __call__
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/joblib/parallel.py", line 263, in <listcomp>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 414, in _compile_for_args
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/dispatcher.py", line 357, in error_rewrite
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function wasserstein_distance at 0x7f91740bbd90>) found for signature:
 
 >>> wasserstein_distance(array(float32, 1d, C), array(float32, 1d, C))
 
There are 2 candidate implementations:
   - Of which 2 did not match due to:
   Overload in function '_wasserstein_distance': File: Cell_BLAST/blast.py: Line 0.
     With argument(s): '(array(float32, 1d, C), array(float32, 1d, C))':
    Rejected as the implementation raised a specific error:
      RuntimeError: cannot cache function '_wasserstein_distance_impl': no locator available for file '/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py'
  raised from /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/numba/core/caching.py:352

During: resolving callee type: Function(<function wasserstein_distance at 0x7f91740bbd90>)
During: typing of call at /home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py (209)

File "anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/blast.py", line 209:
<source missing, REPL/exec in use?>

I'm running the python interpreter on the conda environment created for CellBLAST following the instructions in the installation guide:

(cb) biobam@biobam-500-526ns:~$ python3
Python 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56) 
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Thank you in advance!

decoder

Hi,

Awesome tool! One thing I was having trouble figure out was how to get reconstructed per-gene expression values using a trained model. For other NN models I'd typically do this with predictions = model.predict(some_data), but I can't seem to find the equivalent function for a trained Cell BLAST model. My goal is to use the model as a way of denoising and batch-correcting the data and returning it in high-dimensional space.

Many thanks,
Brian

how to build my own references

Hi, Thank you for releasing such a good tool for annotation, and I have some questions about building a personal reference. I read your script under /Datasets/collect, and found that some data need annotation files (eg.collect_macparland.py), some other doesn't (eg. collect_chen.R, and in this script ,the input is a rds file, which is confusing because I don't know the exact structure of this data). If you don't mind, would you please just release the "download" directory or some data in this directory? Beacause they are the input of BLAST.ipynb and DIRECTi.ipynb.

The demo dataset to re-run the example

Hi Dr.
Thanks to develop the great tools.
I want to use your data to test the pipeline for further analysis myself data. but i can not get the data example -../../Datasets/data/Lawlor/data.h5-you show in the vignette's page: https://cblast.gao-lab.org/doc-latest/_static/BLAST.html.
Even though, you demonstration already very well for me, but in few step i can not how to use my data to run the cell_blast in my data myself, without firstly running the whole pipeline that you show in the webpage.
I hope the dataset "data/Lawlor/data.h5" can be download for me to re-run again the cell_blast pipeline, if you could.
Best,
hanhuihong

blast.save() TypeError: Object of type 'bytes' is not JSON serializable

Hi, thanks for building this great tool!

I have successfully computed 5 DIRECTi models and created a blast object. However, when I tried to save it I received the following error messages:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-1abd49d75b19> in <module>
----> 1 blast.save("../data/NEMO_ARCHIVE/Yao_mouse_MOp_Miniatlas/SMARTer_cells_MOp/outs/yao_MOp_blast")
      2 
      3 # This loads the blast object directly to memory
      4 # blast = cb.blast.BLAST.load("../data/Baron_2016_Pancreas/baron_human_blast")

~/miniconda3/envs/cellblast/lib/python3.6/site-packages/Cell_BLAST/blast.py in save(self, path, only_used_genes)
    690             ref.write_dataset(os.path.join(path, "ref.h5"))
    691         for i in range(len(self)):
--> 692             self.models[i].save(os.path.join(path, f"model_{i}"))
    693 
    694     @classmethod

~/miniconda3/envs/cellblast/lib/python3.6/site-packages/Cell_BLAST/model.py in save(self, path, config, weights)
    283         elif not os.path.exists(path):
    284             os.makedirs(path)
--> 285         self._save_config(os.path.join(path, config))
    286         self._save_weights(os.path.join(path, weights))
    287 

~/miniconda3/envs/cellblast/lib/python3.6/site-packages/Cell_BLAST/model.py in _save_config(self, file)
    228     def _save_config(self, file: str) -> None:
    229         with open(file, "w") as f:
--> 230             json.dump(self._get_config(), f, indent=4)
    231 
    232     @classmethod

~/miniconda3/envs/cellblast/lib/python3.6/json/__init__.py in dump(obj, fp, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    177     # could accelerate with writelines in some versions of Python, at
    178     # a debuggability cost
--> 179     for chunk in iterable:
    180         fp.write(chunk)
    181 

~/miniconda3/envs/cellblast/lib/python3.6/json/encoder.py in _iterencode(o, _current_indent_level)
    428             yield from _iterencode_list(o, _current_indent_level)
    429         elif isinstance(o, dict):
--> 430             yield from _iterencode_dict(o, _current_indent_level)
    431         else:
    432             if markers is not None:

~/miniconda3/envs/cellblast/lib/python3.6/json/encoder.py in _iterencode_dict(dct, _current_indent_level)
    402                 else:
    403                     chunks = _iterencode(value, _current_indent_level)
--> 404                 yield from chunks
    405         if newline_indent is not None:
    406             _current_indent_level -= 1

~/miniconda3/envs/cellblast/lib/python3.6/json/encoder.py in _iterencode_list(lst, _current_indent_level)
    323                 else:
    324                     chunks = _iterencode(value, _current_indent_level)
--> 325                 yield from chunks
    326         if newline_indent is not None:
    327             _current_indent_level -= 1

~/miniconda3/envs/cellblast/lib/python3.6/json/encoder.py in _iterencode(o, _current_indent_level)
    435                     raise ValueError("Circular reference detected")
    436                 markers[markerid] = o
--> 437             o = _default(o)
    438             yield from _iterencode(o, _current_indent_level)
    439             if markers is not None:

~/miniconda3/envs/cellblast/lib/python3.6/json/encoder.py in default(self, o)
    178         """
    179         raise TypeError("Object of type '%s' is not JSON serializable" %
--> 180                         o.__class__.__name__)
    181 
    182     def encode(self, o):

TypeError: Object of type 'bytes' is not JSON serializable

If I BLAST query cells to the blast object, it works properly. So this seems to be a bug related to the blast.save() itself.

Best,
Ray

optimizer = tf.train.__dict__[optimizer] in _compile KeyError: 'RMSPropOptimizer'

Hi,
I installed everything following your recommended. Got the error when running the following command.

model = cb.directi.fit_DIRECTi(
baron_human, baron_human.uns["seurat_genes"],
latent_dim=10, cat_dim=20, epoch=50,
path="./baron_human_model"
)

error message:
File "/Users/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/directi.py", line 234, in _compile
self.step = tf.train.dictoptimizer.minimize(

KeyError: 'RMSPropOptimizer'

Maybe there is a problem in tf.train.dict[optimizer]. Temporally, I just use optimizer = tf.train.RMSPropOptimizer(lr) to bypass this issue.

Thanks.

Error in "cb.directi.fit_DIRECTi()" function

Hi CellBLAST team!

I want to annotate a very low scRNA-seq dataset with CellBlast, which is only about 60 cells in it. I encountered an error when run cb.directi.fit_DIRECTi() function. Here are the error details.
image
Is it because the number of cells in the dataset is too small? I can train the model well with demo dataset and other larger dataset.

Confusion about predicted cell type with my own reference

Hi CellBLAST team!

I'm using CellBlast to build my own reference and train models. But I got many predicted cell types as "rejected" when querying test dataset. But I can querying well when I use another dataset to build reference. What might be the cause?

KeyError: "Unable to open object (object 'obs' doesn't exist)"

Hi, it's me again!
I was able to construct the database, but now I'm having some issues with my input file.

Originally, my input file was in csv format so I converted it to h5 using the h5write() function from rhdf5 R library as follows:

library(rhdf5)
cells <- read.csv(file = "/home/biobam/Downloads/tabula_muris_dataset/Brain_Myeloid-counts.csv", header = TRUE, row.names = 1, quote = "")
cells <- as.matrix(cells)
h5write(cells, "/home/biobam/Downloads/brain_cells.h5", "brain")

But when I tried to read it, it rose the following error:

>>> cells  = cb.data.ExprDataSet.read_dataset("/home/biobam/Downloads/cell_blast_test/brain_cells.h5")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/Cell_BLAST/data.py", line 553, in read_dataset
    dict_from_group(f["obs"]),
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "/home/biobam/anaconda3/envs/cb/lib/python3.6/site-packages/h5py/_hl/group.py", line 288, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5o.pyx", line 190, in h5py.h5o.open
KeyError: "Unable to open object (object 'obs' doesn't exist)"

Is there a problem with my input file? Here you have a link to a Drive folder with the original csv and converted h5 files.

Thanks in advance!
Marta.

Interspecies integration

Hello,

Thanks again for such a great tool!

I'm currently trying to use Cell BLAST to integrate some pretty diverse datasets; several mouse scRNAseq atlases, a zebrafish dataset, and fly dataset (all of which are mostly from central nervous system).

I'm struggling to find a tool that's able to handle this amount of diversity, and have had variable success with Cell BLAST. Here's some steps I've taken:

  1. Convert all gene symbols to their human orthologues using homologene.
  2. Discard any genes that do not have 1:1 orthologues in humans. This leaves ~16,000 genes in mouse, ~10,000 genes in zebrafish, and ~1,000 genes in fly.
  3. Select genes using find_variable_genes() after reducing min_group_frac=, since the default 0.5 only returns ~60 genes (which doesn't seem like it would be enough info to integrate the datasets well). I've played around with this parameter and run DIRECTi with anywhere from 60 to 400 to 4,000 to all genes.
  4. Train the DIRECTi model on the raw expression data from all these datasets at once, while controlling for study and species as batch effects."
  5. Reduce the latent factors into 2D UMAP space (tried both using the visualize_latent() and manually running UMAP).
  6. In all cases, the datasets didn't seem to integrate very well, in the sense that I expect neurons to cluster with neurons, oligodendrocytes to cluster with oligodendrocytes (regardless of the species or study they're from). Instead, they just cluster by the dataset they're from (even with the same species).

Screenshot 2021-04-23 at 22 30 55

var_genes_study, axes_study = combined_dataset.find_variable_genes(grouping="study", min_group_frac=.25)

model = cb.directi.fit_DIRECTi(combined_dataset, 
                               genes = var_genes_study, 
                               latent_dim=10, cat_dim=20, 
                               epoch=50,
                               batch_effect = ["study","species"],
                               path = model_dir
)

Is there anything you can see that I might be doing wrong, or do you have any recommendations to improve the integration in this case? I've been finding that most tools have trouble with integrating data from species this divergent, probably in part due to the fact that most genes are 0s for some species. I've also tried using gene intersections, but this only leaves ~400 genes across mouse + zebrafish + fly, which doesn't seem to be enough to differentiate cell-types (and certainly not sub-types).

Thanks so much in advance,
Brian

snakemake question help

when I Reproduce results as th page said by ran snakemake -prk
and iget an error like this:
Traceback (most recent call last):
File "/root/test_cb_repro/Cell_BLAST/Evaluation/.snakemake/scripts/tmpjl7pf7be.dimension_reduction_metrics.py", line 54, in
main()
File "/root/test_cb_repro/Cell_BLAST/Evaluation/.snakemake/scripts/tmpjl7pf7be.dimension_reduction_metrics.py", line 36, in main
y = y[~utils.na_mask(y)]
AttributeError: module 'utils' has no attribute 'na_mask'
How to fix and rerun?

Cross-Species Celltype Annotation

Hi, Cell BLAST team!

Recently, I wanted to use your method to perform the Cross-species cell type annotation tasks. However, I cannot see any cross-species cell type annotation examples on the Cell BLAST tutorial page.

So, I want to know if there are any exps I can do quickly to start the cross-species cell type annotation task by using your great method.

h5 file generation problem

1.Why appears such an error when generating h5 file:
"MY_ERROR: Error in CreateSeuratObject (raw.data = object @ exprs, meta.data = object @ obs): The parameter is not useful (raw.data = object @ exprs) \ n".
I used the same data and R script in the collect part of your GitHub to run it. Seurat is also installed;
2、Is cell_ontology necessary?
thank you!

reconcile_models() problems

hi @Jeff1995 , I run the code
data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05)
and get error below:
IndexError Traceback (most recent call last)
in
----> 1 data_obj2_hits = data_obj2_hits.reconcile_models().filter(by="pval", cutoff=0.05)

/usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in reconcile_models(self, dist_method, pval_method)
996 """
997 dist_method = self._get_reconcile_method(dist_method)
--> 998 dist = [dist_method(item, axis=1) for item in self.dist]
999 pval_method = self._get_reconcile_method(pval_method)
1000 pval = [pval_method(item, axis=1) for item in self.pval]

/usr/local/lib/python3.6/dist-packages/Cell_BLAST/blast.py in (.0)
996 """
997 dist_method = self._get_reconcile_method(dist_method)
--> 998 dist = [dist_method(item, axis=1) for item in self.dist]
999 pval_method = self._get_reconcile_method(pval_method)
1000 pval = [pval_method(item, axis=1) for item in self.pval]

<array_function internals> in mean(*args, **kwargs)

/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py in mean(a, axis, dtype, out, keepdims)
3255
3256 return _methods._mean(a, axis=axis, dtype=dtype,
-> 3257 out=out, **kwargs)
3258
3259

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _mean(a, axis, dtype, out, keepdims)
136
137 is_float16_result = False
--> 138 rcount = _count_reduce_items(arr, axis)
139 # Make this warning show up first
140 if rcount == 0:

/usr/local/lib/python3.6/dist-packages/numpy/core/_methods.py in _count_reduce_items(arr, axis)
55 items = 1
56 for ax in axis:
---> 57 items *= arr.shape[ax]
58 return items
59

IndexError: tuple index out of range

no idea how to fix

Building models with subsets

Hi!
When I try to build a model with a subset of the HDF5 files provided by Cell Blast, I find that DIRECTI seems to be stuck. I think both Campbell and Campbell_subset can be used to build models.

What I have to say include:

  1. I will use list or convert the list to array.ndarray (Campbell_subset.uns['expressed_genes1']), and later building the model for campbell_subset is unsuccessful.
  2. But the same method can successfully model Campbell.

Here is my code:
Campbell = cb.data.ExprDataSet.read_dataset("Campbell.h5")
list=['Ace2','Adora2a','Aldh1l1','Amigo2','Ano3','Aqp4'] #These genes are all contained in the expressed genes, and I've only listed a few
Campbell_subset = Campbell[:, list]

%%capture
startblast=time.time()
models = []
for i in range(4):
models.append(cb.directi.fit_DIRECTi(
Campbell_subset,Campbell_subset.uns['expressed_genes1'], latent_dim=10, cat_dim=20,
random_seed=i, path="Campbell_subset_blast_models2/model_%d" % i
))

%%capture
start_time=time.time()
models = []
for i in range(4):
models.append(cb.directi.fit_DIRECTi(
Campbell,Campbell.uns['expressed_genes1'], latent_dim=10, cat_dim=20,
random_seed=i, path="Campbell_blast_models2/model_%d" % i
))

Sincerely waiting for your reply!
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.