basilkhuder / seurat-to-rna-velocity Goto Github PK

View Code? Open in Web Editor NEW

125.0 6.0 26.0 1018 KB

A guide to using a Seurat object in conjunction with RNA Velocity

seurat integrating-loom kallisto bustools kb-python rna-velocity anndata velocyto scvelo kallisto-bustools

seurat-to-rna-velocity's Introduction

Seurat to RNA-Velocity

By Basil Khuder

Introduction
Generating Loom files
- Kallisto Bustools
  - Installation
  - Usage
- Velocyto
  - Installation
  - Usage
Extracting Meta-data From a Seurat Object
Integrating Loom File and Meta-data
- Multiple-Sample Integration
Running RNA Velocity
FAQ

Introduction

This guide will demonstrate how to use a processed/normalized Seurat object in conjunction with an RNA Velocity analysis. Keep in mind that although Seurat is R-based, all of the available RNA Velocity software/packages are Python, so we will be moving back and forth between the two. We will be using the following programs:

scVelo (For RNA Velocity)
Velocyto or Kallisto Bustools (To produce our initial RNA Velocity Object)
Anndata (For manipulation of our RNA Velocity object)
Seurat
Samtools -- optional (Velocyto will run Samtools sort on unsorted .bam)

Generating Loom files

To start, we will be generating loom files (a file format designed for genomics datasets such as single-cell) for every single-cell sample you used in your Seurat analysis. A loom file is different from the file format you used in Seurat. A loom file has to be generated from the original FASTQ or BAM files for your samples(s).

The two methods I will discuss for doing this will be Velocyto's run command and Kallisto Bustools (KB.)

Kallisto Bustools

Installation

pip install git+https://github.com/pachterlab/kb_python@devel

Usage

Using KB is a two-step process with you first creating a reference and then generating the counts table. You'll need FASTA and GTF files for your species (download them from Ensembl if you need them: https://useast.ensembl.org/index.html)

The -g, -f1, -f2, c1, -c2 are all files that will be generated from the kb ref step (you don't need to provide them, just give them names. )

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 \
fasta.fa \
gtf.gtf

We'll then use kb count to generate our LOOM file. -x specifies the single-cell technology (use kb --list for all technology parameters) and --lamanno specifies we want to calculate RNA velocity.

kb count -i transcriptome.idx -g t2g.txt -x 10xv2 --workflow lamanno --loom -c1 cdna_t2c.txt -c2 intron_t2c.txt read_1.fastq.gz read_2.fastq.gz

Velocyto

Installation

Velocyto can be installed through pip.

#Download dependencies first
conda install numpy scipy cython numba matplotlib scikit-learn h5py click
pip install velocyto

Usage

Note that I have found it easier to use velocyto run for whichever scRNA-seq chemistry you are working with rather than Velocyto's "ready-to-use subcommands."

velocyto run -b filtered_barcodes.tsv -o output_path -m repeat_msk_srt.gtf bam_file.bam annotation.gtf

If time or memory is limited, you can disregard the -m parameter. Just note that this could contribute a downstream confounding factor later on in the analysis.

Once this step has finished and your loom file is generated, we can go ahead and use anndata to import our loom file and make the necessary adjustments/additions.

But, before we do that, we will export all of the necessary meta-data from our Seurat object that is needed for our RNA Velocity object. This includes:

Filtered Cell Ids
UMAP or TSNE coordinates
Clusters (Optional)
Cluster Colors (Optional)

Extracting Meta-data

I've put the meta-extraction steps in this Google Collab document if you're interested in playing around with the code.

One way we can access our filtered cell id's is through Seurat's Cells function:

write.csv(Cells(seurat_object), file = "cellID_obs.csv", row.names = FALSE)

If you have a Seurat object that is composed of multiple single-cell samples, you either can use the code above, and then later some type of pattern to extract each sample (for example, if you added unique cell prefixes to each sample then you could use that pattern.) Likewise, you can also create a cell ID observation file for every sample, and use each one individually to filter each RNA Velocity object.

To get UMAP or TSNE coordinates, we use the Embeddings function:

write.csv(Embeddings(seurat_object, reduction = "umap"), file = "cell_embeddings.csv")

And finally we can extract our clusters with:

write.csv(seurat_object@meta.data$seurat_clusters, file = "clusters.csv")

Integrating Loom File and Meta-data

We now can import our loom file(s) and all of our Seurat meta-data using anndata

import anndata
import scvelo as scv
import pandas as pd
import numpy as np
import matplotlib as plt
%load_ext rpy2.ipython

sample_one = anndata.read_loom("sample_one.loom")
.... 
sample_n = anndata.read_loom("sample_n.loom")

sample_obs = pd.read_csv("cellID_obs.csv")
umap_cord = pd.read_csv("cell_embeddings.csv")
cell_clusters = pd.read_csv("clusters_obs.csv")

With our extracted Cell IDs from Seurat, we'll need to filter our uploaded loom (now as an anndata object) based upon them.

sample_one = sample_one[np.isin(sample_one.obs.index,sample_obs["x"])]

Multiple-Sample Integration

If you have individual observation files for every sample, you'll do the filtering above one by one. If you have a combined observation file, you'll want to filter it based upon the cell pattern and then use that to filter the RNA Velocity sample. For example, if these were your Cell IDs:

Sample Cell IDs
sample1_ACTCACT
sample1_ACTCCAC
.....
sample2_CACACTG

You could use the pattern sample1_, sample2_, to filter as such:

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contains("sample1_")]
cellID_obs_sample_two = cellID_obs[cellID_obs_sample_two[0].str.contains("sample2_")]
sample_one = sample_one[np.isin(sample_one.obs.index, cellID_obs_sample_one)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]

Once all the samples have been properly filtered, we can merge them into one.

sample_one = sample_one.concatenate(sample_two, sample_three, sample_four)

Now that we have our Velocity file filtered based upon our Seurat object, we can go ahead and add UMAP coordinates. We'll first upload them:

umap = pd.read_csv("umap.csv")

With the coordinates, we will need to make sure we add them so they match the order of the Cell IDs in our anndata object. Our Cell IDs are rownames in the observation layer of our object, so we can view them by using the following:

sample_one.obs.index

Let's cast our index as a data frame and change the column name

sample_one_index = pd.DataFrame(sample_one.obs.index)
sample_one_index = sample_one_index.rename(columns = {0:'Cell ID'})

Let's also change the first column of our UMAP data frame to the same name:

umap = umap.rename(columns = {'Unnamed: 0':'Cell ID'})

Now if we merge our index dataframe with our UMAP, the order will match our anndata object.

umap_ordered = sample_one_index.merge(umap, on = "Cell ID")

Since we're certain the orders are the same, we can remove the first column of the data frame and add the UMAP coordinates to our anndata object.

umap_ordered = umap_ordered.iloc[:,1:]
sample_one.obsm['X_umap'] = umap_ordered.values

Clusters and their cluster colors can be added in the same fashion (and again, they must match the order of the Cell IDs.) Instead of adding them as an multidimensional observation ('obsm'), we'd add them under the unstructured annotation 'uns.'

sample_one.uns['Cluster_colors']

Running RNA Velocity

At this point, we can now run the scVelo commands and generate our RNA Velocity plot based upon our Seurat UMAP coordinates.

scv.pp.filter_and_normalize(sample_one)
scv.pp.moments(sample_one)
scv.tl.velocity(sample_one, mode = "stochastic")
scv.tl.velocity_graph(sample_one)
scv.pl.velocity_embedding(sample_one, basis = 'umap')

If you want to incorporate your clusters and cluster colors in the embedding plot, the parameters you would add would be:

color = sample_one.uns['Cluster_colors']

FAQ

For Kallisto Bustools, why should I generate a loom file rather than an h5ad file?
Either one is fine. I recommended generating a loom file just in case the user wanted to use Velocyto for their RNA velocity, but the h5ad format makes more sense if you're using scVelo (the Anndata format is an extension of h5ad.)

seurat-to-rna-velocity's People

Contributors

Stargazers

Watchers

seurat-to-rna-velocity's Issues

Failed to incorprate colors with "color = sample_one.uns['Cluster_colors']"

Thanks very much for the helpful tutorial, I succeeded in incorporating UMAP coordinates, but failed with Cluster color. My AnnData object looks like:
AnnData object with n_obs × n_vars = 6667 × 1496
obs: 'orig.ident', 'nCount_spliced', 'nFeature_spliced', 'nCount_unspliced', 'nFeature_unspliced', 'nCount_ambiguous', 'nFeature_ambiguous', 'nCount_RNA', 'nFeature_RNA', 'nCount_SCT', 'nFeature_SCT', 'SCT_snn_res.0.8', 'seurat_clusters', 'initial_size_unspliced', 'initial_size_spliced', 'initial_size', 'n_counts', 'velocity_self_transition'
var: 'features', 'ambiguous_features', 'spliced_features', 'unspliced_features', 'velocity_gamma', 'velocity_r2', 'velocity_genes'
uns: 'pca', 'neighbors', 'velocity_settings', 'velocity_graph', 'velocity_graph_neg', 'seurat_clusters_colors'
obsm: 'X_umap', 'X_pca', 'velocity_umap'
varm: 'PCs'
layers: 'ambiguous', 'spliced', 'unspliced', 'Ms', 'Mu', 'velocity', 'variance_velocity'

When I use "scv.pl.velocity_embedding_stream(adata, basis="umap", color=adata.uns['seurat_clusters_colors'])";

I got a series of UMAP plots colored with seurat_clusters_colors, but not the expected seurat_clusters_colors for each cell cluster.
I am new to python, and I do appreciate any helpful suggestions.

high unspliced counts by velocyto

Hi,
Thank you for your incredible guideline. I had a problem making loom file by velocyto. Here is the code I use:

velocyto run -b filtered_barcodes.tsv -o velocyto_output -m mm10_repeats_repeatMasker.gtf possorted_genome_bam.bam refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf

I can successfully generate the loom file but the unspliced RNA is very high (unspliced:spliced=0.92:0.08). This is not expected since the dataset is generated from 10X 3' v3.1 single-cell library. Do you have any idea why I am getting this ratio?

I wonder if I used the wrong input for velocyto. Here are what I use:

barcodes.tsv: unzipped file from CellRanger outs>filtered_feature_bc_matrix>barcodes.gz
repeatMasker.gtf: downloaded from USCS table browser
bam file: file from CellRanger outs>possorted_genome_bam.bam
genes.gtf: mm10 mouse reference dataset provided by 10x

Any comment will be appreciated!

ValueError: cannot reindex from a duplicate axis when merge.

@basilkhuder
Hi, Basil

it's perfect work, but i was confused with one inssue as following,

Dataprocess:
sample_obs = pd.read_csv("cellID.csv")
#I extracted Cell IDs from Seurat, and filtered uploaded loom.
sample_one = sample_one[np.isin(sample_one.obs.index, sample_obs["x"])]
sample_two= sample_two[np.isin(sample_two.obs.index, sample_obs["x"])]

#then, we merged them into one, but get the error,

sample_one = sample_one.concatenate(sample_two, sample_three, sample_four)
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1696, in concatenate
out = concat(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/anndata/_core/merge.py", line 814, in concat
alt_annot = merge_dataframes(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/anndata/_core/merge.py", line 526, in merge_dataframes
dfs = [df.reindex(index=new_index) for df in dfs]
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/anndata/_core/merge.py", line 526, in
dfs = [df.reindex(index=new_index) for df in dfs]
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/util/_decorators.py", line 312, in wrapper
return func(*args, **kwargs)
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/frame.py", line 4173, in reindex
return super().reindex(**kwargs)
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/generic.py", line 4808, in reindex
return self._reindex_axes(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/frame.py", line 4019, in _reindex_axes
frame = frame._reindex_index(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/frame.py", line 4038, in _reindex_index
return self._reindex_with_indexers(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 1301, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/usr/local/anaconda3/envs/epigenetic/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3476, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

So, how can I fix this problem?
Thank you for your generous help!

general question embedding projection

I generated my embedding using Seurat, utilizing the raw count matrix rather than the spliced matrix. Following your instructions to project velocities onto this UMAP embedding, the process involves recalculating new PCs and neighbors, using different genes compared to those initially employed to create the embedding. is this still valid?

Additionally, I have data from 8 distinct donors/batches. How should I handle this aspect in the analysis?

Error: /lib64/libm.so.6: version `GLI BC_2.29' not found

Hi, When I am running kb ref command I am getting GLI BC_2.29 not found error message. Please see my command and the log contents below. I appreciate your help in this matter.

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 hg19/inputs/GRCh37.primary_assembly.genome.fa hg19/inputs/gencode.primary_assembly.annotation.gtf

[2021-09-22 16:38:17,436]    INFO [ref_lamanno] Concatenating cDNA and intron FASTAs to /crex/proj/snic2021-22-49/nobackup/Egle/RNA_velocity/scvelo/JMJ8/tmp/tmpq1p7cltf
[2021-09-22 16:38:53,325]    INFO [ref_lamanno] Creating transcript-to-gene mapping at t2g.txt
[2021-09-22 16:39:38,863]    INFO [ref_lamanno] Indexing cdna.fa to index.idx_cdna
[2021-09-22 16:39:39,981]   ERROR [ref_lamanno] /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto)
/home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto)
/home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto: /lib64/libstdc++.so.6: version `CXXABI_1.3.8' not found (required by /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto)
/home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto)
/home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.22' not found (required by /home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto)
[2021-09-22 16:39:39,982]   ERROR [main] An exception occurred
Traceback (most recent call last):
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/main.py", line 1326, in main
    COMMAND_TO_FUNCTION[args.command](parser, args, temp_dir=temp_dir)
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/main.py", line 222, in parse_ref
    ref_lamanno(
  File "/home/indranil/.local/lib/python3.9/site-packages/ngs_tools/logging.py", line 62, in inner
    return func(*args, **kwargs)
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/ref.py", line 748, in ref_lamanno
    cdna_index_result = kallisto_index(
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/ref.py", line 239, in kallisto_index
    run_executable(command)
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/dry/__init__.py", line 25, in inner
    return func(*args, **kwargs)
  File "/home/indranil/.local/lib/python3.9/site-packages/kb_python/utils.py", line 203, in run_executable
    raise sp.CalledProcessError(p.returncode, ' '.join(command))
subprocess.CalledProcessError: Command '/home/indranil/.local/lib/python3.9/site-packages/kb_python/bins/linux/kallisto/kallisto index -i index.idx_cdna -k 31 cdna.fa' returned non-zero exit status 1.

Fail to install velocyto

Hi,

I have trouble installing velocyto. See the screenshot below. Any idea how to fix that? I have a Macbook with A1 chip.

Thanks!

pip install velocyto
Collecting velocyto
Downloading velocyto-0.17.17.tar.gz (198 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 198.9/198.9 kB 2.8 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (1.23.5)
Requirement already satisfied: scipy in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (1.7.1)
Requirement already satisfied: cython in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (0.29.33)
Requirement already satisfied: numba in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (0.56.4)
Requirement already satisfied: matplotlib in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (3.7.0)
Requirement already satisfied: scikit-learn in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (1.2.1)
Requirement already satisfied: h5py in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (3.7.0)
Collecting loompy
Downloading loompy-3.0.7.tar.gz (4.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.8/4.8 MB 9.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting pysam
Downloading pysam-0.20.0-cp39-cp39-macosx_10_9_x86_64.whl (3.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 10.3 MB/s eta 0:00:00
Requirement already satisfied: Click in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from velocyto) (8.0.4)
Collecting pandas
Downloading pandas-1.5.3-cp39-cp39-macosx_10_9_x86_64.whl (12.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.0/12.0 MB 10.6 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from loompy->velocyto) (65.6.3)
Collecting numpy-groupies
Downloading numpy_groupies-0.9.20-py3-none-any.whl (25 kB)
Requirement already satisfied: packaging>=20.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (22.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (3.0.9)
Requirement already satisfied: pillow>=6.2.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (9.4.0)
Requirement already satisfied: importlib-resources>=3.2.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (5.2.0)
Requirement already satisfied: python-dateutil>=2.7 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (2.8.2)
Requirement already satisfied: fonttools>=4.22.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (4.25.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (1.0.5)
Requirement already satisfied: cycler>=0.10 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (0.11.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from matplotlib->velocyto) (1.4.4)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from numba->velocyto) (0.39.1)
Requirement already satisfied: pytz>=2020.1 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from pandas->velocyto) (2022.7)
Requirement already satisfied: joblib>=1.1.1 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn->velocyto) (1.1.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from scikit-learn->velocyto) (2.2.0)
Collecting numpy
Downloading numpy-1.22.4-cp39-cp39-macosx_10_15_x86_64.whl (17.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.7/17.7 MB 2.1 MB/s eta 0:00:00
Requirement already satisfied: zipp>=3.1.0 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from importlib-resources>=3.2.0->matplotlib->velocyto) (3.11.0)
Requirement already satisfied: six>=1.5 in /Users/jsong/opt/anaconda3/lib/python3.9/site-packages (from python-dateutil>=2.7->matplotlib->velocyto) (1.16.0)
Building wheels for collected packages: velocyto, loompy
Building wheel for velocyto (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [43 lines of output]
['/private/var/folders/vl/x9ks4zqx3vx6jg0zgg1kzt780000gn/T/pip-install-r6ybq1f6/velocyto_0bad0e9ae1c543ce8c2238dcf18b0f0e/setup.py', 'bdist_wheel', '-d', '/private/var/folders/vl/x9ks4zqx3vx6jg0zgg1kzt780000gn/T/pip-wheel-xriwvgjl']
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-39
creating build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/transcript_model.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/diffusion.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/metadata.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/analysis.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/molitem.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/_version.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/neighbors.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/constants.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/estimation.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/init.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/feature.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/logic.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/indexes.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/utils.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/gene_info.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/r_interface.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/segment_match.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/counter.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/serialization.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/read.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
creating build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/velocyto.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/init.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run_smartseq2.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/dropest_bc_correct.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run_dropest.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run10x.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/_run.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
running build_ext
building 'velocyto.speedboosted' extension
creating build/temp.macosx-10.9-x86_64-cpython-39
creating build/temp.macosx-10.9-x86_64-cpython-39/velocyto
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/jsong/opt/anaconda3/include -arch x86_64 -I/Users/jsong/opt/anaconda3/include -fPIC -O2 -isystem /Users/jsong/opt/anaconda3/include -arch x86_64 -I/Users/jsong/opt/anaconda3/lib/python3.9/site-packages/numpy/core/include -I/Users/jsong/opt/anaconda3/include/python3.9 -c velocyto/speedboosted.c -o build/temp.macosx-10.9-x86_64-cpython-39/velocyto/speedboosted.o -fopenmp -ffast-math
clang: error: unsupported option '-fopenmp'
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for velocyto
Running setup.py clean for velocyto
Building wheel for loompy (setup.py) ... done
Created wheel for loompy: filename=loompy-3.0.7-py3-none-any.whl size=52018 sha256=0750011c1332a24b6160f3ecf61261bd9c19fe590c93b65046904228e61ee8e7
Stored in directory: /Users/jsong/Library/Caches/pip/wheels/8b/5b/87/3eabb82aa3b53152c1f25646389ceb6f083e614d928836c341
Successfully built loompy
Failed to build velocyto
Installing collected packages: pysam, numpy, pandas, numpy-groupies, loompy, velocyto
Attempting uninstall: numpy
Found existing installation: numpy 1.23.5
Uninstalling numpy-1.23.5:
Successfully uninstalled numpy-1.23.5
Running setup.py install for velocyto ... error
error: subprocess-exited-with-error

× Running setup.py install for velocyto did not run successfully.
│ exit code: 1
╰─> [49 lines of output]
['/private/var/folders/vl/x9ks4zqx3vx6jg0zgg1kzt780000gn/T/pip-install-r6ybq1f6/velocyto_0bad0e9ae1c543ce8c2238dcf18b0f0e/setup.py', 'install', '--record', '/private/var/folders/vl/x9ks4zqx3vx6jg0zgg1kzt780000gn/T/pip-record-aia4f9hi/install-record.txt', '--single-version-externally-managed', '--compile', '--install-headers', '/Users/jsong/opt/anaconda3/include/python3.9/velocyto']
Compiling velocyto/speedboosted.pyx because it depends on /Users/jsong/opt/anaconda3/lib/python3.9/site-packages/numpy/init.pxd.
[1/1] Cythonizing velocyto/speedboosted.pyx
/Users/jsong/opt/anaconda3/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /private/var/folders/vl/x9ks4zqx3vx6jg0zgg1kzt780000gn/T/pip-install-r6ybq1f6/velocyto_0bad0e9ae1c543ce8c2238dcf18b0f0e/velocyto/speedboosted.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
running install
/Users/jsong/opt/anaconda3/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-39
creating build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/transcript_model.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/diffusion.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/metadata.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/analysis.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/molitem.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/_version.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/neighbors.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/constants.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/estimation.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/init.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/feature.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/logic.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/indexes.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/utils.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/gene_info.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/r_interface.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/segment_match.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/counter.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/serialization.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
copying velocyto/read.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto
creating build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/velocyto.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/init.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run_smartseq2.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/dropest_bc_correct.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run_dropest.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/run10x.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
copying velocyto/commands/_run.py -> build/lib.macosx-10.9-x86_64-cpython-39/velocyto/commands
running build_ext
building 'velocyto.speedboosted' extension
creating build/temp.macosx-10.9-x86_64-cpython-39
creating build/temp.macosx-10.9-x86_64-cpython-39/velocyto
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/jsong/opt/anaconda3/include -arch x86_64 -I/Users/jsong/opt/anaconda3/include -fPIC -O2 -isystem /Users/jsong/opt/anaconda3/include -arch x86_64 -I/Users/jsong/opt/anaconda3/lib/python3.9/site-packages/numpy/core/include -I/Users/jsong/opt/anaconda3/include/python3.9 -c velocyto/speedboosted.c -o build/temp.macosx-10.9-x86_64-cpython-39/velocyto/speedboosted.o -fopenmp -ffast-math
clang: error: unsupported option '-fopenmp'
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> velocyto

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Index splitting and specifying index for kb count

Hi Basil,

I'm trying to generate a loom file for RNA velocity using kb-python using the method you describe in your tutorial.

I just wrote out a detailed description of the problems I was having with kb count after generating a new reference from GENCODE vM25 - which led me to find a new troubleshooting option and the solution to my problem.

I ran the following command to build the reference; fasta.fa and genes.gtf are gunzipped copies of the GENCODE vM25 reference fa and gtf files:

kb ref -i indeces/index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 /lscratch/slurm-job-5124079/fasta.fa /lscratch/slurm-job-5124079/genes.gtf

Running this command generates a warning that the index splitting (-n) flag will be deprecated in the next major release - this led me to check the release notes for kb-python versions back to where index splitting was introduced in v0.25.0.

Use of the -n 4 flag in the kb ref command leads to the generation of four index files in the indeces directory:

index.idx_cdna
index.idx_intron.0
index.idx_intron.1
index.idx_intron.2

The v0.25.0 release notes state:

When -n is used the built indices must be passed in as a comma-delimited list to kb count

I made that change in my kb count command, which seems to have done the trick - although I don't see any loom files, so I may have to do some further tweaking.

I'd suggest a couple of updates to your tutorial (1) to remove index splitting from kb ref (or note that it will be deprecated), and/or (2) to clarify the specification of index file(s) in the kb count command. Your command has -i transcriptome.idx, which doesn't match the indexes generated by the kb ref command two lines above.

Thanks,

Chris

ValueError: Value passed for key 'X_umap' is of incorrect shape. Values of obsm must match dimensions (0,) of parent.

Hi @basilkhuder
Thank you for the tutorial.
I am trying to get RNA velocity for two samples (I have 30 samples, I am using only 2 for testing)
Following is my code.
NDN = C19_P1_NDN.concatenate(C19_P2_NDN) #concatenate the two filtered loom files in an annData object
NDN.obs.index
NDN_index = pd.DataFrame(NDN.obs.index)
NDN_index = NDN_index.rename(columns = {0:'Cell_ID'}) #change the column name from 0 to Cell_ID)
umap_cord.insert(0, 'Cell_ID', cellID_obs ['x']) #adding Cell_ID column into umap coordinates csv file
umap_ordered = NDN_index.merge(umap_cord, on = "Cell_ID") #merge based on Cell_ID
umap_ordered = umap_ordered.iloc[:,1:]
NDN.obsm['X_umap'] = umap_ordered.values

ValueError: Value passed for key 'X_umap' is of incorrect shape. Values of obsm must match dimensions (0,) of parent. Value had shape (0, 2) while it should have had (14872,).

I think the error is at merge step because. I am merging the data frame NDN_index with umap_cord based on Cell_ID but not the actual annData object NDN. It is not clear for me what to do at this point in tutorial.
I really appreciate your help here.

Imported tSNE coordintes not being read!

I was wondering if anyone could help me here: I managed t oloadmy tsne coordinates but ran into an issue when I open the loom file and try to set the obsm_mapping parametre:

adata = scv.read_loom("/home/ali/Dokumente/RPractise/Velocity/PythonCodes/adata.loom", obs_names="CellID", var_names="Gene", obsm_mapping="X_tsne")

Without setting the obsm_mapping parametre, the file loads normally and one can even see the tsne coordinates. Seeing as the information is available in the loom file, I assume that it is in an incorrect format...would it have something to do with it being an array? I notice that there is no ["tsne_1", "tsne_2"] headers, as listed in the scvelo.read_loom API page

Anyone know how to resolve this?

Question about the loom file

Hi!

I have been recently following your helpful tutorial and I have run into some problems. I am relatively new to python and would love to have some help!

First, my question would be about my loom file. I used kb to generate the loom file and ran it fine but it does not really seem correct.

In[15]: sample_1
Out[15]:
AnnData object with n_obs × n_vars = 191684 × 22582
obs: 'barcode'
var: 'gene_id', 'gene_name'
layers: 'matrix', 'spliced', 'unspliced'

Isnt usually for a loom file, it contains a lot of informations already (ex: clusters, X_pca,...)? For me, I only have these few identities inside. Did i miss some steps that let me transfer all the information about the objects from Seurat?

And after filtering the cell and concatenating my samples, I am only left with 477 cells. Is it normal to lose so many cells to filtering?

In[24]: sample_1
Out[24]:
AnnData object with n_obs × n_vars = 477 × 22582
obs: 'batch'
var: 'gene_id', 'gene_name'
obsm: 'X_umap'
layers: 'matrix', 'spliced', 'unspliced'

Thank you so much for your help!

seurat object to RNA velocity question

Thank you for sharing tutorials with codes !
I'm new to python and stuck at first step of generating non-spliced and spliced RNAs for RNA velocyto analysis !
I have selected cell population in seurat object and would like to know if I can extract those non-spliced and spliced RNA information from this or do i have to go back to original sequencing files i.e. bam/fasq files?

10x 5 prime data

Good morning!

Thank you for the well-documented tutorial! Have you tried either kb_python or velocyto on 10x 5' data? Do you have any suggestions for getting the input files with 5' data? I noticed that 5' prime was not listed as a supported technology for kb_python, but I have some 5' data that I would love to run through RNA velocity.

Thank you!!

Variable names are not unique and 'cellID_obs' is not defined

Thanks @basilkhuder for this amazing tutorial!!

I run the python code (see below) and I get the following output error:

Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Traceback (most recent call last):
  File "/home/ali/Dokumente/RPractise/Velocity/PythonCodes/Step_2_scVelo.py", line 34, in <module>
    cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")]
NameError: name 'cellID_obs_WT3' is not defined

Reading #4, I changed WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3)] to WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3[x])] which did not work and gave me errors relating to pandas (Key error 0=. I implemented this potential solution in my code but I do not know if Python is implementing it properly since I get no output for my WT3.var_names line Anyone have any suggestions?

I also do not know how to go about addressing the NameError: name 'cellID_obs_WT3' is not defined error. From #9, I changed cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")] to cellID_obs_WT3 = cellID_obs[cellID_obs[0].str.contains("WT3_WT3_")] and also got the same pandas errors.

Anyone got any tips?

Python script

WT3 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/221929_WT3/velocyto/221929_WT3.loom")
WT3.var_names_make_unique()
WT4 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222863_WT4/velocyto/222863_WT4.loom")
WT4.var_names_make_unique()
KO4 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222862_KO4/velocyto/222862_KO4.loom")
KO4.var_names_make_unique()
KO5 = anndata.read_loom("/home/ali/Dokumente/RPractise/E18.5_rawdata/E18.5_raw_outputs/222864_KO5/velocyto/222864_KO5.loom")
KO5.var_names_make_unique()

WT3.var_names
WT4.var_names
KO4.var_names
KO5.var_names

cellID_obs = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/cellID_obs.csv")
TSNE_cord = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/cell_embeddings.csv")
cell_clusters = pd.read_csv("/home/ali/Dokumente/RPractise/E18.5/MTW Expected E18.5/clusters.csv")

#integration
cellID_obs_WT3 = cellID_obs[cellID_obs_WT3[0].str.contains("WT3_WT3_")]
cellID_obs_WT4 = cellID_obs[cellID_obs_WT4[0].str.contains("WT4_")]
cellID_obs_KO4 = cellID_obs[cellID_obs_KO4[0].str.contains("KO4_")]
cellID_obs_KO5 = cellID_obs[cellID_obs_KO5[0].str.contains("KO5_")]

WT3 = WT3[np.isin(WT3.obs.index, cellID_obs_WT3)]
WT4 = WT4[np.isin(WT4.obs.index, cellID_obs_WT4)]
KO4 = KO4[np.isin(KO4.obs.index, cellID_obs_KO4)]
KO5 = KO5[np.isin(KO5.obs.index, cellID_obs_KO5)]

sample_one = WT3.concatenate(WT4, KO4, KO5)

Cell ID examples

x
WT4_TGCGGGTAGTCCGGTC
KO4_GTTAAGCCATACCATG
KO5_TTTCCTCAGATCCCAT
WT3_WT3_AACCATGCAGCCTTGG

Fail to filter loom by CellID extracted from Seurat

Hi,
I have problem when trying to filter loom with CellID extracted from Seurat object.

When I import the loom file by sample = anndata.read_loom("sample.loom"), I get this warning message: ariable names are not unique. To make them unique, call .var_names_make_unique. Do I need to run sample.var_names_make_unique()?
If I ignore the warning in the above question and continue to load CellID_obs.cvs and filter by sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]], I get the following error:

KeyError Traceback (most recent call last)
~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2890 try:
-> 2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError Traceback (most recent call last)
in
----> 1 sample = sample[sample[np.isin(sample.obs.index,cellID_obs[0])]]

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in getitem(self, key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]

~/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2891 return self._engine.get_loc(casted_key)
2892 except KeyError as err:
-> 2893 raise KeyError(key) from err
2894
2895 if tolerance is not None:

KeyError: 0

Could you help me with this?
Thank you so much!

Any idea about the multiple sample looms to integrated by loompy.combine function?

Hi all,
Thanks to develop the great pipeline to manipulate the RNA velocity between seurat and anndata object.
I wonder that the vignette you post about the merge is one by one to filter and then taking the sample_one.concatenate function to merge the loom data. in my case, i used to the loompy.combine function to aggregate 20 samples' loom and then filter the barcode identical with seurat's barcode. after that, using the seurat object convert to anndata to analyize the velocity.
Is that right ? and can the velocity apply to the multiple samples?
Any advice would be appreciated.
Best,
Hanhuihong

Multiple-Sample Integration for filtering cell ID based off Seurat

Hello,

Thank you for the well detailed instructions for this they are very helpful. I am rather new to python and I am having a challenging time trying to filter the loom files to match my Seurat object. My Seurat consists of 3 individual samples that are integrated together. I have three separate loom files that were made using Velocyto. I have followed all the instructions in your tutorial up to the filtering step for the loom files. After calling in all the CSV files for the CellIds, UMAP, and cluster ids I moved onto the Multiple-Sample Integration step as my CellID_Obs file has combined 3 samples just like your example table. I use the code:

cellID_obs_sample_one = cellID_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
cellID_obs_sample_two = cellID_obs[cellID_obs_sample_two[0].str.contrains("sample2_")]
cellID_obs_sample_three = cellID_obs[cellID_obs_sample_three[0].str.contrains("sample3_")]

sample_one = sample_one[np.isin(sample_one.obs.index, cellID_obs_sample_one)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]
sample_two = sample_one[np.isin(sample_two.obs.index, cellID_obs_sample_two)]

When I run the first line it errors out with:

cellID_obs_sample_one = sample_obs[cellID_obs_sample_one[0].str.contrains("sample1_")]
Traceback (most recent call last):
File "", line 1, in
NameError: name 'cellID_obs_sample_one' is not defined

If i separate the samples cellID_obs from Seurat into 3 separate lists and run it i still error out:

cellID_obs_sample1 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample1.csv")

sample_one = sample_one[np.isin(sample_one.obs.index,cellID_obs_sample1["x"])]
cellID_obs_sample2 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample2csv")
sample_two = sample_two[np.isin(sample_two.obs.index,cellID_obs_sample2["x"])]
cellID_obs_sample3 = pd.read_csv("/home/cfay/Documents/cellID_obs_sample3.csv")
sample_three = sample_three[np.isin(sample_three.obs.index,cellID_obs_sample3["x"])]
sample_one = sample_one.concatenate(sample_two, sample_three)
Traceback (most recent call last):
File "", line 1, in
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1710, in concatenate
out.obs = concat(
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 834, in obs
self._set_dim_df(value, "obs")
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 783, in _set_dim_df
value_idx = self._prep_dim_index(value.index, attr)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 810, in _prep_dim_index
value[0], (str, bytes)
File "/home/cfay/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4101, in getitem
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0

I figure that I am doing some part of this wrong and wanted to know if you would be able to help me pinpoint the issue as I want to calculate RNA velocity and use my seurat UMAP.
Thank you for your help and consideration!

error merging Index data frame with UMAP

Hello, I am following the tutorial posted by Basilkhuder on doing RNA velocity estimation on Seurat object (https://github.com/basilkhuder/Seurat-to-RNA-Velocity)

I am having an error where, the command is given to merge the index data frame with UMAP to match the order pf anndata which is generating an error. Please see the attached output:

import anndata
import scvelo as scv
import pandas as pd
import numpy as np
import matplotlib as plt
sample_one = anndata.read_loom("cellRanger.loom")
sample_obs = pd.read_csv("cellID_obs.csv")
umap_cord = pd.read_csv("cell_embeddings.csv")
cell_clusters = pd.read_csv("clusters.csv")
sample_one = sample_one[np.isin(sample_one.obs.index,sample_obs["x"])]
umap = pd.read_csv("cell_embeddings.csv")
sample_one.obs.index
sample_one_index = pd.DataFrame(sample_one.obs.index)
sample_one_index = sample_one_index.rename(columns = {0:'Cell ID'})
umap = umap.rename(columns = {'Unnamed: 0':'Cell ID'})
umap_ordered = sample_one_index.merge(umap, on = "Cell ID")

KeyError Traceback (most recent call last)
Cell In[16], line 1
----> 1 umap_ordered = sample_one_index.merge(umap, on = "Cell ID")

File ~/Documents/anaconda3/envs/ScveloR/lib/python3.10/site-packages/pandas/core/frame.py:10093, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
10074 @substitution("")
10075 @appender(_merge_doc, indents=2)
10076 def merge(
(...)
10089 validate: str | None = None,
10090 ) -> DataFrame:
10091 from pandas.core.reshape.merge import merge

10093 return merge(
10094 self,
10095 right,
10096 how=how,
10097 on=on,
10098 left_on=left_on,
10099 right_on=right_on,
10100 left_index=left_index,
10101 right_index=right_index,
10102 sort=sort,
10103 suffixes=suffixes,
10104 copy=copy,
10105 indicator=indicator,
10106 validate=validate,
10107 )

File ~/Documents/anaconda3/envs/ScveloR/lib/python3.10/site-packages/pandas/core/reshape/merge.py:110, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
93 @substitution("\nleft : DataFrame or named Series")
94 @appender(_merge_doc, indents=0)
95 def merge(
(...)
108 validate: str | None = None,
109 ) -> DataFrame:
--> 110 op = _MergeOperation(
111 left,
112 right,
113 how=how,
114 on=on,
115 left_on=left_on,
116 right_on=right_on,
117 left_index=left_index,
118 right_index=right_index,
119 sort=sort,
120 suffixes=suffixes,
121 indicator=indicator,
122 validate=validate,
123 )
124 return op.get_result(copy=copy)

File ~/Documents/anaconda3/envs/ScveloR/lib/python3.10/site-packages/pandas/core/reshape/merge.py:703, in _MergeOperation.init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, indicator, validate)
696 self._cross = cross_col
698 # note this function has side effects
699 (
700 self.left_join_keys,
701 self.right_join_keys,
702 self.join_names,
--> 703 ) = self._get_merge_keys()
705 # validate the merge keys dtypes. We may need to coerce
706 # to avoid incompatible dtypes
707 self._maybe_coerce_merge_keys()

File ~/Documents/anaconda3/envs/ScveloR/lib/python3.10/site-packages/pandas/core/reshape/merge.py:1179, in _MergeOperation._get_merge_keys(self)
1175 if lk is not None:
1176 # Then we're either Hashable or a wrong-length arraylike,
1177 # the latter of which will raise
1178 lk = cast(Hashable, lk)
-> 1179 left_keys.append(left._get_label_or_level_values(lk))
1180 join_names.append(lk)
1181 else:
1182 # work-around for merge_asof(left_index=True)

File ~/Documents/anaconda3/envs/ScveloR/lib/python3.10/site-packages/pandas/core/generic.py:1850, in NDFrame._get_label_or_level_values(self, key, axis)
1844 values = (
1845 self.axes[axis]
1846 .get_level_values(key) # type: ignore[assignment]
1847 ._values
1848 )
1849 else:
-> 1850 raise KeyError(key)
1852 # Check for duplicates
1853 if values.ndim > 1:

KeyError: 'Cell ID'

Can somebody please help me solving this error. I am only having a beginner's level expertise so please excuse if there is a very basic workaround for this issue.

Thank. you

raise ValueError"cannot reindex from a duplicate axis"

HI, when i run the following command, i encounter the following error:

import anndata
import scvelo as scv
import pandas as pd
import numpy as np
import matplotlib as plt

C183 = anndata.read_loom("C183.loom",validate=False)
C184 = anndata.read_loom("C184.loom",validate=False)
C185 = anndata.read_loom("C185.loom",validate=False)

sample_obs = pd.read_csv("hspc.three.cellID_obs.csv")
umap_cord = pd.read_csv("hspc.three.cell_embeddings.csv")
cell_clusters = pd.read_csv("hspc.three.clusters.csv")

C183.obs=C183.obs.rename(index = lambda x: x.replace('C183:', ''))
C183.obs=C183.obs.rename(index = lambda x: x.replace('x', ''))
C183.obs.head()

C184.obs=C184.obs.rename(index = lambda x: x.replace('C184:', ''))
C184.obs=C184.obs.rename(index = lambda x: x.replace('x', ''))
C184.obs.head()

C185.obs=C185.obs.rename(index = lambda x: x.replace('C185:', ''))
C185.obs=C185.obs.rename(index = lambda x: x.replace('x', ''))
C185.obs.head()

sample_obs.x=sample_obs.x.replace({"C183_":""},regex=True)
sample_obs.x=sample_obs.x.replace({"C184_":""},regex=True)
sample_obs.x=sample_obs.x.replace({"C185_":""},regex=True)

C183 = C183[np.isin(C183.obs.index,sample_obs["x"])]

C184 = C184[np.isin(C184.obs.index,sample_obs["x"])]

C185 = C185[np.isin(C185.obs.index,sample_obs["x"])]

###merge file

sample_one = C183.concatenate(C183,C184,C185)

the error as following:
Traceback (most recent call last):
File "", line 1, in
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/anndata/_core/anndata.py", line 1757, in concatenate
out = concat(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py", line 81 8, in concat
alt_annot = merge_dataframes(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py", line 53 1, in merge_dataframes
dfs = [df.reindex(index=new_index) for df in dfs]
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/anndata/_core/merge.py", line 53 1, in
dfs = [df.reindex(index=new_index) for df in dfs]
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/util/_decorators.py", lin e 324, in wrapper
return func(*args, **kwargs)
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 4767 , in reindex
return super().reindex(**kwargs)
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 48 09, in reindex
return self._reindex_axes(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 4592 , in _reindex_axes
frame = frame._reindex_index(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 4611 , in _reindex_index
return self._reindex_with_indexers(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 48 74, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/internals/managers.p y", line 663, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)
File "/home/wenyl/miniconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", li ne 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")

I checked the cellid of three samples and found no same. I didin't know how to fix.

Kallisto Bustools "Failed to find compatible kallisto binary."

Hi! Thanks for this tutorial.

I'm stuck at the first part of the tutorial. I've managed to get the .gtf and .fa, and this is my code, but i'm getting an error with kb-bustools

kb ref -i index.idx -g t2g.txt -f1 cdna.fa -f2 intron.fa -c1 cdna_t2c.txt -c2 intron_t2c.txt --workflow lamanno -n 4 Homo_sapiens.GRCh38.cdna.all.fa Homo_sapiens.GRCh38.104.gtf

and my error generated is:
Traceback (most recent call last): File "/home/simonim/anaconda2/envs/kb/bin/kb", line 8, in <module> sys.exit(main()) File "/home/simonim/anaconda2/envs/kb/lib/python3.8/site-packages/ngs_tools/logging.py", line 62, in inner return func(*args, **kwargs) File "/home/simonim/anaconda2/envs/kb/lib/python3.8/site-packages/kb_python/main.py", line 1346, in main raise UnsupportedOSError( kb_python.config.UnsupportedOSError: Failed to find compatible kallisto binary. Provide a compatible binary with the --kallistooption or runkb compile.

I thought this had something to do with the file path, but I'm running the command in the same folder as the .fa and .gtf, and have played around with that with no success. any thoughts?

Mention velociraptor package

Hi Basil,
great tutorial! Since you've been moving back and forth in R and python, I just wanted to add that the Bioconductor package velociraptor allows to access the scVelo functionalities from within R, which allows the user to stay with R the whole time.

errors when running kb count

Dear All,

I am trying to follow the tutor. When I run
kb count -i transcriptome.idx -g t2g.txt -x 10xv2 --workflow lamanno --loom -c1 cdna_t2c.txt -c2 intron_t2c.txt read_1.fastq.gz read_2.fastq.gz

I get the errors:

Error: kallisto index file not found transcriptome.idx
Error: file not found read_1.fastq.gz
Error: file not found read_2.fastq.gz
[2023-05-13 03:04:36,697]   ERROR [main] An exception occurred

Could you tell me how to get these idx, read_1.fastq.gz, and read_2.fastq.gz files?

Thank you in Advance.

Getting AssertionError in scv.pp.moments

Hello, thank you for very helpful tutorial, I am new to python so sorry if I ask very naive question.
I am following the tutorial step by step and getting errors in .pp.moments. Before that I also had an issue with cell ID but after modified the sample_obs csv file I was able to run it. Now I am facing another issue as you can see bellow. Please help me with it. Many thanks.

computing neighbors

AssertionError Traceback (most recent call last)
/opt/miniconda3/lib/python3.8/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
743 try:
--> 744 yield
745 except NumbaError as e:

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower_block(self, block)
229 loc=self.loc, errcls_=defaulterrcls):
--> 230 self.lower_inst(inst)
231 self.post_block(block)

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower_inst(self, inst)
327 val = self.lower_assign(ty, inst)
--> 328 self.storevar(val, inst.target.name)
329

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in storevar(self, value, name)
1277 name=name)
-> 1278 raise AssertionError(msg)
1279

AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32

During handling of the above exception, another exception occurred:

LoweringError Traceback (most recent call last)
in
----> 1 scv.pp.moments(sample_one)

/opt/miniconda3/lib/python3.8/site-packages/scvelo/preprocessing/moments.py in moments(data, n_neighbors, n_pcs, mode, method, use_rep, use_highly_variable, copy)
62
63 if n_neighbors is not None and n_neighbors > get_n_neighs(adata):
---> 64 neighbors(
65 adata,
66 n_neighbors=n_neighbors,

/opt/miniconda3/lib/python3.8/site-packages/scvelo/preprocessing/neighbors.py in neighbors(adata, n_neighbors, n_pcs, use_rep, use_highly_variable, knn, random_state, method, metric, metric_kwds, num_threads, copy)
161 warnings.simplefilter("ignore")
162 neighbors = Neighbors(adata)
--> 163 neighbors.compute_neighbors(
164 n_neighbors=n_neighbors,
165 knn=knn,

/opt/miniconda3/lib/python3.8/site-packages/scanpy/neighbors/init.py in compute_neighbors(self, n_neighbors, knn, n_pcs, use_rep, method, random_state, write_knn_indices, metric, metric_kwds)
748 # we need self._distances also for method == 'gauss' if we didn't
749 # use dense distances
--> 750 self._distances, self._connectivities = _compute_connectivities_umap(
751 knn_indices,
752 knn_distances,

/opt/miniconda3/lib/python3.8/site-packages/scanpy/neighbors/init.py in compute_connectivities_umap(knn_indices, knn_dists, n_obs, n_neighbors, set_op_mix_ratio, local_connectivity)
353 # umap 0.5.0
354 warnings.filterwarnings("ignore", message=r"Tensorflow not installed")
--> 355 from umap.umap import fuzzy_simplicial_set
356
357 X = coo_matrix(([], ([], [])), shape=(n_obs, 1))

/opt/miniconda3/lib/python3.8/site-packages/umap/init.py in
----> 1 from .umap_ import UMAP
2
3 # Workaround: numba/numba#3341
4 import numba
5

/opt/miniconda3/lib/python3.8/site-packages/umap/umap_.py in
52 from umap.spectral import spectral_layout
53 from umap.utils import deheap_sort, submatrix
---> 54 from umap.layouts import (
55 optimize_layout_euclidean,
56 optimize_layout_generic,

/opt/miniconda3/lib/python3.8/site-packages/umap/layouts.py in
37 },
38 )
---> 39 def rdist(x, y):
40 """Reduced Euclidean distance.
41

/opt/miniconda3/lib/python3.8/site-packages/numba/core/decorators.py in wrapper(func)
219 with typeinfer.register_dispatcher(disp):
220 for sig in sigs:
--> 221 disp.compile(sig)
222 disp.disable_compile()
223 return disp

/opt/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py in compile(self, sig)
907 with ev.trigger_event("numba:compile", data=ev_details):
908 try:
--> 909 cres = self._compiler.compile(args, return_type)
910 except errors.ForceLiteralArg as e:
911 def folded(args, kws):

/opt/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
77
78 def compile(self, args, return_type):
---> 79 status, retval = self._compile_cached(args, return_type)
80 if status:
81 return retval

/opt/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
91
92 try:
---> 93 retval = self._compile_core(args, return_type)
94 except errors.TypingError as e:
95 self._failed_cache[key] = e

/opt/miniconda3/lib/python3.8/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
104
105 impl = self._get_implementation(args, {})
--> 106 cres = compiler.compile_extra(self.targetdescr.typing_context,
107 self.targetdescr.target_context,
108 impl,

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
604 pipeline = pipeline_class(typingctx, targetctx, library,
605 args, return_type, flags, locals)
--> 606 return pipeline.compile_extra(func)
607
608

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler.py in compile_extra(self, func)
351 self.state.lifted = ()
352 self.state.lifted_from = None
--> 353 return self._compile_bytecode()
354
355 def compile_ir(self, func_ir, lifted=(), lifted_from=None):

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler.py in _compile_bytecode(self)
413 """
414 assert self.state.func_ir is None
--> 415 return self._compile_core()
416
417 def _compile_ir(self):

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler.py in _compile_core(self)
393 self.state.status.fail_reason = e
394 if is_final_pipeline:
--> 395 raise e
396 else:
397 raise CompilerError("All available pipelines exhausted")

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler.py in _compile_core(self)
384 res = None
385 try:
--> 386 pm.run(self.state)
387 if self.state.cr is not None:
388 break

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler_machinery.py in run(self, state)
337 (self.pipeline_name, pass_desc)
338 patched_exception = self._patch_error(msg, e)
--> 339 raise patched_exception
340
341 def dependency_analysis(self):

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler_machinery.py in run(self, state)
328 pass_inst = _pass_registry.get(pss).pass_inst
329 if isinstance(pass_inst, CompilerPass):
--> 330 self._runPass(idx, pass_inst, state)
331 else:
332 raise BaseException("Legacy pass in use")

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
33 def _acquire_compile_lock(*args, **kwargs):
34 with self:
---> 35 return func(*args, **kwargs)
36 return _acquire_compile_lock
37

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
287 mutated |= check(pss.run_initialization, internal_state)
288 with SimpleTimer() as pass_time:
--> 289 mutated |= check(pss.run_pass, internal_state)
290 with SimpleTimer() as finalize_time:
291 mutated |= check(pss.run_finalizer, internal_state)

/opt/miniconda3/lib/python3.8/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
260
261 def check(func, compiler_state):
--> 262 mangled = func(compiler_state)
263 if mangled not in (True, False):
264 msg = ("CompilerPass implementations should return True/False. "

/opt/miniconda3/lib/python3.8/site-packages/numba/core/typed_passes.py in run_pass(self, state)
461
462 # TODO: Pull this out into the pipeline
--> 463 NativeLowering().run_pass(state)
464 lowered = state['cr']
465 signature = typing.signature(state.return_type, *state.args)

/opt/miniconda3/lib/python3.8/site-packages/numba/core/typed_passes.py in run_pass(self, state)
382 lower = lowering.Lower(targetctx, library, fndesc, interp,
383 metadata=metadata)
--> 384 lower.lower()
385 if not flags.no_cpython_wrapper:
386 lower.create_cpython_wrapper(flags.release_gil)

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower(self)
134 if self.generator_info is None:
135 self.genlower = None
--> 136 self.lower_normal_function(self.fndesc)
137 else:
138 self.genlower = self.GeneratorLower(self)

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower_normal_function(self, fndesc)
188 # Init argument values
189 self.extract_function_arguments()
--> 190 entry_block_tail = self.lower_function_body()
191
192 # Close tail of entry block

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower_function_body(self)
214 bb = self.blkmap[offset]
215 self.builder.position_at_end(bb)
--> 216 self.lower_block(block)
217 self.post_lower()
218 return entry_block_tail

/opt/miniconda3/lib/python3.8/site-packages/numba/core/lowering.py in lower_block(self, block)
228 with new_error_context('lowering "{inst}" at {loc}', inst=inst,
229 loc=self.loc, errcls_=defaulterrcls):
--> 230 self.lower_inst(inst)
231 self.post_block(block)
232

/opt/miniconda3/lib/python3.8/contextlib.py in exit(self, type, value, traceback)
129 value = type()
130 try:
--> 131 self.gen.throw(type, value, traceback)
132 except StopIteration as exc:
133 # Suppress StopIteration unless it's the same exception that

/opt/miniconda3/lib/python3.8/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
749 newerr = errcls(e).add_context(format_msg(fmt, args, kwargs))
750 tb = sys.exc_info()[2] if numba.core.config.FULL_TRACEBACKS else None
--> 751 raise newerr.with_traceback(tb)
752
753

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../opt/miniconda3/lib/python3.8/site-packages/umap/layouts.py", line 52:
def rdist(x, y):

result = 0.0
dim = x.shape[0]
^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=)" at /opt/miniconda3/lib/python3.8/site-packages/umap/layouts.py (52)

How to run velocyto/RNA-velcocity of a subset

Hi, I have some subclusters and would like to run RNA-velocity on these subsets. But the loom file I genereated was from the whole dataset (all clusters). How should I proceed with RNA velocity in this case? Thanks.

basilkhuder / seurat-to-rna-velocity Goto Github PK

seurat-to-rna-velocity's Introduction

Seurat to RNA-Velocity

By Basil Khuder

Introduction

Generating Loom files

Kallisto Bustools

Installation

Usage

Velocyto

Installation

Usage

Extracting Meta-data

Integrating Loom File and Meta-data

Multiple-Sample Integration

Running RNA Velocity

FAQ

seurat-to-rna-velocity's People

Contributors

Stargazers

Watchers

Forkers

seurat-to-rna-velocity's Issues

Python script

Cell ID examples

computing neighbors

Recommend Projects

Recommend Topics

Recommend Org