I've been trying to use the show.velocity.on.embedding.eu method on my scRNA-seq data,

cell trajectory modelling - running out of memory about velocyto.r HOT 2 OPEN

smaslova commented on September 3, 2024

cell trajectory modelling - running out of memory

from velocyto.r.

Comments (2)

ekernf01 commented on September 3, 2024

I am having a similar problem. If my reading of the code is correct, the bottleneck (cc according to discussion of #15 ) is a huge dense correlation matrix. It is N by N, where N is the number of cells. I wonder if it would be possible to work around this. It seems to be used only once, during formation of a matrix tp.

tp <- exp(cc/corr.sigma) * emb.knn

Unless there is already a fix in the works for this, I'd like to experiment by replacing cc with something sparser or smaller. First, let me make sure I understand what it is doing and what speed-up strategies might make sense.

In the supplemental info page 8, the velocyto team writes "We calculated a transition probability matrix P by applying an exponential kernel on the Pearson correlation coefficient between the velocity vector and cell state difference vectors." P is tp in the code, and cc is the Pearson correlations. As for how these are used downstream, the arrow for each cell is a weighted average of unit vectors pointing towards other cells, and the corresponding column of P gives the weights. Is that right?

To make cc and tp sparser or lower-dimensional, one could consider the k_cc nearest neighbors for each cell, or a subset of randomly chosen cells, or a set of metacells or grid-points. I considered various ways of using nearest neighbors, but I am afraid the fast ones introduce too much bias and the slow ones are slow. Randomly chosen cells would be less biased, but it neglects available information (and it makes the plot stochastic). Metacells are awkward because they require a suitably selected partitioning of cells. Grid-points are awkward because those with fewer nearby cells will be noisier, causing a downward bias in any naive correlation estimate. Velocyto devs, do you have thoughts on what might be best here?

Figure 3 of the paper has ~18k cells; was any special adaptation made to get that working?

from velocyto.r.

ekernf01 commented on September 3, 2024

UMAP supports projection of new points, so for those using UMAP, another strategy would be to take the "future" position of each cell as predicted by velocyto and project it onto the UMAP. But, this would be very awkward to implement in R. You'd need to feed the Python UMAP object to sklearn's transform, but some packages -- Seurat is what I am familiar with -- only store the embeddings.

I tried positioning each "future" cell on top of its nearest neighbor in the embedding, then drawing the arrow to that spot, but it yields pretty messy-looking estimates. Maybe some variant of this strategy would be more and reliable.

from velocyto.r.

cell trajectory modelling - running out of memory about velocyto.r HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent