Giter Site home page Giter Site logo

Comments (2)

ekernf01 avatar ekernf01 commented on September 3, 2024

I am having a similar problem. If my reading of the code is correct, the bottleneck (cc according to discussion of #15 ) is a huge dense correlation matrix. It is N by N, where N is the number of cells. I wonder if it would be possible to work around this. It seems to be used only once, during formation of a matrix tp.

tp <- exp(cc/corr.sigma) * emb.knn

Unless there is already a fix in the works for this, I'd like to experiment by replacing cc with something sparser or smaller. First, let me make sure I understand what it is doing and what speed-up strategies might make sense.

In the supplemental info page 8, the velocyto team writes "We calculated a transition probability matrix P by applying an exponential kernel on the Pearson correlation coefficient between the velocity vector and cell state difference vectors." P is tp in the code, and cc is the Pearson correlations. As for how these are used downstream, the arrow for each cell is a weighted average of unit vectors pointing towards other cells, and the corresponding column of P gives the weights. Is that right?

To make cc and tp sparser or lower-dimensional, one could consider the k_cc nearest neighbors for each cell, or a subset of randomly chosen cells, or a set of metacells or grid-points. I considered various ways of using nearest neighbors, but I am afraid the fast ones introduce too much bias and the slow ones are slow. Randomly chosen cells would be less biased, but it neglects available information (and it makes the plot stochastic). Metacells are awkward because they require a suitably selected partitioning of cells. Grid-points are awkward because those with fewer nearby cells will be noisier, causing a downward bias in any naive correlation estimate. Velocyto devs, do you have thoughts on what might be best here?

Figure 3 of the paper has ~18k cells; was any special adaptation made to get that working?

from velocyto.r.

ekernf01 avatar ekernf01 commented on September 3, 2024

UMAP supports projection of new points, so for those using UMAP, another strategy would be to take the "future" position of each cell as predicted by velocyto and project it onto the UMAP. But, this would be very awkward to implement in R. You'd need to feed the Python UMAP object to sklearn's transform, but some packages -- Seurat is what I am familiar with -- only store the embeddings.

I tried positioning each "future" cell on top of its nearest neighbor in the embedding, then drawing the arrow to that spot, but it yields pretty messy-looking estimates. Maybe some variant of this strategy would be more and reliable.

from velocyto.r.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.