Giter Site home page Giter Site logo

Comments (4)

cmdoret avatar cmdoret commented on August 16, 2024

Hi @PerrineLacour,

The main determining factor in Chromosight runtime is the maximum scanning distance.
When running in detect mode, you can just set it with a parameter. However, quantify willl read the list of input coordinates (your loops) and set the max scanning distance based on the largest loop.

One possibility is that you have huge loops in your list of coordinates. Removing those would greatly reduce runtime.
In theory, the time complexity should be: D* (N**2 - (N-M)**2) / 2
Where:

  • N: Matrix dimension
  • M: Max scanning distance (your largest loop, here)
  • D: Density (proportion of pixels > 0)
    image

D is always between 0 and 1, so D << N.
Since usually D << M << N, the big O notation would be: O(N).
But when M approaches N, it becomes: O(MN)

So basically, runtime is linear with matrix dimension for small M, but as M approaches N, runtime becomes quadratic.

Although the command line tool would take too much time to run, you could still use the API with bunch of python scripting.
If you have already subsetted the matrix around a coordinate, you can call chromosight quantification algorithm like this:

import numpy as np
import cooler
import chromosight.kernels as ck
import chromosight.utils.detection as cud
import chromosight.utils.preprocessing as cup

# Load and preprocess Hi-C data
clr = cooler.Cooler('sample.cool')
chrom_mat = clr.matrix(balance=True, sparse=True).fetch('chrom')
obs_exp = cup.detrend(chrom_mat)

# Subset Hi-C data for specific loop
x, y = 300, 450 # Coordinate of loop in chromosome matrix
width = 100
centered_mat = obs_exp[x-width: x+width, y-width:y+width]

# Run chromosight on that loop matrix region
kernel = np.array(ck.loops['kernels'][0])
conv_center, _ = cud.normxcorr2(centered_mat, kernel)
result = conv[width, width] # This is the quantify score

You could implement this in a function and get it to loop over your 140 coordinates.

from chromosight.

PerrineLacour avatar PerrineLacour commented on August 16, 2024

Thank you a lot for your answer,

There is still something that I don't understand, why is M still important for quantify? I thought that chromosight only looked at the coordinates provided.

Can the scaling of the kernel also have an impact on the computation time? Since the correlation has to be computed on a matrix that is 4 or 9 times bigger (for a factor of 2 or 3).

from chromosight.

cmdoret avatar cmdoret commented on August 16, 2024

So, in earlier versions chromosight looked only at the regions around coordinates, but in most real life use cases (> 10k coordinates on large matrices), slicing the matrix so many times would be orders of magnitude slower than just scanning it once.

In practice the best approach would be to automatically choose between 2 approaches:

  • If M is large or there are few coordinates: look at each coordinates
  • If there are many coordinates or M is large: scan the whole matrix at once.

But I have not had time to implement that yet.

Kernel size also has a (quadratic) impact on computation time, sure, but since it is usually very small it's not too bad. If you are using a large kernel, I recommend using the tsvd option to decompose the kernel (see supp. materials of the paper).

from chromosight.

PerrineLacour avatar PerrineLacour commented on August 16, 2024

Ok, that makes sense. Thank you for your help!

from chromosight.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.