Comments (4)
Hi @PerrineLacour,
The main determining factor in Chromosight runtime is the maximum scanning distance.
When running in detect
mode, you can just set it with a parameter. However, quantify
willl read the list of input coordinates (your loops) and set the max scanning distance based on the largest loop.
One possibility is that you have huge loops in your list of coordinates. Removing those would greatly reduce runtime.
In theory, the time complexity should be: D* (N**2 - (N-M)**2) / 2
Where:
- N: Matrix dimension
- M: Max scanning distance (your largest loop, here)
- D: Density (proportion of pixels > 0)
D is always between 0 and 1, so D << N.
Since usually D << M << N, the big O notation would be: O(N)
.
But when M approaches N, it becomes: O(MN)
So basically, runtime is linear with matrix dimension for small M, but as M approaches N, runtime becomes quadratic.
Although the command line tool would take too much time to run, you could still use the API with bunch of python scripting.
If you have already subsetted the matrix around a coordinate, you can call chromosight quantification algorithm like this:
import numpy as np
import cooler
import chromosight.kernels as ck
import chromosight.utils.detection as cud
import chromosight.utils.preprocessing as cup
# Load and preprocess Hi-C data
clr = cooler.Cooler('sample.cool')
chrom_mat = clr.matrix(balance=True, sparse=True).fetch('chrom')
obs_exp = cup.detrend(chrom_mat)
# Subset Hi-C data for specific loop
x, y = 300, 450 # Coordinate of loop in chromosome matrix
width = 100
centered_mat = obs_exp[x-width: x+width, y-width:y+width]
# Run chromosight on that loop matrix region
kernel = np.array(ck.loops['kernels'][0])
conv_center, _ = cud.normxcorr2(centered_mat, kernel)
result = conv[width, width] # This is the quantify score
You could implement this in a function and get it to loop over your 140 coordinates.
from chromosight.
Thank you a lot for your answer,
There is still something that I don't understand, why is M still important for quantify
? I thought that chromosight only looked at the coordinates provided.
Can the scaling of the kernel also have an impact on the computation time? Since the correlation has to be computed on a matrix that is 4 or 9 times bigger (for a factor of 2 or 3).
from chromosight.
So, in earlier versions chromosight looked only at the regions around coordinates, but in most real life use cases (> 10k coordinates on large matrices), slicing the matrix so many times would be orders of magnitude slower than just scanning it once.
In practice the best approach would be to automatically choose between 2 approaches:
- If M is large or there are few coordinates: look at each coordinates
- If there are many coordinates or M is large: scan the whole matrix at once.
But I have not had time to implement that yet.
Kernel size also has a (quadratic) impact on computation time, sure, but since it is usually very small it's not too bad. If you are using a large kernel, I recommend using the tsvd option to decompose the kernel (see supp. materials of the paper).
from chromosight.
Ok, that makes sense. Thank you for your help!
from chromosight.
Related Issues (20)
- How to compare loops of Hi-C from different conditions like DEG in RNA-Seq? HOT 4
- Questions about resolution in loop json file HOT 2
- Chromosight for single-cell Hi-C HOT 8
- Point and click mode HOT 8
- Different number of patterns for the same Hi-C matrix HOT 3
- How to evaluate the detected loops? HOT 4
- Different number of loops on GM12878 Hi-C map HOT 2
- chromosight detects hairpin, the numer is too large HOT 1
- Can chromosight detect loops based on restriction fragments level (1f, 2f, etc) HI-C matrix. HOT 2
- Bus error in chromosight quantify HOT 2
- How does Chromosight compute the Pearson correlation ? HOT 2
- Is there a way to use multiresolution .mcool files directly? HOT 1
- Where do these kernels come from? HOT 5
- Loop score calculation HOT 4
- Pattern = TAD? HOT 1
- Recommend parameters of borders detect HOT 1
- Tuning the parameters (perc-zero, perc-undetected, pearson) for a relatively small dataset HOT 2
- HicMatrix generated cool file not supported - possible solution HOT 6
- Handle variable bin size
- plots appear blurry
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from chromosight.