Hello! Firstly, thank you for fixing error with pandas) Then, I have

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

How does Chromosight compute the Pearson correlation ? about chromosight HOT 2 CLOSED

koszullab commented on August 16, 2024

How does Chromosight compute the Pearson correlation ?

from chromosight.

Comments (2)

cmdoret commented on August 16, 2024

Hi @GeorgyChistov

The convolution operation (implemented in the xcorr2 function) is used multiple times to compute the map Pearson coefficient at the end.

The normxcorr2 function is where the Pearson correlation is computed. The code is hard to read because 1) it was written to work on sparse matrices and 2) we avoided temporary variables as much as possible to limit memory usage.

tl;dr: We compute the map of Pearson correlations at the very end by plugging the results of correlation products into the different terms of the formula described in the paper methods. We are vectorizing the pearson formula over the whole matrix.

More details below:

Note: This is a slightly simplified version of what we do, because in practice we also account for missing bins (NaNs) by adjusting the denominator.

The basic concept is the following: A Pearson correlation coefficient is computed between each position of an image $IMG$ (the Hi-C map) and a template $TMP$ (the kernel). The result is an image of correlation coefficients $CORR$:

Assuming $TMP$ has $M_{TMP}$ rows and $N_{TMP}$ columns

$CORR[i, j] = Corr(IMG[i: i+M_{TMP}, j: j+N_{TMP}], TMP)$

Where $Corr(\cdot, \cdot)$ between images $X$ and $Y$ is defined as:

$Corr(X, Y) = \frac{ cov(X, Y) }{ \sigma(X) \cdot \sigma(Y) }$

$=\frac{ (X - \overline{X} ) \cdot\ (Y - \overline{Y} ) }{ \sqrt{ \overline{ (X - \overline{X} )^2 } } \cdot \sqrt{ \overline{ (Y - \overline{Y} )^2 } } }$
$=\frac{E[(X - E[X])] \cdot E[(Y - E[Y])]}{\sqrt{E[(X - E[X])^2]} \cdot \sqrt{E[(Y - E[Y])^2]}}$
$=\frac{E[XY] - E[X] \cdot E[Y]}{\sqrt{E[X^2]} - E[X]^2 \cdot \sqrt{E[Y^2] - E[Y]^2}} $

Given that X represents the image around a pixel of the Hi-C matrix and Y represents the template, $E[Y] $ and $\sqrt{E[Y^2] - E[Y]^2}$ are the kernel's mean and standard deviation (float values), and are constant across $IMG$.

The other values to compute are:

$E[XY]$: The convolution of the image and the kernel.
$E[X]$: The convolution of the image by the uniform (mean) kernel. Each pixel (i, j) of the resulting map give the mean of the window $ (i: i+M_{TMP}, j: j+N_{TMP})$ . Values in this matrix can just be squared to obtain $E[X]^2$
$E[X^2]$: The convolution of the squared signal by the uniform (mean) kernel.

Which means there are 3 convolution products to compute in order to obtain a map of Pearson coefficients.

from chromosight.

cmdoret commented on August 16, 2024

As for the kernel of a TAD corner, in principle you'd like to use a simple kernel. It should have a decent correlation with most TAD corners (i.e. 1 quarter dark, 3 quarters light) in your dataset.

The problem might be the size: If the kernel is too large, you will miss small TADs (as the corner only fills a tiny portion of the kernel and will show poor correlation). If the kernel is too small, you will get many false positives due to noise being picked up as corners.

from chromosight.

How does Chromosight compute the Pearson correlation ? about chromosight HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent