spacetx / spacetx-research Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 504.46 MB

Contains research projects related to image-based transcriptomics assays

Jupyter Notebook 99.38% Python 0.61% WDL 0.01% Shell 0.01%

spacetx-research's People

Contributors

Stargazers

Watchers

spacetx-research's Issues

things to test

possible issues:

GP prior, is quite far from the posterior (I see that in the KL_LOGIT being large...)
encoder/decoder are too simple (increase zdim and maybe normalize by its running average)
different betas for Adam since the batch_size is small

Sparsity is in range but BB are not tight.....
Maybe when increasing KL I should also increase SPARSITY

Maybe sparsity should be multiplied by balance AND sparsity term.....

memory efficient implementation with many instances

right now big_mask and big_imgs have the spatial extension of the original image and an extra dimension for the number of instances. This will kill the memory for large images containing many cells. Is there a way to get away with this? It is interesting to note that the in the only the combination sum_j p_j m_j then I can drastically reduce the memory usage. What about imgs? Is the same thing true, i.e. sum_j p_j img_j?

background of arbitrary complexity

You should be able to specify the complexity of the background and/or whether to use it or not.
Sometimes the background is so complex that there is no signal for the cells

robustness to different inputs size

need to adjust initial value of length_scale_similarity based on the intercellular distance
need to initialize lambda (parameter used to keep running average of kl_logit) in some reasonable way
use only adaptive_avg_pool_2D or adaptive_max_pool_2D

MEMO OF THINGS TO DO

produce one large segmentation of smFISH_OLEH and VISIUM
check batch_norm in UNET (currently is not there)
reflection padding in UNET (currently is not there) or nothing and then prediction on a smaller region?
the graph is not a K_NN graph. is that ok? Optimize the radius. It seems that larger is better (i.e. 5 is better than 2). To evaluate this systematically you need to make plots of N_OBJECTS vs RESOLUTION parameters. Hopefully for large radius we will see a plateau
is greedy modularity optimization the thing we are interested in? TIM suggests: If you aren’t committed to greedy modularity maximization, one of the fastest libraries that will get you community detection (using Stochastic Block Models) is graph tool (https://graph-tool.skewed.de/). It’s c++ underneath (using Boost I believe), so it is very fast. The tradeoff is that it can be a huge pain in the ass to install, though I have heard it has recently been simplified.
the graph is partitioned in disconnected components. Is there an advantage in treating each connected component separately. Is community detection faster? Can I use the same resolution parameters for all the different disconnected components
loss function optimization. It seems that the best loss function was the one in
folder: /home/jupyter/REPOS/spacetx-research/NEW_ARCHIVE/merfish_june22_v2
commit 39d6bf2
Change master implementation back to that one. Try to understand the differences.
can i reduce operation for the creation of the graph to 1/4 by using roller2d on just one quadrant?

USE toarch.save instead of pickle.save

work with partial annotation

If few images are partially annotated then you can do supervised learning.
I would think that given an integer_mask_annotation:

compute target bounding boxes, centroid and width/height (using skimage)
identify which voxel is responsible for each target bounding box.
add a regression loss between the target bounding box and the infered bounding box (i.e. tx_map, ty_map, tw_map, th_map which are all in (0,1)). Note that only few voxel will be "labelled" therefore the regression loss should be "masked".
All location inside the target bounding box should have a loss between p_map and target probability. The target probability is 1 at the center of the bounding box and zero in all the other location of the bounding box, i.e. the probability is both pushed up (at center) and down (at periphery)
identifies the bb with the largest IoU with the target bounding box. For that bb put a cross entropy classification loss between the inferred and target mask

Note:
For most images there would not be any annotation, even the annotation is present it is only partial. Therefore the code need to be written in such a way that this labelled loss defaults to zero in most cases.

analyze real datasets

Tommaso Biancalani and Alma Andersson check VIsium Data

AGENDA (5/8/20)

Complete review of outline of manuscript
---(Updated for accuracy)
Discussion of how to integrate Visium/slide-seq
Review of other tasks
---Additional annotations for final data sets?
---Segmentation group status?
---Visualization demo (this week or next week?) 
From Eeshit Dhaval Vaishnav to Me: (Privately) (2:03 PM)
 aah nice to see the lake again 
From Richard Scheuermann to Everyone: (2:05 PM)
 Has the segmentation been finalized? 
From Eeshit Dhaval Vaishnav to Everyone: (2:40 PM)
 For comparing segmentation results, the F1 score, Dice Index and Hausdorff distance would be good metrics . ( I have used them before and have code for computing each , lmk if that is helpful during the segmentation comparison stage ) 
From Me to Everyone: (2:48 PM)
 What is the required input?Share the script for the comparison. Thanks!

Improve GECO

what is the range of allowed value fo the GECO hyper-parameters (0; + infinity) ?
change of hyper-parameters should be proportional to the distance to the target

build a k-nn graph not a graph based on spatial vicinity....

make encoder/decoder of different size and check them on single cells

make encoder/decoder of size 28x28, 56x56, etc.
the architecture can be copied from https://github.com/AntixK/PyTorch-VAE/blob/master/models/vanilla_vae.py

strategy to deal with high-resolution images

the feature map can be from some lower_level.
the unet can go down all the way to 1x1 so that I can extract the background easily.
the sliding_window can be smaller so that only 3, 4 cells are in it (reducing N_BOX will be computationally convenient)
Unet can use dilation to deal with large images.....

to do october

geco parameters trajectory should save only the image not the chart since it does not work
run a simulation using cromwell
merge to master
make branch of master called experiment
start experimenting (remember to save the source code in neptune)
visualize chart comparison in neptune (learn). Coordinate plot

WHAT I AM LEARNING

Informative latent space (with clusters) is antithetical with a generator which works taking N(0,1) which needs structureless latent-space.
sigma should be chosen so that reconstruction is of order 1 (and therefore balanced with the rest of the term). A simple way to do it is: sigma2 = (x-x.mean()).pow(2).mean()
At that point, all lambda terms can be between 0 and 5
RECONSTRUCTION IS ALWATS ON. If in range do not change lambda. If out_of_range change lambda up or down. Lambda is clamped to [0.1, 10]
SPARSITY IS ALWAYS ON: If in range do nothing. If out_of_range change lambda up or down. Lambda is clamped to [-10, 10]. The negative part is to get out of the empty solution if necessary.
User should provide a fg_mask which can be easily obtained by Otsu or other thresholding methods

Overlap immediately pushes the fg_fraction to zero. That makes sense since at the beginning y_k < 0.5 and y_k(1-y_k) is minimized pushing all y_k to zero. Is there any incentives in learning non-overlapping instances (via KL) is there is no overlap?
I should reintroduce overlap as computed in terms of no-self interaction

READ PAPERS ABOUT HOW THEY DO DYNAMICAL REGULARIZATION
I COULD CROP THE FEATURE MAP AT THE LEVEL OF THE PGRID B/C THE POINT IS THAT THE INTERACTION IS DISCOVERED AT THE COARSER LEVEL (SIMILAR TO MASK-R_CNN)
power of methods would come from:
--> combining dots (like Baysor)
--> graph consensus
BACKGROUND LATENT CODE CAN BE 5x5. That way I can probably describe the spreading spreading I see in MERFISH

If reconstruction is in range do nothing. when parameters are in range I should not change them, i.e. change g = min(x-x_min, x_max -x) to g = min(x-x_min, x_max -x).clamp(max=0)
4. sparsity should always be on.

OLD:
3. in reconstruction is high it overcomes the sparsity term and the overlap term -> therefore lambda_rec need to multiply everything)

conditional dependence between z_what and z_mask

Right now in the generative model we draw z_what and z_mask independently from each other. This is clearly bad. We should have conditional dependence between z_what and z_mask.

IN THE PRIOR:

z_mask ~ N(0,1)
mu, sigma = MLP(z_mask)
z_what ~ N(mu,sigma)
In this way we achieve conditional dependence:
p_prior(z_what, z_mask) = p_prior(z_what | z_mask) p_prior(z_mask)

IN THE POSTERIOR

decode z_mask to mask
use mask to crop the raw image
encode the masked image in z_what
In this way we achieve p_posterior(z_what, z_maks) = p_posterior(z_what|z_maks)p_posterio(z_mask)

dataloader

https://discuss.pytorch.org/t/move-numpy-dataset-to-gpu-memory/55758
https://discuss.pytorch.org/t/how-to-use-collate-fn/27181/2
https://pytorch.org/docs/stable/_modules/torchvision/transforms/functional.html#five_crop

increase size of the cropped region

Right now the cropped region is 28x28 which might be too small for segmentation.
Maybe high resolution is necessary for good reconstruction (which we do not care about) but it is not necessary for segmentation (which we care about)

IDEA TO CHECKS

use higher resolution encoder/decoder (see commit: de168fa )
Crop directly the image instead of the feature map?

work in 3D

Asking the model to segment object starting from images is a crazy request.
It should not be possible.
It it only possible if you have a richer dataset, such as:

movie
same scene from different point of view
By the way, predicting one z-slice from a different z-slice might be a good a approach

This means that we need to work in 3D

generate large image?

Is there a way to generate a large image or should always work with small patches and glue them together? Large image is nicer b/c generated pattern would show realistic variation and no boundary effects

replace all RELU with LeakyReLU

to do tomorrow

Merge 60 into 55 and then 55 into master

USE THE DATALOADER I HAVE.
JUST SAVE ON CPU AND LOAD TO GPU EVERY BATCH
put the result of the tiling function on CPU if necessary

LOAD CKPT FROM HERE USE PRETRAINED TO OBTAIN GOOD SEGMENTATION
ld-results-bucket/merfish_june25_v7

graphclustering 65
#TODO: Compute median density of connected components so that resolution parameter is about 1
self.reference_density = AUCH

NAMEDTUPLE 151

#TODO: this might be too slow. Eliminate torch.bincount.
new_dict = self.params
new_dict["filter_by_size"] = (min_size, max_size)
new_membership = old_2_new[self.membership]
return self._replace(membership=new_membership, params=new_dict, sizes=torch.bincount(new_membership))

problem at TEST time

there is a problem at TEST time.
It is probably related to batch normalization.
I don't like BN since it has different behavior at train and test time.
Double check pytorch setting.....

new graphical model

implement new graphical model where:

in generator zwhat is conditioned on zmask

KL_logit

attach KL_logit to KL_total
Make sure that the scale of kl_logit is not too big
Should I learn the length scale of the gaussian kernel? Probably yes

memory footprint might be too large

put the result of the tiling function on CPU if necessary

Prior for the probability is wrong

I have observed that:

even after long training the probability map that is generated by my prior (with learnable parameters) is very different from the inferred probability map given the data.
this leads to the KL(posterior|prior) being very large
moreover the learnable parameters of the prior change very slowly.

For all these reasons, I know that my prior is wrong, i.e. it is not flexible enough to capture the data.

First approach:

have a more general kernel, K=k1+k2+k3
logit = GP(K)
p = sigmoid(a x + b)

Even in this approach I should decide whether to use straight-trough sigmoid or not???
The advanrage of this apporach is that KL between gaussian posterior and logit prior is analytically known.

Second approach -> DPP:

The inference gives me: p = sigmoid(x)
Do straight Bernoulli, i.e. c=0,1 but the gradient will see the probability.
The prior is DPP, i.e. use a general kernel, K=k1+k2+k3 which will be my covariance matrix
compute log_p by doing det(K(c))/det(K+I)
Probably at each time step you need to do a cholesky decomposition
use DPP. To compute the KL divergence I just need to compute log_P =

to do when coming back from vacation

I have to:

Increase encoder from 28x28 to 56x56
encode the background in a small latent variable
change factor_balance_range: [0.1, 0.8, 0.9]
Monitor the 3 different term in sparsity
Since I have sparsity (which constraint the number of pixels) maybe I can just sum the KL together without rescaling?!

the real deal is to use multi-objective optimization instead of GECO
test that encoder/decoder are powerfull enough. For that I can create a dataset of isolated cells

use Neptune
reduce channels after UNET to 2 (one should be the original image, the other something else)

image preprocessing

check:

put tiling function on CPU if memory is a problem

data loader from fast.ai

I need a dataloader from fast.ai with all the nice things like.
from_folder, data_augmentation, transform_y etc...

FIX MOVIES in MAIN

Fix movies. Do not use
import moviepy.editor as mpy
use
from matplotlib import animation

tiling to analyze large images

def tiling(img, crop_w, crop_h, stride_w, strid_h):
n_obj = 0
integer_segmentation_mask = torch.zeros_like(img)
vae.eval()
for crop in crops:
out = vae.forward(crop)
segmentation = out.inference.integer_segmentation_mask

For each crop it gets the integer_segmentation_mask which:

is censured outside the region which do not suffers of the boundary effect (how do I do this?)
shifted by the number of instances already found, i.e. mask = torch.where(mask > 0, mask+shift, 0)
pasted in the right place

poission-gaussian observation model

In fluorescent microscopy the observation model should be poisson-gaussian since the brighter a pixel the more photon the more poisson noise. Things get complicated b/c CCD cameras have offsets. Porbavbly the parameter of the observation model (constant term, linear term and offset) should be learned. Ask Mehrtash and Tianle

Unet from fast AI without boundary effect

Current UNET sucks!
Use pretrained unet similar to fast.ai in which arc is resnet34.
Add hooks for skipped connection and to attached the 3 heads
keep track of the region without boundary effect (i.e. do not do padding)

spacetx / spacetx-research Goto Github PK

spacetx-research's People

Contributors

Stargazers

Watchers

spacetx-research's Issues

Recommend Projects

Recommend Topics

Recommend Org