spacetx / spacetx-research Goto Github PK
View Code? Open in Web Editor NEWContains research projects related to image-based transcriptomics assays
Contains research projects related to image-based transcriptomics assays
possible issues:
Sparsity is in range but BB are not tight.....
Maybe when increasing KL I should also increase SPARSITY
Maybe sparsity should be multiplied by balance AND sparsity term.....
right now big_mask and big_imgs have the spatial extension of the original image and an extra dimension for the number of instances. This will kill the memory for large images containing many cells. Is there a way to get away with this? It is interesting to note that the in the only the combination sum_j p_j m_j then I can drastically reduce the memory usage. What about imgs? Is the same thing true, i.e. sum_j p_j img_j?
You should be able to specify the complexity of the background and/or whether to use it or not.
Sometimes the background is so complex that there is no signal for the cells
need to adjust initial value of length_scale_similarity based on the intercellular distance
need to initialize lambda (parameter used to keep running average of kl_logit) in some reasonable way
use only adaptive_avg_pool_2D or adaptive_max_pool_2D
If few images are partially annotated then you can do supervised learning.
I would think that given an integer_mask_annotation:
Note:
For most images there would not be any annotation, even the annotation is present it is only partial. Therefore the code need to be written in such a way that this labelled loss defaults to zero in most cases.
Tommaso Biancalani and Alma Andersson check VIsium Data
AGENDA (5/8/20)
what is the range of allowed value fo the GECO hyper-parameters (0; + infinity) ?
change of hyper-parameters should be proportional to the distance to the target
make encoder/decoder of size 28x28, 56x56, etc.
the architecture can be copied from https://github.com/AntixK/PyTorch-VAE/blob/master/models/vanilla_vae.py
the feature map can be from some lower_level.
the unet can go down all the way to 1x1 so that I can extract the background easily.
the sliding_window can be smaller so that only 3, 4 cells are in it (reducing N_BOX will be computationally convenient)
Unet can use dilation to deal with large images.....
Informative latent space (with clusters) is antithetical with a generator which works taking N(0,1) which needs structureless latent-space.
sigma should be chosen so that reconstruction is of order 1 (and therefore balanced with the rest of the term). A simple way to do it is: sigma2 = (x-x.mean()).pow(2).mean()
At that point, all lambda terms can be between 0 and 5
RECONSTRUCTION IS ALWATS ON. If in range do not change lambda. If out_of_range change lambda up or down. Lambda is clamped to [0.1, 10]
SPARSITY IS ALWAYS ON: If in range do nothing. If out_of_range change lambda up or down. Lambda is clamped to [-10, 10]. The negative part is to get out of the empty solution if necessary.
User should provide a fg_mask which can be easily obtained by Otsu or other thresholding methods
Overlap immediately pushes the fg_fraction to zero. That makes sense since at the beginning y_k < 0.5 and y_k(1-y_k) is minimized pushing all y_k to zero. Is there any incentives in learning non-overlapping instances (via KL) is there is no overlap?
I should reintroduce overlap as computed in terms of no-self interaction
If reconstruction is in range do nothing. when parameters are in range I should not change them, i.e. change g = min(x-x_min, x_max -x) to g = min(x-x_min, x_max -x).clamp(max=0)
4. sparsity should always be on.
OLD:
3. in reconstruction is high it overcomes the sparsity term and the overlap term -> therefore lambda_rec need to multiply everything)
Right now in the generative model we draw z_what and z_mask independently from each other. This is clearly bad. We should have conditional dependence between z_what and z_mask.
IN THE PRIOR:
IN THE POSTERIOR
Right now the cropped region is 28x28 which might be too small for segmentation.
Maybe high resolution is necessary for good reconstruction (which we do not care about) but it is not necessary for segmentation (which we care about)
Asking the model to segment object starting from images is a crazy request.
It should not be possible.
It it only possible if you have a richer dataset, such as:
This means that we need to work in 3D
Is there a way to generate a large image or should always work with small patches and glue them together? Large image is nicer b/c generated pattern would show realistic variation and no boundary effects
Merge 60 into 55 and then 55 into master
USE THE DATALOADER I HAVE.
JUST SAVE ON CPU AND LOAD TO GPU EVERY BATCH
put the result of the tiling function on CPU if necessary
LOAD CKPT FROM HERE USE PRETRAINED TO OBTAIN GOOD SEGMENTATION
ld-results-bucket/merfish_june25_v7
graphclustering 65
#TODO: Compute median density of connected components so that resolution parameter is about 1
self.reference_density = AUCH
NAMEDTUPLE 151
#TODO: this might be too slow. Eliminate torch.bincount.
new_dict = self.params
new_dict["filter_by_size"] = (min_size, max_size)
new_membership = old_2_new[self.membership]
return self._replace(membership=new_membership, params=new_dict, sizes=torch.bincount(new_membership))
implement new graphical model where:
attach KL_logit to KL_total
Make sure that the scale of kl_logit is not too big
Should I learn the length scale of the gaussian kernel? Probably yes
put the result of the tiling function on CPU if necessary
I have observed that:
For all these reasons, I know that my prior is wrong, i.e. it is not flexible enough to capture the data.
First approach:
Even in this approach I should decide whether to use straight-trough sigmoid or not???
The advanrage of this apporach is that KL between gaussian posterior and logit prior is analytically known.
Second approach -> DPP:
The inference gives me: p = sigmoid(x)
Do straight Bernoulli, i.e. c=0,1 but the gradient will see the probability.
The prior is DPP, i.e. use a general kernel, K=k1+k2+k3 which will be my covariance matrix
compute log_p by doing det(K(c))/det(K+I)
Probably at each time step you need to do a cholesky decomposition
use DPP. To compute the KL divergence I just need to compute log_P =
I have to:
I need a dataloader from fast.ai with all the nice things like.
from_folder, data_augmentation, transform_y etc...
Fix movies. Do not use
import moviepy.editor as mpy
use
from matplotlib import animation
def tiling(img, crop_w, crop_h, stride_w, strid_h):
n_obj = 0
integer_segmentation_mask = torch.zeros_like(img)
vae.eval()
for crop in crops:
out = vae.forward(crop)
segmentation = out.inference.integer_segmentation_mask
For each crop it gets the integer_segmentation_mask which:
In fluorescent microscopy the observation model should be poisson-gaussian since the brighter a pixel the more photon the more poisson noise. Things get complicated b/c CCD cameras have offsets. Porbavbly the parameter of the observation model (constant term, linear term and offset) should be learned. Ask Mehrtash and Tianle
Current UNET sucks!
Use pretrained unet similar to fast.ai in which arc is resnet34.
Add hooks for skipped connection and to attached the 3 heads
keep track of the region without boundary effect (i.e. do not do padding)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.