eidoslab / torchstain Goto Github PK
View Code? Open in Web Editor NEWStain normalization tools for histological analysis and computational pathology
License: MIT License
Stain normalization tools for histological analysis and computational pathology
License: MIT License
Before release it would be great if we could update the About
and tags
on the right-hand side in the repo. As I don't own the repo, it would be great if someone would address this.
I would have a similar description as the one described in the README (to cover all backends). I would also include the tags tensorflow
, numpy
, and macenko
. Any other natural tags we should add? digital-pathology
is a very common alternative to computational-pathology
. Perhaps add that as well.
I had some ideas of feature requests which could greatly improve the usefulness of your tool.
I would be more than happy to help or assist in any way.
This feature request was mentioned in issue #3. However, I believe it would be better to track it in a separate issue.
The idea is basically to introduce the functionality StainTools has for generating HE-like stain augmentations, as an alternative method for improving model robustness to stain normalization.
I can have a crack at it and make a PR when I have something working.
As there are lots of developers using anaconda we should test and verify whether torchstain can be installed and used seemlessly with conda
, as alternative to pip
.
The simplest way of testing this is adding a new unit test, similar to the one we already made using pip
.
Might not be cruicial to test for this release, but something to do in the future.
An improved Reinhard normalization technique was proposed in this paper:
https://ieeexplore.ieee.org/document/9616117
As the original Vahadane implementation is extremely slow compared to the already existing implementations, i.e., Reinhard and Macenko. It might be a good idea to look into improving some of the existing methods.
It does not seem that the modified reinhard is that much different from the original implementation. Hence, API-wise, it might be a good idea to add a method
argument to Reinhard, to choose which implementation of Reinhard to use.
On the right hand side, there should normally be a Releases tab.
However, after the last commits, I cannot see that it is there in the main repo anymore.
Has it been disabled by mistake in the settings maybe?
However, as I am contributing to this project, I was notified about the new release and given this link:
https://github.com/EIDOSLAB/torchstain/releases/tag/v1.2.0-stable
In my fork, the releases tab is still showing:
https://github.com/andreped/torchstain
Remember to add the generated wheel to the release tag, as well as update the Zenodo citation in the README:
https://github.com/EIDOSLAB/torchstain#citing
As we have implementations of the Macenko algorithm for torch, tf, and numpy backends, it would be a good idea to not have everything into a single binary, as it would mean that the user would need to install both tf and torch to use torchstain, which often collides with underlying deps.
As discussed here, a good solution would be to make it possible to: pip install torchstain[backend]
where one would select whether to install torchstain with tf or torch backends. Numpy can be included in both.
My idea is to make submodules within the torchstain module, and then use the extra_require
option in setuptools to install specific submodules. I have started the implementation in a separate branch:
https://github.com/andreped/torchstain/tree/backends/torchstain
Some pieces remain, but when this works properly I believe we are ready to make a new release.
Reinhard color normalization is a commonly used method which is much faster than macenko and vahadane (see here). A drawback is that it is not as suitable for normalizing high-resolution patches, where the alternative methods are far superior.
However, for normalizing low-resolution images, reinhard might be more suitable. Hence, it would have been beneficial to have support for it.
It is also a very simple method to implement. I can make an attempt.
Hi,
I am sorry if my question is trivial but I have trouble using this package with the tensorflow backend.
Using torchstain 1.2.0, I have no problem performing a Macenko normalization with numpy. But as I try with tensorflow, it crashes using normalizer.fit
target_path = '/XXX.jpg'
target = cv2.cvtColor(cv2.imread(target_path), cv2.COLOR_BGR2RGB)
tf_normalizer = torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')
The only thing that I am doing differently from the provided example is the tensor conversion of the numpy array.
That is, I am not doing this
T = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x*255)
])
But rather tried this to match the transformation:
target = tf.constant(target, dtype=tf.float32) #convert to tensor
target = tf.transpose(target, perm=[2, 0, 1]) #channel first
tf_normalizer.fit(target)
Is this why it crashes ? Is there a way to run this without using torchvision.transforms/on a pure TF basis?
I am using Tensorflow_2.10.0 and have installed torchstain using pip install torchstain[tf].
I currently do not use nor have installed torchvision in my TF environment.
Thank you for your advice
hi, I'm a new person in this field and I need this method to normalize the data I have because the data variation is too big. I work with WSI data but have limitations in device. I've tried several packages regarding stain normalization and all of them were killed due to memory limitations. the method works well on patch-based data but unfortunately I need it as a whole slide. any input will mean a lot to me. Thank You
Add support for Vahadane normalization, as in https://github.com/Peter554/StainTools
Excellent tool. We are likely to cite you in the future.
Would you consider building out a Jax backend?
Emptyish image from slide crop fails to normalize and crashes.
Expected behavior:
Program does not crash on valid RGB images.
Steps to reproduce
# %% Import
import torch
from torchvision import transforms
import torchstain
import cv2
# %% Try transformation
target = cv2.cvtColor(cv2.imread("temp.png"), cv2.COLOR_BGR2RGB)
T = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x*255)
])
torch_normalizer = torchstain.MacenkoNormalizer(backend='torch')
torch_normalizer.fit(T(target))
Causes:
kthvalue(): Expected reduction dim 0 to have non-zero size.
Behavior is replicated in numpy implementation.
Enviroment Ubuntu 18.04
>>> python --version
Python 3.9.6
>>>pip freeze
numpy==1.21.2
opencv-python==4.5.3.56
Pillow==8.3.1
torch==1.9.0
torchstain==1.1.0
torchvision==0.10.0
typing-extensions==3.10.0.0
Describe the bug
I use Macenko Normalizer as part of an augmentation pipeline in a PyTorch environment. It works fine almost everytime, but once in a while (especially when working with small images below 512x512), it will sometimes throw the error
IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.
and I have no idea why.
To Reproduce
I honestly couldn't say how to reproduce the bug, it is quite very rare, but when it happens it crashes the training process. My intuition would be that it can happen more on small images where there isn't a lot of tissue (white space, black marks, etc), but I don't know for sure.
Expected behavior
It crashes and throws the above error.
Logs
Original Traceback (most recent call last):
File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
data = fetcher.fetch(index)
File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/travail/repos/plac/lib/wsi_analysis/wsi_dataset.py", line 60, in __getitem__
img = self.transforms(img)
File "/home/travail/repos/plac/lib/utils/simsiam.py", line 20, in __call__
k = self.base_transform(x)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 95, in __call__
img = t(img)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/travail/repos/plac/lib/datasets/macenko.py", line 13, in forward
x, _, _ = self.norm.normalize(x*255, stains=False)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 105, in normalize
HE, C, maxC = self.__compute_matrices(I, Io, alpha, beta)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 68, in __compute_matrices
HE = self.__find_HE(ODhat, eigvecs, alpha)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 39, in __find_HE
minPhi = percentile(phi, alpha)
File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/utils/percentile.py", line 24, in percentile
return t.view(-1).kthvalue(k).values
IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Following best practice, it is a good idea to make a release available on Zenodo. That way, if the GitHub repo becomes deleted or any other unforseen circumstances, people will be able to get access to the source code.
In addition, Zenodo is better for referencing purposes, as there will be a citation count. Otherwise, people will just add a URL of the repo (but which can also be fine, in some scenarios). However, as this is a tool lots of people are and will use in their research, having a separate Zenodo DOI is probably the best option, especially as this tool has not been published as part of a paper or similar.
The citation style looks like this (somewhat like any other paper):
https://github.com/andreped/GradientAccumulator#how-to-cite
And in some scenarios might also show up on Google Scholar.
However, as I am not the owner of this repo, I am unable to do so, so I suggest the maintainer(s) to look into this. Really fast to do, just remember to set the author list (with full names and not github usernames).
Describe the bug
A clear and concise description of what the bug is.
It seems like all parameters of the torch-based normalizers are hard coded as tensors on CPUs and therefore there are no GPU acceleration at all unless one explicitly move the target, input, and all stain vectors to GPU devices.
Might be an easier way to fix this by making your torch normalizer also extend from torch.nn.Modules like all torchvision's transformation API does, and use register_buffer
to store parameters and the whole normalizer object can be moved to specified device as whole.
To Reproduce
Steps to reproduce the behavior. Ideally a gist but a simple code example is also fine.
Expected behavior
A clear and concise description of what you expected to happen.
Logs
If applicable, add sufficient logs to assist in debugging the problem.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
First time opening an issue, so I apologize if the approach is not correct.
I was getting an issue with pytorch 1.13, when torch.lstsq in line 52 of macenko.py for torch was replaced recently by torch.linalg.lstsq.
Naively changing torch.lstsq to torch.linalg.lstsq would throw an error at line 106.
The correct approach is to make the following change at line 52:
torch.lstsq(Y, HE)[0][:2]
to torch.linalg.lstsq(HE, Y)[0]
I will do a Pull request to solve this issue, I hope this post can help!
We should update the runtimes presented in the README to also cover TensorFlow (the table mentioned here).
I would keep the avg runtime
and remove the total runtime
(as the latter don't add any new information). Then have avg runtime
for all backends: Numpy, PyTorch, and TensorFlow.
As we have support for different backends, it would probably be a good idea to have unit tests to verify that we achieve the same results using the three different backends: pytorch, numpy, and tensorflow.
This can easily be achieved using GitHub Actions, which I have lots of experience with (e.g., this project).
I could make an attempt tomorrow and create a PR when I have something working. Shouldn't take that much time.
Also I observed that lots of people have taken interest in the project. It would therefore be a good idea to maybe create a blog post or similar to further reach the target audience, as this project is extremely valuable for researchers withing computational pathology. Perhaps @carloalbertobarbano is interested in doing that?
Is your feature request related to a problem? Please describe.
I believe a lot of users would benefit of clearer documentations on which methods actually exist and how to use them. This is not so clear from the README, and there are numerous nuances in applications which we do not cover, and may result in end users having a hard time getting this to work.
Describe the solution you'd like
For a different python package I am developing, I created documentations and hosted it for free on ReadTheDocs. This is extremely easy to do. And I setup all the logic necessary for this to work.
However, in order for me to setup, test, and deploy the documentations, it would be nice if I had suitable permissions.
Alternatively, @carloalbertobarbano or someone would create a user at ReadTheDocs (can link to GitHub account), and link this repo to a ReadTheDocs project. That way deployment of docs should work seemlessly.
I can start making the PR now. I will add a docs/
directory the repo where all relevant information should lie. I will also use autodocs to build the API documentation, which will be parsed based on class/method descriptions in the code base.
Test not passing for tf.utils.percentile https://github.com/EIDOSLAB/torchstain/runs/7728799452
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.