Giter Site home page Giter Site logo

eidoslab / torchstain Goto Github PK

View Code? Open in Web Editor NEW
103.0 5.0 20.0 2.54 MB

Stain normalization tools for histological analysis and computational pathology

License: MIT License

Python 100.00%
pytorch histopathology stain-normalization computational-pathology medical-imaging python digital-pathology numpy tensorflow

torchstain's People

Contributors

andreped avatar carloalbertobarbano avatar raphaelattias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

torchstain's Issues

Update About and tags

Before release it would be great if we could update the About and tags on the right-hand side in the repo. As I don't own the repo, it would be great if someone would address this.

I would have a similar description as the one described in the README (to cover all backends). I would also include the tags tensorflow, numpy, and macenko. Any other natural tags we should add? digital-pathology is a very common alternative to computational-pathology. Perhaps add that as well.

Feature requests

I had some ideas of feature requests which could greatly improve the usefulness of your tool.

  1. Enable GPU mode (seems to only run on CPU as of now)
  2. Enable batch mode (seems to assume that the input is a single image)
  3. Add augmentation option, similar as done in StainTools

I would be more than happy to help or assist in any way.

Add stain augmentation alternative

This feature request was mentioned in issue #3. However, I believe it would be better to track it in a separate issue.

The idea is basically to introduce the functionality StainTools has for generating HE-like stain augmentations, as an alternative method for improving model robustness to stain normalization.

I can have a crack at it and make a PR when I have something working.

Conda compatibility?

As there are lots of developers using anaconda we should test and verify whether torchstain can be installed and used seemlessly with conda, as alternative to pip.

The simplest way of testing this is adding a new unit test, similar to the one we already made using pip.

Might not be cruicial to test for this release, but something to do in the future.

Support for Modified Reinhard

An improved Reinhard normalization technique was proposed in this paper:
https://ieeexplore.ieee.org/document/9616117

As the original Vahadane implementation is extremely slow compared to the already existing implementations, i.e., Reinhard and Macenko. It might be a good idea to look into improving some of the existing methods.

It does not seem that the modified reinhard is that much different from the original implementation. Hence, API-wise, it might be a good idea to add a method argument to Reinhard, to choose which implementation of Reinhard to use.

Releases not showing anymore?

On the right hand side, there should normally be a Releases tab.
However, after the last commits, I cannot see that it is there in the main repo anymore.
Has it been disabled by mistake in the settings maybe?

However, as I am contributing to this project, I was notified about the new release and given this link:
https://github.com/EIDOSLAB/torchstain/releases/tag/v1.2.0-stable

In my fork, the releases tab is still showing:
https://github.com/andreped/torchstain

Remember to add the generated wheel to the release tag, as well as update the Zenodo citation in the README:
https://github.com/EIDOSLAB/torchstain#citing

Multiple backends support

As we have implementations of the Macenko algorithm for torch, tf, and numpy backends, it would be a good idea to not have everything into a single binary, as it would mean that the user would need to install both tf and torch to use torchstain, which often collides with underlying deps.

As discussed here, a good solution would be to make it possible to: pip install torchstain[backend]

where one would select whether to install torchstain with tf or torch backends. Numpy can be included in both.

My idea is to make submodules within the torchstain module, and then use the extra_require option in setuptools to install specific submodules. I have started the implementation in a separate branch:
https://github.com/andreped/torchstain/tree/backends/torchstain

Some pieces remain, but when this works properly I believe we are ready to make a new release.

Support for Reinhard

Reinhard color normalization is a commonly used method which is much faster than macenko and vahadane (see here). A drawback is that it is not as suitable for normalizing high-resolution patches, where the alternative methods are far superior.

However, for normalizing low-resolution images, reinhard might be more suitable. Hence, it would have been beneficial to have support for it.

It is also a very simple method to implement. I can make an attempt.

Crash when calling MacenkoNormalizer.fit with tensorflow backend

Hi,
I am sorry if my question is trivial but I have trouble using this package with the tensorflow backend.
Using torchstain 1.2.0, I have no problem performing a Macenko normalization with numpy. But as I try with tensorflow, it crashes using normalizer.fit

target_path = '/XXX.jpg'
target = cv2.cvtColor(cv2.imread(target_path), cv2.COLOR_BGR2RGB)
tf_normalizer = torchstain.normalizers.MacenkoNormalizer(backend='tensorflow')  

The only thing that I am doing differently from the provided example is the tensor conversion of the numpy array.
That is, I am not doing this

T = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x*255)
])

But rather tried this to match the transformation:

target = tf.constant(target, dtype=tf.float32)  #convert to tensor
target = tf.transpose(target, perm=[2, 0, 1])  #channel first

tf_normalizer.fit(target)

Is this why it crashes ? Is there a way to run this without using torchvision.transforms/on a pure TF basis?

I am using Tensorflow_2.10.0 and have installed torchstain using pip install torchstain[tf].
I currently do not use nor have installed torchvision in my TF environment.

Thank you for your advice

Whole Slide Images

hi, I'm a new person in this field and I need this method to normalize the data I have because the data variation is too big. I work with WSI data but have limitations in device. I've tried several packages regarding stain normalization and all of them were killed due to memory limitations. the method works well on patch-based data but unfortunately I need it as a whole slide. any input will mean a lot to me. Thank You

Support for Jax

Excellent tool. We are likely to cite you in the future.

Would you consider building out a Jax backend?

Fails with `kthvalue(): Expected reduction dim 0 to have non-zero size.`

Emptyish image from slide crop fails to normalize and crashes.

Expected behavior:

Program does not crash on valid RGB images.

Steps to reproduce

# %% Import

import torch
from torchvision import transforms
import torchstain
import cv2
# %% Try transformation

target = cv2.cvtColor(cv2.imread("temp.png"), cv2.COLOR_BGR2RGB)

T = transforms.Compose([
    transforms.ToTensor(),
    transforms.Lambda(lambda x: x*255)
])

torch_normalizer = torchstain.MacenkoNormalizer(backend='torch')
torch_normalizer.fit(T(target))

Causes:
kthvalue(): Expected reduction dim 0 to have non-zero size.

Behavior is replicated in numpy implementation.

"temp.png" used.
temp

Enviroment Ubuntu 18.04

>>> python --version
Python 3.9.6
>>>pip freeze
numpy==1.21.2
opencv-python==4.5.3.56
Pillow==8.3.1
torch==1.9.0
torchstain==1.1.0
torchvision==0.10.0
typing-extensions==3.10.0.0

Typo in rgb2lab transform

Describe the bug
In THIS LINE from rgb2lab implementation, the value 166 should be 116.

Expected behavior
The line should be changed to arr.masked_scatter_(not_mask, 7.787 * torch.masked_select(arr, not_mask) + 16 / 116) to match numpy implementation (e.g. THIS one from skimage)

IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

Describe the bug
I use Macenko Normalizer as part of an augmentation pipeline in a PyTorch environment. It works fine almost everytime, but once in a while (especially when working with small images below 512x512), it will sometimes throw the error

IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

and I have no idea why.

To Reproduce
I honestly couldn't say how to reproduce the bug, it is quite very rare, but when it happens it crashes the training process. My intuition would be that it can happen more on small images where there isn't a lot of tissue (white space, black marks, etc), but I don't know for sure.

Expected behavior
It crashes and throws the above error.

Logs

Original Traceback (most recent call last):
  File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/travail/anaconda3/envs/placenta/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/travail/repos/plac/lib/wsi_analysis/wsi_dataset.py", line 60, in __getitem__
    img = self.transforms(img)
  File "/home/travail/repos/plac/lib/utils/simsiam.py", line 20, in __call__
    k = self.base_transform(x)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchvision/transforms/transforms.py", line 95, in __call__
    img = t(img)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/travail/repos/plac/lib/datasets/macenko.py", line 13, in forward
    x, _, _ = self.norm.normalize(x*255, stains=False)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 105, in normalize
    HE, C, maxC = self.__compute_matrices(I, Io, alpha, beta)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 68, in __compute_matrices
    HE = self.__find_HE(ODhat, eigvecs, alpha)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/normalizers/macenko.py", line 39, in __find_HE
    minPhi = percentile(phi, alpha)
  File "/home/travail/anaconda3/envs/pla/lib/python3.10/site-packages/torchstain/torch/utils/percentile.py", line 24, in percentile
    return t.view(-1).kthvalue(k).values
IndexError: kthvalue(): Expected reduction dim 0 to have non-zero size.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Linux Fedora
  • Version 36

Add Zenodo DOI

Following best practice, it is a good idea to make a release available on Zenodo. That way, if the GitHub repo becomes deleted or any other unforseen circumstances, people will be able to get access to the source code.

In addition, Zenodo is better for referencing purposes, as there will be a citation count. Otherwise, people will just add a URL of the repo (but which can also be fine, in some scenarios). However, as this is a tool lots of people are and will use in their research, having a separate Zenodo DOI is probably the best option, especially as this tool has not been published as part of a paper or similar.

The citation style looks like this (somewhat like any other paper):
https://github.com/andreped/GradientAccumulator#how-to-cite

And in some scenarios might also show up on Google Scholar.

However, as I am not the owner of this repo, I am unable to do so, so I suggest the maintainer(s) to look into this. Really fast to do, just remember to set the author list (with full names and not github usernames).

No GPU acceleration in torch backend

Describe the bug
A clear and concise description of what the bug is.
It seems like all parameters of the torch-based normalizers are hard coded as tensors on CPUs and therefore there are no GPU acceleration at all unless one explicitly move the target, input, and all stain vectors to GPU devices.
Might be an easier way to fix this by making your torch normalizer also extend from torch.nn.Modules like all torchvision's transformation API does, and use register_buffer to store parameters and the whole normalizer object can be moved to specified device as whole.

To Reproduce
Steps to reproduce the behavior. Ideally a gist but a simple code example is also fine.

Expected behavior
A clear and concise description of what you expected to happen.

Logs
If applicable, add sufficient logs to assist in debugging the problem.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. Ubuntu, Windows, macOS]
  • Version [e.g. 20.04, 10, Catalina]

Support for torch.linalg.lstsq

First time opening an issue, so I apologize if the approach is not correct.

I was getting an issue with pytorch 1.13, when torch.lstsq in line 52 of macenko.py for torch was replaced recently by torch.linalg.lstsq.

Naively changing torch.lstsq to torch.linalg.lstsq would throw an error at line 106.

The correct approach is to make the following change at line 52:

torch.lstsq(Y, HE)[0][:2] to torch.linalg.lstsq(HE, Y)[0]

I will do a Pull request to solve this issue, I hope this post can help!

Update runtimes

We should update the runtimes presented in the README to also cover TensorFlow (the table mentioned here).

I would keep the avg runtime and remove the total runtime (as the latter don't add any new information). Then have avg runtime for all backends: Numpy, PyTorch, and TensorFlow.

Add unit tests?

As we have support for different backends, it would probably be a good idea to have unit tests to verify that we achieve the same results using the three different backends: pytorch, numpy, and tensorflow.

This can easily be achieved using GitHub Actions, which I have lots of experience with (e.g., this project).
I could make an attempt tomorrow and create a PR when I have something working. Shouldn't take that much time.

Also I observed that lots of people have taken interest in the project. It would therefore be a good idea to maybe create a blog post or similar to further reach the target audience, as this project is extremely valuable for researchers withing computational pathology. Perhaps @carloalbertobarbano is interested in doing that?

Documentations?

Is your feature request related to a problem? Please describe.
I believe a lot of users would benefit of clearer documentations on which methods actually exist and how to use them. This is not so clear from the README, and there are numerous nuances in applications which we do not cover, and may result in end users having a hard time getting this to work.

Describe the solution you'd like
For a different python package I am developing, I created documentations and hosted it for free on ReadTheDocs. This is extremely easy to do. And I setup all the logic necessary for this to work.

However, in order for me to setup, test, and deploy the documentations, it would be nice if I had suitable permissions.
Alternatively, @carloalbertobarbano or someone would create a user at ReadTheDocs (can link to GitHub account), and link this repo to a ReadTheDocs project. That way deployment of docs should work seemlessly.

I can start making the PR now. I will add a docs/ directory the repo where all relevant information should lie. I will also use autodocs to build the API documentation, which will be parsed based on class/method descriptions in the code base.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.