Giter Site home page Giter Site logo

andreped / pathology-streaming-pipeline Goto Github PK

View Code? Open in Web Editor NEW

This project forked from diagnijmegen/pathology-streaming-pipeline

0.0 1.0 0.0 754 KB

Use streaming to train whole-slides images with single image-level labels, by reducing GPU memory requirements with 99%.

License: Apache License 2.0

C++ 1.59% Python 98.41%

pathology-streaming-pipeline's Introduction

Whole-slide classification pipeline โ€” end-to-end

This repository will give an overview on how to use streaming to train whole slides to single labels. Streaming is an implementation of convolutions using tiling and gradient checkpointing to save memory.

alt text

Papers until now about this method (please consider citing when using this code):

  • Application on prostate data, paper: H. Pinckaers, W. Bulten, J. Van der Laak and G. Litjens, "Detection of prostate cancer in whole-slide images through end-to-end training with image-level labels," in IEEE Transactions on Medical Imaging, doi: 10.1109/TMI.2021.3066295 - Open Access.

  • Methods paper: H. Pinckaers, B. van Ginneken and G. Litjens, "Streaming convolutional neural networks for end-to-end learning with multi-megapixel images," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2020.3019563 - older preprint

Other resources:

Requirements

Packages:

  • Install libvips
  • See requirements.txt, install via pip install -r requirements.txt
  • Pytorch (1.6+) for mixedprecision support
  • Make sure the repo is in your $PYTHONPATH

Hardware requirements:

  • GPU with 11 GB memory (smaller could work with smaller tile-sizes)
  • Preferably 32+ GB RAM (go for less workers when you have less memory available)

Windows users

  • Please see issues #2 and #3 for help with building the cpp extensions.

Network

For now, only the ResNet-34 implementation is checked. Other networks could be implemented (please make an issue, I can help).

Input sizes

Recommended image sizes (microscopy magnification):

  • 4096x4096 for spacing 4.0 (2.5x)
  • 8192x8192 for spacing 2.0 (5x)
  • 16384x16384 for spacing 1.0 (10x)

Steps

0. Prepare train.csv and val.csv

For this pipeline you will need two csv files: train and val.csv. The syntax is easy:

slide_id,label
TRAIN_1,1
TRAIN_2,1
...

1. Prepare data

python streaming/trim_tissue.py \
    --csv='' \
    --slide-dir='' \
    --filetype='tif' \
    --save-dir='' \
    --output-spacing=1.0

2. Train network!

python streaming/train.py \
    --name=test-name \
    --train_csv='train.csv' \
    --val_csv='val.csv' \
    --data_dir='/local/data' \
    --save_dir='/home/user/models' \
    --lr=2e-4 \
    --num_workers=1 \
    --tile_size=5120

3. Options

There are quite some options (disable boolean options by prepending with no_, so e.g., no_mixedprecision):

Required options Description
name: str The name of the current experiment, used for saving checkpoints.
num_classes: int The number of classes in the task.
train_csv: str The filenames (without extension) and labels of train set.
val_csv: str The filenames (without extension) and labels of validation or test set.
data_dir: str The directory where the images reside.
save_dir: str Where to save the checkpoints.
Optional options
filetype: str default: '.jpg'. The file-extension of the images.
Train options
lr: float default: 1e-4 . Learning rate.
batch_size: int default: 16. Effective mini-batch size.
pretrained: bool default: True. Whether to use ImageNet weights.
image_size: int default: 16384. Effective input size of the network.
tile_size: int default: 5120. The input/tile size of the streaming-part of the network.
epochs: int default: 50. How many epochs to train.
multilabel: bool default: False.
regression: bool default: False.
Validation options
validation: bool default: True. Whether to run on validation set.
validation_interval: int default: 1. How many times to run on validation set, after n train epochs.
epoch_multiply: int default: 1. This will increase the size of one train epoch by reusing train images.
Increase speed
mixedprecision: bool default: True. Paper is trained with full precision, but this is faster.
variable_input_shapes: bool default: False. When the images vary a lot with size, this helps with speed.
normalize_on_gpu: bool default: True. Helps with RAM usage of dataloaders.
num_workers: int default: 2. Number of dataloader workers.
convert_to_vips: bool default: False.
Model options
resnet: bool default: True. Only resnet is tested so far.
mobilenet: bool default: False. Experimental.
train_all_layers: bool default: False. Whether to finetune whole network, or only last block.
Save and logging options
resuming: bool default: True. Will restart from the last checkpoint with same experiment-name.
resume_name: str default: ''. Restart from another experiment with this name.
resume_epoch: int default: -1. Restart from specific epoch.
save: bool default: True. Save checkpoints.
progressbar: bool default: True. Show the progressbar.
Evaluation options
weight_averaging: bool default: False. Average weights over 5 epochs around picked epoch.
only_eval: bool default: False. Only do one evaluation epoch.
Obscure train options
gather_batch_on_one_gpu: bool default: False.
accumulate_batch: int default: -1. Do not touch, is calculated automatically.
weighted_sampler: bool default: False. Oversample minority class, only works in binary tasks.
train_set_size: int default: -1. Sometimes you want to test on smaller train-set you can limit number here.
train_streaming_layers: bool default: True. Whether to backpropagate the streaming-part of network.

pathology-streaming-pipeline's People

Contributors

hanspinckaers avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.