Giter Site home page Giter Site logo

histofl's Introduction

HistoFL

Federated Learning for Computational Pathology on Gigapixel Whole Slide Images

Medical Image Analysis

Read Link | Journal Link

© This code is made available for non-commercial academic purposes.

TL;DR: Code accompanying our paper for simulating weakly-supervised classification and survival prediction on multi-institutional histopathology WSI data using federated learning with differential privacy

Preprocess/Setup

Installation Guide for Linux (using anaconda)

Installation Guide

Data Preparation

We chose to encode each tissue patch with a 1024-dim feature vector using a truncated, pretrained ResNet50. For each WSI, these features are expected to be saved as matrices of torch tensors of size N x 1024, where N is the number of patches from each WSI (varies from slide to slide). The following folder structure is assumed:

DATA_ROOT_DIR/
    └──FEATURES_DIR/
         ├── h5_files
                ├── slide_1.h5
                ├── slide_2.h5
                └── ...
         └── pt_files
                ├── slide_1.pt
                ├── slide_2.pt
                └── ...

DATA_ROOT_DIR is the base directory of all datasets (e.g. the directory to your SSD). DATASET_DIR is the name of the folder containing data specific to one experiment and features from each slide is stored as .pt files.

Please refer to refer to CLAM for examples on how perform this feature extraction step.

Results by default are saved to ./results folder. Predictions for each sample in the test set are stored in .pkl files under the directory specified --exp_code (see below).

Classification

use main.py --h for help on list of possible commandline arguments See the definition of args.task in main.py for how to setup experiments with your own dataset (currently a dummy task is provided for illustration corresponding to the csv_file dataset_csv/classification_fl_dummy_dataset.csv and features stored under DATA_ROOT_DIR/classification_features_dir). See splits/classification_dummy for examples on the expected format of train/val/test splits for k-fold experiments.

Individual Site

Selecting data from a single institution for training can be done by specifying --inst_name NAME_OF_INSTITUTE. Example command for training on data from "institute_0" is as follows:

CUDA_VISIBLE_DEVICES=0 python main.py --k 5 --exp_code classification_institute_0_only --weighted_sample --task classification --inst_name institute_0 --data_root_dir DATA_ROOT_DIR --split_dir classification_dummy

No FL (Centralized)

Training on data on all institutions without federated learning is toggled on using the --no_fl flag.

CUDA_VISIBLE_DEVICES=0 python main.py --k 5 --exp_code classification_no_fl --weighted_sample --task classification --no_fl --data_root_dir DATA_ROOT_DIR --split_dir classification_dummy

FL with varying noise level (Gaussian Mechanism)

Training with federated averaging can be simulated by removing the --no_fl flag, as well as providing the level of noise (alpha parameter) via --noise_level. Example with alpha = 0.001 is as follows:

CUDA_VISIBLE_DEVICES=0,1 python main.py --k 5 --exp_code classification_noise_0.001 --weighted_sample --task classification --noise_level 0.001 --weighted_fl_avg --data_root_dir DATA_ROOT_DIR --split_dir classification_dummy

Survival Prediction

use main_surv.py --h for help on list of possible commandline arguments See the definition of args.task in main.py for how to setup experiments with your own dataset (currently a dummy task is provided for illustration corresponding to the csv_file dataset_csv/survival_fl_dummy_dataset.csv and features stored under DATA_ROOT_DIR/survival_features_dir). See splits/survival_dummy for examples on the expected format of train/val/test splits for k-fold experiments.

Individual sites

Selecting data from a single institution for training can be done by specifying --inst_name NAME_OF_INSTITUTE. Example command for training on data from "institute_0" is as follows:

CUDA_VISIBLE_DEVICES=0 python main_surv.py --k 5 --exp_code survival_institute_0_only  --task survival --inst_name institute_0 --data_root_dir DATA_ROOT_DIR --split_dir survival_dummy

No FL (Centralized)

Similar to classification, training on data on all institutions without federated learning is toggled on using the --no_fl flag.

CUDA_VISIBLE_DEVICES=0 python main_surv.py --k 5 --exp_code survival_no_fl --task survival --no_fl --data_root_dir DATA_ROOT_DIR --split_dir survival_dummy

FL with varying noise level (Gaussian Mechanism)

Training with federated averaging can be simulated by removing the --no_fl flag, as well as providing the level of noise (alpha parameter) via --noise_level. Example with alpha = 0.001 is as follows:

CUDA_VISIBLE_DEVICES=0,1 python main_surv.py --k 5 --exp_code survival_noise_0.001 --task survival --noise_level 0.001 --weighted_fl_avg --data_root_dir DATA_ROOT_DIR --split_dir survival_dummy

License

© Mahmood Lab - This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

Reference

If you find our work useful in your research or if you use parts of this code please consider citing our paper:

@article{lu2020federated,
  title={Federated Learning for Computational Pathology on Gigapixel Whole Slide Images},
  author={Lu, Ming Y and Kong, Dehan and Lipkova, Jana and Chen, Richard J and Singh, Rajendra and Williamsona, Drew FK and Chena, Tiffany Y and Mahmood, Faisal},
  journal={arXiv preprint arXiv:2009.10190},
  year={2020}
}

histofl's People

Contributors

faisalml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

histofl's Issues

Pre-trained model

Hi,
Do you have any pre-trained model for this predictive model?
If so, could you please share that with me?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.