Giter Site home page Giter Site logo

vita-group / longtailcxr Goto Github PK

View Code? Open in Web Editor NEW
27.0 10.0 3.0 5.56 MB

[DALI 2022] "Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study" by Gregory Holste, Song Wang, Ziyu Jiang, Thomas C. Shen, Ronald M. Summers, Yifan Peng, and Zhangyang Wang

Python 74.20% Shell 25.80%
chest-x-ray chest-xray long-tail long-tailed-recognition

longtailcxr's Introduction

Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study

Gregory Holste, Song Wang, Ziyu Jiang, Thomas C. Shen, Ronald M. Summers, Yifan Peng, Zhangyang Wang
[Oral Presentation] MICCAI Workshop on Data Augmentation, Labelling, and Imperfections (DALI). 2022.

[Paper] | [arXiv] | [Oral Presentation]

Abstract

Imaging exams, such as chest radiography, will yield a small set of common findings and a much larger set of uncommon findings. While a trained radiologist can learn the visual presentation of rare conditions by studying a few representative examples, teaching a machine to learn from such a “long-tailed” distribution is much more difficult, as standard methods would be easily biased toward the most frequent classes. In this paper, we present a comprehensive benchmark study of the long-tailed learning problem in the specific domain of thorax diseases on chest X-rays. We focus on learning from naturally distributed chest X-ray data, optimizing classification accuracy over not only the common “head" classes, but also the rare yet critical “tail” classes. To accomplish this, we introduce a challenging new long-tailed chest X-ray benchmark to facilitate research on developing long-tailed learning methods for medical image classification. The benchmark consists of two chest X-ray datasets for 19- and 20-way thorax disease classification, containing classes with as many as 53,000 and as few as 7 labeled training images. We evaluate both standard and state-of-the-art long-tailed learning methods on this new benchmark, analyzing which aspects of these methods are most beneficial for long-tailed medical image classification and summarizing insights for future algorithm design. The datasets, trained models, and code are available at https://github.com/VITA-Group/LongTailCXR.


Results & Model Weights

All trained model weights are available below. In the following table, best results are bolded and second-best results are underlined. See paper for full results (bAcc = balanced accuracy).

Method NIH-CXR-LT bAcc MIMIC-CXR-LT bAcc NIH-CXR-LT Weights MIMIC-CXR-LT Weights
Softmax 0.115 0.169 link link
CB Softmax 0.269 0.227 link link
RW Softmax 0.260 0.211 link link
Focal Loss 0.122 0.172 link link
CB Focal Loss 0.232 0.191 link link
RW Focal Loss 0.197 0.239 link link
LDAM 0.178 0.165 link link
CB LDAM 0.235 0.225 link link
CB LDAM-DRW 0.281 0.267 link link
RW LDAM 0.279 0.243 link link
RW LDAM-DRW 0.289 0.275 link link
MixUp 0.118 0.176 link link
Balanced-MixUp 0.155 0.168 link link
Decoupling (cRT) 0.294 0.296 link link
Decoupling (tau-norm) 0.214 0.230 -- --

Data Access

Labels for the MIMIC-CXR-LT dataset presented in this paper can be found in the labels/ directory. Labels for NIH-CXR-LT can be found at https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/174256157515. For both datasets, there is one csv file for each data split ("train", "balanced-val", "test", and "balanced-test").


Usage

To reproduce the results presented in this paper...

  1. Register to download the MIMIC-CXR dataset from https://physionet.org/content/mimic-cxr/2.0.0/, and download the NIH ChestXRay14 dataset from https://nihcc.app.box.com/v/ChestXray-NIHCC/.
  2. Install prerequisite packages with Anaconda: conda env create -f lt_cxr.yml and conda activate lt_cxr.
  3. Run all MIMIC-CXR-LT experiments: bash run_mimic-cxr-lt_experiments.sh (first changing the --data_dir argument to your MIMIC-CXR path).
  4. Run all NIH-CXR-LT experiments: bash run_nih-cxr-lt_experiments.sh (first changing the --data_dir argument to your NIH ChestXRay14 path).

Citation

@inproceedings{holste2022long,
  title={Long-Tailed Classification of Thorax Diseases on Chest X-Ray: A New Benchmark Study},
  author={Holste, Gregory and Wang, Song and Jiang, Ziyu and Shen, Thomas C and Shih, George and Summers, Ronald M and Peng, Yifan and Wang, Zhangyang},
  booktitle={MICCAI Workshop on Data Augmentation, Labelling, and Imperfections},
  pages={22--32},
  year={2022},
  organization={Springer}
}

Contact

Feel free to contact me (Greg Holste) at [email protected] with any questions!

longtailcxr's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

longtailcxr's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.