Giter Site home page Giter Site logo

leoll2 / medicalcnn Goto Github PK

View Code? Open in Web Editor NEW
16.0 16.0 4.0 9.14 MB

Abnormality detection in mammogram images using Deep Convolutional Neural Networks

License: MIT License

Jupyter Notebook 100.00%
cbis-ddsm-dataset convolutional-neural-networks keras machine-learning mammogram medical-imaging

medicalcnn's Introduction

MedicalCNN

Python|Keras|Made withJupyter

License: MIT DOI

This project aims to perform abnormality classification in mammography by means of Convolutional Neural Networks. The dataset of interest is the CBIS DDSM dataset. The mammogram images feature two kinds of breast abnormalities: mass and calcification, which can be either benign or malignant.
The classification task consists in distinguishing between the four cases:

  • Benign mass
  • Malignant mass
  • Benign calcification
  • Malignant calcification

A subtask is to just distinguish masses from calcifications.

The full detailed report is available here.

Left: example of mass Right: example of calcification


Repo structure

All the Jupyter notebooks used for the experiments are collected in the scripts folder.

Specifically:

  • Scratch_CNN_2_class: CNN built from scratch for the 2-categories classification task.
  • Scratch_CNN_4_class: CNN built from scratch for the 4-categories classification task.
  • Scratch_CNN_ben_mal: CNN build from scratch for benign-malignant classification.
  • VGG16_2_class: VGG16 with feature-extraction and fine-tuning for the 2-categories classification task.
  • VGG16_4_class: VGG16 with feature-extraction and fine-tuning for the 4-categories classification task.
  • Baseline_Dual_CNN: Dual CNN model exploiting images of nearby healthy tissue too.
  • Composite_4_class: Two parallel CNN models to decompose the 4-categories classification task.
  • Baseline_Siamese: Siamese CNN exploiting images of nearby healthy tissue too.
  • Ensemble_2_class: Ensemble of different CNN models for the 2-categories classification task.
  • Ensemble_4_class: Ensemble of different CNN models for the 4-categories classification task.

Extra:

  • LearningRate: Experiments tuning the learning rate for different optimizers.

You can download the dataset from Google Drive. All the scripts assume that the dataset zip file is located in the root of your Google Drive folder, but you can easily change it.


Experiments and results

I developed and tested many models for the 2-class and 4-class tasks.

The best model for the 2-class task obtained a 91.37% accuracy on the test set. The best model for the 4-class task obtained a 61.01% accuracy on the test set.

Comparing the results with those presented in many papers, the models achieved state-of-the-art accuracy [1][2][3].

[1] Neeraj Dhungel, Gustavo Carneiro, and Andrew P Bradley. “Automated mass
detection in mammograms using cascaded deep learning and random forests”.
In: 2015 international conference on digital image computing
[2] Dina A Ragab et al. “Breast cancer detection using deep convolutional neural
networks and support vector machines”. In: PeerJ 7 (2019)   
[3] Li Shen et al. “Deep learning to improve breast cancer detection on screening
mammography”. In: Scientific reports 9.1 (2019)  

See the report for full details.


Tools

The project was developed using the following technologies:

  • Python: scripting language
  • Keras: open-source library for experimentation with deep neural networks
  • Google Colab: free cloud-based Jupyter notebook environment by Google

About the dataset

The dataset of interest is the CBIS DDSM (Curated Breast Imaging Subset of Digital Database for Screening Mammography), a collection of mammography images by Lee et al. It is an updated version of the original DDSM dataset, where all the images have been segmented and labeled.

Rebecca Sawyer Lee, Francisco Gimenez, Assaf Hoogi , Daniel Rubin  (2016). Curated Breast Imaging Subset of DDSM [Dataset]. The Cancer Imaging Archive. DOI: 10.7937/K9/TCIA.2016.7O02S9CY

Credits

The author (Leonardo Lai) designed and performed all the experiments listed in the project.

If you want to cite this work, please use the following:

@software{leonardo_lai_2021_4700130,
  author       = {Leonardo Lai},
  title        = {leoll2/MedicalCNN: v1.0},
  month        = apr,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {1.0},
  doi          = {10.5281/zenodo.4700130},
  url          = {https://doi.org/10.5281/zenodo.4700130}
}

medicalcnn's People

Contributors

leoll2 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

medicalcnn's Issues

Train-Test Split & Class Distribution

Hello leoll2,
i hope you are fine and doing well, First i want to thank you for sharing with AI community your source code and various of experiments to grow more & more AI.
I am here to ask you some questions regrading your numpy tensor & png folder, I am working on benign vs malignant classification for that
first i looked into your png-images and found these results

split-count-split percentage
Train images : 2676 - 79.93%
Test Images : 672 - 20.07%
Total images: 3348

Then I looked into more for each class distribution

Train

BENIGN : 1568 - 58.59%
MALIGNANT : 1108 - 41.41%
Total: 2676

Test

BENIGN : 406 - 60.42%
MALIGNANT : 266 - 39.58%
Total: 672

as you can see train images = 2676 & Test Images = 672
But when i checked into your jupyter notebook
there
Train size: 2676 Test size: 336

Where other test images gone ?

Then i explored more and check your train & validation generators
train gen contains 17 batches and validation batches 5
so in train = 17128=2176
validation = 5
128=640
2176+640 = 2816 which is more than train images why ?

My last question is what criteria you used for image rescaleing.

Thank you & regards
Farhan Shahid

Number of Classes inside .zip file

Hi! First of all, I would like to say well done on the project. But, I do have a question. You are referring to 4-class classification task, but when I loaded data you have provided us a link to I saw 5 classes. My question is, are these classes organized as:

  • Normal/Background
  • Benign Mass
  • Malignant Mass
  • Benign Calcification
  • Malignant Calcification

Thanks in advance!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.