Giter Site home page Giter Site logo

slives-lab / breast-cancer-detection-mammogram-deep-learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from adamouization/breast-cancer-detection-mammogram-deep-learning

0.0 0.0 0.0 21.6 MB

Master's dissertation for breast cancer detection in mammograms using deep learning techniques. Contains source code and report used.

Home Page: http://doi.org/10.5281/zenodo.3985051

License: BSD 2-Clause "Simplified" License

Shell 0.41% Python 31.44% Perl 5.77% TeX 62.10% Makefile 0.27%

breast-cancer-detection-mammogram-deep-learning's Introduction

Breast Cancer Detection in Mammograms Using Deep Learning Techniques DOI GitHub license

Dissertation for the MSc Artificial Intelligence at the University of St Andrews (2020).

The final report can be read here: Breast Cancer Detection in Mammograms using Deep Learning Techniques, Adam Jaamour (2020)

Abstract

The objective of this dissertation is to explore various deep learning techniques that can be used to implement a system which learns how to detect instances of breast cancer in mammograms. Nowadays, breast cancer claims 11,400 lives on average every year in the UK, making it one of the deadliest diseases. Mammography is the gold standard for detecting early signs of breast cancer, which can help cure the disease during its early stages. However, incorrect mammography diagnoses are common and may harm patients through unnecessary treatments and operations (or a lack of treatments). Therefore, systems that can learn to detect breast cancer on their own could help reduce the number of incorrect interpretations and missed cases.

Convolution Neural Networks (CNNs) are used as part of a deep learning pipeline initially developed in a group and further extended individually. A bag-of-tricks approach is followed to analyse the effects on performance and efficiency using diverse deep learning techniques such as different architectures (VGG19, ResNet50, InceptionV3, DenseNet121, MobileNetV2), class weights, input sizes, amounts of transfer learning, and types of mammograms.

CNN Model

Ultimately, 67.08% accuracy is achieved on the CBIS-DDSM dataset by transfer learning pre-trained ImagetNet weights to a MobileNetV2 architecture and pre-trained weights from a binary version of the mini-MIAS dataset to the fully connected layers of the model. Furthermore, using class weights to fight the problem of imbalanced datasets and splitting CBIS-DDSM samples between masses and calcifications also increases the overall accuracy. Other techniques tested such as data augmentation and larger image sizes do not yield increased accuracies, while the mini-MIAS dataset proves to be too small for any meaningful results using deep learning techniques. These results are compared with other papers using the CBIS-DDSM and mini-MIAS datasets, and with the baseline set during the implementation of a deep learning pipeline developed as a group.

Usage on a GPU lab machine

Clone the repository:

cd ~/Projects
git clone https://github.com/Adamouization/Breast-Cancer-Detection-Code

Create a repository that will be used to install Tensorflow 2 with CUDA 10 for Python and activate the virtual environment for GPU usage:

cd libraries/tf2
tar xvzf tensorflow2-cuda-10-1-e5bd53b3b5e6.tar.gz
sh build.sh

Activate the virtual environment:

source /cs/scratch/<username>/tf2/venv/bin/activate

Create outputand save_models directories to store the results:

mkdir output
mkdir saved_models

cd into the src directory and run the code:

main.py [-h] -d DATASET [-mt MAMMOGRAMTYPE] -m MODEL [-r RUNMODE] [-lr LEARNING_RATE] [-b BATCHSIZE] [-e1 MAX_EPOCH_FROZEN] [-e2 MAX_EPOCH_UNFROZEN] [-roi] [-v] [-n NAME]

where:

  • -h is a flag for help on how to run the code.
  • DATASET is the dataset to use. Must be either mini-MIAS, mini-MIAS-binary or CBIS-DDMS. Defaults to CBIS-DDMS.
  • MAMMOGRAMTYPE is the type of mammograms to use. Can be either calc, mass or all. Defaults to all.
  • MODEL is the model to use. Must be either VGG-common, VGG, ResNet, Inception, DenseNet, MobileNet or CNN.
  • RUNMODE is the mode to run in (train or test). Default value is train.
  • LEARNING_RATE is the optimiser's initial learning rate when training the model during the first training phase (frozen layers). Defaults to 0.001. Must be a positive float.
  • BATCHSIZE is the batch size to use when training the model. Defaults to 2. Must be a positive integer.
  • MAX_EPOCH_FROZEN is the maximum number of epochs in the first training phrase (with frozen layers). Defaults to 100.
  • MAX_EPOCH_UNFROZENis the maximum number of epochs in the second training phrase (with unfrozen layers). Defaults to 50.
  • -roi is a flag to use versions of the images cropped around the ROI. Only usable with mini-MIAS dataset. Defaults to False.
  • -v is a flag controlling verbose mode, which prints additional statements for debugging purposes.
  • NAME is name of the experiment being tested (used for saving plots and model weights). Defaults to an empty string.

Dataset installation

mini-MIAS dataset

  • This example will use the mini-MIAS dataset. After cloning the project, travel to the data/mini-MIAS directory (there should be 3 files in it).

  • Create images_original and images_processed directories in this directory:

cd data/mini-MIAS/
mkdir images_original
mkdir images_processed
  • Move to the images_original directory and download the raw un-processed images:
cd images_original
wget http://peipa.essex.ac.uk/pix/mias/all-mias.tar.gz
  • Unzip the dataset then delete all non-image files:
tar xvzf all-mias.tar.gz
rm -rf *.txt 
rm -rf README 
  • Move back up one level and move to the images_processed directory. Create 3 new directories there (benign_cases, malignant_cases and normal_cases):
cd ../images_processed
mkdir benign_cases
mkdir malignant_cases
mkdir normal_cases
  • Now run the python script for processing the dataset and render it usable with Tensorflow and Keras:
python3 ../../../src/dataset_processing_scripts/mini-MIAS-initial-pre-processing.py

DDSM and CBIS-DDSM datasets

These datasets are very large (exceeding 160GB) and more complex than the mini-MIAS dataset to use. They were downloaded by the University of St Andrews School of Computer Science computing officers onto \textit{BigTMP}, a 15TB filesystem that is mounted on the Centos 7 computer lab clients with NVIDIA GPUsusually used for storing large working data sets. Therefore, the download process of these datasets will not be covered in these instructions.\

The generated CSV files to use these datasets can be found in the /data/CBIS-DDSM directory, but the mammograms will have to be downloaded separately. The DDSM dataset can be downloaded here, while the CBIS-DDSM dataset can be downloaded here.

License

Code Authors

  • Adam Jaamour
  • Ashay Patel
  • Shuen-Jen Chen

The common pipeline can be found at DOI 10.5281/zenodo.3975092

Contact

breast-cancer-detection-mammogram-deep-learning's People

Contributors

adamouization avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.