Giter Site home page Giter Site logo

breastcancer_detection's Introduction

About

Breast cancer is the most common cancer in American women except for skin cancers and according to the World Cancer Research Fund there were over 2 million new cases in the year 2018. Breast cancer detection is a highly time-consuming task and it is dependent on the expertise of pathologists. With the rapid advancements of Machine Learning and Deep Learning, computer-aided diagnosis (CAD) can be a useful support to pathologists.

Dataset

Spanhol et al. published the Breast Cancer Histopathological Database (BreakHis) in 2016.

tissue_images

From the dataset's webpage:

The Breast Cancer Histopathological Image Classification (BreakHis) is composed of 9,109 microscopic images of breast tumor tissue collected from 82 patients using different magnifying factors (40X, 100X, 200X, and 400X). To date, it contains 2,480 benign and 5,429 malignant samples (700X460 pixels, 3-channel RGB, 8-bit depth in each channel, PNG format).

The dataset BreaKHis is divided into two main groups: benign tumors and malignant tumors. Histologically benign is a term referring to a lesion that does not match any criteria of malignancy – e.g., marked cellular atypia, mitosis, disruption of basement membranes, metastasize, etc. Normally, benign tumors are relatively “innocents”, presents slow growing and remains localized. Malignant tumor is a synonym for cancer: lesion can invade and destroy adjacent structures (locally invasive) and spread to distant sites (metastasize) to cause death.

The dataset is structured as follows:

Magnification Benign Malignant Total
40x 652 1,370 1,995
100x 644 1,437 2,081
200x 623 1,390 2,013
400x 588 1,232 1820
Total 2,480 5,429 7,909

To simplify things, I will only use the images with the 40x magnification in my experiments. That means we will try to train a classifier with only 1,995 images in total. Spanhol et al. (2016) achieved an accuracy of 83.8% on the 40x magnication images with their best classifier. Note that they wrote their paper in 2016, so let's see what accuracy we can achieve with recent Deep Learning developments like transfer learning or data augmentation!

Classification Model

Like in Spanhol et al.'s paper, we will split the dataset in a 70/30 split. That means we have 1397 training images and 598 validation images. We will use a pre-trained Convolution Neural Network (in this case resnet50) to use the benefits of transfer learning. We apply data augmentation as well, since slight rotation or flips of pathology images should be fine. Moreover, thanks to the efficient implementation of the fastai library (like the 1cycle policy of varied learning rates and other optimizations during training time), we can train a top classifier in just a few minutes.

Results

On the 40x magnification set, we achieve an accuracy of 93.31% which is almost 10% higher than in the original paper. Total training time was under 10 minutes.

Confusion matrix:

Confusion_Matrix

Especially in cancer diagnosis, the right measure/metric is of uttermost importance. Accuracy alone is not sufficient to tell if a screening model is good enough or not. Sensitivity (also known as recall) and specificity are two important metrics. According to the WHO:

Sensitivity: the effectiveness of a test in detecting a cancer in those who have the disease.

Specificity: the extent to which a test gives negative results in those that are free of the disease.

A screening test aims to be sure that as few as possible with the disease get through undetected (high sensitivity) and as few as possible without the disease are subject to further diagnostic tests (high specificity).

Our model output the following metrics:

Metric Score in %
Precision 95.3%
Sensitivity 94.8%
Specificity 90.2%
F1 95.0%

Since all of our metrics are about and above 90%, we can say that our classifier has high enough credibility.

Summary

We trained a breast cancer classification model on the "Breast Cancer Histopathological Database (BreakHis)" dataset. On the 40x magnification images (1397 training images and 598 validation images), we achieve an accuracy of 93.31% which is almost 10% higher than in the original paper from 2016. Our other metrics like precision, sensitivity or specificity are also in the 90% range which makes this classifier acceptable enough. Thanks to the efficient fastai libray, we also trained the whole model in less than 10 minutes in total which shows the rapid advancement in Deep Learning in the last years.

breastcancer_detection's People

Contributors

magicaltrap avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.