Breast-cancer-detection

This is a practical case study of logistic regression. Logistic regression has many applications in data science, but in the world of healthcare, it can really drive life-changing action. In this case study, we apply a logistic regrression model on a real-world dataset and predict whether the tumor is benign(not breast cancer) or malignant(breast cancer) based off its characteristics.

Read the article here.

The different independent variables in the dataset are

Clump thickiness.
Uniformity of cell size.
Uniformity of cell shape.
Marginal adhesion.
Single epithelial cell.
Bares Nuclei.
Bland chromatin.
Normal nucleoli.
Mitoses.

Installation

pip install numpy
pip install matplotlib
pip install pandas

About Dataset

The dataset used for this case study was extracted from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29

Attribute Information

Sample code number: id number
Clump Thickness: 1 - 10
Uniformity of Cell Size: 1 - 10
Uniformity of Cell Shape: 1 - 10
Marginal Adhesion: 1 - 10
Single Epithelial Cell Size: 1 - 10
Bare Nuclei: 1 - 10
Bland Chromatin: 1 - 10
Normal Nucleoli: 1 - 10
Mitoses: 1 - 10
Class: (2 for benign, 4 for malignant)

Steps Involved

Importing the libraries.
Importing the dataset.
Splitting the dataset into training and testing sets.
Training the logistic regression model on the training set.
Predicting the test result.
Creating the confusion matrix.
Calculating the accuracy with k-fold cross validation.

Observation

The result of the confusion matrix are as follows:

84	3
3	47

Where,
True positive = 84
False negative = 3
True negative = 47
False positve = 3

The overall accuaracy of the classification model using k-fold validation was observed to be 96.70 %.

The standard deviation was observed to be 1.97%.

Note* : The dataset filename is "breast_cancer.csv".

maskey71098 / breast-cancer-detection Goto Github PK