Giter Site home page Giter Site logo

breast-cancer-classification's Introduction

Breast Cancer Classification

Breast cancer classification using machine learning techniques has become an essential area of research for improving early detection and diagnosis. This project focuses on developing a reliable model that can accurately differentiate between malignant and benign breast tumors.

In this breast cancer classification project, a Support Vector Machine (SVM) algorithm was employed to develop an accurate model for distinguishing between malignant and benign breast tumors.

The dataset used comprises 569 cases, with 212 cases labeled as malignant and 357 cases labeled as benign. The dataset consists of 30 features, including mean radius, mean texture, mean perimeter, mean area, mean smoothness, mean compactness, mean concavity, and mean concave points, among others.

Exploratory Data Analysis

mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst symmetry worst fractal dimension target
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656 0.7119 0.2654 0.4601 0.11890 0.0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866 0.2416 0.1860 0.2750 0.08902 0.0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245 0.4504 0.2430 0.3613 0.08758 0.0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 ... 26.50 98.87 567.7 0.2098 0.8663 0.6869 0.2575 0.6638 0.17300 0.0
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 ... 16.67 152.20 1575.0 0.1374 0.2050 0.4000 0.1625 0.2364 0.07678 0.0

5 rows ร— 31 columns

Data Visualization

png

0 - indicates Malignant --> The life threatning case

1 - indicates Benign

Some observation :

  • Looking at the distributions for mean radius, mean area and mean perimeter. We see that Malignant cases tend to be larger than Benign.

  • Looking at mean texture, we see that Melignant have a higher mean texture compared with Benign.

  • Benign have higher mean smoothness than Melignant

Case count

png

Some observation:

  • More Benign cases than Malignent in the dataset.

Correlation

png

Model Training

  • Define Matrix of Features X and target y

  • Split into testing and training data

  • Fit the SVM model

# Matrix of features X and target y
X = df.drop(["target"], axis = 1)
y = df["target"]


from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state=5)


from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix


model1 = SVC()

model1.fit(X_train, y_train)

Model Evaluation

Confusion Matrix

png

Looking at these results:

  • We have 0 type II errors. I.e, so the model did not give any False Negatives.
  • 7 type I errors. The model gave 7 False Positive.
  • When a cell was malignant, the model correcrly 41. And When the cell was benign, the model correcrlt identified 66.

Improving Model

  1. Feature scaling (Uniity Based Normalization).

  2. Grid Search. SVM parameters optimization:

    • C parameter : Controll the trade of between classyfying and having a smooth decision bounary.
      • Small C (loose) : Makes cost of misclassification low (soft margin).
      • Large C (strict) : Makes cost of misclassification high, forcing model to explain input data stricter potentially over fitting.
    • Gamma parameter : Controls how far the influence of a single training set reaches.
      • Large Gamma : Close reach (closer data points have more weight)

      • Small Gamna : Far reach (more generalized solution)

1. Results After Feature Scaling

png

precision recall f1-score support
0.0 1.00 0.92 0.96 48
1.0 0.94 1.00 0.97 66
accuracy 0.96 114
macro avg 0.97 0.96 0.96 114
weighted avg 0.97 0.96 0.96 114

2. Results After Grid Search

png

precision recall f1-score support
0.0 1.00 0.92 0.96 48
1.0 0.94 1.00 0.97 66
accuracy 0.96 114
macro avg 0.97 0.96 0.96 114
weighted avg 0.97 0.96 0.96 114

In this case, the grid parameter optimization seems to have not affected the model.

We only have 4 type I errors and 0 type II errors.

Conclusion

  • Built a model that can classify between Benign and Malignant.
  • Model had an precision of 97%. Only 4 type I errors and 0 type II errors. There is still room for improvement.

What I have learned :

  • How to implement Support Vector Machine Classifier
  • Feature Scaling
  • Grid Search for parameter optimization

breast-cancer-classification's People

Contributors

adilsaid64 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.