Giter Site home page Giter Site logo

datasciencevishal / breast_cancer_prediction Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 631 KB

Breast_Cancer_Prediction using XGBoostClassifier & AdaboostClassifier

Jupyter Notebook 100.00%
adaboost-classifier eda gridsearchcv hyperparameter-tuning xgboost-classifier

breast_cancer_prediction's Introduction

Breast Cancer Wisconsin (Diagnostic) Dataset Analysis and Classification

Introduction

The Breast Cancer Wisconsin (Diagnostic) dataset is a classic dataset used for machine learning classification tasks. It contains features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass, and the target variable indicates whether the mass is malignant or benign. This dataset is widely used for developing and testing classification algorithms in the field of medical diagnostics.

Dataset Overview

  • Source: The dataset is obtained from the load_breast_cancer function in the sklearn.datasets module.
  • Number of Instances: 569
  • Number of Attributes: 30 numeric, predictive attributes
  • Attribute Information: Features include measures of radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry, and fractal dimension, computed for each image.
  • Target Variable: The target variable indicates the class of the tumor:
    • WDBC-Malignant
    • WDBC-Benign
  • Missing Attribute Values: None
  • Class Distribution: 212 malignant, 357 benign

Problem Statement

The goal of this project is to develop a classification model to predict whether a breast mass is malignant or benign based on the provided features. The main objective is to accurately classify tumors to assist medical professionals in diagnosis and treatment decisions.

Libraries Used

  • NumPy: For numerical computations.
  • Pandas: For data manipulation and analysis.
  • XGBoost: For implementing the XGBoost classifier model.
  • Sklearn: For data preprocessing, model evaluation, and hyperparameter tuning.
  • Matplotlib and Seaborn: For data visualization.

Approach

  1. Data Loading: Load the breast cancer dataset using the load_breast_cancer function.
  2. Exploratory Data Analysis (EDA): Understand the distribution of features, visualize relationships between variables, and identify any patterns or trends.
  3. Data Preprocessing: Handle any missing values and perform feature scaling if required.
  4. Model Building: Implement the XGBoost classifier model for breast cancer classification.
  5. Hyperparameter Tuning: Fine-tune model parameters using GridSearchCV to optimize model performance.
  6. Model Evaluation: Assess model performance using metrics such as AUC score, accuracy, precision, recall, and F1 score.

Findings

After conducting EDA and hyperparameter tuning using GridSearchCV, the XGBoost classifier model achieved an AUC score of 100% on the training dataset and 98.87% on the test dataset. This indicates that the model can effectively differentiate between malignant and benign tumors with high accuracy.

Conclusion

The XGBoost classifier model developed in this project demonstrates promising results in accurately classifying breast masses as malignant or benign. Medical professionals can leverage this model to assist in diagnosing breast cancer and making informed treatment decisions.

Further analysis and validation using additional datasets can enhance the robustness and generalization capabilities of the model.

For detailed implementation and code, refer to the Jupyter notebook provided in this repository.

Feel free to explore further and contribute to the enhancement of the project!

breast_cancer_prediction's People

Contributors

datasciencevishal avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.