Giter Site home page Giter Site logo

ierolsen / deep-dive-into-breast-cancer-dataset Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.14 MB

breast cancer classification with advanced machine learning technique

Jupyter Notebook 100.00%
advanced-machine-learning kaggle breast-cancer-wisconsin breastcancer-classification knn-classification nca neighbourhood-components-analysis pca principal-component-analysis

deep-dive-into-breast-cancer-dataset's Introduction

Deep Dive into Breast Cancer

About one year ago, I did practise and worked on this dataset. But one year ago I can say I just started the data science therefore I did not know some methods and I never applied before like NCA, PCA,etc.


Explanations about Graphs

1-) Box Plots

box_plot_before_stand Before applied standardization we draw the box plot graph we can see this graph. This means data is not uniform. On the data there are big and small values same columns. Because of that there are long lines on the graph.

box_plot_after_stand After applied standardization we can see this graph. As you can see this graph looks like more homogeneous, we can see clearly median values and there are not long lines like first one. This is why we use standardization. Big values effect all data. After all model can't learn anything in data.

2-) Pair Plots

comp  pair_plot As you can see standardization doesn't effect pair plot graph. Why? Because we draw them about their correlation. Skewness is kind of a shape. It doesn't have a correlation between standardization.

3-) Principal Component Analysis (PCA)

pca_classes PCA keeps as much information as possible and reduce the size of data. We used for size transform that from 30 size to 2 size. And after all we'll train our KNN algorithm on this 2 size. On the graph, you can see kind of a border that seperates 0 and 1. We used for PCA.

if we check out another graph pca our KNN_Best_Params() function determines n_neighbors: 9 as best params. But as we can see our KNN model misclassified.

4-) Neighborhood Components Analysis (NCA)

nca classes Neighborhood Components Analysis is a distance metric learning algorithm which aims to improve the accuracy of nearest neighbors classification compared to the standard Euclidean distance.(from sklearn)

nca

I want to show you something on the graph. maybe you can think model is overfitted but actually not.

blue_border On this are looks like a overfitting but actually not. Our KNN_Best_Params() function determine n_neighbor:1. Therefore this blue point looked its neighbor and then it create a blue area.

This is the model evaluation graph: model_evaluation


I need to explain another thing in this project. Because in the notebook you can't understand clearly. As you can see, after applied NCA Training Acc: 1.0, Test Acc: 0.99. Probably, you can think model is overfitted but actually not because of test acc . If test acc would be about 0.90 we can absolutely say model is overfitted

deep-dive-into-breast-cancer-dataset's People

Contributors

ierolsen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.