Giter Site home page Giter Site logo

nexer8 / myocardial_infraction_diagnosis_with_k-nn Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.21 MB

Myocardial Infraction Diagnosis using k-nearest neighbors algorithm (k-NN).

Python 7.42% Jupyter Notebook 92.58%
knn-classification medical-diagnosis machine-learning

myocardial_infraction_diagnosis_with_k-nn's Introduction

Myocardial Infraction Diagnosis using k-nearest neighbors algorithm (k-NN)

Authors

Problem Description

The medical problem we are undertaking is the use of artificial intelligence methods (k-NN algorithm) to aid in the diagnosis of heart attacks in patients. Training data on the patient's medical profile will be used to create the solution. On the basis of the data, the program will link the symptoms and the patient's profile with heart disease or inform about the inability to link the symptoms with its cause, which means that the undertaken medical problem is a problem of multi-class classification.

About the Dataset

Heart related diseases will be classified on the basis of the traits in the training data concerning the patient's medical history and health condition. The aggregate dataset contains five classes and a total of 901 records. As part of the data set preprocessing, some features will be omitted based on the feature ranking (utilizing the chi-squared test). As part of the optimization of the created model, the hyperparameter tuning for parameters characteristic for the k-NN classifier will be applied.

Data Distribution

The classes (diagnoses) and their numbers (equivalent to the number of patients) are presented below. All classes are linked to heart disease.

  • Pain of non-heart origin - 230
  • Angina pectoris - 142
  • Angina pectoris (Prinzmetal variant) - 68
  • Myocardial infraction (transmural) - 263
  • Myocardial infraction (subendocardial) - 198

Experiments

  • Evaluation of the used classifier using the 5-times repeated method of 2-fold cross-validation.
  • Measuring the quality of classification by the frequency of correct diagnoses on the testing set.
  • Use of 3 different k values: 1, 5 and 10.
  • Use of 2 different distance measures: Manhattan and Euclidean.
  • Tests were conducted for a varying number of features, beginning with one - the best according to the calculated ranking - and then increasing one by one.
  • Conducting a statistical analysis of the obtained results.

Results

The best average accuracy (the highest frequency of correct solutions) obtained from the k-NN algorithm is 0.732. It was obtained for a model with the following parameters:

  • metric: Manhattan,
  • number of neighbors: 10,
  • number of features: 58.

Manhattan Metric

Manhattan Metric
Plot of the frequency of correct diagnoses depending on the number of features for the Manhattan distance metric.

Euclidean Metric

Euclidean Metric
Plot of the frequency of correct diagnoses depending on the number of features for the Euclidean distance metric.

Metrics Comparison

Metrics Comparison
Plot of the frequency of correct diagnoses depending on the number of features comparing the best selection of the number of neighbors for Manhattan and Euclidean metrics.

High Scores Ranking Report

precision recall f1-score
Angina prectoris 0.69 0.56 0.62
Angina prectoris - Prinzmetal variant 0.07 0.29 0.11
Myocardial infraction (subendocardial) 0.52 0.79 0.63
Myocardial infraction (transmural) 0.91 0.72 0.80
Pain of non-heart origin 0.93 0.87 0.90

Remarks

The Manhattan and Euclidean plots show how the frequency of correct diagnoses changed depending on the number of features and the value of the k-NN algorithm parameter for individual metrics.

For both the Euclidean and Manhattan metrics, the highest accuracy was found for the value of the parameter k (the number of neighbors) equal to 10. Moreover, in both cases the lowest accuracy occurred for k equal to 1, and the results for k equal to 5 were between the results for the aforementioned values. Both these figures allow us to conclude that for the data set used, the greater the value of k meant a greater frequency of correct solutions to the algorithm.

On the Euclidean plot it can be seen that for the Euclidean metric, regardless of the value of k, the accuracy of diagnoses increased to a certain number of features (14), and then it began to gradually and successively decrease. This relationship, however, is not true for the Manhattan metric (Manhattan plot). It can be observed that the selected metric copes very well with a larger number of dimensions, as the best result obtained with its use occurred for 58 features.

The full report is available in Polish in raport.pdf.

myocardial_infraction_diagnosis_with_k-nn's People

Contributors

nexer8 avatar olaziobrowska avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.