Giter Site home page Giter Site logo

ml-model-for-diabetes.'s Introduction

ML-model-for-Diabetes.

A machine learning model to classify whether patients in a dataset have diabetes.

Problem Statement

Given a dataset of patients' information, predict whether a patient has diabetes.

Dataset

The dataset is called "Pima_Indian_Diabetes" and is provided in the form of a csv file. It has 9 columns featuring pregnancies, glucose level, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, age and the outcome (diabetic or not), and records of more than 750 patients for each of the columns.

The histograms shown are in the following order : Pregnancies, Glucose, BloodPressure, Diabetes Pedigree Function, BMI, Insulin, Skin thickness and age.

Installation Requirements

pip3 install numpy
pip3 install pandas
pip3 install scikit-learn 

Data Preprocessing

We added a new column named "Patient_ID" for patient classification.

The provided dataset had a lot of missing values or NULL values. It could be because of one of the two reasons : either the value wasn't recorded or it doesn't exist at all.

Pregnancies

For the NULL values in this column, we can assume those patients to be male and fill them with 0s.

All other columns

The NULL values in all the other columns were filled according to a scheme -

  • For any NULL value of any patient, if the patient was given to be diabetic, then the NULL value will be replaced by the mean value of that column corresponding only to the diabetic patients.
  • Similarly, for any NULL value of any patient, if the patient was given to be non-diabetic, then the NULL value will be replaced by the mean value of that column corresponding only to the non-diabetic patients.

Model building

  • The features that we use are pregnancies, glucose level, blood pressure, skin thickness, insulin, BMI, diabetes pedigree function, age with the target being the outcome.
  • We then split the dataset into training and testing subsets in the ratio of 0.3
  • Since the target outcome can only have either 0 or 1 as its value, we apply logistic regression over the feature vector for a 1000 different shuffles of the data.
  • We then fit our training data on the logmodel and use the predict function in scikit-learn to predict the outcome on the test data.
  • We also use the accuracy measuring tools of scikit-learn to measure the accuracy of our model.
  • Running instructions :
python3 Model.py
  • The final accuracy comes out to be :
    • Maximum accuracy : 84.84%
    • Mean accuracy : 76.32%

Contributors

  • Gandharv Suri : IMT2017017
  • Advait Lonkar : IMT2017002
  • Mili Goyal : IMT2017513

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.