Diabetes Prediction Project

This project focuses on predicting whether a person has diabetes or not based on various factors. It includes a dataset file diabetes_prediction_dataset.csv and a Jupyter Notebook file notebook.ipynb which contains the code and analysis.

Dataset

The dataset file diabetes_prediction_dataset.csv contains 100,000 rows and 9 columns with the following column names:

gender: Three unique values - male, female, other.
age: Age of the individual.
hypertension: Binary value (0 or 1) indicating the presence of hypertension.
heart_disease: Binary value (0 or 1) indicating the presence of heart disease.
smoking_history: Six unique values - never, no info, current, former, ever, not current.
bmi: Body Mass Index (BMI) of the individual.
HbA1c_level: Level of HbA1c (glycated hemoglobin) in the blood.
blood_glucose_level: Blood glucose level of the individual.
diabetes: Binary value (0 or 1) indicating the presence of diabetes.

Notebook

The Jupyter Notebook file notebook.ipynb contains the code and analysis for the diabetes prediction project. Here is an overview of the steps performed in the notebook:

Importing required libraries.
Importing the diabetes_prediction_dataset.csv file.
Removing duplicated values from the dataset.
Data visualization:
- Countplot of the count of individuals by smoking history.
- Countplot of the count of individuals by smoking history and diabetes status.
- Countplot of the count of individuals by gender.
- Countplot of the count of individuals by gender and diabetes status.
- Histogram of age distribution.
- Box plot of age distribution by diabetes.
- Countplot of the count of individuals by hypertension and diabetes status.
- Countplot of the count of individuals by heart disease and diabetes status.
- Box plot of BMI distribution by diabetes.
- Box plot of HbA1c level distribution by diabetes.
- Box plot of blood glucose level distribution by diabetes.
- Correlation heatmap.
Performing one-hot encoding on the gender and smoking_history columns.
Concatenating the encoded columns with other columns (age, hypertension, heart_disease, bmi, HbA1c_level, blood_glucose_level) and saving the resulting dataframe in the variable X.
Normalizing X using MinMaxScaler.
Defining the target variable y as df['diabetes'].
Balancing the class values using SMOTE, as the count of 0 (non-diabetic) is 87,664 and the count of 1 (diabetic) is 8,482.
Defining a dictionary of algorithms in the algos variable.
Training the algorithms with different hyperparameters and saving the model, best score, and best parameters in the scores variable.
Converting the scores into a dataframe.
Splitting the data of X and y into training, testing, and validating datasets using train_test_split.
Training a Random Forest Classifier with n_estimators=100 and criterion='entropy'.
Calculating the accuracy score.
Predicting the values for the test data X_valid and saving the predictions in y_pred.
Creating a confusion matrix and heatmap of the confusion matrix.
Creating a classification report.
Applying PCA (Principal Component Analysis) on X for dimensionality reduction.
Training the model again with the Random Forest Classifier using the same parameters.
Creating a confusion matrix and classification report based on the reduced dimensions.

Please refer to the notebook.ipynb file for detailed code implementation and further analysis of the diabetes prediction project.

Deployment Model

To deploy the diabetes prediction model, we have trained a Random Forest Classifier on the dataset. The trained model has been saved as model.pickle for later use in the web application.

You can find the code for training the model in the deployment_model.ipynb notebook.

Flask Web Application

We have created a Flask web application for diabetes prediction using the trained Random Forest Classifier model. The application includes the following files:

app.py: This is the main Flask application file that handles routing and prediction.
requirements.txt: Contains the list of Python packages and dependencies required to run the Flask application.
templates directory: Contains the HTML templates for the web pages:
- index.html: The main page for entering user data and getting predictions.
- true.html: Displayed when the prediction result is "Diabetic."
- false.html: Displayed when the prediction result is "Non-Diabetic."

To run the Flask web application, make sure you have the required dependencies installed as mentioned in requirements.txt, and then execute app.py.

Feel free to explore the web application and make predictions based on the trained model.

atreop / diabetes_prediction Goto Github PK

diabetes_prediction's Introduction

Diabetes Prediction Project

Dataset

Notebook

Deployment Model

Flask Web Application

Webpage Glimpse:

diabetes_prediction's People

Contributors

Stargazers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent