logreg_heart_disease's Introduction

Classifying Heart Disease

The purpose of this project is to predict heart disease with a logistic regression model. The "Heart Disease Data Set" we will use is from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Heart+Disease). This data set comes from the famous Cleveland Clinic Foundation, which recorded information on various patient characteristics, including age and chest pain, to try to classify the presence of heart disease in an individual.

There are 14 attributes we will consider in our model:

age: age in years
sex: sex (1 = male; 0 = female)
cp : chest pain type (1: typical angina, 2: atypical angina, 3: non-anginal pain, 4: asymptomatic)
trestbps: resting blood pressure (in mm Hg on admission to the hospital)
chol: serum cholestoral in mg/dl
fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
restecg: resting electrocardiographic results (0: normal, 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV, 2: showing probable or definite left ventricular hypertrophy by Estes' criteria)
thalach: maximum heart rate achieved
exang: exercise induced angina (1 = yes; 0 = no)
oldpeak = ST depression induced by exercise relative to rest
slope: the slope of the peak exercise ST segment (1: upsloping, 2: flat, 3: downsloping)
ca: number of major vessels (0-3) colored by flourosopy
thal: a blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)
num: diagnosis of heart disease, angiographic disease status (0:< 50% diameter narrowing, 1-4: > 50% diameter narrowing)