The purpose of this project is to predict heart disease with a logistic regression model. The "Heart Disease Data Set" we will use is from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Heart+Disease). This data set comes from the famous Cleveland Clinic Foundation, which recorded information on various patient characteristics, including age and chest pain, to try to classify the presence of heart disease in an individual.
There are 14 attributes we will consider in our model:
- age: age in years
- sex: sex (1 = male; 0 = female)
- cp : chest pain type (1: typical angina, 2: atypical angina, 3: non-anginal pain, 4: asymptomatic)
- trestbps: resting blood pressure (in mm Hg on admission to the hospital)
- chol: serum cholestoral in mg/dl
- fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
- restecg: resting electrocardiographic results (0: normal, 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV, 2: showing probable or definite left ventricular hypertrophy by Estes' criteria)
- thalach: maximum heart rate achieved
- exang: exercise induced angina (1 = yes; 0 = no)
- oldpeak = ST depression induced by exercise relative to rest
- slope: the slope of the peak exercise ST segment (1: upsloping, 2: flat, 3: downsloping)
- ca: number of major vessels (0-3) colored by flourosopy
- thal: a blood disorder called thalassemia (3 = normal; 6 = fixed defect; 7 = reversable defect)
- num: diagnosis of heart disease, angiographic disease status (0:< 50% diameter narrowing, 1-4: > 50% diameter narrowing)
The procedure is described in the attached notebook classifying_heart_disease.ipynb .