In this project, we aimed to build models to predict the heart disease rate according to various features. Our dataset is from Kaggle, which collects data from counties in the United States. In all, there are 3199 data points and 33 variables in our dataset and they are from three different aspects. The first aspect which we included in our model is the area, which is categorized as rural area and urban area. The economic conditions of the counties are also considered. The economic conditions are put into six categories, including farming, mining, manufacturing, Federal/State government, recreation, and non-specialized counties. Our dataset also includes various health factors. To make the best prediction, we tried several methods, includes KNN, LinearRegression, RandomForest, Decision Tree, etc. After that, we compared the test score to determine the best model.
minglyubyte / ml-heart-diease-prediction Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License