Prediction model for housing prices in King County, Washington
Overview: This project will explore a given King County, Washington dataset to create features and build a model for prediction sales prices in King County.
Business Aim: Create a model that predicts the sales price of property in King County, Washington. A real estate firm is searching for a prediction model that will assist in predicting housing prices to help them better tailer their business objectives.
Approach:
-
Data Cleaning - remove outliers and check for any null values
-
Data Visualization - the data visualization serves two purposes in this project, one is to help check the dataset for any irregularities and the other is to further analyze and explore significant features
-
Exploratory Data Analysis (EDA) - statistical analysis to determine most significant features
-
Feature Engineering - after analyzing the existing dataset, new features are created based on statistical significance
-
Model Testing - Test different linear regression models to determine which model will provide the best predictions
-
Final Model - Fit final model to dataset
Summary: The King County Housing dataset was analyzed and some of the significant features for predicitng housing sales include sqft living and lot size, whether the home has a basement, the grade and condition of the home, as well as several other features. Combining SelectKBest to narrow down to 200 features and then using RFE to narrow down more features resulted in my lowest RMSE number.
Future: Future work should look into additional features not offered within the dataset such as economic and educational statistics amongst other things. Given additional time to complete this model will allow for further research and testing to ensue.