Giter Site home page Giter Site logo

fridahkimathi / house-sales-in-king-county-washington-usa Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 4.09 MB

The project used Python to perform exploratory and predictive analysis on 21,597 homes in King County in the years 2014 and 2015 and created a linear regression model that predicted the house selling prices for a real estate agency

Jupyter Notebook 100.00%
data-science pandas python scikit-learn

house-sales-in-king-county-washington-usa's Introduction

House Sales in King County, USA

A picture of houses in King County

Overview


The project analyzes about 21597 houses sold in King County, Washington-USA in the years 2014 and 2015. The relevant features in the dataset used for analysis and linear regression in this project are; house id, date, house price, number of bedrooms and bathrooms, size of living space and property, number of floors the house has, the condition and grade of the house according to the King County System and the year the house was built.

Business Problem


The project aims to aid a Real Estate Agency in King county, Washington-USA predict the house prices for single-family homes that their potential clients are looking buy and/or sell. The business problem the project aims to tackle is to highlight the main influential factors affecting a home's value in King county, Washington-USA.

Data


The King County Housing Data Set contains information about the size, location, condition, and other features of houses in Washington's King County. More details about the columns can be found in King County Residential Glossary of Terms

Methods


After exploring, splitting and preprocessing the data, multiple linear regression models were built with price as the dependent variable.

Results


The price of the house is relatively highly correlated to two columns, that is footage of the home(sqft_living) and grade.

Features such as number of bedrooms and bathrooms, size of living space and property, number of floors the house has, whether the house has a waterfall or not and the grade of the house were included in the final multiple linear regression model.

The model satisfied all linear regression assumptions that is linearity, normality, multicollinearity and homoscedasticity.

Visualizations

      Linearity Assumption

       Normality Assumption

       Homoscedasticity Assumption

The p-values for each predictor variable were below 0.05 and therefore the null hypothesis that states that there is no relationship between the chosen explanatory variables and the response variable can be rejected. The r-squared value of the model was 0.63.

Conclusions


This analysis leads to three recommendations for improving operations of the Real Estate Agency:

  • Homeowners interested in selling their homes should focus on improving the design and quality of construction of their homes This may in turn improve the grade of the home.
  • Home buyers looking to buy a house should consider older homes with a high grade Older home tend to be slightly cheaper. If a home is old and has a high grade, this might be one of the cheapest yet good quality homes the buyer will find. Home with a low number of bathrooms tend to be cheaper too.
  • If possible, homeowners should also expand the square footage of living space on the lot The higher the sqaure footage of the living space, the higher the house sale price.

Limitations


The model does have some limitations such as; some of the variables needed to be log-transformed to satisfy regression assumptions hence any new data used with the model would have to undergo similar preprocessing. Additionally, given that outliers were removed, the model may also not accurately predict extreme values. Future analysis should explore the best predictors of the prices of homes outside of King County, as well as homes with extreme price values.

Next Steps


Further analyses could yield additional insights to further improve operations at the Real Estate Agency:

  • Prediction of house price based on the crime rate of the neighborhood where the house is locatedThis model could use already available data, such as latitude and longitude coordinates, and police records.

  • Prediction of house price based on the the average income of residents of the county This model could predict house prices based on the average income of residents of the area.

  • Prediction of house price based on the ethnicity and race of the residents in an area This model could identify the effect of ethnic-racial discrimination on house prices

For More Information

See the full analysis in the Jupyter Notebook or review this presentation.

For additional info, contact Fridah Kimathi at [email protected] or via my LinkedIn profile.

Repository Structure

├── .gitignore
├── data
├── images
├── House Sale King County,USA Presentation.pdf
├── README.md
└── index.ipynb

house-sales-in-king-county-washington-usa's People

Contributors

fridahkimathi avatar

Stargazers

Scholar Chepkirui avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.