Giter Site home page Giter Site logo

king_county_housing_data's Introduction

King_County_Housing_Data

Housing is important to the health and vitality of communities. A house for most Americans is the largest single asset that they own and it is most likely the largest purchase they will make during their lifetimes. For homeowners, residential housing values influences their personal wealth. City managers track and study housing data for tax and revenue generation, for zoning and planning purposes and for distribution of government services. These factors influenced how the authors, Helen Levy-Myers and Linh Pham, examined the King County (WA) housing data. The two questions they had were try to develop a linear model that could predict home values for that both city officials and homeowners for the immediate future when considering next year’s budgets and well understand how to possibly improve home values for the individual when considering selling or buying a home and for the city when facing policy decisions that affect the residential market.

Background

The data set included the year the house was built and that data was binned by decade which created an interesting data visualization showing the economic history of Seattle and King County. Housing starts are considered a leading economic indicator, increases indicates economic growth and declines, beyond the winter season, can push it into a recession. In this data set, the year built acts as approximate proxy for housing starts. Houses started in each decade indicate how Seattle’s economy performed during that decade. There is a drop during the Great Depression and a from the beginning of World War II to the end of the 1960s, there was sustained economic growth with a huge increase in population. People do need homes to live in. However, the largest employer, Boeing, cut its workforce by more than 50 percent in less than two years in 1971, resulting in a depression that cut reduced the number of homes built. Then in the 1980s, Microsoft moved to Seattle and spun off numerous businesses that diversified the economy.

Knowing where homes are built in vital for home buyers that want to buy a particular style of home and for city officials, knowing where new homes are built can affect where to locate schools, retail, hospitals, roads, and more. This map show where homes were built in which decade. The blues represent homes built earlier in the 20th century, the darker reds were built in the 21st century. Again one can see interesting patterns. Homes were built closer to downtown Seattle at the beginning of 20th century. At that time, there was less transportation options and more jobs were located there. Towards the end of the 20th century, Seattle added two major causeways across Puget Sound making the areas east of city an easy commute. Also Microsoft opened up in 1983 in Redmond which is on the other side of Puget Sound.

The Data

The King County Housing Data set includes 21,597 observations, all involving residential home sales between May 2014 and May 2015. Each observation included the following variables, the parcel lot number from the tax records, the price paid, size of the home (sq.ft.), bedroom, bathrooms, floors, living space (sq. ft.), lot size (sq. ft.), basement (sq. ft.), above grade (sq. ft), zip code, latitude, longitude, living space of nearest 15 neighbors (sq. ft.), lot size of nearest 15 neighbors (sq. ft.), building status, condition, grade, year built, , year renovated, waterfront, views, and date sold.

For the final model, the authors did a variety of data cleaning methods. For missing values or a question mark, a new NULL column was added to indicate a null value in the Basement, Waterfront, Views, and Year Renovated columns. There were more than 100 years as individual values in the data set and to make it easier to see trends, Year Built and Year Renovated were binned into 12 decades. Zip codes were converted to categorical data from numbers. Three new variables were created: 1. Living space ratio - living space of the individual home divided by the living space of the nearest 15 neighbors, 2. Lot size ratio - lot size of the individual home divided by the lot size of the nearest 15 neighbors, 3. Month from the date sold variable.

Deciding on how to focus and modeling

A heat map with annonated values was used to decide which variables to focus on, as high collinearity usually indicates that only one variable of the two variables should be included, not both. In the end, the variable price was transformed and the residuals were checked. Transforming the price variable and adding more categorical variables made a significant difference in the model's performance as seen by the Q-Q plots. The model to predict price was checked against both the mean of price and the mean of the price log and it performed better than both.

The jupyter notebooks and powerpoint presentation were done for the Flatiron Data Science Boot Camp and were the work of both Helen Levy-Myers and Linh Pham.

king_county_housing_data's People

Contributors

hlevymyers avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.