Giter Site home page Giter Site logo

midbootcampproject's Introduction

Iron hack Midbootcamp Project - Regression Case Study

by Federica Riva & Cath Vos, May 2021

Prediciting the selling price of houses

Table of content

Project Brief

Scenario: We are working as analysts for a real estate company. Our company wants to build a machine learning model to predict the selling prices of houses based on a variety of features on which the value of the house is evaluated.

Challenge: To build a model that will predict the price of a house based on features provided in the dataset.

The senior management also wants to explore the characteristics of the houses using some business intelligence tools, one of those parameters include understanding which factors are responsible for higher property value - $650K and above.

Problem: Can we build a machine learning model that predicts the selling price of houses?

Data

Working with the data provided, we used mainly Tableau's data visualisation tools as well as Python´s to explore the relationships between features.

workflow workflow
To find out more about the distribution of the important features, you can have a look on our Tableau dashboard: Tableau Dashboard


For further details on all features, please refer to the notebook.

Process & Tools

Process

workflow

  • Github: set up our Github repo to collaborate on.
  • Trello: set up our Trello board to stay organised and reprioritise daily.
  • SQL: completed the SQL queries
  • EDA: assessment of dataframe to prepare for cleaning
  • Data cleaning & wrangling in Python: drop 'date' column, check for duplicates, drop null values, convert float columns to int
  • Prepocessing: 3 methods - data cleaning, data transformation and data reduction
  • Machine Learning Model: using scikit learn
    - iteration 1 (X): run the base model after basic cleaning steps (duplicates and nulls) and not including ID, and date, to be able to use this as a benchmark
    - iteration 2 (X_i2): removed ID duplicates, dropped wrongly reported data point, dummified yr_renoved, transformed yr_built to age and zip code to distance from most expensive zipcode
    - iteration 3 (X_i3): check for multi collinearity and dropping sqft_above, sqft_living15 and sqft_lot15
    - iteration 4 (X_i4): dummified view and sqft_basement, grouped condition 1 and 2 together and dummified this column
    - iteration 5 (X_i5): dropped age and column sqft_lot because they have a correlation with price lower then 0.1
    - iteration 6 (X_i5): using Decision tree model on the df of iteration 2 to check which model is stronger

Tools

Visualizations

For further visualisations check out our Tableau workbooks or the presentation we've done below.

Tableau
Tableau extra
Presentation

Key Take Aways

1. Our model can predict the price of a house with an accuracy of 80%

2. The mean absolute error of the best model is around 100K

Thank you for reading!
If you have any questions, please reach out to us.

midbootcampproject's People

Contributors

cathvos avatar federicariva avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.