Giter Site home page Giter Site logo

thealexsamexe / boston-house-price-prediction Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 464 KB

This project is based on Boston House Price Prediction using Linear Regression and Random Forest Regressor.

Jupyter Notebook 100.00%
machine-learning linear-regression random-forest-regression sideproject bostonhouseprice machinelearningprojects boston-housing-dataset boston-housing-price-prediction

boston-house-price-prediction's Introduction

Python Type Type Status

Install

This project requires at least Python 3.1 and the following Python libraries installed:

Data

The dataset used in this project is included as BostonHousing.csv. This dataset is a freely available on the UCI Machine Learning Repository. This dataset has the following attributes:

Features

Features: crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, b, lstat, medv

crim = per capita crime rate by town

zn = proportion of residential land zoned for lots over 25,000 sq.ft.

indus = proportion of non-retail business acres per town

chas = Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)

nox = nitric oxides concentration (parts per 10 million)

rm = average number of rooms per dwelling

age = proportion of owner-occupied units built prior to 1940

dis = weighted distances to five Boston employment centers

rad = index of accessibility to radial highways

tax = full-value property-tax rate per 10,000 USD

ptratio = pupil-teacher ratio by town

b = 1000(Bk โ€” 0.63)ยฒ where Bk is the proportion of blacks by town

lstat = % lower status of the population

Target Variable

Target: medv

Data Visualization

A heatmap was drawn for the correlation between the features

alt text

It helped in giving the correlation between the features available in the dataset.

Furthermore, a heatmap for null values present in the dataset was plotted so that EDA (Exploratory Data Analysis) could be done.

alt text

The frequency of values in column rad was visualized using a histogram.

alt text

Similarily, the above procedure was done with the column chas.

alt text

Later on, a countplot was plotted between column rad and chas to find the correlation.

alt text

To understand the distribution of age groups in column age, a histogram was plotted.

alt text

Same procedure was followed with column crim and rm.

alt text

alt text

To identify the importance of certain features with regard to target variable, feature importance scores were generated.

alt text

Model Fitting

Linear Regression

The train_test_split() was applied to the dataset and the split was done on the basis of 70% data to be considered as Training data and 30% data to be considered as Testing data.

Therefore, the Training accuracy was measured as 76.45%, Testing Accuracy as 67.33% and Model Accuracy as 73.78%.

To visualize the things up, a scatter plot was plotted between the Actual prices and the Predicted prices.

alt text

The scatter plot clearly indicates the huge presence of residuals. So, a scatter plot was plotted between Predicted and Residuals values.

alt text

To clearly understand the relationship via numeric values between Residuals and their Frequency, a histogram was plotted between them.

alt text

Random Forest Regression

The next algorithm that was applied on the dataset was Random Forest Regression. The splitting was done on the rule of 70-30. And, the Training Accuracy obtained was 99.99% and the testing accuracy was 99.97%. This algorithm worked well than Linear Regression as it provided much higher accuracies if compared to the Linear Regression.

A scatter plot for the relationship between Actual prices and Predicted prices clearly visualizes the whole scenario.

alt text

boston-house-price-prediction's People

Contributors

thealexsamexe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.