Laptop Prices Predictor

Designed a web app that predicts the price of the laptop given the configurations.
Scraped the laptops data from flipkart.com using python and BeautifulSoup package
Developed Linear, Lasso, and Random Forest Regressors using GridsearchCV to get the best model.
Deployed the Machine Learning model using streamlit library in Heroku using flask

Links and Resources Used

Streamlit Library: https://www.streamlit.io/

Model Deployment Video: https://www.youtube.com/watch?v=IWWu9M-aisA

Model Deployment Github: https://github.com/krishnaik06/Dockers

Packages: pandas, numpy, sklearn, flask, streamlit, joblib

Web Scraping

This is the Flipkart website comprising of different laptops. This page contains the specifications of 24 laptops. So now looking at this, we try to extract the different features of the laptops such as:

Description
Processor
RAM
Storage
Display
Warranty
Price

So we extract the data from 7 pages so our dataset now consists of the information the 168 different laptops.
Link to my article: https://towardsdatascience.com/learn-web-scraping-in-15-minutes-27e5ebb1c28e

Feature Engineering

We go through all the features one by one and keep adding new features. I have made the following changes and created new variables: RAM - Made columns for Ram Capacity in GB and the DDR version
Processor - Made columns for Name of the Processor, Type of the Processor, Generation
Operating System - Parsed the Operating System from this column and made a new column
Storage - Made new columns for the type of Disk Drive and the capacity of the Disk Drive
Display - Made new columns for the size of the laptop(in inches) and touchscreen
Description - Made new columns for the company and graphic card

Data Preprocessing

There are a few columns which are categorical here but they actually contain numerical values.So we need to convert few categorical columns to numerical columns. These are DDR_Version,Generation,Storage_GB,Price.

Exploratory Data Analysis

Model Building

Traditional Method

Used scikit-learn library for the Machine Learning tasks. Applied label encoding and converted the categorical variables into numerical ones.Then I splited the data into training and test sets with a test size of 20%. I tried three different models ( Linear Regression, Random Forest Regression, XGBoost) and evaluated them using Mean Absolute Error.

Automated Method

Used the auto ML library in python called PyCaret. Compared all the regression models and selected the best model for applied hyperparameter tuning and plotted the various curves.

Link to my article: https://towardsdatascience.com/leverage-the-power-of-pycaret-d5c3da3adb9b

Model Deployment

I have deployed the model using Streamlit library and flask framework on Heroku which is a Platform As A Service(PAAS)

Web application: https://laptop-prices-predictor.herokuapp.com/

mathangpeddi / laptop-prices-predictor Goto Github PK