- Designed a web app that predicts the price of the laptop given the configurations.
- Scraped the laptops data from flipkart.com using python and BeautifulSoup package
- Developed Linear, Lasso, and Random Forest Regressors using GridsearchCV to get the best model.
- Deployed the Machine Learning model using streamlit library in Heroku using flask
This is the Flipkart website comprising of different laptops. This page contains the specifications of 24 laptops. So now looking at this, we try to extract the different features of the laptops such as:
- Description
- Processor
- RAM
- Storage
- Display
- Warranty
- Price
Link to my article: https://towardsdatascience.com/learn-web-scraping-in-15-minutes-27e5ebb1c28e
We go through all the features one by one and keep adding new features. I have made the following changes and created new variables:
RAM - Made columns for Ram Capacity in GB and the DDR version
Processor - Made columns for Name of the Processor, Type of the Processor, Generation
Operating System - Parsed the Operating System from this column and made a new column
Storage - Made new columns for the type of Disk Drive and the capacity of the Disk Drive
Display - Made new columns for the size of the laptop(in inches) and touchscreen
Description - Made new columns for the company and graphic card
There are a few columns which are categorical here but they actually contain numerical values.So we need to convert few categorical columns to numerical columns. These are DDR_Version,Generation,Storage_GB,Price.
Link to my article: https://towardsdatascience.com/leverage-the-power-of-pycaret-d5c3da3adb9b
I have deployed the model using Streamlit library and flask framework on Heroku which is a Platform As A Service(PAAS)
Web application: https://laptop-prices-predictor.herokuapp.com/