Giter Site home page Giter Site logo

car-price-prediction's Introduction

Car Price Prediction Data Analysis

This project involves analyzing a car price dataset using Python libraries such as NumPy, Pandas, Matplotlib, Seaborn, and sklearn. The goal is to pre-process the dataset by applying feature engineering, feature selection, and exploratory data analysis. Note that actual model building is not part of this project.

Table of Contents

Project Description

The goal of this project is to analyze a car price dataset and perform various data preprocessing tasks, including feature engineering, feature selection, and exploratory data analysis. This analysis will help develop machine learning models that can accurately predict car prices based on various features such as model, production year, category, brand, fuel type, engine volume, mileage, cylinders, vehicle style, and others.

Dataset

The dataset consists of 19,237 samples and includes features such as:

  • Model
  • Production year
  • Category
  • Brand
  • Fuel type
  • Engine volume
  • Mileage
  • Cylinders
  • Vehicle style
  • Price (target variable)

Installation

To run this project, you'll need to install the following dependencies:

  • Python 3.x
  • NumPy
  • Pandas
  • Matplotlib
  • Seaborn
  • scikit-learn
  • Jupyter Notebook

You can install the required packages using pip:

pip install numpy pandas matplotlib seaborn scikit-learn jupyter

Usage

git clone https://github.com/yourusername/car-price-prediction.git
cd car-price-prediction
python -m venv envsource env/bin/activate # On Windows use \`env\\Scripts\\activate\`
pip install numpy pandas matplotlib seaborn scikit-learn jupyter
jupyter notebook

Open car_price_analysis.ipynb and run the cells sequentially to perform the data analysis.

Project Structure

The project directory contains the following files:

  • car_price_analysis.ipynb: The Jupyter Notebook containing the data analysis and preprocessing steps.

  • price_prediction_batch_23.csv: The dataset used for the analysis.

  • README.md: The project documentation.

Analysis Steps

Section 1: Data Types and Preprocessing

  • Analyze data types of features and update if required.

  • Process the 'Levy' column and convert it to an integer type.

  • Process the 'Mileage' column and convert it to an integer.

  • Check for NaN values in the data and remove them if necessary.

Section 2: Data Cleaning and Exploration

  • Check for duplicates and remove them.

  • Check for outliers using boxplots and statistical methods, and remove them if necessary.

  • Draw countplots for categorical features and write observations.

  • Draw histograms for numeric features, compute skewness, and apply transformation functions if needed.

Section 3: Feature Engineering and Encoding

  • Create a joint plot with the hue parameter and write observations.

  • Apply scaling methods to independent features.

  • Convert categorical features into numeric ones using appropriate encoding techniques.

  • Combine results and compute the correlation among all independent features using a heatmap. Discard one of the variables if high correlation is detected (above 0.7).

Section 4: Feature Selection Based on Correlation

  • Split the dataset into training and testing sets using an 80-20 split.

  • Compute the correlation of each independent feature with the dependent variable 'Price'.

  • Select the seven most important independent features based on the correlation values.

Section 5: Feature Selection Using SelectKBest

  • Apply the SelectKBest method to the dataset to reduce the feature set to the seven most important features.

Conclusions

  • The analysis identified key features influencing car prices and reduced the feature set to the most relevant ones for potential model building.

  • Data preprocessing steps, including handling missing values, outliers, and feature scaling, were crucial in preparing the data for analysis.

Contributing

Contributions are welcome! Please fork this repository and submit a pull request for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

car-price-prediction's People

Contributors

thabhelo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.