Giter Site home page Giter Site logo

norhanabdelhafez / airlineticketpriceprediction-1 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nourkamaly/airlineticketpriceprediction

0.0 0.0 0.0 236.63 MB

Airline ticket price prediction from end to end (analysis - preprocessing - modeling - testing - deployment - documentation) for Indian airlines

Python 0.11% CSS 0.18% HTML 0.11% Jupyter Notebook 99.56% Procfile 0.01% JavaScript 0.04%

airlineticketpriceprediction-1's Introduction

AirlineTicketPricePrediction from End to End

This machine learning project aims to do 2 main things separately:

  1. Predicting the airline ticket price (regression problem).
  2. Classifying the ticket price range into 4 categories: cheap, moderate, expensive, very expensive.

These two parts rely on 10 features: date, airline, ch code (airline code), num code, time taken, stop, arrival time, type, route.

Data format: comma separated values file.

Project Lifecycle

  1. Data Analysis
  2. Preprocessing
  3. Modeling
  4. Testing
  5. Models Analysis (not done yet)
  6. Deployment on heroku

Project can be found at: https://github.com/NourKamaly/AirlineTicketPricePrediction

Tech Stack

Programming Languages: Python 3.9, JavaScript

Markup Languages: HTML, CSS

Tools used: PowerBI

Frameworks: Bootstrap

Libraries used: NumPy, pandas, dataprep, matplotlib, scipy, seabron, TensorFlow, xgboost, sklearn, joblib, flask.

Data Analysis:

Analysis was done using Power BI, answering 20 questions about the data analysis report

Preprocessing:

  1. Due to the presence of the date feature, the data was handled as a time series forecasting problem
  2. Data was sorted (mergesort) according to month, day, flight departure hour, and flight departure minute to prevent data leakage when splittling the data into the training and validation set.
  3. Features extracted: weekday of flight, flight day, flight month, and distance between the source and destination cities.
  4. Feature balance applied to airline as some categories had relatively low frequency
  5. Outlier detection using the interquartile range on the label (price)
  6. Feature engineering applied to the other features
  7. Feature selection using the p value
  8. Data transformation using the discrete cosine transform as this is a time series data (we suspected that the data may have been periodic) time series data Multiple encoders were used and this resulted in 3 different dataset and training was done on each one of them separatly

Modeling :

10 models were tried in Regression:

  1. eXtreme Gradient Boosting Regressor
  2. Poisson Regressor
  3. Histogram Gradient Boosting Regressor
  4. Linear Regression
  5. Light Gradient Boosting Machine Regressor
  6. Gradient Boosting Regressor
  7. Extra Tree Regressor
  8. Bagging Regressor
  9. Decision Tree Regressor
  10. Random Forest
  11. A bagging ensemble learning model (simple averaging) made with: HistGradientBoostingRegressor,LGBMRegressor, ExtraTreesRegressor, BaggingRegressor, RandomForestRegressor

The ensemble model and random forest got the 2 best r2 score in the regression testing set

Ensemble model r2 score: 0.982

Random Forest r2 score: 0.980

random forest mse

9 models were tried in classification:

  1. Ada Boost
  2. Gradient Boosting Classifier
  3. Bagging Classifier
  4. Random Forest
  5. eXtreme Gradient Boosting Classifier
  6. Decision Tree Classifier
  7. Histogram Gradient Boosting Classifier
  8. Extra Tree Classifier
  9. Ensemble Stacking model that consists of RF, bagging classifier , extra tree classifier (the best performing models)

Deployment:

Deployment was done using HTML, CSS, Javascript, bootstrap for the interface and the backened was made by Flask.

The Website Link: https://airline-ticket-prediction-app.herokuapp.com/

website

airlineticketpriceprediction-1's People

Contributors

abdelrhman2023 avatar ahmeedsamyy avatar mohamednour2019 avatar norhanabdelhafez avatar nourkamaly avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.