Giter Site home page Giter Site logo

ayoub-etoullali / spark-mllib-ntt-data Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 1.0 31.74 MB

This GitHub repository contains code for predicting the country destination of new Airbnb users using machine learning techniques on the "Airbnb New User Bookings" dataset from a Kaggle competition.

Jupyter Notebook 100.00%
dataanalysis datapreprocessing datascience datavisualization featureengineering linearregression machinelearning machinelearningmodel prediction prediction-model

spark-mllib-ntt-data's Introduction

Airbnb New User Booking Prediction

This repository contains the code and resources for predicting the country destination of new users on Airbnb. The project uses machine learning techniques to build a model that predicts the country where a new user will make their first travel booking.

Dataset

The dataset used for this project is the "Airbnb New User Bookings" dataset obtained from the Kaggle competition hosted by Airbnb. It includes various user features such as age, gender, signup method, and more, along with the target variable "country_destination."

Approach

The goal of this project is to predict a categorical target variable ("country_destination") based on user features. Here's an overview of the approach:

  1. Data Preprocessing: Load, clean, and preprocess the dataset. Handle missing values and convert categorical variables into numerical format.

  2. Feature Engineering: Extract relevant features from the dataset, such as age and gender.

  3. Model Selection: Since the target variable is categorical, we use classification algorithms such as Random Forest Classifier.

  4. Model Training: Train the chosen classification algorithm on the preprocessed data.

  5. Model Evaluation: Evaluate the model's performance using classification metrics like accuracy, precision, recall, and F1-score.

  6. Prediction and Submission: Use the trained model to predict the country destinations for new users in the test dataset and prepare a submission file.

Files and Directories

  • data/: Directory containing the dataset files.
  • model_selection/: Trained model files.
  • README.md: This file providing an overview of the project.
  • and more

Requirements

  • Python 3.x

Getting Started

  1. Clone this repository.
  2. Install the required libraries
  3. analysing
  4. cleaning
  5. modeling

Results

Kaggle

image

Predicted Country Distribution

image

Challenges and Future Work

Improvements in the database with other evaluation tests

License

This project is licensed under the MIT License.

Feel free to contact us for any questions or collaborations.

spark-mllib-ntt-data's People

Contributors

ayoub-etoullali avatar hajaremr avatar hamzaabth avatar wissu12 avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

hamzaabth

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.