Giter Site home page Giter Site logo

danieldacosta / recommendation-system Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 13 KB

The project's goal is to create diverse recommendation systems that predict user-item ratings

License: MIT License

Python 100.00%
item-based-recommendation model-based-recommendation recommendation-system

recommendation-system's Introduction

Recommendation System

The project consists of building different types of recommendation systems using the Yelp dataset to predict the ratings/stars for given user ids and business ids.

Dataset

Original Yelp review dataset with some filters.

  1. yelp_train.csv: the training data, which only include the columns: user_id, business_id, and stars.
  2. yelp_val.csv: the validation data, which are in the same format as training data.
  3. We are not sharing the test dataset.
  4. review_train.json: review data only for the training pairs (user, business)
  5. user.json: all user metadata
  6. business.json: all business metadata, including locations, attributes, and categories d. checkin.json: user checkins for individual businesses
  7. tip.json: tips (short reviews) written by a user about a business
  8. photo.json: photo data, including captions and classifications

Models

Item-based CF recommendation system with Pearson similarity

The idea behind a item-to-item collaborative filtering is to rather than matching similar users, match user's rated items to similar items. In practice, often leads to faster online systems and better recommendations Similarities between pairs of items i and j are computed off-line. Predict rating of user “a” on item “i" with a simple weighted average

Results

File: code/competition.py RMSE: 1.0575379905

Model-based

In this project, I have implemented a Model-Based approach to predict user ratings. To achieve this, I utilized the powerful XGBoost model, which I fine-tuned through RandomizedSearchCV with a thoughtfully selected set of parameters. This rigorous tuning process aimed to optimize the model's performance and enhance its predictive capabilities.

Fine-tuning

The hyperparameters that have been tuned were

 {'max_depth': [7, 8, 9], 'learning_rate': [0.01, 0.03 ,0.05, 0.07, 0.1], 'n_estimators': [512], 'colsample_bytree': np.arange(0.7, 1, 0.1), 'colsample_bylevel': np.arange(0 7, 1.0, 0.1).}

I employed a Model Stacking technique with Cross Validation to train the model, generating 10 distinct models. To arrive at the final rating prediction, I averaged the individual predictions from each model.

For consistency and reproducibility, I saved all 10 models in the model/ folder and utilized the joblib library to read them for making predictions.

Additionally, I experimented with combining user-based, item-based, and model-based predictions using both switching and weighting techniques. Despite exploring these approaches, the model-based prediction consistently outperformed the other two methods.

Features

In this project, significant effort was dedicated to feature engineering to optimize the predictive performance and minimize the Root Mean Squared Error (RMSE). To achieve this, a diverse set of features was explored, combining raw data from both the User and Business datasets with newly created features. The following features were developed:

  1. n_attributes: The number of attributes associated with each business_id, providing additional information about the businesses' characteristics.

  2. average_stars_user: The average star rating given by each user, offering insights into their general reviewing behavior.

  3. avg_star_category: The average star rating given by each user for businesses falling within specific categories, enabling the model to capture preferences across different types of businesses.

  4. The businesses were categorized into the following categories: ['restaurants', 'shopping', 'food', 'beauty', 'health', 'home', 'nightlife', 'automotive', 'bars', 'local'].

  5. yelping_since_year: The year of each review, potentially uncovering trends or changes in reviewing habits over time.

  6. review_count_business: The average number of reviews per business, which may reveal patterns related to business popularity or activity.

Results

  • Files: competion.py and train.py
  • RMSE: 0.9772904711772428

recommendation-system's People

Contributors

danieldacosta avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.