Giter Site home page Giter Site logo

ml-3a-project's Introduction

ML-3A-project: Game recommender system based in Graph Neural Networks

Jhon Sebastian Rojas Rodriguez

Task Description

The idea of the project is to build a recommender system based on Graph Neural Networks, the problem can be formulated as a Link Prediction Task (the prediction whether an edge exists between two particular nodes of the graph). The dataset used is a Steam Recommendation Dataset public available in Kaggle. The data is represented by a bipartite graph where the vertex-set can be split into two separate disjoint sets: users and games. The weight of each edge is 1 if the users liked the game and -1 if they disliked it.

This project formulate the link prediction problem as a binary classification problem as follows:

  • Treat the edges in the graph as positive examples.
  • Sample a number of non-existent edges as negative examples.
  • Divide the positive examples and negative examples into training, validation and test sets.
  • Evaluate the model with a binary classification metric such as Area Under Curve (AUC)

Model Description

The model consists of two GraphSAGE layers, each layer computes new node representations by averaging neighbor information. The DGL framework is used, it provides an implementation of the GraphSAGE layer and some useful optimized functions to make the computations needed.

In order to do the prediction classification usually binary operators such as dot product and L1 / L2 norm are used to encode node embeddings into a singular edge embedding value. Then a logistic regression model is used to classify.

Dataset Description

The dataset contains over 41 million cleaned and preprocessed user recommendations or reviews from a Steam Store. Steam is a leading online platform for purchasing and downloading video games, DLC, and other gaming-related content. Additionally, it contains detailed information about games and add-ons. This project uses mainly the user-game recommendation data (likes, dislikes). However it would be possible to use the game and user data as additional custom node features to include in the graph.

The data was preprocessed with Pandas in order to create a DGLDataset class that represents the data as a graph with 27573556 nodes and 38354101 edges. The dataset is split into train, validation and test sets, with 10% of the edges for validation and another 10% for the test.

An equal number of negative examples were sampled to train the classifier that were split using the same percentage of edges for validation and test.

Results

Different embedding sizes were tested in order to overfit the model:

Train example 1

Train example 2

Train example 3

Train example 4

One additional SAGE layer was added to the model, but the overfitting was not achieved.

Train example 4

In all the train example we can see a proper decreasing loss for the train and validation sets, the higher AUC (0.9984655101101535) was achived by the last model.

ml-3a-project's People

Contributors

jhsrojasro avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.