Giter Site home page Giter Site logo

mushdog / thedatasetsdilemma Goto Github PK

View Code? Open in Web Editor NEW

This project forked from almightygosu/thedatasetsdilemma

0.0 0.0 0.0 33.09 MB

Code for our WSDM 2022 paper titled "The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?"

License: GNU General Public License v3.0

Shell 1.03% Python 98.97%

thedatasetsdilemma's Introduction

Code for the WSDM 2022 paper

The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?


Hello! :)

This repository contains the source code, as well as other useful information, for the paper "The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?" in WSDM 2022.

The paper is available here: Paper (Best Paper Award Runner-up)

For a quick overview of the paper, you can refer to these slides: The Datasets Dilemma Slides

Reference

Please consider citing our work if you find it useful, thank you!

@inproceedings{10.1145/3488560.3498519,
  author = {Chin, Jin Yao and Chen, Yile and Cong, Gao},
  title = {The Datasets Dilemma: How Much Do We Really Know About Recommendation Datasets?},
  year = {2022},
  isbn = {9781450391320},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3488560.3498519},
  doi = {10.1145/3488560.3498519},
  booktitle = {Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining},
  pages = {141โ€“149},
  numpages = {9},
  keywords = {datasets, item recommendation, evaluation, data characteristics},
  location = {Virtual Event, AZ, USA},
  series = {WSDM '22}
}

Outline

In our paper, we try to address the "datasets dilemma" using 3 main steps.

  1. How are different datasets being utilised in recent papers?
    • Are there any patterns?
    • Code can be found in the ./Step 1/ folder (Please refer to its README file)
  2. What are the similarities as well as differences between various datasets?
    • Can we define them using objective measures?
    • Code can be found in the ./Step 2/ folder (Please refer to its README file)
  3. If the choice of datasets used could influence the observations and/or conclusions obtained
    • Empirical study using a variety of item recommendation algorithms
    • Code can be found in the ./Step 3/ folder (Please refer to its README file)

The ./Datasets/ folder

Environment Setup

  1. Python 3.6.8
  2. PyTorch 1.4.0
  3. Tensorflow 2.3.0
  4. numpy 1.17.2
  5. pandas 0.25.3
  6. matplotlib 3.3.2
  7. scikit-learn 0.23.2
  8. scipy 1.3.0
  9. scikit-optimize 0.8.1
  10. mlxtend 0.18.0 (for frequent itemset mining)
  11. implicit 0.4.4 (for Weighted Matrix Factorization (WMF))

Analyses & experiments were conducted on a Ubuntu server with version 16.04.6 LTS, and conda 4.8.4.

thedatasetsdilemma's People

Contributors

almightygosu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.