Giter Site home page Giter Site logo

ictir23-cost-sensitive's Introduction

ictir23-cost-sensitive

This is our implementation and experimental data for the paper:

Roger Zhe Li, Julián Urbano, Alan Hanjalic (2023). Mitigating Mainstream Bias in Recommendation via Cost-sensitive Learning. In Proceedings of ICITR'23, Taipei, Taiwan, July 23, 2023.

Please cite our SIGIR'21 paper if you use our code and data. Thanks!

Author: Roger Zhe Li (https://www.zhe-li.me)

Environment Settings

We use PyTorch 1.6.0 as the main deep learning framework for implementation.
The debugging stage relies much on the Torchsnooper package.
The dataset processing and splitting stage is conducted with the dependency of PyLensKit.

Figures and related analysis in the paper are mainly implemented in plotnine, a Python package achieving the similar effects as ggplot2 in R.

File and Folder Structure

./cost_sensitive_func.ipynb: An indicator of the truncated normal distribution-based cost-sensitive weights, and the method in controlling the contrast of the importance of most and least non-mainstream users. Numbers calculated are further used in the core runs.

./analysis/movielens.py: processing movielens dataset;
./analysis/Amazon.py: processing Amazon dataset;
./analysis/beer.py: processing Amazon datasets.
./dataset_prep.py: script for train-val-test data splits on different conditions.

./dataset: Store the processed raw files, preprocessed tables, and the train-val-test split subsets of datasets;
./results: Store the aggregated results of all experiments including the effectiveness of cost-sensitive learning and explorations on the data split needed for reliable results. Aggregated from files in ./results_random which could be generated by running the source code. Each file is pointed to one dataset.

All other .py files are related to the experiments and result analysis. See below for a brief introduction.

./vanilla.py: source code for running Factorization Machines to get the prerequisites for caculating cost-sensitive weights;
./data_loader_random.py: data loader module for training/validation/testing;
./csl_random.py: Souce code for cost-sensitive training and validation;
./eval_random.py: model evaluation;
./corr_valid_test.py: correlation analysis for different numbers of items used for validation and testing;
./stat_line.py: visualization of correlation analysis mentioned above;
./data_aggregation.py put all experimental results together, one for each dataset;
./plot_csl.py: visualization of the artifacts of cost-sensitive strategies.

Example to run the code

The instruction of commands has been clearly stated in the code (see the parse_args function under ./util/parser.py). All random seeds are by defualt 8964.

Run example (after all data processing is done):

Step 1: get the vanilla results from FM

python3 vanilla.py --dataset movielens --train_low 5 --valid_low 5 --seed 8964

Steps 2: cost-sensitive learning:

python3 csl_random.py --dataset movielens --train_low 5 --valid_low 5 --seed 8964 --norm quantile --sigma_type 0

Dataset

We provide four processed datasets with three different train-test splits under the ./dataset folder. The processing methods are stated in the paper.

The original movielens dataset is available in this repo. For two Amazon datasets, the original version is too large to be available here. You can find them here. As per the request from the data providers, BeerAdvocate dataset is no longer public.

Cite

Please cite our ICTIR'21 paper if you use the code.

License

Last Update Date: June 27, 2023

ictir23-cost-sensitive's People

Contributors

roger-zhe-li avatar

Stargazers

Ramsey avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.