Giter Site home page Giter Site logo

upc_sdg's Introduction

Privacy-Preserving Synthetic Data Generation for Recommendation Systems

This is our implementation for the paper:

Fan Liu, Zhiyong Cheng, Huilin Chen, Yinwei Wei, Liqiang Nie, and Mohan Kankanhalli. 2022. Privacy-Preserving Synthetic Data Generation for Recommendation Systems. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). Association for Computing Machinery, New York, NY, USA, 1379–1389.

Please cite our SIGIR'22 paper if you use our codes. Thanks!

Updates

  • Update (November 17, 2022) This update shares our training dataset with public researchers. Please check the Dataset section.

  • Update (June 11, 2022) This update will integrate the function of manually generating user privacy settings(process_data.py) into the model. Now you can specify the user's privacy sensitivity by setting the parameter --privacy_ratio or --privacy_settings_json

Table of contents

  1. Requirement
  2. Dataset
  3. Usage
  4. Genereated train data
  5. Results

Enviroment Requirement

  1. Install via pip: pip install -r requirements.txt
  2. Create the empty folders, output and data.
  3. Download the train data from the Amazon Review Data and SNAP Page, details setting see Dataset section
  4. Prepare for pre-trained User/Item embedding weight from Google Ddrive and put them in ./code/embedding

Dataset

We provide three processed datasets: Office, Clothing and Gowalla. Besides, we also share our training dataset Google Ddrive with public researchers.

#Interactions #Users #Items #interactions sparsity
Office 4,874 2,405 52,957 99.55%
Clothing 18,209 17,317 150,889 99.95%
Gowalla 29,858 40,981 1,027,370 99.91%

-train.txt Train file. Each line is a user with her/his positive interactions with items: (userID and itemID) -test.txtTest file. Each line is a user with her/his several positive interactions with items: (userID and itemID) -user_privacy.json User's privacy setting. Each element is user's sensitivity of privacy guarantee for original items.

Note: IF you need to add other dataset, please consider below steps:

  1. Add additional dataset data into data folder, includes train.txt and test.txt
  2. Add new dataset name in ./code/world.py and ./code/register.py
  3. Extend dataloader.py file If you need

Run UPC-SDG Model

Run UPC-SDG model to generate new train data considering user privacy sensitivity, and different dataset parameters are shown below:

Run model on Office dataset:

 python -u ./code/main.py --decay=1e-1 --lr=0.001 --seed=2022 --dataset="Office" --topks="[20]" --recdim=64 --bpr_batch=2048 --load=1 --replace_ratio=0.2 --privacy_ratio=0.1 --bpr_loss_d=1 --similarity_loss_d=3

run model on Clothing dataset:

python -u ./code/main.py --decay=1e-1 --lr=0.001 --seed=2022 --dataset="Clothing" --topks="[20]" --recdim=64 --bpr_batch=2048 --load=1 --replace_ratio=0.2 --privacy_ratio=0.1 --bpr_loss_d=1 --similarity_loss_d=3

run model on Gowalla dataset:

python -u ./code/main.py --decay=1e-3 --lr=0.001 --seed=2022 --dataset="gowalla" --topks="[20]" --recdim=64 --bpr_batch=2048 --load=1 --replace_ratio=0.2 --privacy_ratio=0.1 --bpr_loss_d=1 --similarity_loss_d=3

Extend:Set user privacy settings(Optional)

If you need to load the special user's privacy settings, you can set path parameter into run command. (e.g. --privacy_settings_json='./data/privacy_example.json'

Besides, we provide process_data.py to generate user_privacy.json file into dataset folder:

python process_data.py --data_path="Office" --privacy_ration=0.7

Note:

  • data_path: the path for the train data folder, which include train.txt/text.txt files.
  • privacy_ration: is defined as privacy sensitivity for the original item, limit is (0,1), (e.g. for validating, we used 0.1, 0.3, 0.5, 0.7, 0.9 respectively in our paper).

Evaluate model effectiveness

When the training process of UPC-SDG model is finished, the model will output the new train data into output folder, name format is {dataset name}-replace{replace ratio}-{output prefix}.txt, then the new train file and original test file as privacy guarantee dataset input into other recommendation system to evaluate (e.g. BPRMF, NeuMF and LightGCN ).

You can find the generated train data used to evaluate in the output folder, or generate a new file according to needed

Genereated train data

For convenient, we provided the genereated train data used in our paper. you can get them from Google Ddrive and put them into other recommendation system to evaluate.

Results

All metrics is under top-20

results

upc_sdg's People

Contributors

huilinchenjn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.