Giter Site home page Giter Site logo

k-anonml's Introduction

k-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers

This repository contains the Python code for applying different k-anonymisation algorithms, i.e., Optimal Lattice Anonymization (OLA), Mondrian, Top-Down Greedy Anonymisation (TDG), k-NN Clustering-Based (CB) Anonymisation, on datasets and measuring their effects on Machine Learning (ML) classifiers as presented in k-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers.

@misc{slijepčević2021kanonymity,
    title        = {$k$-Anonymity in Practice: How Generalisation and Suppression Affect Machine Learning Classifiers},
    author       = {Djordje Slijepčević and
                    Maximilian Henzl and
                    Lukas Daniel Klausner and
                    Tobias Dam and
                    Peter Kieseberg and
                    Matthias Zeppelzauer},
    year         = 2021,
    eprint       = {2102.04763},
    archiveprefix = {arXiv},
    primaryclass = {cs.LG}
}

Setup

In order to install the necessary requirements either use pipenv install or pip3 install -r requirements.txt. Then activate the virtual environment, e.g. with pipenv shell.

Code & Usage

The code is written in Python 3 and conducts following steps for each experiment:

  • read specified dataset
  • measure specified ML algorithm performance using original dataset
  • anonymise dataset with specified algorithm and current value k for k-anonymity
  • measure specified ML algorithm performance using anonymised dataset
  • repeat previous steps for other configured values of k

The parameters, i.e., dataset, ML algorithm, k-anonymisation algorithm, and k are defined via arguments as follows:

usage: baseline_with_repetitions.py [-h] [--start-k START_K] [--stop-k STOP_K] [--step-k STEP_K] [--debug] [--verbose] [{cmc,mgm,adult,cahousing}] [{rf,knn,svm,xgb}] {mondrian,ola,tdg,cb} ...

Anonymize data utilising different algorithms and analyse the effects of the anonymization on the data

positional arguments:
  {cmc,mgm,adult,cahousing}
                        the dataset used for anonymization
  {rf,knn,svm,xgb}      machine learning classifier
  {mondrian,ola,tdg,cb}
    mondrian            mondrian anonyization algorithm
    ola                 ola anonyization algorithm
    tdg                 tdg anonyization algorithm
    cb                  cb anonyization algorithm

optional arguments:
  -h, --help            show this help message and exit
  --start-k START_K     initial value for k of k-anonymity
  --stop-k STOP_K       last value for k of k-anonymity
  --step-k STEP_K       step for increasing k of k-anonymity
  --debug, -d           enable debugging
  --verbose, -v

The k-anonymisation algorithms "k-NN Clustering-Based Anonymisation", "Mondrian" and "Top-Down Greedy Anonymisation" located in the folders clustering_based, basic_mondrian and top_down_greedy are based on the open-source implementation of Qiyuan Gong.

The original reporitories can be found on github.com:

Our changes include the migration of Python 2 to Python 3, the option to leave non-QID attributes and the target variable non-anonymised, the ability to handle float numbers in datasets, removal and cleanup of files and code that were irrelevant to our project.

Data

The repository contains following locations for data:

  • datasets
    • contains all available datasets in separate folders
  • generalization/hierarchies
    • contains our defined generalization hierarchies per attribute and dataset
  • results
    • all computed results (anonymised datasets, ML performance, etc.) are stored inside a folder structure inside results for each experiment
  • paper_results
    • contains the results we used for analyses and plots in our paper
  • figures
    • contains the figures used in our paper

k-anonml's People

Contributors

fhstpmc avatar tobiasdam avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.