Giter Site home page Giter Site logo

stefanrmmr / differentially_private_synthetic_data Goto Github PK

View Code? Open in Web Editor NEW
11.0 1.0 2.0 5.36 MB

Differentially Private Synthetic Data Generation [DP-SDG] - Experimental Setups & Knowledge Base - WORK IN PROGRESS

Jupyter Notebook 100.00%
synthetic-data synthetic-dataset-generation data-anonymization data-analysis data-anonymity sensitive-data-security quasi-identifiers privacy-enhancing-technologies privacy privacy-preserving-machine-learning

differentially_private_synthetic_data's Introduction

Experimental Implementation of DP-WGAN
Differentially Private Synthetic Data Generation

For Continuous Data with binary Targets using the Differentially Private Wasserstein GAN

  1. DP-WGAN Synthetic Data for "Health care: Heart attack possibility" Kaggle Dataset --> view Notebook
  2. DP-WGAN Synthetic Data for "BankNote Authentication UCI" Kaggle Dataset --> view Notebook


Metrics achieved for DP-WGAN on the Heart Disease Dataset


synthdata_sc1

*after multiple attempts using normalized input data, epsilon = approx 3.4 and delta = 1e-5

Process Steps & Key Concepts

  • The data needs to be in csv format and has to be partitioned as train and test before feeding it to the models.
  • Missing values are not supported and needs to replaced appropriately by the user before usage.
  • In case the data has continuous and categorical attributes, it needs to be pre-processed
    (discretization for continuous values/ encoding for categorical attr.)

  • The generative GAN-based ML models are trained using the training dataset.
  • The generative model is used to create a synthetic version of the train dataset
  • To compensate for irregularities multiple GAN-Generator models are trained
  • To compensate for irregularities multiple synthetic datasets are generated,
    the optimal best-performing dataset that yields the max AUC is selected

  • Logistic Regression Classifiers are trained using the real data, as well as, the synthetically generated dataset
  • Both classifiers are evaluated regarding performance on the left-out real test dataset (preserved for evaluation)
  • Relevant Metrics (mainly AUC) and visualizations of correlation-matrices of synthetic datasets were generated

Acknowledgements & Sources

Major parts of this summary notebook were extracted from this BOREALIS Private Data Generation Github repository by BorealisAI. Note that, this Jupyter notebook covers only one (DP-WGAN) of various possible datasets and generative models for differentially private synthetic data generation. The aforementioned analysis aproaches have yielded the following results as extracted from the original notebook. For more information rearding differential privacy specific privacy arguments Delta & Epsylon please refer to this info-page by Microsoft

differentially_private_synthetic_data's People

Contributors

stefanrmmr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.