Giter Site home page Giter Site logo

kiprotect / dwork Goto Github PK

View Code? Open in Web Editor NEW
7.0 4.0 7.0 210 KB

A Python library for anonymous data science and analytics that uses differential privacy (DP) and similar privacy-enhancing technologies to protect personal or sensitive data.

License: Other

Makefile 2.14% Python 97.86%
differential-privacy privacy privacy-enhancing-technologies privacy-tools python pandas

dwork's Introduction

Attention: This is still an early version just for testing, do not use it in production yet.

Dwork

Dwork is a Python toolkit for anonymous data science & analytics. It leverages modern privacy concepts like differential privacy (DP) to anonymize sensitive and personal data on the fly while making it (hopefully) easy to work with this data. Dwork can be freely used, it is released under the BSD-3 license.

Installing

You can install the latest Dwork version using pip:

pip install dwork

Alternatively, you can download this repository and install Dwork directly from the main directory of the repository:

pip install .

Quick Example

To learn more about how to use Dwork for anonymous data science, please have a look at the examples directory and our documentation (coming soon). As a start, here's a quick and simple example:

We first load a CSV file using Dworks' pandas interface:

from dwork.dataset.pandas import PandasDataset
from dwork.dataschema import DataSchema
from dwork.ast.types import Integer

class AbsenteeismSchema(DataSchema):
    Weight = Integer(min=0, max=200)
    Height = Integer(min=0, max=200)

filename = f"absenteeism_at_work.csv"
df = pd.read_csv(filename, sep=";")
ds = PandasDataset(AbsenteeismSchema, df)

Here we have also defined a schema for the dataset, which is necessary to tell Dwork about the data types and their ranges. Dwork uses that information to e.g. calculate sensitivities and apply proper random noise to the results of our analyses.

The loaded dataset can then be used almost like a normal dataframe. For example, the expression

(ds["Weight"].sum()/ds.len()).dp(0.5)

returns the mean weight of our dataset using a differentially private (DP) mechanism with a privacy factor epsilon=0.5. Here, Dwork automatically calculates the sensitivity and the required amount of noise that needs to be added to the result of the calculation in order to achieve 0.5-DP. Neat, isn't it?

Right now Dwork already supports basic operations like addition, multiplication and division on types like integers and floats. It can also perform basic aggregation operations on arrays of these values, like calculating sums or lengths. In addition, Dworks' internal semantics and type system make it easy to add new data types and expressions.

Information For Developers

If you want to work on Dwork itself, you can install the package in development mode, which will not copy files but instead link them to your virtual environment so that you can edit them and see changes immediately:

pip install -e .

If you want to run tests, please also install test dependencies and a virtual environment:

make setup

The following sections are only relevant for developers of Dwork, if you are a user you can disregard them.

Running tests

Dwork comes with automated code formatting via black, static type analysis via mypy and testing via py.test / unittest. You can run all of the above with a single make command:

make

To only run tests, simply run

make test

You can also pass arguments to py.test via the testargs parameter:

make test testargs="-x -k TestDatapoints"

Upgrading packages

You can use the fabulous pur tool to upgrade packages in the requirements files:

# will update normal requirements
pur -v -r requirements.txt
# will update test requirements
pur -v -r requirements-test.txt

Building Wheels

We install all packages from local wheels if possible (for security reasons), to generate these wheels simply use the following commands:

pip wheel --wheel-dir wheels -r requirements.txt
pip wheel --wheel-dir wheels -r requirements-test.txt

Making a New Release

To release a new version of Dwork, follow these steps:

  • Make sure all tests pass for the new release.

  • Update setup.py with the new version number. We follow the semantic versioning standard for our version numbers.

  • Add a changelog entry in the README.md.

  • Commit the updated setup.py and README.md files to the repository.

  • Create a new tag with the version number (which is required for CI integration):

    git tag -a v0.1.4 -m "v0.1.4"
    
  • Push the tag to the main repository together with the commit

    git push origin master --tags
    
  • Gitlab/Travis will pick up the version tag and make the release for us.

  • Alternatively, you can create the distribution packages using setup.py:

    python setup.py sdist bdist_wheel
    
  • You can also manually publish the packages to PyPi via Twine (not recommended):

    twine upload dist/*
    

dwork's People

Contributors

adewes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.