yuxiangw / autodp Goto Github PK

View Code? Open in Web Editor NEW

256.0 256.0 52.0 2.78 MB

autodp: A flexible and easy-to-use package for differential privacy

License: Apache License 2.0

Python 24.71% Jupyter Notebook 75.14% Starlark 0.15%

autodp's People

Contributors

Stargazers

Watchers

Forkers

tdiethe rsindper keyianpai dayu11 stjordanis sarafudheen dadelani jding0 enayatullah weiningzhang jeremy43 phillipwangaust kritikalcoder wh-forker milkigit maidousj samellem alessio-proietti hongxin001 tom-cat-god alabid adam-dziedzic iamtrask liuhaolinwen 1540367751 fenghz erchiw yifeim yuzhengcuhk rredberg jiahuigeng timo9madrid7 sukanya-rs fairycloudsi aiefordream wgzhsh mary-python isabella232 xiyueyiwan series-of-sign leoyoungbkit stoianmihail sisaman w-yd bjiang518 steveli88 jean-yifan-sun noploop peikalunci sprintml

autodp's Issues

issue with "privacy_calibrator.subsample_epsdelta_inverse(eps,delta,prob=gamma)"

Thanks for making this tool available for DP research, I appreciate the great work.

I was going through your tutorial on privacy calibrator (section 4) https://github.com/yuxiangw/autodp/blob/master/tutorials/tutorial_privacy_calibrator.ipynb

Not sure if the function "privacy_calibrator.subsample_epsdelta_inverse(eps, delta, gamma) is giving the right answer. For example

eps = 1
delta = 1e-6
gamma = 0.01

First, apply subsampling lemma to calibrate the basic privacy needed

eps0,delta0 = privacy_calibrator.subsample_epsdelta_inverse(eps,delta,prob=gamma)

Then we can get the amount of noise needed from the base mechanism

print((eps0,delta0))
params = privacy_calibrator.gaussian_mech(eps0,delta0)
print(f'Gaussian: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])

It gives the answer

Gaussian: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level sigma= 0.9366237019634324

However, I was expecting sigma = 1.258483615711703
similar to the result when we try

params = privacy_calibrator.gaussian_mech(eps,delta,prob=gamma)
print(f'Gaussian: eps,delta,gamma = ({eps},{delta},{gamma}) ==> Noise level sigma=',params['sigma'])

Gaussian: eps,delta,gamma = (1,1e-06,0.01) ==> Noise level sigma= 1.258483615711703

`bounds` cannot be used together with `method=Brent` in latest version of scipy (>= v1.10.1)

SciPy (>= v1.10.1) will complain about this line

autodp/autodp/converter.py

Line 95 in 5fad5e1

    
           results = minimize_scalar(fun, method='Brent', bracket=(1, 2), bounds=[1, 100000])

because it now does not support using Brent when a bound is given (scipy source)

if bounds is not None and meth in {'brent', 'golden'}:
    message = f"Use of `bounds` is incompatible with 'method={method}'."
    raise ValueError(message)

Can switch to method='Bounded' to bypass this issue.

About the bisection method used for converting RDP to approximate DP

Thanks for the great work!
Not sure if I should submit the issue here, but let me just do it anyway.

In the paper, it is suggested that one should use bisection to solve Equation 2 efficiently, inferring that the sum of a monotonically increasing function and a monotonically decreasing function is quasi-convex/unimodal (Corollary 38).
This however does not seem to be correct as the sum of these functions is not always quasi-convex/unimodal. See this example.

Therefore, it seems to me that one could not use bisection to convert RDP to approximate DP to arbitrary precision since the optimization is not quasi-convex/unimodal?

huge fan of this work

Not listing a problem - just saying that I think this library is extremely cool and I'm very glad you've taken the time to make it.

Issue with SSP_scale and AdaSSP_scale inheritance

I was running the tutorial_AdaSSP_vs_noisyGD.ipynb Tutorial Notebook on Google Colab. I encountered the following issue while running the 4th Cell Block of the notebook:

AttributeError: 'SSP_scale' object has no attribute 'set_all_representation'.

The expanded error is as follows:

Kindly have a look at the earliest @yuxiangw. Thanks in advance!

Can PATE be used in knowledge distillation to calculate privacy budgets?

Can PATE be used in knowledge distillation to calculate privacy budgets?
if temperature is too high, can we use the pate?

Slow privacy calibration

noise calibration takes a very long time, and doesn't return a result after 22 minutes(at least when prob < 1 and eps is small) --- any fix for this?

%time ans = privacy_calibrator.gaussian_mech(0.1, 1e-9, k=128, prob=0.1)

/usr/local/lib/python3.6/dist-packages/autodp/utils.py:21: RuntimeWarning: divide by zero encountered in log
  mag = y + np.log(1 - np.exp(x - y))
/usr/local/lib/python3.6/dist-packages/autodp/utils.py:24: RuntimeWarning: divide by zero encountered in log
  mag = x + np.log(1 - np.exp(y - x))
CPU times: user 22min 9s, sys: 2.31 s, total: 22min 11s
Wall time: 22min 12s

difference between eps using this method and abadi

Using the implementation of Abadi et al computes smaller eps compare to this method. I would appreciate your opinion about it. Is their method tighter ?

https://github.com/tensorflow/privacy/tree/master/tutorials

Amplification with sampling without replacments is throwing following error.

Hi everyone,

When doing gaussian mechanism amplification by sampling without replacements it is throwing AssertionError: mechanism's add-remove notion of DP is incompatible with Privacy Amplification by subsampling without replacements. Here is the code snippet to reproduce the error. Is there anything that I am doing wrong ?

subsample = transformer_zoo.AmplificationBySampling(PoissonSampling=False)
mech = mechanism_zoo.GaussianMechanism(sigma=0.1)
prob = 0.1

SubsampledGaussian_mech = subsample(mech,prob,improved_bound_flag=True)

An issue when I installed "autodp": Preparing metadata (setup.py) ... error

The following issue occurred when I installed “autodp” by "pip install autodp" and I'm not sure how to solve it.

Collecting autodp
Using cached autodp-0.2.3.1.tar.gz (56 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-c9_gcmpt\autodp_184d6ab919d64a7f98792f3b252bbe16\setup.py", line 9, in
long_description = f.read()
UnicodeDecodeError: 'gbk' codec can't decode byte 0x9a in position 3594: illegal multibyte sequence
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

the noise is much greater than the probability. How should I use it correctly?

I want to add noise to the probability distribution of words, but it seems that the pate framework provided is for counting tasks. If I use it directly, the noise will be much greater than the probability. How should I use it correctly? If I try to reduce Sigma to 0.5, the calculated eps will be very large.

Pure Fdp gaussian mechanism doesn't work under composition of multiple rounds

Computing the get_fDP(delta) for a gaussian mechanism with pure Fdp works fine, but trying to compose the pure-fdp gaussian mechanism for several rounds, the function get_approxDP(delta) always returns inf as the result of composition.

Fdp seems not to work under Composition or AmplificationBySampling

def compute_amplified_fl_privacy(num_rounds=60, noise_multiplier=20, num_users=500, users_per_round=100):
    gm1 = GaussianMechanism(sigma=noise_multiplier, RDP_off=True, approxDP_off=True, fdp_off=False)

    compose = Composition()
    num_rounds = [num_rounds]
    q = users_per_round / num_users
    delta = num_users ** (-1)

    composed_fdp = compose([gm1], num_rounds)

    composed_fdp_eps = composed_fdp.get_fDP(delta)
    composed_fdp_approxdp = composed_fdp.get_approxDP(delta)

    mechanism_fdp_eps = gm1.get_fDP(delta)
    mechanism_fdp_approxdp = gm1.get_approxDP(delta)
    print('---------------------------------------------------')
    print('composed fdp eps = ', composed_fdp_eps, ', at delta = ', delta)
    print('composed fdp eps_approxdp = ', composed_fdp_approxdp, ', at delta = ', delta)

    print('mechanism fdp eps = ', mechanism_fdp_eps, ', at delta = ', delta)
    print('mechanism fdp approxdp = ', mechanism_fdp_approxdp, ', at delta = ', delta)


def main():
    compute_amplified_fl_privacy(num_rounds=60, noise_multiplier=20, num_users=500, users_per_round=100)


if __name__ == '__main__':
    main()
    print('DONE')

Can't install in GBK locale

pip install autodp will fail like below. The setup script should specify an encoding in open(...).

C:\Users\xxx>pip install autodp
Looking in indexes: https://mirror.baidu.com/pypi/simple
Collecting autodp
  Using cached https://mirror.baidu.com/pypi/packages/78/7c/63aa6d37b9d9f0f68d1231e1b3247c3ac83c634f451f8bcbd9a5c7a55db0/autodp-0.2.tar.gz (39 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\xxx\AppData\Local\Temp\pip-install-l8spogwd\autodp_e20c1a3119ab4b0c8d685149702e4657\setup.py", line 6, in <module>
          long_description = f.read()
      UnicodeDecodeError: 'gbk' codec can't decode byte 0x9a in position 3594: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
WARNING: You are using pip version 22.0.4; however, version 22.1.2 is available.
You should consider upgrading via the 'd:\pysyft\Scripts\python.exe -m pip install --upgrade pip' command.

Composing different mechanism with different sensitivies

@yuxiangw Great talk at MIT!. I have a question regarding composing difference mechanisms with different rounds and with different sensitivities. Assume that am we are composing in following way

Gaussian mechanism with sensitivity L1 and with noise level sigma1 for T1 rounds
Subsampled gaussian mechanism with sensitivity L2 and noise level Sigma2 with sampling rate Gamma for T2 rounds

Then to get epsilon for delta = 1e-6, this is right way to pass parameter configs right

class TestMech(Mechanism):
    def __init__(self, params, name="TestMech"):
        Mechanism.__init__(self)
        subsample = AmplificationBySampling(PoissonSampling=False)
        mech1 = GaussianMechanism(sigma=params["sigma1"] )
        mech2 = GaussianMechanism(sigma=params["sigma2"] )
        mech2.neighboring = "replace_one"
        submech2 = subsample(mech2, params["prob"], improved_bound_flag=True)
        compose = Composition()
        mech = compose([mech1, submech2], [params["T1"] , params["T2"]])
        rdp_total = mech.RenyiDP
        self.propagate_updates(rdp_total, type_of_update="RDP")

params = {}
params["sigma1"] = sigma1/(L1)  # This is correct right ?
params["sigma2"] = sigma2/(L2)  # This is correct right ?
params["T1"] = T1
params["T2"] = T2
mech = TestMech(params)
mech.get_approxDP(delta=1e-6)

My main question is about scaling of sigma parameters params["sigma1"] = sigma1/(L1) and params["sigma2"] = sigma2/(L2), as far as I can understand this seems necessary right? Thanks!

documentation question

Hi,

thanks a lot for this work. It is very helpful.

Just one quick issue : what is the role of coeff in the compose_subsampled_mechanism?

thanks a lot

Does autodp support arbitrary group size?

Hi! I am wondering how we should use auto-dp when the adjacent datasets differ by more than one data point. I noticed there is a parameter called group_size when initializing the Mechanism, but I cannot find any other usage of this parameter. Is it left on purpose for future use, or am I missing something here?

For now, I am manually increasing my noise scale sqrt(n) times if the adjacent datasets differ by n points, but I would really appreciate any advice on how to achieve this goal in a smarter way. Thanks!

AFA of composition of subsampled Laplace Mechanism breaks down

Following the tutorial here, I tried to compute optimal accounting of composition of subsampled Gaussian and Laplace Mechanisms:

from autodp.mechanism_zoo import GaussianMechanism, LaplaceMechanism
from autodp.transformer_zoo import ComposeAFA
from autodp.transformer_zoo import AmplificationBySampling_pld

sigma = 1.0
b = 1.0
delta = 1e-5
prob=.1

gm1 = GaussianMechanism(sigma, phi_off=False, name='phi_GM1')
lm1 = LaplaceMechanism(b, phi_off=False, name='phi_LM1')


transformer_remove_only = AmplificationBySampling_pld(PoissonSampling=True, neighboring='remove_only')
transformer_add_only = AmplificationBySampling_pld(PoissonSampling=True, neighboring='add_only')
sample_gau_remove_only =transformer_remove_only(gm1, prob)
sample_lap_remove_only =transformer_remove_only(lm1, prob)
compose_gm = ComposeAFA()
compose_lm = ComposeAFA()
composed_gm_afa = compose_gm([sample_gau_remove_only], [10])
composed_lm_afa = compose_lm([sample_lap_remove_only], [10])

eps_gm_afa = composed_gm_afa.get_approxDP(delta)
eps_lm_afa = composed_lm_afa.get_approxDP(delta)

The Gaussian proceeds normally. The Laplace breaks down with the following error:

  File "AUTODPHOME/gmvslm.py", line 25, in <module>
    eps_lm_afa = composed_lm_afa.get_approxDP(delta)
  File "AUTODPHOME/autodp/autodp_core.py", line 113, in get_approxDP
    return self.approxDP(delta)
  File "AUTODPHOME/autodp/converter.py", line 1118, in min_f1_f2
    return np.minimum(f1(x), f2(x))
  File "AUTODPHOME/autodp/converter.py", line 824, in approxdp
    t = exp_eps(1 - delta)
  File "AUTODPHOME/autodp/converter.py", line 1080, in inv_f
    results = minimize_scalar(normal_equation, bounds=bounds, bracket=[1,2], tol=tol)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_minimize.py", line 879, in minimize_scalar
    return _minimize_scalar_brent(fun, bracket, args, **options)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2511, in _minimize_scalar_brent
    brent.optimize()
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2281, in optimize
    xa, xb, xc, fa, fb, fc, funcalls = self.get_bracket_info()
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2257, in get_bracket_info
    xa, xb, xc, fa, fb, fc, funcalls = bracket(func, xa=brack[0],
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/optimize/_optimize.py", line 2765, in bracket
    fa = func(*(xa,) + args)
  File "AUTODPHOME/autodp/converter.py", line 1077, in normal_equation
    return abs(fun(x))
  File "AUTODPHOME/autodp/converter.py", line 1073, in fun
    return f(x) - y
  File "AUTODPHOME/autodp/converter.py", line 818, in trade_off
    result = cdf_p(log_e) + x*cdf_q(-log_e)
  File "AUTODPHOME/autodp/autodp_core.py", line 324, in <lambda>
    cdf_p2q = lambda x: converter.phi_to_cdf(log_phi_p2q, x, n_quad = n_quad)
  File "AUTODPHOME/autodp/converter.py", line 924, in phi_to_cdf
    res = integrate.fixed_quad(inte_f, -1.0, 1.0, n =n_quad)
  File "AUTODPHOME/venv/lib/python3.8/site-packages/scipy/integrate/_quadrature.py", line 151, in fixed_quad
    return (b-a)/2.0 * np.sum(w*func(y, *args), axis=-1), None
  File "AUTODPHOME/autodp/converter.py", line 923, in <lambda>
    inte_f = lambda t: qua(t) * (1 + t ** 2) / ((1 - t ** 2) ** 2)
  File "AUTODPHOME/autodp/converter.py", line 919, in qua
    phi_result = [log_phi(x) for x in new_t]
  File "AUTODPHOME/autodp/converter.py", line 919, in <listcomp>
    phi_result = [log_phi(x) for x in new_t]
  File "AUTODPHOME/autodp/transformer_zoo.py", line 111, in new_log_phi_p2q
    return sum([c * mech.log_phi_p2q(x) for (mech, c) in zip(mechanism_list, coeff_list)])
  File "AUTODPHOME/autodp/transformer_zoo.py", line 111, in <listcomp>
    return sum([c * mech.log_phi_p2q(x) for (mech, c) in zip(mechanism_list, coeff_list)])
TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

Update Installation Instructions

The latest version on PyPi is pretty outdated, so the pip install is going to leave folks with issues that have since been fixed in the code.

We should either update the PyPi version (i.e. do a v0.3 release) or if development is ongoing, update the install instructions to use `python setup.py install'.

Looseness in analytic Gaussian mechanism?

Here's a minimal example to demonstrate the issue:

from autodp import privacy_calibrator, dp_bank
import numpy as np

sigma = privacy_calibrator.ana_gaussian_mech(1.0, 1e-6)['sigma']
delta = np.exp(dp_bank.get_logdelta_ana_gaussian(1.0, sigma))

1.901276833828726e-05

I expect the delta = 1e-6, but it is nearly 20X larger according to DP bank.