dingfanchen / gs-wgan Goto Github PK

Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

License: MIT License

Python 85.92% Shell 1.10% Jupyter Notebook 12.98%

gs-wgan's People

Contributors

Stargazers

Watchers

gs-wgan's Issues

Suggestion on Hyper-parameter Tuning given low eps (eps <= 1)

Hi,

Thanks a lot for open-sourcing the code.

I am wondering if you have any advice on how to select the proper hyper-parameter given low eps, especially when eps <= 1.

Thanks :)

subsampling rate and training iterations

Hi @DingfanChen:

Hope you are doing well. I wonder why calculate this privacy budget this way or not like dp-sgd.
The variables that differ from dp-sgd are prob= 1. / config['num_discriminators'] and coeff=n_steps * batch_size. ^_^

delta = 1e-5
batch_size = config['batchsize']
prob = 1. / config['num_discriminators']  # subsampling rate
n_steps = config['iterations']  # training iterations
sigma = config['noise_multiplier']  # noise scale
func = lambda x: rdp_bank.RDP_gaussian({'sigma': sigma}, x)

acct = rdp_acct.anaRDPacct()
acct.compose_subsampled_mechanism(func, prob, coeff=n_steps * batch_size)
epsilon = acct.get_eps(delta)
print("Privacy cost is: epsilon={}, delta={}".format(epsilon, delta))

How to reproduce federated results?

Hi, I am currently doing my bachelor's thesis and I want to reproduce the results from your paper in the federated setting. Do you happen to have code available or steps to be able to reproduce the findings for the federated setting with EMNIST?

Poor Generator Performance

Hi. I've tried pre-training 1000 Discriminators for 2K iters each (as per the subsampling rate mentioned in the appendix). The metrics (G_cost, D_cost,Wasserstein) are in the [-0.5, 2.4 ish] interval, however the corresponding stats for the Generator that loads and uses these, are much higher (~13 or 11 for each). The metrics don't improve at all (when training for 20K iters in main.py) and the final images are all just blank.
I am using the default noise-multiplier, and default architectures. Am I missing something here?

Question about how the hook affect the gradient pass to Generator

Hi Dingfan, Thank you for your work!
I'm a postgraduate student in Beihang university.Recently I've read your paper and tried to find out how the Mechanism affect the gradient in the backward process.
In your code ( source/main.py line294) , you defined the dynamic_hook_function = dp_conv_hook, which means you changed the dummy_hook to dp_conv_hook to let the DP Mechanism( clip-gradient and add noise) work.
However, I noticed that in line301-302 p.requires_grad = False, you actually set the netD parameters fixed and it seems the dp_conv_hook will not modify the gradients in the backward process, so I wander how could the hook take effect？Or what should I do to let the dp_conv_hook work?
Thank you !

About evaluation

Dear author, I couldn‘t find the evaluation code of inception-score(IS)，FID, would you mind sharing it in your project?

about the environment

Dear Author,
Does the GPU environment must be RTX8000? Does the problem of GPU environment solve?

Have you tested the case of using multiple GPUs?

Hi, I have tried to train the generator using 4 GPUs. I used your pretrained D and using the defualt configuration only with different number of GPUs. The finally generated images are terrible. So, have you tested the case of using multiple GPUs? Should I change some configurations, e.g., noise scale and iterations?

Apply to custom images

Hi @DingfanChen:

Hope you are doing well. I wonder if this can be applied to the custom images? Is changing the dataloader enough in such case.

Questions for the gradient clipping operation

Dear author,

I have a question about the gradient clipping operation in your code.

From my point of view, the operation of gradient-clipping is to scale the large gradients to CLIP_BOUND, and for the gradients smaller than CLIP_BOUND, it should not be scaled.

But in your code,

GS-WGAN/source/main.py

Lines 72 to 74 in 5f33f21

    
           clip_coef = clip_bound_ / (grad_input_norm + 1e-10) 
        
           clip_coef = clip_coef.unsqueeze(-1) 
        
           grad_wrt_image = clip_coef * grad_wrt_image

it seems we change all the gradients (even for the gradients whose norm is smaller than the CLIP_BOUND), it seems we need to add the following code

clip_coef = clip_coef.clamp(max=1.0)

Is this the right implementation? Or am I missing something?

Question about the privacy cost calculation

Hi Dingfan! Sorry to bother you again.

When I'm evaluating the privacy cost of the GS-WGAN code from GS-WGAN/evaluation/privacy_analysis.py:

def main(config):
    delta = 1e-5
    batch_size = config['batchsize']
    prob = 1. / config['num_discriminators']  # subsampling rate
    n_steps = config['iterations']  # training iterations
    sigma = config['noise_multiplier']  # noise scale
    func = lambda x: rdp_bank.RDP_gaussian({'sigma': sigma}, x)

    acct = rdp_acct.anaRDPacct()
    acct.compose_subsampled_mechanism(func, prob, coeff=n_steps * batch_size)
    epsilon = acct.get_eps(delta)
    print("Privacy cost is: epsilon={}, delta={}".format(epsilon, delta))

It seems that the final privacy cost is about the calculation of parameter delta, batch_size, num_discriminators, iterations, and noise_multiplier, but in the dp_conv_hook module in main.py:

def dp_conv_hook(module, grad_input, grad_output):
    '''
    gradient modification + noise hook
    :param module:
    :param grad_input:
    :param grad_output:
    :return:
    '''
    global noise_multiplier
    ### get grad wrt. input (image)
    grad_wrt_image = grad_input[0]
    grad_input_shape = grad_wrt_image.size()
    batchsize = grad_input_shape[0]
    clip_bound_ = CLIP_BOUND / batchsize

    grad_wrt_image = grad_wrt_image.view(batchsize, -1)
    grad_input_norm = torch.norm(grad_wrt_image, p=2, dim=1)

    ### clip
    clip_coef = clip_bound_ / (grad_input_norm + 1e-10)
    clip_coef = torch.min(clip_coef, torch.ones_like(clip_coef))
    clip_coef = clip_coef.unsqueeze(-1)
    grad_wrt_image = clip_coef * grad_wrt_image

    ### add noise
    noise = clip_bound_ * noise_multiplier * SENSITIVITY * torch.randn_like(grad_wrt_image)
    grad_wrt_image = grad_wrt_image + noise
    grad_input_new = [grad_wrt_image.view(grad_input_shape)]
    for i in range(len(grad_input) - 1):
        grad_input_new.append(grad_input[i + 1])
    return tuple(grad_input_new)

It seems that there are also hyperparameters like CLIP_BOUND, and SENSITIVITY associated with the backward loss, so I'm confused about 2 questions as follows:

when calculating privacy cost, why are the hyperparams like CLIP_BOUND,SENSITIVITY not included in the privacy_analysis function?
why should we define the variable prob in privacy_analysis.py's main function as 1/config['num_discriminators'] ? Does this mean that privacy problems may occur when num_discriminators is small?

Thank you very much for reading the above, and I'm looking forward to your reply if you got time to answer this.

The meaning of 'GP'

Hi Dingfan,

I am not very clear on the 67th line in main.py. Why CLIP_BOUND is divided by batch size? And what's the full expression of 'GP'?

Thank you!

	clip_coef = clip_bound_ / (grad_input_norm + 1e-10)
	clip_coef = clip_coef.unsqueeze(-1)
	grad_wrt_image = clip_coef * grad_wrt_image

dingfanchen / gs-wgan Goto Github PK

gs-wgan's People

Contributors

Stargazers

Watchers

Forkers

gs-wgan's Issues

Recommend Projects

Recommend Topics

Recommend Org