avivsham / pfedhn Goto Github PK

View Code? Open in Web Editor NEW

177.0 5.0 31.0 354 KB

Official code implementation for "Personalized Federated Learning using Hypernetworks" [ICML 2021]

Python 100.00%

pfedhn's Introduction

Personalized Federated Learning using Hypernetworks [ICML 2021]

This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [Link]

Installation

Create a virtual environment with conda/virtualenv
Clone the repo
Run: cd <PATH_TO_THE_CLONED_REPO>
Run: pip install -e . to install necessary packages and path links.

Reproduce Paper Results

PfedHN Results on CIFAR10

Run: cd experiments/pfedhn
Run: python trainer.py

PfedHN-PC Results on CIFAR10

Run: cd experiments/pfedhn_pc
Run: python trainer.py

Citation

If you find pFedHN to be useful in your own research, please consider citing the following paper:

@inproceedings{shamsian2021personalized,
  title={Personalized federated learning using hypernetworks},
  author={Shamsian, Aviv and Navon, Aviv and Fetaya, Ethan and Chechik, Gal},
  booktitle={International Conference on Machine Learning},
  pages={9489--9502},
  year={2021},
  organization={PMLR}
}

pfedhn's People

Contributors

Stargazers

Watchers

pfedhn's Issues

Question about gradient calculation

the

About some parameters?

Please,I read the sources some times.But,How is an epoch counted? The eval_every parameter is the epoch?I am the new to pFedHN.Please,thank you.

The experiments about model size

Hi,
I wanna test the results of model size under 75 nodes setting in CIFAR-100, but it

  assert (classes_per_user * num_users) % num_classes == 0, "equal classes appearance is needed"
AssertionError: equal classes appearance is needed

If I ignore this assert, it would appear another error:

    class_partitions['prob'].append([class_dict[i]['prob'].pop() for i in c])
IndexError: pop from empty list

Anything is okay in other node number or in CIFAR-10. Could u plz tell me how to fix it, thx a lot.

Algorithm question

Thank you for sharing the source code, it is extremely useful and straightforward. This paper will be one of the main references for my future work.

I have one question regarding the algorithm: is there any specific reason for choosing only one user per iteration? I guess one reason is to reduce the computational cost, I would appreciate if you could share your thoughts on my question.

Thank you very much for your time. Please stay safe during this time!

Omniglot is missing

Could you provide omniglot's neural network architecture?
Thank you

where is the code for part of Computational Budget ?

In your paper 5.2 and table 2, clients with different hardware capacity can use different model architectures.
I wonder how you implement it using pFedHN.
Nice job, thanks!

how to update v_i

Dear authors,

It is unclear to me where in your code, you update v_i as in line 11 of Algorithm 1 in your paper (Personalized Federated Learning using Hypernetworks).

In your code (main/experiments/pfedhn/trainer.py), you compute hnet_grads (line 172 in trainer.py), which is \delta_phi \theta^T \delta \theta, used to update \phi in line 10 of Alg 1. Then you assign values of hnet_grads to update hnet weights (line 177 in trainer.py), which include v_i update. In this case, it seems to me that, you update v_i as follows:

v_i = v_i - alpha \delta_phi \theta^T \delta \theta.

I do not see how you have the components of \delta_vi \phi^T as in line 11 of Alg 1 in the paper.

Could you please explain the v_i update?

Thanks a lot!

v-i update

Dear authors,

When I look closer to Alg 1 and the code, I concern about updating v_i.

As I tested, if I use num_nodes =50, then the emb size is [50,13] and if I use num_nodes =75, then the emb size is [75,19]. It means that if at each iteration, you use one client to update the model, so only one row of the emb will be updated. I also checked the gradient, it is correct with what I understand (only 1 row of the grad has values not 0, and all other rows are 0).

However, in the optimizer, you use SGD with momentum=0.9. In that case, it does update other rows of the emb, not only at the selected client. This is different from the update rule that you have in Alg 1. If I use SGD without momentum, it only updates 1 row as expected (as the same as Alg 1).

Do you have any reason or insights in this case? Why you use 1 client, but update all embedding of all clients in your experiments, which is different from Alg.1?

Generalization to Novel Clients

Hi,

In Section 5.3, you mentioned the generalization to novel clients. However, I didn't find the implementation of generalization in this git repo. Could you please provide the code or details for reproducing the result in Figure 2?

Thanks!