microsoft / personalizedfl Goto Github PK

View Code? Open in Web Editor NEW

310.0 4.0 38.0 279 KB

Personalized federated learning codebase for research

Home Page: http://aka.ms/pfed

License: MIT License

Python 98.43% Shell 1.57%

federated-learning non-iid personalized-federated-learning deep-learning machine-learning

personalizedfl's Introduction

PersonalizedFL: Personalized Federated Learning Codebase

An easy-to-learn, easy-to-extend, and for-fair-comparison codebase based on PyTorch for federated learning (FL). Please note that this repository is designed mainly for research, and we discard lots of unnecessary extensions for a quick start. Example usage: when you want to recognize activities of different persons without accessing their privacy data; when you want to build a model based on multiple patients' data but not access their own data.

Implemented Algorithms

As initial version, we support the following algoirthms. We are working on more algorithms.

Baseline, train in the client without communication.
FedAvg [1].
FedProx [2].
FedBN [3].
FedAP [4].
MetaFed [5].
FedCLIP [6].

NOTE: The code for FedCLIP is located at ./fedclip. This folder is independent of other folders in this repo. You can just download the folder and run it for this algorithm.

Installation

git clone https://github.com/microsoft/PersonalizedFL.git
cd PersonalizedFL

We recommend to use Python 3.7.1 and torch 1.7.1 which are in our development environment. For more environmental details and a full re-production of our results, please refer to luwang0517/torch10:latest (docker) or jindongwang/docker (docker).

Dataset

Our code supports the following dataset:

If you want to use your own dataset, please modifty datautil/prepare_data.py to contain the dataset.

Usage

Modify the file in the scripts
bash run.sh

Benchmark

We offer a benchmark for OrganS-MNIST. Please note that the results are based on the data splits in split/medmnist0.1. Different data splits may lead different results. For complete parameters, please refer to run.sh.

Non-iid alpha	Base	FedAvg	FedProx	FedBN	FedAP	MetaFed
0.1	73.99	75.62	75.97	79.96	81.33	83.87
0.01	75.83	74.81	75.09	81.85	82.87	84.98

Customization

It is easy to design your own method following the steps:

Add your method to alg/, and add the reference to it in alg/algs.py.
Midify scripts/run.sh and execuate it.

Contribution

The toolkit is under active development and contributions are welcome! Feel free to submit issues and PRs to ask questions or contribute your code. If you would like to implement new features, please submit a issue to discuss with us first.

Reference

[1] McMahan, Brendan, et al. "Communication-efficient learning of deep networks from decentralized data." Artificial intelligence and statistics. PMLR, 2017.

[2] Li, Tian, et al. "Federated optimization in heterogeneous networks." Proceedings of Machine Learning and Systems 2 (2020): 429-450.

[3] Li, Xiaoxiao, et al. "FedBN: Federated Learning on Non-IID Features via Local Batch Normalization." International Conference on Learning Representations. 2021.

[4] Lu, Wang, et al. "Personalized Federated Learning with Adaptive Batchnorm for Healthcare." IEEE Transactions on Big Data (2022).

[5] Yiqiang, Chen, et al. "MetaFed: Federated Learning among Federations with Cyclic Knowledge Distillation for Personalized Healthcare." FL-IJCAI Workshop 2022.

[6] Lu, Wang, et al. "FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning." IEEE Data Engineering Bulletin 2023.

Citation

If you think this toolkit or the results are helpful to you and your research, please cite us!

@Misc{PersonalizedFL,
howpublished = {\url{https://github.com/microsoft/PersonalizedFL}},   
title = {PersonalizedFL: Personalized Federated Learning Toolkit},  
author = {Lu, Wang and Wang, Jindong}
}

Contact

Wang lu: [email protected]
Jindong Wang: [email protected]

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

personalizedfl's People

Contributors

Stargazers

Watchers

Forkers

guoyuanxiong liuyu54128 troublemaker1994 isabella232 rl-dlmu ccbond pengyuzhang97 yf817 lidashuai12 zibaparsons ttccq muskbing mikettmike andrew89982018 shunxinguo kintorch ss3b3 iheai47 tokaka22 phanducthien82 bss12138 pluszzh fl-har bafatah s222416822 haythem-jaidane yaohui-huang foundation-model louiseliuming sss999527 whuhxb liupeng0606 mvandermeulen zihao-kevin realrui sofi1790 dearborn-open-ai lavanquan

personalizedfl's Issues

About method getallfea() in models

Hi, your work is very inspiring.
I found some codes about extracting the features made me confused. And I wonder if it is on purpose.
In models.py, method getallfea() in classes PamapModel, and lenet5v sometimes put the output of conv layers into the fealist but not the output of bn layers.
But in class AlexNet, method getallfea()'s features are all outputs of bn layers.
Is it right?

Bug in FedClip

Thanks for your work! In the process of reproducing the FedClip, I found that the training accuracy of each epoch does not change when test_envs is 3. Strangely, when I change the position of the domain in the dataset, [‘art_painting’, ‘cartoon’, ‘photo’, ‘sketch’] ——>['sketch, ‘art_painting’, ‘cartoon’, ‘photo’], and let test_envs still set to 3, this problem still occurs. This phenomenon also exists in other datasets. I tried to locate the bug, but I still didn’t find it.

I would be very grateful if you could spare some time to fix this bug. Thank you for your patience.

大佬，请问我想使用自己生成的mnist-m数据集（是一堆28*28的图片）进行训练的话，应该如何修改代码呢，非常感谢您的回答[爱]

如何配置client数量

你好，请问可以在run.sh中配置client的数量么

The dataset cannot be downloaded

Dear Author:

The dataset in your tutorial cannot be downloaded. Could you restore it as soon as possible? Thank you.

When will the code be updated?

Hello, thank you for sharing excellent repository.
I'm wondering when the source code for the CLIP-based model based on the paper will be uploaded. (https://arxiv.org/abs/2302.13485)

About the threshold value

I notice that you set the threshold value as 1.1 in the run.sh.

But the code used to check the flag with the threshold in the metafed.py will never be triggered if you set it as 1.1, right?

if val_acc > self.args.threshold: self.flagl[idx] = True
As a result, the experiment result you provided here may not be represented for the metafed. They would be normal training results, right?

Please let me know if I have any misunderstandings.

Thanks!

Why does each federation have a dataset？

Thank you for your work. You mentioned the results of Fedavg and FedProx on CLIP in your Fedclip paper. I would be very grateful if you could provide the code for this part

Did you use multi-thread code?

When I trained on the dataset you provided，My GPU memory is higher than training on cifar10(added by me) and training faster than on cifar10.
Did you use multi-thread code(I try to found this, but I failed).

GPU 0 is training on cifar10.
GPU 1 is training on covid.

MetaFed hyperparameters?

Hi,

Thanks for sharing the code. The hyperparameters for fedmeta seems missing in the run.sh, so I just run python main.py --alg metafed --dataset medmnist --iters 300 --wk_iters 1 --non_iid_alpha 0.1 --model_momentum 0.3, and I got an average accuracy 54.53. But the accuracy in the benchmark is 83.87, and which is 90.81 in Table 3 of your paper.

Thanks for your attention!

关于metaFed的一点问题

在init_model_flag中所有client已经完成了一个联邦round，然后mian.py中iters-1
algclass.init_model_flag(train_loaders, val_loaders) args.iters = args.iters-1 # 在init时，已经训练了一轮
紧接着在接下来再进行client训练时，已经可以开始知识蒸馏了，为什么还要在metafed.py的client_train方法中加round==0（第0轮不是已经在init_model_flag时就进行了吗）的限制，期待您的回复

关于Train val splits and h5py files pre-read文件夹的使用

您好！请问数据集文件中，train val splits and h5py files pre-read文件夹下的.kfold.txt,hdf5,.txt文件应当如何使用，非常感谢！

在FedAP中，获得预训练模型的过程中是否存在数据隐私问题？

您好！我是否需要使用另一组数据用于获得预训练模型

Where is code about "FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning"

Hello, sorry to bother you.
FedCLIP is really good work in prompt learning and federated learning. I find the relevant code link of FedCLIP is this repo in the website "paper with code". But I cannot find any information about FedCLIP in this repo.

Could you tell me where is the code for FedCLIP?

关于non_iid_alpha这个参数的意义

做实验时想设置一种满足客户端之间数据“非独立同分布”的情况，所以想请问non_iid_alpha以及non_iid_dirichlet这两个参数的意义，以及它们是否能帮助我划分满足“非独立同分布“这种情况的数据？非常感谢

cannot download the dataset

the download link is invalid.
<Error> <Code>AccountIsDisabled</Code> <Message>The specified account is disabled. RequestId:898e522c-e01e-002d-0fde-0a0e8b000000 Time:2022-12-08T08:21:34.5410795Z</Message> </Error>

How long was the model trained in total, including the pre-training time before the clip was sent to the clients

changing partition data way

Hello ,
i'm training this code for medmnist dataset everythings goes good but when i change the partiton data way to either random or when i change the --non_iid_alpha argument to something not in (0.1 or 0.01) (because as i see this partitions are already in split directory ) it raise an error ,

File "/home/sanaa/PHD/fedsim/PersonalizedFL/main.py", line 75, in <module> train_loaders, val_loaders, test_loaders = get_data(args.dataset)(args) File "/home/sanaa/PHD/fedsim/PersonalizedFL/datautil/prepare_data.py", line 201, in medmnist trd, vad, ted = getlabeldataloader(args, data) File "/home/sanaa/PHD/fedsim/PersonalizedFL/datautil/prepare_data.py", line 187, in getlabeldataloader trl, val, tel = getdataloader(args, data) File "/home/sanaa/PHD/fedsim/PersonalizedFL/datautil/datasplit.py", line 315, in getdataloader tmparr = np.array(tmparr) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (60,) + inhomogeneous

runing error, there is no dataset file “xdata.npy，ydata.npy”

runing error, there is no dataset file “xdata.npy，ydata.npy”, how to solve the problem?

Fedavg algorithm in CLIP settings

Sincerely thank you for providing the code. I'm having a little trouble reproducing the Fedavg algorithm in CLIP settings. I optimized both the visual encoder and text encoder with the prompt "a photo of a [class]", but I get the low score. I set the learning rate 5e-5 and use Adam optimizer. Could you pleases give me some suggestions? I noticed the paper "FedCLIP: Fast Generalization and Personalization for CLIP in Federated Learning" reported the Fedavg results in ViT32， so I wonder to know how to reproduce the results.

about setup

Hi, could I run this code only in anaconda without docker?