Giter Site home page Giter Site logo

alibaba / federatedscope Goto Github PK

View Code? Open in Web Editor NEW
1.2K 14.0 198.0 307.22 MB

An easy-to-use federated learning platform

Home Page: https://www.federatedscope.io

License: Apache License 2.0

Python 91.57% Shell 5.35% Dockerfile 0.68% Jupyter Notebook 2.40%
federated-learning machine-learning pytorch

federatedscope's Introduction

federatedscope-logo

Website Playground Contributing

FederatedScope is a comprehensive federated learning platform that provides convenient usage and flexible customization for various federated learning tasks in both academia and industry. Based on an event-driven architecture, FederatedScope integrates rich collections of functionalities to satisfy the burgeoning demands from federated learning, and aims to build up an easy-to-use platform for promoting learning safely and effectively.

A detailed tutorial is provided on our website: federatedscope.io

You can try FederatedScope via FederatedScope Playground or Google Colab.

| Code Structure | Quick Start | Advanced | Documentation | Publications | Contributing |

News

  • new [05-17-2023] Our paper FS-REAL has been accepted by KDD'2023!
  • new [05-17-2023] Our benchmark paper for FL backdoor attacks Backdoor Attacks Bench has been accepted by KDD'2023!
  • new [05-17-2023] Our paper Communication Efficient and Differentially Private Logistic Regression under the Distributed Setting has been accepted by KDD'2023!
  • new [04-25-2023] Our paper pFedGate has been accepted by ICML'2023!
  • new [04-25-2023] Our benchmark paper for FedHPO FedHPO-Bench has been accepted by ICML'2023!
  • new [04-03-2023] We release FederatedScope v0.3.0!
  • [02-10-2022] Our paper elaborating on FederatedScope is accepted by VLDB'23!
  • [10-05-2022] Our benchmark paper for personalized FL, pFL-Bench has been accepted by NeurIPS'22, Dataset and Benchmark Track!
  • [08-18-2022] Our KDD 2022 paper on federated graph learning receives the KDD Best Paper Award for ADS track!
  • [07-30-2022] We release FederatedScope v0.2.0!
  • [06-17-2022] We release pFL-Bench, a comprehensive benchmark for personalized Federated Learning (pFL), containing 10+ datasets and 20+ baselines. [code, pdf]
  • [06-17-2022] We release FedHPO-Bench, a benchmark suite for studying federated hyperparameter optimization. [code, pdf]
  • [06-17-2022] We release B-FHTL, a benchmark suit for studying federated hetero-task learning. [code, pdf]
  • [06-13-2022] Our project was receiving an attack, which has been resolved. More details.
  • [05-25-2022] Our paper FederatedScope-GNN has been accepted by KDD'2022!
  • [05-06-2022] We release FederatedScope v0.1.0!

Code Structure

FederatedScope
├── federatedscope
│   ├── core           
│   |   ├── workers              # Behaviors of participants (i.e., server and clients)
│   |   ├── trainers             # Details of local training
│   |   ├── aggregators          # Details of federated aggregation
│   |   ├── configs              # Customizable configurations
│   |   ├── monitors             # The monitor module for logging and demonstrating  
│   |   ├── communication.py     # Implementation of communication among participants   
│   |   ├── fed_runner.py        # The runner for building and running an FL course
│   |   ├── ... ..
│   ├── cv                       # Federated learning in CV        
│   ├── nlp                      # Federated learning in NLP          
│   ├── gfl                      # Graph federated learning          
│   ├── autotune                 # Auto-tunning for federated learning         
│   ├── vertical_fl              # Vertical federated learning         
│   ├── contrib                          
│   ├── main.py           
│   ├── ... ...          
├── scripts                      # Scripts for reproducing existing algorithms
├── benchmark                    # We release several benchmarks for convenient and fair comparisons
├── doc                          # For automatic documentation
├── environment                  # Installation requirements and provided docker files
├── materials                    # Materials of related topics (e.g., paper lists)
│   ├── notebook                        
│   ├── paper_list                                        
│   ├── tutorial                                       
│   ├── ... ...                                      
├── tests                        # Unittest modules for continuous integration
├── LICENSE
└── setup.py

Quick Start

We provide an end-to-end example for users to start running a standard FL course with FederatedScope.

Step 1. Installation

First of all, users need to clone the source code and install the required packages (we suggest python version >= 3.9). You can choose between the following two installation methods (via docker or conda) to install FederatedScope.

git clone https://github.com/alibaba/FederatedScope.git
cd FederatedScope

Use Docker

You can build docker image and run with docker env (cuda 11 and torch 1.10):

docker build -f environment/docker_files/federatedscope-torch1.10.Dockerfile -t alibaba/federatedscope:base-env-torch1.10 .
docker run --gpus device=all --rm -it --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash

If you need to run with down-stream tasks such as graph FL, change the requirement/docker file name into another one when executing the above commands:

# environment/requirements-torch1.10.txt -> 
environment/requirements-torch1.10-application.txt

# environment/docker_files/federatedscope-torch1.10.Dockerfile ->
environment/docker_files/federatedscope-torch1.10-application.Dockerfile

Note: You can choose to use cuda 10 and torch 1.8 via changing torch1.10 to torch1.8. The docker images are based on the nvidia-docker. Please pre-install the NVIDIA drivers and nvidia-docker2 in the host machine. See more details here.

Use Conda

We recommend using a new virtual environment to install FederatedScope:

conda create -n fs python=3.9
conda activate fs

If your backend is torch, please install torch in advance (torch-get-started). For example, if your cuda version is 11.3 please execute the following command:

conda install -y pytorch=1.10.1 torchvision=0.11.2 torchaudio=0.10.1 torchtext=0.11.1 cudatoolkit=11.3 -c pytorch -c conda-forge

For users with Apple M1 chips:

conda install pytorch torchvision torchaudio -c pytorch
# Downgrade torchvision to avoid segmentation fault
python -m pip install torchvision==0.11.3

Finally, after the backend is installed, you can install FederatedScope from source:

From source
# Editable mode
pip install -e .

# Or (developers for dev mode)
pip install -e .[dev]
pre-commit install

Now, you have successfully installed the minimal version of FederatedScope. (Optinal) For application version including graph, nlp and speech, run:

bash environment/extra_dependencies_torch1.10-application.sh

Step 2. Prepare datasets

To run an FL task, users should prepare a dataset. The DataZoo provided in FederatedScope can help to automatically download and preprocess widely-used public datasets for various FL applications, including CV, NLP, graph learning, recommendation, etc. Users can directly specify cfg.data.type = DATASET_NAMEin the configuration. For example,

cfg.data.type = 'femnist'

To use customized datasets, you need to prepare the datasets following a certain format and register it. Please refer to Customized Datasets for more details.

Step 3. Prepare models

Then, users should specify the model architecture that will be trained in the FL course. FederatedScope provides a ModelZoo that contains the implementation of widely adopted model architectures for various FL applications. Users can set up cfg.model.type = MODEL_NAME to apply a specific model architecture in FL tasks. For example,

cfg.model.type = 'convnet2'

FederatedScope allows users to use customized models via registering. Please refer to Customized Models for more details about how to customize a model architecture.

Step 4. Start running an FL task

Note that FederatedScope provides a unified interface for both standalone mode and distributed mode, and allows users to change via configuring.

Standalone mode

The standalone mode in FederatedScope means to simulate multiple participants (servers and clients) in a single device, while participants' data are isolated from each other and their models might be shared via message passing.

Here we demonstrate how to run a standard FL task with FederatedScope, with setting cfg.data.type = 'FEMNIST'and cfg.model.type = 'ConvNet2' to run vanilla FedAvg for an image classification task. Users can customize training configurations, such as cfg.federated.total_round_num, cfg.dataloader.batch_size, and cfg.train.optimizer.lr, in the configuration (a .yaml file), and run a standard FL task as:

# Run with default configurations
python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml
# Or with custom configurations
python federatedscope/main.py --cfg scripts/example_configs/femnist.yaml federate.total_round_num 50 dataloader.batch_size 128

Then you can observe some monitored metrics during the training process as:

INFO: Server has been set up ...
INFO: Model meta-info: <class 'federatedscope.cv.model.cnn.ConvNet2'>.
... ...
INFO: Client has been set up ...
INFO: Model meta-info: <class 'federatedscope.cv.model.cnn.ConvNet2'>.
... ...
INFO: {'Role': 'Client #5', 'Round': 0, 'Results_raw': {'train_loss': 207.6341676712036, 'train_acc': 0.02, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.152683353424072}}
INFO: {'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_loss': 209.0940284729004, 'train_acc': 0.02, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1818805694580075}}
INFO: {'Role': 'Client #8', 'Round': 0, 'Results_raw': {'train_loss': 202.24929332733154, 'train_acc': 0.04, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.0449858665466305}}
INFO: {'Role': 'Client #6', 'Round': 0, 'Results_raw': {'train_loss': 209.43883895874023, 'train_acc': 0.06, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1887767791748045}}
INFO: {'Role': 'Client #9', 'Round': 0, 'Results_raw': {'train_loss': 208.83140087127686, 'train_acc': 0.0, 'train_total': 50, 'train_loss_regular': 0.0, 'train_avg_loss': 4.1766280174255375}}
INFO: ----------- Starting a new training round (Round #1) -------------
... ...
INFO: Server: Training is finished! Starting evaluation.
INFO: Client #1: (Evaluation (test set) at Round #20) test_loss is 163.029045
... ...
INFO: Server: Final evaluation is finished! Starting merging results.
... ...

Distributed mode

The distributed mode in FederatedScope denotes running multiple procedures to build up an FL course, where each procedure plays as a participant (server or client) that instantiates its model and loads its data. The communication between participants is already provided by the communication module of FederatedScope.

To run with distributed mode, you only need to:

  • Prepare isolated data file and set up cfg.data.file_path = PATH/TO/DATA for each participant;
  • Change cfg.federate.model = 'distributed', and specify the role of each participant by cfg.distributed.role = 'server'/'client'.
  • Set up a valid address by cfg.distribute.server_host/client_host = x.x.x.x and cfg.distribute.server_port/client_port = xxxx. (Note that for a server, you need to set up server_host and server_port for listening messages, while for a client, you need to set up client_host and client_port for listening as well as server_host and server_port for joining in an FL course)

We prepare a synthetic example for running with distributed mode:

# For server
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml data.file_path 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx

# For clients
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml data.file_path 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml data.file_path 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml data.file_path 'PATH/TO/DATA' distribute.server_host x.x.x.x distribute.server_port xxxx distribute.client_host x.x.x.x distribute.client_port xxxx

An executable example with generated toy data can be run with (a script can be found in scripts/run_distributed_lr.sh):

# Generate the toy data
python scripts/distributed_scripts/gen_data.py

# Firstly start the server that is waiting for clients to join in
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_server.yaml data.file_path toy_data/server_data distribute.server_host 127.0.0.1 distribute.server_port 50051

# Start the client #1 (with another process)
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_1.yaml data.file_path toy_data/client_1_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50052
# Start the client #2 (with another process)
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_2.yaml data.file_path toy_data/client_2_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50053
# Start the client #3 (with another process)
python federatedscope/main.py --cfg scripts/distributed_scripts/distributed_configs/distributed_client_3.yaml data.file_path toy_data/client_3_data distribute.server_host 127.0.0.1 distribute.server_port 50051 distribute.client_host 127.0.0.1 distribute.client_port 50054

And you can observe the results as (the IP addresses are anonymized with 'x.x.x.x'):

INFO: Server: Listen to x.x.x.x:xxxx...
INFO: Server has been set up ...
Model meta-info: <class 'federatedscope.core.lr.LogisticRegression'>.
... ...
INFO: Client: Listen to x.x.x.x:xxxx...
INFO: Client (address x.x.x.x:xxxx) has been set up ...
Client (address x.x.x.x:xxxx) is assigned with #1.
INFO: Model meta-info: <class 'federatedscope.core.lr.LogisticRegression'>.
... ...
{'Role': 'Client #2', 'Round': 0, 'Results_raw': {'train_avg_loss': 5.215108394622803, 'train_loss': 333.7669372558594, 'train_total': 64}}
{'Role': 'Client #1', 'Round': 0, 'Results_raw': {'train_total': 64, 'train_loss': 290.9668884277344, 'train_avg_loss': 4.54635763168335}}
----------- Starting a new training round (Round #1) -------------
... ...
INFO: Server: Training is finished! Starting evaluation.
INFO: Client #1: (Evaluation (test set) at Round #20) test_loss is 30.387419
... ...
INFO: Server: Final evaluation is finished! Starting merging results.
... ...

Advanced

As a comprehensive FL platform, FederatedScope provides the fundamental implementation to support requirements of various FL applications and frontier studies, towards both convenient usage and flexible extension, including:

  • Personalized Federated Learning: Client-specific model architectures and training configurations are applied to handle the non-IID issues caused by the diverse data distributions and heterogeneous system resources.
  • Federated Hyperparameter Optimization: When hyperparameter optimization (HPO) comes to Federated Learning, each attempt is extremely costly due to multiple rounds of communication across participants. It is worth noting that HPO under the FL is unique and more techniques should be promoted such as low-fidelity HPO.
  • Privacy Attacker: The privacy attack algorithms are important and convenient to verify the privacy protection strength of the design FL systems and algorithms, which is growing along with Federated Learning.
  • Graph Federated Learning: Working on the ubiquitous graph data, Graph Federated Learning aims to exploit isolated sub-graph data to learn a global model, and has attracted increasing popularity.
  • Recommendation: As a number of laws and regulations go into effect all over the world, more and more people are aware of the importance of privacy protection, which urges the recommender system to learn from user data in a privacy-preserving manner.
  • Differential Privacy: Different from the encryption algorithms that require a large amount of computation resources, differential privacy is an economical yet flexible technique to protect privacy, which has achieved great success in database and is ever-growing in federated learning.
  • ...

More supports are coming soon! We have prepared a tutorial to provide more details about how to utilize FederatedScope to enjoy your journey of Federated Learning!

Materials of related topics are constantly being updated, please refer to FL-Recommendation, Federated-HPO, Personalized FL, Federated Graph Learning, FL-NLP, FL-Attacker, FL-Incentive-Mechanism, FL-Fairness and so on.

Documentation

The classes and methods of FederatedScope have been well documented so that users can generate the API references by:

cd doc
pip install -r requirements.txt
make html

NOTE:

  • The doc/requirements.txt is only for documentation of API by Sphinx, which can be automatically generated by Github actions .github/workflows/sphinx.yml. (Trigger by pull request if DOC in the title.)
  • Download via Artifacts in Github actions.

We put the API references on our website.

Besides, we provide documents for executable scripts and customizable configurations.

License

FederatedScope is released under Apache License 2.0.

Publications

If you find FederatedScope useful for your research or development, please cite the following paper:

@article{federatedscope,
  title = {FederatedScope: A Flexible Federated Learning Platform for Heterogeneity},
  author = {Xie, Yuexiang and Wang, Zhen and Gao, Dawei and Chen, Daoyuan and Yao, Liuyi and Kuang, Weirui and Li, Yaliang and Ding, Bolin and Zhou, Jingren},
  journal={Proceedings of the VLDB Endowment},
  volume={16},
  number={5},
  pages={1059--1072},
  year={2023}
}

More publications can be found in the Publications.

Contributing

We greatly appreciate any contribution to FederatedScope! We provide a developer version of FederatedScope with additional pre-commit hooks to perform commit checks compared to the official version:

# Install the developer version
pip install -e .[dev]
pre-commit install

# Or switch to the developer version from the official version
pip install pre-commit
pre-commit install
pre-commit run --all-files

You can refer to Contributing to FederatedScope for more details.

Welcome to join in our Slack channel, or DingDing group (please scan the following QR code) for discussion.

federatedscope-logo

federatedscope's People

Contributors

ahn1340 avatar alan-qin avatar cheneydon avatar cuiyuebing avatar davdgao avatar eltociear avatar joneswong avatar osier-yi avatar pan-x-c avatar private-mechanism avatar qbc2016 avatar rayrayraykk avatar sundave1998 avatar thesunwillrise avatar wanghh7 avatar xieyxclack avatar xkxxfyf avatar yxdyc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

federatedscope's Issues

report cuda error when trying to launch up the demo case

Hi when I am trying to launch up the demo case, cuda relevant error was reported as below:

I am using conda to manage the environment. in other env I have the pytorch works on cuda without any problem.
I think this could be the installation issue-- I did not install anything by myself, totally following your guidance.
My cuda version:
NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6
and my torch version: 1.10.1

(fedscope) liangma@lMa-X1:~/prj/FederatedScope$ python federatedscope/main.py --cfg federatedscope/example_configs/femnist.yaml

...
2022-05-13 22:06:09,249 (server:520) INFO: ----------- Starting training (Round #0) -------------
Traceback (most recent call last):
 File "/home/liangma/prj/FederatedScope/federatedscope/main.py", line 41, in <module>
   _ = runner.run()
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 136, in run
   self._handle_msg(msg)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/fed_runner.py", line 254, in _handle_msg
   self.client[each_receiver].msg_handlers[msg.msg_type](msg)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/worker/client.py", line 202, in callback_funcs_for_model_para
   sample_size, model_para_all, results = self.trainer.train()
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 374, in train
   self._run_routine("train", hooks_set, target_data_split_name)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 208, in _run_routine
   hook(self.ctx)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/trainer.py", line 474, in _hook_on_fit_start_init
   ctx.model.to(ctx.device)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 899, in to
   return self._apply(convert)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 570, in _apply
   module._apply(fn)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 593, in _apply
   param_applied = fn(param)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/nn/modules/module.py", line 897, in convert
   return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
 File "/home/liangma/miniconda3/envs/fedscope/lib/python3.9/site-packages/torch/cuda/__init__.py", line 208, in _lazy_init
   raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

A keyword indicates the task type

Is it possible to add a keyword, such as cfg.federate.task_type, to indicate the task type of each client? It is useful in calculating the loss function because y_true should be long and float respectively for classification and regression tasks.

Keep the label distributions of train and test set consistent within one client

When using the splitter for customized datasets, the label distributions of train and test sets are independent within one client.
And it might cause meaningless observation on the client-wise performance because of such independence of distribution of train and test set.

IMO, FederatedScope can provide an option for users to keep the label distributions of train and test set consistent within one client when using the splitter, which can be useful in some tasks such as personalization federated learning.

Improve the current finetune mechanism

Current finetune is implemented by reusing the training routine, so we have to store the variables in context that belong to the training process and recover them after finetuning. Besides, current finetune doesn't support training by epoch. So maybe we can sperate a single routine, and value of "ctx.cur_mode" for finetune, for example, ctx.cur_mode=="finetune".

Cannot import federatedscope

Although I have successfully completed the installation procedure, I cannot import federatedscope from path other than the root of the cloned repo.

Mechanisms to support asynchronous training protocol in FL

As the title said.
Some ideas can be borrowed from asynchronous SGD, including but not limited to:

  • Timeout strategy
  • oversampling
  • staleness toleration
  • variances toleration

A simulator can also be provided for running asynchronous FL standalone.

Some questions about cross-device FL.

Hello guys!
I have read the tutorial about the FederatedScope. It seems that the whole project is based on the Python and the cross-device part is just simulation.
I wonder is there any cross-language design to deal with the communication between the cllient and the server, for example, with Android(Java) in the mobile phone and Linux(Java/Python) in the server. Because, you know, some divices lack the Python enviroument.
What's more, is there any trial on the real devices especially cross-device part?
I will be appreciated if you cute guys can solve my doubts.

Thanks to you for your payment on the FederatedScope!

Confusing caliber

'Results_raw': {'client_individual': {'val_loss': 0.7106942534446716, 'test_loss': 0.7106942534446716, 'test_avg_loss': 0.7106942534446716, 'test_total': 1.0, 'val_avg_loss': 0.7106942534446716, 'val_total': 1.0}, 'client_summarized_weighted_avg': {'val_loss'

It is difficult for users to know client_individual means the best individual results.

Unexpected behavior when checking the sample_client_num

sample_client_num_valid = (0 < cfg.federate.sample_client_num <=
cfg.federate.client_num)

The term cfg.federate.client_num is allowed to be 0, which implies that the client_num would be determined by the dataset after the loading process. However, the above assertion happens before data loading and thus causes some unexpected behaviors, such as forcibly setting sample_client_num to the same value as cfg.federate.client_num:

if non_sample_case or not sample_cfg_valid:
# (a) use all clients
cfg.federate.sample_client_num = cfg.federate.client_num

Fail to run the standalone example with the minimal version of requirement

An Error happens in FederatedScope/federatedscope/cv/dataset/leaf_cv.py:

When it is required to download the dataset, a function download_url is imported from the module torch_geometric.data, which is not included in the minimal version of the requirement, (of course it shouldn't be included in the minimal version, in my opinion).

So maybe we should replace the download_url with another implementation here. Thanks :)

Monitoring improvement: visualization support

The monitoring can be improved in terms of the visualization support.

To use visualization tools such as wandb or tensorboard, we currently parse the log file after the results saved. We need to support results logging in real time rather the two-step style. Besides, the parsing process should be automatic for better usability.

We can discuss other requirements for visualization here @rayrayraykk

Terse logging

The results (most are floating numbers) are printed to the stdout without controlling the preserved precision. Thus, the reported results look lengthy.

Guidance for running the example in FederatedScope/scripts/

Although FederatedScope provides different kinds of scripts in FederatedScope/scripts/, it is a little hard for users to understand these scripts without some guidance, such as which example is running for a certain script and how to use these scripts for customized tasks.

Maybe some detailed guidance on the scripts should be provided for users. Thanks :)

Redundancy in the log files

A Fedavg on 5% of FEMNIST trail will produce a 500 kb log each round:
with 80% eval logs like 2022-04-13 16:33:24,901 (client:264) INFO: Client #1: (Evaluation (test set) at Round #26) test_loss is 79.352451. And 10% is server results and 10% is train informations.

If the round is 500, 1000, or much larger, the log files will take up too much space with a lot of redundancy. @yxdyc

The optimizer may track some values across different training routines

  • Sine the optimizer is initialized within context, different training routines will use the same optimizer.
  • Some optimizers, like Adam, will track the past momentum.

Therefore, the optimizer may track some state variables across different training routines. Considering the initialized model of each training routine is broadcast by the server, maybe it is unnecessary or even wrong to track past variables.

Is there some complete tutorial about how to put into myself data/model/trainer and run a FL task?

Hi, I am trying to follow up on the guidance here: https://federatedscope.io/docs/own-case/, to add the data/model/trainer etc., into the project and run up an FL task.
However, this guidance is not that clear, particularly, 1). what does the config section work; 2). how to unify the customized data/model/trainer/config together to complete an FL task.
Is that possible to have complete guidance by using something like mnist/cifar10?

Incorrect Evaluation

In each round, multiple evaluation results are reported in the logs, each of which seems to be the results on a fraction of clients.

Invalid system metric names

There are metrics that have invalid names. It seems that this is caused by incorrect recursive concatenation of strings.

Improve logging content with more metrics and better readability

We aim to improve logging content with more metrics and better readability:

  • the systematic metrics are missing. We need to add more metrics to reflect system performance, such as communication and computational efficiency.

  • we may add metric to reflect the convergence, e.g., the number of rounds to converge

We can discuss specific metric requirements and logging timing here @rayrayraykk

in Quick Start, build docker image and run with docker env command error

in Quick Start, build docker image and run with docker env command error,
in docs ,the command is {docker run --gpus device=all --rm --it --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash"},
it should be {docker run --gpus device=all --rm -it --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash"} or {docker run --gpus device=all --rm -i -t --name "fedscope" -w $(pwd) alibaba/federatedscope:base-env-torch1.10 /bin/bash"}

An Error happens when batch_size is larger than the number of clients' local samples

When drop_last=True and batch_size is larger than the number of clients' local samples, no data is used for local training and the following error happens:

File "/root/miniconda3/lib/python3.9/site-packages/federatedscope-0.1.0-py3.9.egg/federatedscope/core/trainers/context.py", line 154, in pre_calculate_batch_epoch_num
    num_train_epoch = math.ceil(local_update_steps / num_train_batch)
ZeroDivisionError: division by zero

A gRPC-related suggestion

In the current main branch (commit 954322c), the implementation of the gRPCCommManager class (see federatedscope/core/communication.py) largely refers to that in FedML (see https://github.com/FedML-AI/FedML/blob/master/fedml_core/distributed/communication/gRPC/grpc_comm_manager.py in commit 0fb63dd157e55ee603b7049568bf4c4ed0586e71), as commented in FederatedScope's codebase. This class is based on gRPC, a modern open source high performance Remote Procedure Call (RPC) framework. A gRPCCommManager (i) keeps addresses of potential message receivers in a dict/list collection and (ii) has wrapper functions that call APIs in gRPC for message sending and receiving. Many similar variants of such wrapper functions have been widely adopted in related packages. Although FederatedScope obeys the Apache-2.0 License and had a declaration of FedML's copyright, in order to avoid the risk of unintended infringement and unnecessary disputes with FedML, we re-implement this class by refering to the examples in gRPC tutorial in this commit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.