Comments (10)
FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized
@prosopher Thanks. But I guess he was discussing the distributed computing setting, not the standalone version.
from fedml.
@chaoyanghe Thanks for your detailed explanation. Maybe I can try to complete it by myself, and when I finish it I would like to push it to your master branch.
from fedml.
BytePS is for data center-based distributed training, while FedML (e.g., FedAvg) is edge-based distributed training. The particular assumptions of FL include:
- heterogeneous data distribution cross devices (non-I.I.D.)
- resource constrained edge devices (memory, computational, and communication)
- label deficiency (harder to label the data points because of privacy)
- concern the security and privacy
So what do you mean "adding more servers in FedAvg"?
from fedml.
BytePS is for data center-based distributed training, while FedML (e.g., FedAvg) is edge-based distributed training. The particular assumptions of FL include:
- heterogeneous data distribution cross devices (non-I.I.D.)
- resource constrained edge devices (memory, computational, and communication)
- label deficiency (harder to label the data points because of privacy)
- concern the security and privacy
So what do you mean "adding more servers in FedAvg"?
I mean adding more parameter servers to improve the communication efficiency. Maybe this can be used suitably only in cluster environment but not true Federated Learning environment with resource constrained edge devices. However, it is still can accelerate the training when doing research.
from fedml.
FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.
Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized
from fedml.
@wizard1203 Thanks for your suggestion. As for acceleration, FedML is the only research-oriented FL framework that supports cross-machine multiple GPU distributed training. To further accelerate, we can definitely use many techniques from traditional distributed training (very mature with much less research attention). I elaborate a few here:
- AllReduce-based GPU-GPU communication using InfiniBand. However, this is not a real FL setting. As you said, it is only useful to evaluate algorithms or modeling which are not insensitive to the training speed.
- Hybrid parallelism (model + data parallelism) + pipeline
- Bucking BP (as introduced in PyTorch VLDB 2020)
- low-bit (half precision)
- pruning
....
As @prosopher pointed out, you can design any topology as you like. Our topology configuration is very flexible. In distributed computing setting, you can refer the following algorithms with different topologies:
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed
In addition, I have to point out that "adding more parameter servers to improve the communication efficiency" is a bit confusing conceptually. We cannot say using more computation resources improves communication efficiency. Normally, the relationship between computation and communication is a trade-off. Using more parallel computation cannot change the communication itself, and it also does not mean we can speed up the training since the communication cross machines may dominate the training time. But I agree with your idea of using traditional techniques in distributed computing to accelerate FL research. Thanks.
from fedml.
FedML supports multiple parameter servers for the communication efficiency via hierarchical FL and decentralized FL .
In hierarchical FL, there are group parameter servers that split the total client set into multiple client subsets.
In decentralized FL, each client acts as a parameter server.Please refer to the following links for details.
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/hierarchical_fl
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/standalone/decentralized
@prosopher Thanks for this, I will carefully read them.
from fedml.
@chaoyanghe Thanks for your detailed explanation. Maybe I can try to complete it by myself, and when I finish it I would like to push it to your master branch.
Thanks. Looking forward to your contribution.
from fedml.
@wizard1203 Do you mean modifying based on this code?
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed/fedavg
from fedml.
@wizard1203 Do you mean modifying based on this code?
https://github.com/FedML-AI/FedML/tree/master/fedml_experiments/distributed/fedavg
@chaoyanghe No, maybe it needs to base on those codes on fedml_core. Whatever, I may try to do it many days later. In fact, I have some other algorithms that I want to implement more urgently than this.
from fedml.
Related Issues (20)
- fed_cifar10 sample does not download the dataset correctly
- KeyError. msg_type = 5. Please check whether you launch the server or client with the correct args.rank HOT 1
- Where can I find FedGraphNN? HOT 2
- On the problem of gradient processing in FedML HOT 1
- 运行fedml.run_simulation()时就会出现TypeError: bind_simulation_device() takes 2 positional arguments but 3 were given HOT 4
- where is FedGraphNN HOT 3
- FedOpt for cross-silo HOT 2
- trained model path in single process simulation examples
- The compatibility issues of Nvidia Jetson
- Quickstart Guide
- log_file_dir arg not work
- Rookie question HOT 1
- from fedml.core.distributed.server.server_manager import ServerManager from fedml.core.distributed.client.client_manager import ClientManager from fedml.core.distributed.communication.comm_manager import CommManager显示
- Which communication protocol and serialization method is supported?
- typo "salve" instead of "slave" in identifiers
- possible bug in python/fedml/core/distributed/communication/trpc/utils.py
- FedGraphnn -- wandb utilization HOT 2
- [FedML-HE] How is the merging of decrypted weights done? HOT 1
- In Fed-ML HE example, the client model weights are not encrypted.
- bind_simulation_device() takes 2 positional arguments but 3 were given
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fedml.