Giter Site home page Giter Site logo

litian96 / fedprox Goto Github PK

View Code? Open in Web Editor NEW
615.0 5.0 155.0 30.1 MB

Federated Optimization in Heterogeneous Networks (MLSys '20)

License: MIT License

Python 93.09% Shell 6.91%
parallel-learning distributed-optimization large-scale-learning federated-optimization

fedprox's People

Contributors

litian96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fedprox's Issues

module version

Can you put the right version of the module to requirements.txt? Or it will download the latest version.

Where is the gamma in the code implemetion?

According to Algorithm 2, there is a parameter gamma in the input which measures how much local
computation is performed to solve the local subproblem on device k at the t-th round.
image
But I can't find gamma in the code implemention.
In (https://github.com/litian96/FedProx/blob/master/flearn/models/mnist/mclr.py) there is only a variable num_epochs.
def solve_inner(self, data, num_epochs=1, batch_size=32):
'''Solves local optimization problem'''
for _ in trange(num_epochs, desc='Epoch: ', leave=False, ncols=120):
for X, y in batch_data(data, batch_size):
with self.graph.as_default():
self.sess.run(self.train_op,
feed_dict={self.features: X, self.labels: y})
soln = self.get_params()
comp = num_epochs * (len(data['y'])//batch_size) * batch_size * self.flops
return soln, comp
So could please help me find gamma?

The FEMNIST data generation

In my_sample.py file for generating FEMNIST data, the < seems should be > in line. Otherwise, the retrieved samples will be the same for the same class at the beginning. I checked the data files shared by google drive. There are indeed several same images for the identical class, each user.

All clients are sharing the same underlying learner.

self.clients = self.setup_clients(dataset, self.client_model)

Please take a look at this line. It seems that all clients are using the same ML model for local training. In other words, there is no local model, but a global model which is sequentially trained on each client.

This can be verified by the following code snippet (I have tested it on flearn/trainers/fedavg.py).

            csolns = []  # buffer for receiving client solutions

            lastc = None
            for idx, c in enumerate(active_clients.tolist()):  # simply drop the slow devices
                print(i, idx)
                if lastc is not None:
                  for j in range(len(lastc)):
                    print('Is the parameters of the current client (before training) the same as the parameters of the previous client (after training)?: %s' % (c.get_params()[j] == lastc[j]).all())
                  from time import sleep
                  sleep(1)
                else:
                  print('The first client.')
                # communicate the latest model
                c.set_params(self.latest_model)

                # solve minimization locally
                soln, stats = c.solve_inner(num_epochs=self.num_epochs, batch_size=self.batch_size)
                lastc = c.get_params()

                # gather solutions from client
                csolns.append(soln)

                # track communication cost
                self.metrics.update(rnd=i, cid=c.id, stats=stats)

            # update models
            self.latest_model = self.aggregate(csolns)

In my opinion, this is not expected for federated learning.

problem about setup

Hello,when I tried to run this code and typed pip3 install -r requirements.txt in my anaconda, an error showed up like this
image
how do I solve it ? Thanks.

No module named 'flearn.models.nist.stacked_lstm'

hey~
when run main.py
have an error:
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py', wdir='C:/Users/Administrator/Desktop/federated learning/code/FedProx-master')

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 130, in
main()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 118, in main
options, learner, optimizer = read_options()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 94, in read_options
mod = importlib.import_module(model_path)

File "E:\anaconda\Anaconda\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 936, in _find_and_load_unlocked

File "", line 205, in _call_with_frames_removed

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 948, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'flearn.models.nist.stacked_lstm'

And flearn.models.nist.stacked_lstm does not exist exactly. Why?
Thank you so much.

ModuleNotFoundError: No module named 'FedML'

After running this: !python experiments/centralized/moleculenet/molecule_classification_multilabel.py

Getting this Error Message:
Traceback (most recent call last):
File "experiments/centralized/moleculenet/molecule_classification_multilabel.py", line 11, in
from data_preprocessing.molecule.data_loader import get_dataloader, get_data
File "/content/drive/My Drive/Colab Notebooks/FedGraphNN/data_preprocessing/molecule/data_loader.py", line 12, in
from FedML.fedml_core.non_iid_partition.noniid_partition import partition_class_samples_with_dirichlet_distribution
ModuleNotFoundError: No module named 'FedML'

Should the global model replace the client model?

Hi, I read your paper and code, and this work has inspired me a lot in my work on Federated Learning Optimization. I am trying to reproduce FedProx using PyTorch and I am confused on a small detail. In the algorithm in the paper, the local client model seems to have no replacement operation, i.e. w_k^t=w^t$

image

But when I read your code, I found that there is actually a REPLACE operation.

self.latest_model = self.aggregate(csolns)
self.client_model.set_params(self.latest_model)

And I also found similar operations in a PyTorch replication repo's.

FedMA

https://github.com/IBM/FedMA/blob/4b586a5a22002dc955d025b890bc632daa3c01c7/main.py#L863-L883

Q1: Actually, should I use this aggregated model to replace the local client model after aggregation?

Q2: When not replacing, it can be interpreted as the local model $w_k^t$ trying to approximate the global model $w^t$. From another point of view, does it count to alleviate the catastrophic forgetting problem?

If I have misunderstood something, please let me know. I look forward to hearing from you.

python version

What python version are you using? I use python3.6 and some packages do not support import, so I switched to 3.5 because 3.5 has been deprecated and the dependencies cannot be downloaded.

Dynamic μ

Does the current implementation provide the option for heuristic μ as discussed in "C.3.3 Adaptively setting μ" from https://arxiv.org/pdf/1812.06127.pdf?

We decrease μ by 0.1 when the loss continues to decrease for 5 rounds and increase μ by 0.1 when we see the loss increase.

I assume that you mean that you use the same μ for all clients, and that you refer to the global loss, right?

Thank you

Algorithm 2 inconsistent with code

Thanks for the work :)
I have read the code, and corresponding issues #10, but there are some places I still feel inconsistent with the paper. Please correct me if I am wrong.

  • In the Algorithm 2 line 7, we are calculating the norm between local model and global model. But the code is using l2 norm for local model without considering global model. Taks mnist/mclr.py line 40 for example.

  • I also checked the NLP experiments Shakespare, but I didn't find the regularization part in create_model. shakespare/stacked_lstm.py create_model

Thank you!

fig1

problems when run shakespeare and sent140

Dear Tian:
when i run below on CPU:
python3 -u main.py --dataset='sent140' --optimizer='fedprox'
--learning_rate=0.01 --num_rounds=200 --clients_per_round=10
--mu=0 --eval_every=1 --batch_size=10
--num_epochs=1
--model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’

it runs very very slow,and the worst is the outputs are the same numbers! Result is below:
5726 Clients in Total
Training with 10 workers ---
At round 0 accuracy: 0.4060871469235822
At round 0 training accuracy: 0.40770690942001303
At round 0 training loss: 0.6931471925528921
gradient difference: 0.3779687893000023
At round 1 accuracy: 0.5939128530764178
At round 1 training accuracy: 0.5922930905799869
At round 1 training loss: 0.682659032131717
gradient difference: 0.6406151359028104
At round 2 accuracy: 0.4060871469235822
At round 2 training accuracy: 0.40770690942001303
At round 2 training loss: 0.6951613189004014
gradient difference: 1.0240842395041418
At round 3 accuracy: 0.5939128530764178
At round 3 training accuracy: 0.5922930905799869
At round 3 training loss: 0.6845133630735032
gradient difference: 1.334649037607692
At round 4 accuracy: 0.4060871469235822
At round 4 training accuracy: 0.40770690942001303
At round 4 training loss: 0.7872438000397856
gradient difference: 3.8706158347478246
At round 5 accuracy: 0.5939128530764178
At round 5 training accuracy: 0.5922930905799869
At round 5 training loss: 0.676954747225743
gradient difference: 2.8532703690523324
At round 6 accuracy: 0.4060871469235822
At round 6 training accuracy: 0.40770690942001303
At round 6 training loss: 0.6952778442305486
gradient difference: 2.9297919740883964
At round 7 accuracy: 0.5939128530764178
At round 7 training accuracy: 0.5922930905799869
At round 7 training loss: 0.7021283723042158
gradient difference: 4.2864026772781
At round 8 accuracy: 0.5939128530764178
At round 8 training accuracy: 0.5922930905799869
At round 8 training loss: 0.6761318949424154
gradient difference: 4.987087255237341
At round 9 accuracy: 0.4060871469235822
At round 9 training accuracy: 0.40770690942001303
At round 9 training loss: 0.8113437744137745
gradient difference: 9.235964830922306
At round 10 accuracy: 0.5939128530764178
At round 10 training accuracy: 0.5922930905799869
At round 10 training loss: 0.7755919640498169
gradient difference: 6.982072813031079
At round 11 accuracy: 0.5939128530764178
At round 11 training accuracy: 0.5922930905799869
At round 11 training loss: 0.7091725448816267
gradient difference: 6.115867566149534
At round 12 accuracy: 0.5939128530764178
At round 12 training accuracy: 0.5922930905799869
At round 12 training loss: 0.7398191231275261
gradient difference: 7.72441549160035
At round 13 accuracy: 0.5939128530764178
At round 13 training accuracy: 0.5922930905799869
At round 13 training loss: 1.0417891773572328
gradient difference: 15.32712477985914

And the same result happened when i run shakespeare.
But mnist and nist performs good.
how can i solve this? is there something wrong of stacked_lstm?

about personalized FL

FedProx, only the performance about clients' own testset was concerned, without global testset. I know that sometimes personalized FL concerns about client's testset, but why not do we compare personialized FL with local train in clients?
If we just concern about clients' own testset, I think comparsion experiments with the profermance of local train are necessary.

No model of "'mnist.cnn"

Hi there,
I do see you have the option of using CNN on the MNIST dataset. But I don't see the implementation in the model.
Would you provide it later?

BTW, I was also on ICML this year, but was unable to attend the poster session. Would you put your poster on your homepage as well?

H.

Tensorflow installation

Hi, I got this problem over Mac OS and windows:

~ % pip install tensorflow-gpu==1.10
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.10 (from versions: none)
ERROR: No matching distribution found for tensorflow-gpu==1.10

same with pip3

Did I missed anything?
Thanks

Obtain \nabla h_k(w_t, w_t) in FedProx

Hi,

I studied your paper/code and I am trying to obtain \nabla h_k(w_t, w_t) to use as a local optimization criteria. In the fedprox and pgd codes, it is not clear to me where the gradients \nabla h_k(w, w_t) are evaluated. Could you help me with this?

If I can understand where these gradients are evaluated, I could simply pass w_t, the self.latest_model in fedprox, to this function instead of using the local model.

Best Regards,
Mairton

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.