I have some confusion about the code on lines 98 of "per-fedavg /perfedavg.py". param.data.sub_(self.beta * grad1 - self.beta * self.alpha * grad2)
According to the formula in the article, I think "self.beta * self.alpha * grad2" seems to miss "grad1".
In the section 5, the author wrote "Note that the model obtained by any of these three methods is later updated using one step of stochastic gradient descent at the test time".
May I ask you why the testset data is used here instead of the validation set data ? This problem confuses me a lot.