We need to re-write the SGD optimizer used at the client-level, thus modifying the <co

We can aggregate individual client metrics via the following: <div class="snippet-

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

Implement Differential Privacy with `VectorizedDPKerasSGDOptimizer` about crypton HOT 25 CLOSED

ferasbg commented on August 17, 2024

Implement Differential Privacy with `VectorizedDPKerasSGDOptimizer`

from crypton.

Comments (25)

ferasbg commented on August 17, 2024

In order to resolve the issue with the server-side metrics, I need to process the log data passed from app.py within server.app during the federated server-side parameter evaluation, in a way that I can collect the dictionary objects and the logs to create my plots directly. I noticed that after each round, I get inadaquate data both on the client-side and the server-side, and to resolve this, I need to wrap the strategy on the server-side to stream in the FitRes data, as well as more extensive log information from the functions within app.py. I can copy and emulate other code that attempts to extend this limitation on the user's end by extending the dictionary collection on the client-side and server-side in a more coherent manner that doesn't lose any information. Currently have no direct and concrete solutions. Will re-evaluate my current understanding, and get to a solution consistent with resolving the limitations of the framework, its lack of communication-wise visibility, and the data I need to stream and store.

from crypton.

ferasbg commented on August 17, 2024

Test EvaluateRes and FitRes as return parameters in order to expose more log information during client-side training and evaluation.

from crypton.

ferasbg commented on August 17, 2024

Reveal training history for the clients so that the training metrics can be accessed before aggregation (and thus effect on server model), for measuring adv-regularized accuracy and loss information (dicts).

write script to iterate over all the various baselines + experimental model / IV
process log information that stores training history for the clients as they are running through the files executed on the flwr backend

from crypton.

ferasbg commented on August 17, 2024

Write a client function that acts as a logger or processor of log information from the flwr backend as the Client object is run on the flwr instance.

from crypton.

ferasbg commented on August 17, 2024

Use this and these code references (this, and this)

from crypton.

ferasbg commented on August 17, 2024

Wrapper for the strategies in order to return the log information for server-side, and then use aggregate_evaluate for the Client-side objects to return the regularization training dictionaries before the weights are aggregated for server model.

from crypton.

ferasbg commented on August 17, 2024

We can aggregate individual client metrics via the following:

class AggregateCustomMetricStrategy(fl.server.strategy.FedAvg):
    def aggregate_evaluate(
        self,
        rnd: int,
        results: List[Tuple[ClientProxy, EvaluateRes]],
        failures: List[BaseException],
    ) -> Optional[float]:
        """Aggregate evaluation losses using weighted average."""
        if not results:
            return None

        # Weigh accuracy of each client by number of examples used
        accuracies = [r.metrics["accuracy"] * r.num_examples for _, r in results]
        examples = [r.num_examples for _, r in results]

        # Aggregate and print custom metric
        accuracy_aggregated = sum(accuracies) / sum(examples)
        print(f"Round {rnd} accuracy aggregated from client results: {accuracy_aggregated}")

        # Call aggregate_evaluate from base class (FedAvg)
        return super().aggregate_evaluate(rnd, results, failures)

# Create strategy and run server
strategy = AggregateCustomMetricStrategy(
    # (same arguments as FedAvg here)
)
fl.server.start_server(strategy=strategy)

from crypton.

ferasbg commented on August 17, 2024

def aggregate_fit(
        self,
        rnd: int,
        results: List[Tuple[ClientProxy, FitRes]],
        failures: List[BaseException],
    ) -> Tuple[Optional[Parameters], Dict[str, Scalar]]:
        """Aggregate fit results using weighted average."""
        if not results:
            return None, {}
        # Check if enough results are available
        completion_rate = len(results) / (len(results) + len(failures))
        if completion_rate < self.completion_rate_fit:
            # Not enough results for aggregation
            return None, {}
        # Convert results
        weights_results = [
            (parameters_to_weights(fit_res.parameters), fit_res.num_examples)
            for client, fit_res in results
        ]
        return weights_to_parameters(aggregate(weights_results)), {}

    def aggregate_evaluate(
        self,
        rnd: int,
        results: List[Tuple[ClientProxy, EvaluateRes]],
        failures: List[BaseException],
    ) -> Tuple[Optional[float], Dict[str, Scalar]]:
        """Aggregate evaluation losses using weighted average."""
        if not results:
            return None, {}
        # Check if enough results are available
        completion_rate = len(results) / (len(results) + len(failures))
        if completion_rate < self.completion_rate_evaluate:
            # Not enough results for aggregation
            return None, {}
        return (
            weighted_loss_avg(
                [
                    (
                        evaluate_res.num_examples,
                        evaluate_res.loss,
                        evaluate_res.accuracy,
                    )
                    for client, evaluate_res in results
                ]
            ),
            {},
        )

from crypton.

ferasbg commented on August 17, 2024

https://github.com/adap/flower/blob/79bcf952e746ed0d1544a30cea251ae082494666/src/py/flwr/server/strategy/aggregate_test.py

from crypton.

ferasbg commented on August 17, 2024

If we are adapting the defense to surface variations and corrupted data, that's not where our contribution comes from. Instead, out contribution comes from integrating the regularization in terms of the adaptive strategy not in an additive manner, but rather interoperably as its own function. Other than that, if neural structured learning is proven (C1) to provide better results given the adaptive strategy (C2), given communication optimizations (C3). The secure aggregation method should be replaced in terms of an adversarially-regularized adaptive strategy which operates in terms of the adversarially-regularized gradient information, assuming the constraints of the adversarial vector norm constant (e=0.05) and norm type (l-inf). We can compare adversarial regularization techniques to prove neural structured learning is more effective given an adaptive strategy fitted to the adversarial regularization technique (C4).

from crypton.

ferasbg commented on August 17, 2024

That requires viewing this paper and using a modified set of equations in terms of the input-output variables from the equations specific to the neural structured learning process (whether that's multiple losses given adversarial neighbors --> adversarial loss, etc). This way, we can modify the constraints of our input for the adaptive strategy which just takes in the gradients of each local client model and then applies its procedure independent of secure aggregation's method definition.

from crypton.

ferasbg commented on August 17, 2024

What would make a robustness model (adv) brittle? It'd occur between the gradients after being adversarially regularized and when the local client models containing respective gradients (with either constant or different surface variations, corruptions, or other perturbations) would aggregate. This requires understanding FedAdagrad versus FedAvg.

from crypton.

ferasbg commented on August 17, 2024

Metrics

Creating all the plots for the initial paper.

Parameters & Constants

Range: 10-4000 Rounds
10-100 clients
Client Learning Rate: 0.1-1.0
Server Learning Rate: 0.1-1.0
Epsilon-Robust Perturbation Value: 0.05 (Server-Side)
Perturbation Norm Type (Infinity)

It seems that I am attempting to minimize the communication cost by reducing the number of model updates under limited adversarial samples, which can affect convergence independent of volume??

Given some relative communication budget I am not even tracking.

I change the adaptivity of the strategy and the adversarial regularization technique. These appear unrelated and uncorrelated in their effect on the convergence time (how do I measure?), the communication cost (how do I measure?), and the overall robustness of the system measured by the server-side neural network that converges on norm-bounded perturbation attacks (measured by loss and accuracy after regularization and adaptive strategy).

By using different adversarial regularization techniques, all I am doing is changing the client-level convergence time, which doesn't indicate communication cost or even the effect on the server-side model. I think that this is where discussion comes in, where you link the effect that the reduction in convergence time has on the strategy (whether it's FedAvg, FedOpt, FedAdagrad, FedYogi, FedQ). Since we are measuring various things, I'd change only 1 variable and then measure the set of plots, and then change the next variable, for all the variables I need to test in order to make my conclusions, and then synthesize the conclusion in terms of the encompassing goal (optimization formulation for robustness in a federated context).

Plots I am going to write

The plots I'd collect would be the following:

client-side regularized accuracy vs rounds (but for each regularization technique)
the server-side acc/loss (adaptive and non-adaptive federated strategies + a CONSTANT adversarial regularization technique, iterate given the previous regularization technique used, so iterate the strategies used by resetting the system and then iterating over each strategy, then changing the adv-reg technique, then repeat; at the end, change the I.V. to measure for some other variable)

Specific Details on Getting Data

Average all the client-side regularized-accuracy data for each interval m_i in the interval set M given N rounds (0 <= m <= N) and then average them all across all the clients.
Keep the norm type and norm value constant, so that the norm value and type used for the server-side model is consistent.
Make sure to differentiate between each corruption type and corruption method when it comes to plotting each regularization technique.

Questions

How do we relate communication cost and convergence time on the client-side? How do we relate adversarial regularization and federated optimization / secure aggregation? --> Needs equation derived from several papers that define each respective component. There's features within the strategy that make it advantageous to receive the gradient inputs in a certain manner (regularization affect on strategy, independent of adaptivity, then dependent on adaptivity)

Notes to keep in mind

I'd use proofs within the adversarial change and the strategy in order to actually back the explanations later that will help correlate certain understandings between conditioning local clients and minimizing model updates, simultaneously operating on less data overall (why would I do both? Because in real-world federated systems, robustness is often constrained by the incoherent optimization methods used to minimize communication cost and efficiency within the system, perhaps because adversarial regularization wasn't related to adaptive federated optimization in order to interoperably reduce communication costs)

Technical notes

Need to figure out how to remove stalling issue with the clients and the strategy (adagrad and nsl_model vs fedavg and base_model)

I have noticed that when I use the basic FedAvg, it creates problems. I should look into that, then wrap the FedAdagrad in terms of the MetricsAggegator class that tracks the actual metrics dict data.

from crypton.

ferasbg commented on August 17, 2024

Resolve the Wrapper for the strategy object, and remove erroneous code specific to accessing the History objects for the individual clients.

from crypton.

ferasbg commented on August 17, 2024

Check for examples that use aggregate_fit in their Client objects. Check for strategy_name.py files that use aggregate_fit. Check the source files that track the History objects, and check for code references for the methods in flwr.server.history.

from crypton.

ferasbg commented on August 17, 2024

Algorithm 1 FederatedAveraging. The K clients are
indexed by k; B is the local minibatch size, E is the number
of local epochs, and η is the learning rate.
Server executes:
initialize w0
for each round t = 1, 2, . . . do
m ← max(C · K, 1)
St ← (random set of m clients)
for each client k ∈ St in parallel do
w
k
t+1 ← ClientUpdate(k, wt)
wt+1 ←
PK
k=1
nk
n w
k
t+1
ClientUpdate(k, w): // Run on client k
B ← (split Pk into batches of size B)
for each local epoch i from 1 to E do
for batch b ∈ B do
w ← w − ηO`(w; b)
return w to server

from crypton.

ferasbg commented on August 17, 2024

Communication efficiency & local steps. The total communication cost of the algorithms depends on the number of communication rounds T. From Corollary 1 & 2, it is clear that a larger K leads to fewer rounds of communication as long as K = O(T σ2 l /σ2 g ). Thus, the number of local iterations can be large when either the ratio σ 2 l /σ2 g or T is large. In the i.i.d setting where σg = 0, unsurprisingly, K can be very large. (2021, Reddi, et. al)

from crypton.

ferasbg commented on August 17, 2024

Change the learning decay rather than learning rate in order to determine effect on communication cost (given adversarially-regularized condition).

from crypton.

ferasbg commented on August 17, 2024

Given that client lr increase improves convergence better than server lr increase.

from crypton.

ferasbg commented on August 17, 2024

We also know that non-adaptive methods and adaptive methods have the same communication cost.

from crypton.

ferasbg commented on August 17, 2024

We relate the model update frequency to communication cost.

from crypton.

ferasbg commented on August 17, 2024

"Standard optimization methods, such as distributed SGD, are often unsuitable in FL and can incur
high communication costs. To remedy this, many federated optimization methods use local client
updates, in which clients update their models multiple times before communicating with the server.
This can greatly reduce the amount of communication required to train a model. One such method is
FEDAVG (McMahan et al., 2017), in which clients perform multiple epochs of SGD on their local
datasets. The clients communicate their models to the server, which averages them to form a new
global model. While FEDAVG has seen great success, recent works have highlighted its convergence
issues in some settings (Karimireddy et al., 2019; Hsu et al., 2019). This is due to a variety of factors
including (1) client drift (Karimireddy et al., 2019), where local client models move away from
globally optimal models, and (2) a lack of adaptivity. FEDAVG is similar in spirit to SGD, and may
be unsuitable for settings with heavy-tail stochastic gradient noise distributions, which often arise
when training language models (Zhang et al., 2019a). Such settings benefit from adaptive learning
rates, which incorporate knowledge of past iterations to perform more informed optimization (Reddi, et. al, 2021)".

To expand on minimizing communication cost, we can further implement adversarial regularization to make client models robust and therefore reduce the number of local client updates given a reduction in convergence time (not sure).

Such redundancy is observed in my analyses of prior works on the basis of a surface-level understanding of their equations and algorithms, so to speak. A practical question to answer is where an aggregator and a regularizer act to solve the optimization problem involved with a server-side model and its available ReLU functions, taking into considerations re-formulations in priors as well as the involvement of federacy within a proof or re-formulation. Such certification can be extended to a dual Lagrangian if need be, and if consistent enough. Other than simply applying an optimization problem to an aggregator, regularizer, and client-server architecture, it'd serve to be valuable to measure the effect that the adaptivity of an aggregator has upon resolving the optimization problem that proves or certifies the robustness of the federated setup, correct?

from crypton.

ferasbg commented on August 17, 2024

Other than communication cost, certifying the system's defense based on the regularization and strategy

from crypton.

ferasbg commented on August 17, 2024

Read how and what they certify before using the equations used for FedAdagrad and NSL together.

from crypton.

ferasbg commented on August 17, 2024

possible solutions: accessing log info with current code setup with client and server files non-trivially, using simulation.py to actually run each of the experiments in a consolidated file (and to track the state of all the data in a single thread that runs many threads)

from crypton.

Implement Differential Privacy with `VectorizedDPKerasSGDOptimizer` about crypton HOT 25 CLOSED

Comments (25)

Metrics

Parameters & Constants

Plots I am going to write

Specific Details on Getting Data

Questions

Notes to keep in mind

Technical notes

Related Issues (4)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent