google / fedjax Goto Github PK

View Code? Open in Web Editor NEW

249.0 11.0 41.0 851 KB

FedJAX is a JAX-based open source library for Federated Learning simulations that emphasizes ease-of-use in research.

License: Apache License 2.0

Python 89.84% Jupyter Notebook 9.22% Shell 0.93%

jax federated-learning

fedjax's Introduction

FedJAX: Federated learning simulation with JAX

Documentation | Paper

NOTE: FedJAX is not an officially supported Google product. FedJAX is still in the early stages and the API will likely continue to change.

What is FedJAX?

FedJAX is a JAX-based open source library for Federated Learning simulations that emphasizes ease-of-use in research. With its simple primitives for implementing federated learning algorithms, prepackaged datasets, models and algorithms, and fast simulation speed, FedJAX aims to make developing and evaluating federated algorithms faster and easier for researchers. FedJAX works on accelerators (GPU and TPU) without much additional effort. Additional details and benchmarks can be found in our paper.

Installation

You will need a moderately recent version of Python. Please check the PyPI page for the up to date version requirement.

First, install JAX. For a CPU-only version:

pip install --upgrade pip
pip install --upgrade jax jaxlib  # CPU-only version

For other devices (e.g. GPU), follow these instructions.

Then, install FedJAX from PyPI:

pip install fedjax

Or, to upgrade to the latest version of FedJAX:

pip install --upgrade git+https://github.com/google/fedjax.git

Getting Started

Below is a simple example to verify FedJAX is installed correctly.

import fedjax
import jax
import jax.numpy as jnp
import numpy as np

# {'client_id': client_dataset}.
fd = fedjax.InMemoryFederatedData({
    'a': {
        'x': np.array([1.0, 2.0, 3.0]),
        'y': np.array([2.0, 4.0, 6.0]),
    },
    'b': {
        'x': np.array([4.0]),
        'y': np.array([12.0])
    }
})
# Initial model parameters.
params = jnp.array(0.5)
# Mean squared error.
mse_loss = lambda params, batch: jnp.mean(
    (jnp.dot(batch['x'], params) - batch['y'])**2)
# Loss for clients 'a' and 'b'.
print(f"client a loss = {mse_loss(params, fd.get_client('a').all_examples())}")
print(f"client b loss = {mse_loss(params, fd.get_client('b').all_examples())}")

The following tutorial notebooks provide an introduction to FedJAX:

You can also take a look at some of our working examples:

Citing FedJAX

To cite this repository:

@article{fedjax2021,
  title={{F}ed{JAX}: Federated learning simulation with {JAX}},
  author={Jae Hun Ro and Ananda Theertha Suresh and Ke Wu},
  journal={arXiv preprint arXiv:2108.02117},
  year={2021}
}

Useful pointers

fedjax's People

Contributors

Stargazers

Watchers

Forkers

alshedivat jaehunro iamthatiam777 alabid akaanirban kho methimpact dedsec-9 tangx-yy isabella232 stheertha saipraneet anukaal degregat modestgoblin stjordanis shubhammittal98 ai-hub-deep-learning-fundamental omeremhan yang-zheming metavai nerdai ichiruchan aouedions11 yushuaiji marcociccone mahi97 python-repository-hub pp-qq cmoyacal amitport mbrukman siabdullah4 jungsungjae lando-l ethicalsecurity-agency ghas-results mvandermeulen

fedjax's Issues

Implement standard CIFAR-100 model in fedjax.models.cifar100

Add a standard implementation of the model for the CIFAR-100 task. The dataset can be found in fedjax.datasets.cifar100.

For the model architecture, we should follow “Adaptive Federated Optimization”. The model architecture is detailed in section 4 as a ResNet-18 (replacing batch norm with group norm). Code for this paper and a Keras implementation of the model can be found here. We suggest using either haiku or flax to implement the model for use with JAX.

If you choose to use haiku, you can use fedjax.create_model_from_haiku to create a fedjax compatible model. If you choose to use flax, wrapping it in a fedjax.Model is fairly straightforward and we can provide guidance for this.

A good example to follow is #265 that checks in a simple linear model for CIFAR-100 and includes the model implementation, tests, and baseline results with FedAvg using this script. Make sure to add a flags file similar to https://github.com/google/fedjax/blob/main/experiments/fed_avg/fed_avg.CIFAR100_LOGISTIC.flags and add the new task to https://github.com/google/fedjax/blob/main/fedjax/training/tasks.py.

Thanks for your contributions!

Problem of Quick Start in Readme.md

I tried to run the code in the QuickStart and I found some problems.
federated_data = fedjax.FederatedData() can not be executed because it is an abstract class. So I replaced it as

client_a_data = {
        'x': np.array([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]),
        'y': np.array([7, 8])
    }
client_b_data = {'x': np.array([[9.0, 10.0, 11.0]]), 'y': np.array([12])}
client_to_data_mapping = {'a': client_a_data, 'b': client_b_data}
federated_data = fedjax.InMemoryFederatedData(client_to_data_mapping)

The other things are same as the QuickStart, but i got an error

for client_id, client_output, _ in func(shared_input, clients):
for client_id, client_batches, client_input in clients:
ValueError: not enough values to unpack (expected 3, got 2)

It seems that client_batches is missing and we need to batch the dataset, but there is no example which fits this situation.

Clarifying the meaning of "weight"

In the Intro notebook, the backward_pass_output from model.backward has a weight feature.
It seems to me that this is used for performing a weighted averaging in FedAvg, but this is not clear to me how. Perhaps this can be renamed to batch_size?

Feature request: Convert standard dataset into a federated dataset

Synthetic federated datasets can constructed from standard centralized ones by artificially splitting them among clients. This is usually done using a Dirichlet distribution (e.g. Hsu et al. 2019).
Such synthetic datasets are very useful since we can explicitly control the total number of users, as well as the heterogeneity.

It would be great to have primitives which can automatically convert standard numpy dataset into a FedJax datset.

Full EMNIST example does not exhibit parallelization

Hi! I am facing an issue with parallelizing the base code provided by the developers.

My local workstation contains two GPUs.
I installed FedJax in a conda environment
I downloaded "emnist_fed_avg.py" file from the folder "examples", deleted the "fedjax.training.set_tf_cpu_only()" line and replaced fed_avg.federated_averaging to fedjax.algorithms.fed_avg.federated_averaging on line 61
Having activated the conda environment, I ran the file with python emnist_fed_avg.py. The file runs correctly and prints the expected output (round nums and train/test metrics on each 10th round)
The nvidia-smi command shows zero percent utilization and almost zero memory usage on one of the GPUs (and ~40% utilization/maximum memory usage on another node)

Any ideas what I am doing wrong?

Support for haiku models with non-trainable state

Hi!
congrats on this great library! I've started using it a few days ago and I love it!

Is there any way to use a haiku model with a non-trainable state (e.g. to use batch norm)?
I didn't find any nontrivial way, but maybe I'm missing something.

Thanks a lot for your help!

FedJax depends on TensorFlow Federated?

I am helping users install FedJax for use in their federated learning research projects and I noticed that installing FedJax is pulling in TensorFlow Federated (0.17) and TensorFlow (2.3). I don't see either of these listed as dependencies of FedJax so I am trying to understand why they are being pulled in by pip install fedjax.

Support for manually modifying client/server learning rate

Hi,
I'm playing around with clients learning rate but I cannot find a clean way of modifying it.

Basically, I need to change the LR following a schedule based on the current round.
Is that possible?

Thanks

CIFAR 100 Questions

Hi, thanks for the awesome library! I want to ask a couple of questions related to CIFAR100 datasets.

I noticed that while the dataset is available in the library, the model is not. Curious if a model for CIFAR100 is work-in-progress, or if there is no short-term plan for this?
Looking at the CIFAR100 dataset, this seems to be inconsistent with Google's TFF. Notably, the cropping size and normalizing are done differently from TFF. Is this intentional? Would it be correct to say that we could expect this to mirror TFF's design eventually?

Thanks in advance for all the help!

Support for gldv2 and inaturalist datasets

I think it would be great to port these datasets from tff to fedjax.
I would be happy to make the effort and contribute to the library, but I need a bit of support from the fedjax team 🙂

By looking at the tff codebase (gldv2, inaturalist) it looks that load_data_from_cache function creates a tfrecords file for each client.

The only concrete classes that I see are SQLiteFederatedData and InMemoryFederatedData, but I don't think they are meant for this use case. What would be the best way to map the clients into a FederatedDataset?
We could replicate something like FilePerUserClientData.

Thanks!

Implementing SCAFFOLD

It might be a good idea to have an implementation of SCAFFOLD as well in the algorithms. I think this can be done by modifying the existing Mime implementation.

PatentMime^

tensorflow/federated#1950
🧾 works well for sort(ing) new patent(s) comparably with the older suite. Hath structural kinks, wh0 doesn’t er which’s worry, 🎗️ ;]

Add support for stateful clients

At this moment I don't see how to implement a fedjax.FederatedAlgorithm with stateful clients. Which would be necessary to implement personalised federated algorithms. It would be great to include an example similar to the one e.g. in TensorflowFederated.

Error of the Stackoverflow Tokernizer example

TensorFlow version: 2.5.3
fedjax version: 0.0.16
jax version: 0.4.8

When I follow the docs (https://fedjax.readthedocs.io/en/latest/fedjax.datasets.html#fedjax.datasets.stackoverflow.load_data) to process the Stackoverflow dataset by using

from fedjax.datasets import stackoverflow
# Load partially preprocessed splits.
train, held_out, test = stackoverflow.load_data(cache_dir='../data')
# Apply tokenizer during batching.
Tokenizer = stackoverflow.StackoverflowTokenizer()
train_max_length, eval_max_length = 20, 30
train_for_train = train.preprocess_batch(
    tokenizer.as_preprocess_batch(train_max_length))
train_for_eval = train.preprocess_batch(
    tokenizer.as_preprocess_batch(eval_max_length))

It has the following error:

2023-05-06 23:46:33.460149: W tensorflow/core/platform/cloud/google_auth_provider.cc:184] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Failed precondition: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Could not resolve host: metadata".
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    tokenizer = stackoverflow.StackoverflowTokenizer()
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/fedjax/datasets/stackoverflow.py", line 185, in __init__
    self._table = tf.lookup.StaticVocabularyTable(
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/ops/lookup_ops.py", line 1255, in __init__
    raise TypeError("Invalid key dtype, expected one of %s, but got %s." %
TypeError: Invalid key dtype, expected one of (tf.int64, tf.string), but got <dtype: 'float32'>.
Exception ignored in: <function CapturableResource.__del__ at 0x2b2156f4c040>
Traceback (most recent call last):
  File "/home/yik/anaconda2/envs/fl/lib/python3.8/site-packages/tensorflow/python/training/tracking/tracking.py", line 269, in __del__
    with self._destruction_context():
AttributeError: 'StaticVocabularyTable' object has no attribute '_destruction_context'

Could you please help fix this?