Giter Site home page Giter Site logo

etesami / fedscale Goto Github PK

View Code? Open in Web Editor NEW

This project forked from symbioticlab/fedscale

0.0 0.0 0.0 56.28 MB

FedScale: Benchmarking Model and System Performance of Federated Learning

License: Apache License 2.0

Shell 1.63% Python 76.86% MATLAB 0.24% C++ 1.91% Cuda 9.09% C 8.44% Cython 1.83%

fedscale's Introduction

FedScale: Benchmarking Model and System Performance of Federated Learning (Paper)

This repository contains scripts and instructions of building FedScale, a diverse set of challenging and realistic benchmark datasets to facilitate scalable, comprehensive, and reproducible federated learning (FL) research. FedScale datasets are large-scale, encompassing a diverse range of important FL tasks, such as image classification, object detection, language modeling, speech recognition, and reinforcement learning. For each dataset, we provide a unified evaluation protocol using realistic data splits and evaluation metrics. To meet the pressing need for reproducing realistic FL at scale, we have also built an efficient evaluation platform, FedScale Automated Runtime (FAR), to simplify and standardize the process of FL experimental setup and model evaluation. Our evaluation platform provides flexible APIs to implement new FL algorithms and include new execution backends with minimal developer efforts.

FedScale is open-source with permissive licenses and actively maintained, and we welcome feedback and contributions from the community! If you have any questions or comments, please join our Slack channel.

Overview

Getting Started

Our install.sh will install the following automatically:

  • Anaconda Package Manager
  • CUDA 10.2

Note: if you prefer different versions of conda and CUDA, please check comments in install.sh for details.

Run the following commands to install FedScale.

git clone https://github.com/SymbioticLab/FedScale
cd FedScale
source install.sh 

Realistic FL Datasets

We are adding more datasets! Please feel free to contribute!

We provide real-world datasets for the federated learning community, and plan to release much more soon! Each is associated with its training, validation and testing dataset. A summary of statistics for training datasets can be found in Table, and you can refer to each folder for more details. Due to the super large scale of datasets, we are uploading these data and carefully validating their implementations to FAR. So we are actively making each dataset available for FAR experiments.

CV tasks:

Dataset Data Type # of Clients # of Samples Example Task
iNature Image 2,295 193K Classification
FMNIST Image 3,400 640K Classification
OpenImage Image 13,771 1.3M Classification, Object detection
Google Landmark Image 43,484 3.6M Classification
Charades Video 266 10K Action recognition
VLOG Video 4,900 9.6k Video classification, Object detection

NLP tasks:

Dataset Data Type # of Clients # of Samples Example Task
Europarl Text 27,835 1.2M Text translation
Blog Corpus Text 19,320 137M Word prediction
Stackoverflow Text 342,477 135M Word prediction, classification
Reddit Text 1,660,820 351M Word prediction
Amazon Review Text 1,822,925 166M Classification, Word prediction
CoQA Text 7,189 114K Question Answering
LibriTTS Text 2,456 37K Text to speech
Google Speech Audio 2,618 105K Speech recognition
Common Voice Audio 12,976 1.1M Speech recognition

Misc Applications:

Dataset Data Type # of Clients # of Samples Example Task
Taobao Text 182,806 0.9M Recommendation
Go dataset Text 150,333 4.9M Reinforcement learning

Note that no details were kept of any of the participants age, gender, or location, and random ids were assigned to each individual. In using these datasets, we will strictly obey to their licenses, and these datasets provided in this repo should be used for research purpose only.

Please go to ./dataset directory and follow the dataset README for more details.

Run Experiments with FAR

FedScale Automated Runtime (FAR), an automated and easily-deployable evaluation platform, to simplify and standardize the FL experimental setup and model evaluation under a practical setting. FAR is based on our Oort project, which has been shown to scale well and can emulate FL training of thousands of clients in each round.

FAR enables the developer to benchmark various FL efforts with practical FL data and metrics

Please go to ./core directory and follow the FAR README to set up FL training scripts.

Repo Structure

Repo Root
|---- dataset     # Realistic datasets in FedScale
|---- core        # Experiment platform of FedScale
    |---- examples  # Examples of new plugins
    |---- evals     # Backend of job submission
    

Notes

please consider to cite our paper if you use the code or data in your research project.

@inproceedings{fedscale-arxiv,
  title={FedScale: Benchmarking Model and System Performance of Federated Learning},
  author={Fan Lai and Yinwei Dai and Xiangfeng Zhu and Mosharaf Chowdhury},
  booktitle={arXiv:2105.11367},
  year={2021}
}

and

@inproceedings{oort-osdi21,
  title={Oort: Efficient Federated Learning via Guided Participant Selection},
  author={Fan Lai and Xiangfeng Zhu and Harsha V. Madhyastha and Mosharaf Chowdhury},
  booktitle={USENIX Symposium on Operating Systems Design and Implementation (OSDI)},
  year={2021}
}

Contact

Fan Lai ([email protected]), Yinwei Dai ([email protected]), Xiangfeng Zhu ([email protected]) and Mosharaf Chowdhury from the University of Michigan.

fedscale's People

Contributors

fanlai0990 avatar dywsjtu avatar samuelgong avatar romero027 avatar amberljc avatar singam-sanjay avatar etesami avatar chuheng001 avatar qinyeli avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.