Giter Site home page Giter Site logo

mozilla / canosp-2019 Goto Github PK

View Code? Open in Web Editor NEW
6.0 9.0 9.0 6.05 MB

CANOSP-2019 internship project meta-repo for Private Federated Learning research project

License: Mozilla Public License 2.0

Jupyter Notebook 28.64% Python 70.01% Dockerfile 0.39% Makefile 0.75% Shell 0.21%

canosp-2019's Introduction

CircleCI

CANOSP-2019

This project implements a minimal server that can perform federated learning with differential privacy and accepts messages from clients.

Getting Started

We are using Miniconda to manage the environment. It can be installed using one of the installers available here. For MacOS, the bash installer is recommended.

Make sure to have conda init run during conda installation so that your PATH is set properly.

Installing locally to run tests

To install and run the tests for this project you can run:

# Set up the environment.
$ make setup_conda
$ conda activate mozfldp

# Run tests
$ make pytest

Running the server

You can run the server locally, serving requests on port 8000, using:

$ python -m mozfldp.server

Building a release

python setup.py sdist

Running from Docker

The server can also be built and run as a Docker container. First, install Docker.

Once you have Docker installed, you can build the container and run tests using:

$ make build_image
$ make docker_tests

To run the service in the container, use:

$ make up

Note that in the above command, we are exposing the container's port 8000 by binding it to port 8090 on the host computer.

Sending data to the server

You can submit arbitrary JSON blobs to the server using HTTP POST.

A sample curl invocation that will work is:

curl -X POST http://127.0.0.1:8000/api/v1/compute_new_weights

{"result":"ok","weights":[[[0.0,0.0,0.0,.... }

Note: If you are running locally, the port will be 8000. Port 8090 is used if you are running in a docker container.

canosp-2019's People

Contributors

bgluth avatar crankycoder avatar dzeber avatar jason-cooke avatar maharshmellow avatar mlopatka avatar shivansh2407 avatar zhaoyuxuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

canosp-2019's Issues

Implement actual minibatch learning

The federated client update algorithm calls for splitting a client's data into "minibatches" of size B and updating the weights once per minibatch. The SGDClassifier doesn't in fact support this (there is an open issue) - it updates the weights once for each training example, which is what is traditionally meant by SGD, "stochastic" because it traverses the dataset in a random order (when shuffle=True).

Using partial_fit on a batch still causes one weight update per training example in the batch, so this isn't really getting us minibatch learning. The difference with fit is that it starts from the current state of the model and runs 1 weight update for each example in the supplied dataset, which may be different from the "main" dataset.

However, we can compute the minibatch update by averaging the pointwise updates across the minibatch. Given a minibatch of data x = (x_1,...,x_k), we want to compute w = w_0 + \eta \dell E(w_0, x). But since the error on the dataset E is the average of the error on each individual training example, E(w, x) = 1\n \sum_i E(w, x_i), we get the same result by computing pointwise updates w^{(i)} = w_0 + \eta \dell E(w_0, x_i) (as is done at each step by the SGDClassifier) and averaging the w^{(i)}: w = 1/n \sum_i w^{(i)}. The difference is that the built-in method updates the weights progressively, whereas we want the initial w_0 to be the same for each i.

I think this can be done by using a similar approach to the current version of client_update, for each training example in the minibatch resetting the weights to the initial weights on entering the minibatch and running partial_fit on that training example. We would then average the weights at the end of the minibatch. We may want to create a fresh classifier instance each time rather than keep reusing the same one, because if we reuse, the internal step counter will keep incrementing. We should check the source code to see whether that will have any unintended consequences.

One outstanding issue is the choice of learning rate \eta. The default choice changes on each "time step", ie each per-example weight update. Let's pass this through as a param we handle. We can keep it constant for now, and maybe implement something adaptive later on.

Implement a Client class

Currently the data is loaded and apportioned to the clients by server_update. It would be better, on setting up the runner, to instantiate multiple Client instances that each manage their own data. client_update could then be a method of Client, and the server piece would receive model updates from clients and combine them.

Allow for customization of the SGDClassifier

We should be able to use custom settings for the SGDClassifier params and have these carry across the full simulation.

It may be worthwhile writing a "Classifier" wrapper class around SGDClassifier that we can instantiate at the beginning and pass into to the runner. This handle setting params, and it could also be made to handle the minibatch update gymnastics described in #30 (eg. a batch_update function which internally spawns clones of the SGDClassifier to compute the single-point updates and combine them, etc).

Tests

Certain portions of the project are still missing test coverage. In particular, simulation_runner.py

Outstanding error in Federated_Learning_Simulation notebook

Flagging that the issue with the notebook crashing, raised in #23, is still outstanding. The error is actually occurring in simulation_utils.client_update. Seems to be something to do with the conversion steps of the parameters from coef/intercept to a combined list to separate np arrays.

We might want to refactor to keep coef and intercept separate everywhere to avoid having to do conversions, since this is what the underlying classifier uses.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.