Giter Site home page Giter Site logo

dingke / geeps Goto Github PK

View Code? Open in Web Editor NEW

This project forked from cuihenggang/geeps

0.0 2.0 0.0 126 KB

GPU-specialized parameter server for GPU machine learning.

License: BSD 3-Clause "New" or "Revised" License

Python 1.09% Makefile 0.08% C++ 95.03% Shell 0.48% Cuda 3.33%

geeps's Introduction

GeePS

License

GeePS is a parameter server library that scales single-machine GPU machine learning applications (such as Caffe) to a cluster of machines.

Download and build GeePS and Caffe application

Run the following command to download GeePS and (our slightly modified) Caffe:

git clone --recurse-submodules https://github.com/cuihenggang/geeps.git

If you use the Ubuntu 14.04 system, you can run the following commands (from geeps root directory) to install the dependencies:

./scripts/install-geeps-deps-ubuntu14.sh
./scripts/install-caffe-deps-ubuntu14.sh

Also, please make sure your CUDA library is installed in /usr/local/cuda.

After installing the dependencies, you can build GeePS by simply running this command from geeps root directory:

scons -j8

You can then build (our slightly modified) Caffe by first entering the apps/caffe directory and then running make -j8:

cd apps/caffe
make -j8

Caffe's CIFAR-10 example on two machines

You can run Caffe distributedly across a cluster of machines with GeePS. In this section, we will show you the steps to run Caffe's CIFAR-10 example on two machines.

All commands in this section are executed from the apps/caffe directory:

cd apps/caffe

You will first need to prepare a machine file as examples/cifar10/2parts/machinefile, with each line being the host name of one machine. Since we use two machines in this example, this machine file should have two lines, such as:

host0
host1

We will use pdsh to launch commands on those machines with the ssh protocol, so please make sure that you can ssh to those machines without password.

When you have your machine file in ready, you can run the following command to download and prepare the CIFAR-10 dataset:

./data/cifar10/get_cifar10.sh
./examples/cifar10/2parts/create_cifar10_pdsh.sh

Our script will partition the datasets into two parts, one for each machine. You can then train an Inception network on it with this command:

./examples/cifar10/2parts/train_inception.sh

Please look at our wiki for more details. Happy training!

Reference Paper

Henggang Cui, Hao Zhang, Gregory R. Ganger, Phillip B. Gibbons, and Eric P. Xing. GeePS: Scalable Deep Learning on Distributed GPUs with a GPU-Specialized Parameter Server. In ACM European Conference on Computer Systems, 2016 (EuroSys'16)

geeps's People

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.