DeepPlan

Title: Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access

1.Experimental Environment

1.1 Hardware

AWS P3.8xlarge instance
GPU: NVIDIA V100 (16GB) x 4ea
Memory: 244GB DDR4 DRAM
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
NVLink 2.0
PCIe 3.0

For EuroSys '23 Artifact Evaluation Committee, we can provide the AWS instance we used if you don't have any machine that satisfies the requirements. Let us know through the HotCRP portal.

1.2 Software requirements

Operating system: Ubuntu 18.04
CUDA v11.3
CuDNN v8.2.1
ProtoBuf v3.11.4
Boost v1.65
TBB (Threading Building-Blocks) v2017_U7
PyTorch v1.9
Matplotlib v3.3.4 (for generating graphs)

2. Build software components

2.1 Dependent packages

build-essential

$ sudo apt update
$ sudo apt install build-essential

C++ Library on Ubuntu

$ sudo apt-get install libtbb-dev libboost1.65-all-dev

CUDA Toolkit v11.3 & CuDNN v8.2.1

DeepPlan works with the PyTorch DL framework. To run PyTorch, we are supposed to install the dependent packages, CUDA and CuDNN.

To install the CUDA Toolkit, see this link: Download Installer for Linux Ubuntu 18.04 x86_64

To install the CuDNN Library, see this link: Installation Guide and CuDNN Archive

ProtoBuf v3.11.4

DeepPlan uses the ProtoBuf library to serialize or deserialize plans. So, ProtoBuf is required to build DeepPlan. To install ProtoBuf, see this following link: https://github.com/protocolbuffers/protobuf/blob/main/src/README.md

2.2 PyTorch

To use DeepPlan, it is required to modify PyTorch (v1.9) framework. To simplify the step reflecting the code changes on the framework, we have provided a patch file for DeepPlan. The following command applies the patch to the PyTorch v1.9.0.

$ cd $HOME
$ # Let's first clone the DeepPlan repository and set the path
$ git clone https://github.com/csl-ajou/DeepPlan/
$ DEEPPLAN_HOME=$HOME/DeepPlan
$
$ # Let's download the PyTorch v1.9.0 package and set the path
$ git clone --recursive https://github.com/pytorch/pytorch -b v1.9.0
$ PYTORCH_HOME=$HOME/pytorch
$
$ cd $PYTORCH_HOME
$ patch -p1 < $DEEPPLAN_HOME/pytorch.patch

After applying the patch file, let's compile the PyTorch.

$ python3 setup.py install

In addition to PyTorch, install pip modules using the command below, from DeepPlan's Home directory.

$ cd $DEEPPLAN_HOME
$ pip3 install -r requirements.txt

2.3 DeepPlan

After successfully patching and building the PyTorch framework, we are ready to build DeepPlan to generate inference execution plans and the DL server prototype.

$ cd $DEEPPLAN_HOME
$ mkdir build
$ cd build
$ cmake -DCMAKE_PREFIX_PATH=$PYTORCH_HOME ..
$ make

3. Setup execution plans

You need to create a plan for a given model. In this tutorial, our target is ResNet50. The python module, plan.py, already imports the pre-trained models evaluated in the paper so that you can simply type the name of the model.

# Create Plan
$ cd $DEEPPLAN_HOME
$ mkdir -p plan_repo
$ python3 plan.py -m resnet50 -p plan_repo
# The generated plan from this command is saved the plans directory

If you want to take a look at generated plans (Table 3 in the paper), you can click the following links.

Plans

4. Run benchmarks

Once DeepPlan generate the execution plan for a given model, you can run the model inference with the DeepPlan engine through the commands below, from DeepPlan's Home directory. Here, we have an example for ResNet50. In this section, we describe how to run four different execution methods, Baseline (on-demand), PipeSwitch, DeepPlan (DHA), DeepPlan (PT), and DeepPlan (PT+DHA), explained in our paper.

Before running the model inference, you have to set PLAN_REPO environment variable which represents where plans are stored.

# The plan repository should be the same as the path specified in above creating a plan
$ export PLAN_REPO=$DEEPPLAN_HOME/plan_repo
$ cd $DEEPPLAN_HOME

Baseline (on-demand)

$ ./build/benchmark -m resnet50 -e demand