MIGProfiler is a toolkit for benchmark study on NVIDIA MIG techniques. It provides profiling on multiple deep learning training and inference tasks on MIG GPUs.
MIGProfiler is featured for:
- ๐จ Support a lot of deep learning tasks and open-sourced models on a various of benchmark type
- ๐ Present comprehensive benchmark results
- ๐ฃ Easy to use with a configuration file (WIP)
The project is under rapid development! Please check our benchmark website and join us!
Coming soon!
Requirements:
- PyTorch with CUDA
- OpenCV
- Sanic
- Transformers
- Tqdm
- Prometheus client
# create virtual environment
conda create -n mig-perf python=3.8
conda activate mig-perf
# install required packages
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c conda-forge opencv
pip install transformers
pip install sanic tqdm prometheus_client
WIP
WIP
You can easily to profile on MIG GPU. Below are some common deep learning tasks to play with.
We first create a 1g.10gb
MIG device
# enable MIG
sudo nvidia-smi -i 0 -mig 1
# create MIG instance
sudo nvidia-smi mig -cgi 1g.10gb -C
Start DCGM metric exporter
docker run -d --rm --gpus all --net mig_perf -p 9400:9400 \
-v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
--name dcgm_exporter --cap-add SYS_ADMIN nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
-c 500 -f /etc/dcgm-exporter/customized.csv -d f
Start to profile
cd mig_perf/profiler
export PYTHONPATH=$PWD
python train/train_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0
Remeber to disable MIG after finish benchmark
sudo nvidia-smi -i 0 -dci
sudo nvidia-smi -i 0 -dgi
sudo nvidia-smi -i 0 -mig 0
Start DCGM metric exporter
docker run -d --rm --gpus all --net mig_perf -p 9400:9400 \
-v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
--name dcgm_exporter --cap-add SYS_ADMIN nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
-c 500 -f /etc/dcgm-exporter/customized.csv -d f
Start to profile
cd mig_perf/profiler
export PYTHONPATH=$PWD
python client/block_infernece_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0
See more benchmark experiments in ./exp
.
- in notebook
- in Prometheus (under improvement)
@article{zhang2022migperf,
title={MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs},
author={Zhang, Huaizheng and Li, Yuanming and Xiao, Wencong and Huang, Yizheng and Di, Xing and Yin, Jianxiong and See, Simon and Luo, Yong and Lau, Chiew Tong and You, Yang},
journal={arXiv preprint arXiv:2301.00407},
year={2023}
}
- Yuanming Li
- Huaizheng Zhang
- Yizheng Huang
- Xing Di
Special thanks to Aliyun and NVIDIA AI Tech Center to provide MIG GPU server for benchmarking.
This repository is open-sourced under MIT License.