Giter Site home page Giter Site logo

migprofiler's Introduction

MIG Profiler

GitHub

MIGProfiler is a toolkit for benchmark study on NVIDIA MIG techniques. It provides profiling on multiple deep learning training and inference tasks on MIG GPUs.

MIGProfiler is featured for:

  • ๐ŸŽจ Support a lot of deep learning tasks and open-sourced models on a various of benchmark type
  • ๐Ÿ“ˆ Present comprehensive benchmark results
  • ๐Ÿฃ Easy to use with a configuration file (WIP)

The project is under rapid development! Please check our benchmark website and join us!

Benchmark Website ๐Ÿ“ˆ

Coming soon!

Install ๐Ÿ“ฆ๏ธ

Manual install

Requirements:

  • PyTorch with CUDA
  • OpenCV
  • Sanic
  • Transformers
  • Tqdm
  • Prometheus client
# create virtual environment
conda create -n mig-perf python=3.8
conda activate mig-perf

# install required packages
conda install pytorch torchvision pytorch-cuda=11.6 -c pytorch -c nvidia
conda install -c conda-forge opencv
pip install transformers
pip install sanic tqdm prometheus_client

PyPI install

WIP

Use Docker

WIP

Quick Start ๐Ÿšš

You can easily to profile on MIG GPU. Below are some common deep learning tasks to play with.

1. MIG training benchmark

We first create a 1g.10gb MIG device

# enable MIG
sudo nvidia-smi -i 0 -mig 1
# create MIG instance
sudo nvidia-smi mig -cgi 1g.10gb -C

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python train/train_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

Remeber to disable MIG after finish benchmark

sudo nvidia-smi -i 0 -dci
sudo nvidia-smi -i 0 -dgi
sudo nvidia-smi -i 0 -mig 0

2. MIG inference benchmark

Start DCGM metric exporter

docker run -d --rm --gpus all --net mig_perf -p 9400:9400  \
    -v "${PWD}/mig_perf/profiler/client/dcp-metrics-included.csv:/etc/dcgm-exporter/customized.csv" \
    --name dcgm_exporter --cap-add SYS_ADMIN   nvcr.io/nvidia/k8s/dcgm-exporter:2.4.7-2.6.11-ubuntu20.04 \
    -c 500 -f /etc/dcgm-exporter/customized.csv -d f

Start to profile

cd mig_perf/profiler
export PYTHONPATH=$PWD
python client/block_infernece_cv.py --bs=32 --model=resnet50 --num_batches=500 --mig-device-id=0

See more benchmark experiments in ./exp.

3. Visualize

  • in notebook
  • in Prometheus (under improvement)

Cite Us ๐ŸŒฑ

@article{zhang2022migperf,
  title={MIGPerf: A Comprehensive Benchmark for Deep Learning Training and Inference Workloads on Multi-Instance GPUs},
  author={Zhang, Huaizheng and Li, Yuanming and Xiao, Wencong and Huang, Yizheng and Di, Xing and Yin, Jianxiong and See, Simon and Luo, Yong and Lau, Chiew Tong and You, Yang},
  journal={arXiv preprint arXiv:2301.00407},
  year={2023}
}

Contributors ๐Ÿ‘ฅ

  • Yuanming Li
  • Huaizheng Zhang
  • Yizheng Huang
  • Xing Di

Ackowledgement

Special thanks to Aliyun and NVIDIA AI Tech Center to provide MIG GPU server for benchmarking.

License

This repository is open-sourced under MIT License.

migprofiler's People

Contributors

dixing0908 avatar yuanmingleee avatar huangyz0918 avatar huaizhengzhang avatar swagshaw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.