Giter Site home page Giter Site logo

mlperf_storage's Introduction

MLPerf™ Storage Benchmark Suite

MLPerf Storage is a benchmark suite to characterize the performance of storage systems that support machine learning workloads.

Overview

This section describes how to use the MLPerf™ Storage Benchmark to measure the performance of a storage system supporting a compute cluster running AI/ML training tasks.

This benchmark attempts to balance two goals:

  1. Comparability between benchmark submissions to enable decision making by the AI/ML Community.
  2. Flexibility to enable experimentation and to show off unique storage system features that will benefit the AI/ML Community.

To that end we have defined two classes of submissions: CLOSED and OPEN.

CLOSED represents a level playing field where all() results are comparable across submissions. CLOSED explicitly forfeits flexibility in order to enable easy comparability. () Since the benchmark supports both PyTorch and TensorFlow data formats, and those formats apply such different loads to the storage system, cross-format comparisons are not appropriate, even with CLOSED submissions. Thus, only comparisons between CLOSED PyTorch runs, or comparisons between CLOSED TensorFlow runs, are comparable. As new data formats like PyTorch and TensorFlow are added to the benchmark that categorization will grow.

OPEN allows more flexibility to tune and change both the benchmark and the storage system configuration to show off new approaches or new features that will benefit the AI/ML Community. OPEN explicitly forfeits comparability to allow showcasing innovation.

Benchmark output metric

For each workload, the benchmark output metric is accelerator under-utilization (AUU), where lower is better. AUU is computed as follows. The total ideal compute time is derived from the batch size, total dataset size, number of simulated accelerators, and sleep time: total_compute_time = (records/file * total_files)/simulated_accelerators/batch_size * sleep_time. Then AUU is computed as follows:

AUU = (total_benchmark_running_time - total_compute_time)/total_compute_time

Note that the sleep time has been determined by running the workloads including the compute step on real hardware and is dependent on the accelerator type. In this preview package we include sleep times for NVIDIA V100 GPUs, as measured in an NVIDIA DGX-1 system.

In addition to AUU, submissions are expected to report details such as the number of MPI processes run on the DLIO host, as well as the amount of main memory on the DLIO host.

Future work

In a future version of the benchmark, the MLPerf Storage WG plans to add support for the “data preparation” phase of AI/ML workload as we believe that is a significant load on a storage system and is not well represented by existing AI/ML benchmarks, but the current version only requires a static copy of the dataset exist on the storage system before the start of the run.

In a future version of the benchmark, the MLPerf Storage WG plans to add support for benchmarking a storage system while running more than one MLPerf Storage benchmark at the same time (ie: more than one Training job type, such as 3DUnet and Recommender at the same time), but the current version requires that a submission only include one such job type per submission.

In a future version of the benchmark, we aim to include sleep times for different accelerator types, including different types of GPUs and other ASICS.

Installation

Install dependencies using your system package manager.

  • mpich for MPI package
  • sysstat for iostat package

For eg: when running on Ubuntu OS,

sudo apt-get install mpich sysstat

Clone the latest release from MLCommons Storage repository and install Python dependencies.

git clone -b v0.5-rc0 --recurse-submodules https://github.com/mlcommons/storage.git
cd storage
pip3 install -r dlio_benchmark/requirements.txt

The working directory structure is as follows

|---storage
       |---benchmark.sh
       |---dlio_benchmark
       |---storage-conf
           |---workload(folder contains configs of all workloads)
               |---unet3d.yaml
               |---bert.yaml

The benchmark simulation will be performed through the dlio_benchmark code, a benchmark suite for emulating I/O patterns for deep learning workloads. dlio_benchmark currently is listed as a submodule to this MLPerf Storage repo. The DLIO configuration of each workload is specified through a yaml file. You can see the configs of all MLPerf Storage workloads in the storage-conf folder. benchmark.sh is a wrapper script which launches dlio_benchmark to perform the benchmark for MLPerf Storage workloads.

./benchmark.sh -h

Usage: ./benchmark.sh [datagen/run/configview/reportgen] [options]
Script to launch the MLPerf Storage benchmark.

Configuration

The benchmark suite consists of 3 distinct phases

  1. Synthetic data is generated based on the workload requested by the user.
./benchmark.sh datagen -h

Usage: ./benchmark.sh datagen [options]
Generate benchmark dataset based on the specified options.


Options:
  -h, --help			Print this message
  -c, --category		Benchmark category to be submitted. Possible options are 'closed'(default)
  -w, --workload		Workload dataset to be generated. Possible options are 'unet3d', 'bert'
  -n, --num-parallel		Number of parallel jobs used to generate the dataset
  -r, --results-dir		Location to the results directory. Default is ./results/workload.model/DATE-TIME
  -p, --param			DLIO param when set, will override the config file value

Example:

For generating training data for unet3d workload into unet3d_data directory with 10 subfolders using 8 parallel jobs,

./benchmark.sh datagen --workload unet3d --num-parallel 8 --param dataset.num_subfolders_train=10 --param dataset.data_folder=unet3d_data
  1. Benchmark is run on the generated data. Device stats are collected continuously using iostat profiler during the benchmark run.
./benchmark.sh run -h

Usage: ./benchmark.sh run [options]
Run benchmark on the generated dataset based on the specified options.


Options:
  -h, --help			Print this message
  -c, --category		Benchmark category to be submitted. Possible options are 'closed'(default)
  -w, --workload		Workload to be run. Possible options are 'unet3d', 'bert'
  -g, --accelerator-type	Simulated accelerator type used for the benchmark. Possible options are 'v100-32gb'(default)
  -n, --num-accelerators	Simulated number of accelerators of same accelerator type
  -r, --results-dir		Location to the results directory. Default is ./results/workload.model/DATE-TIME
  -p, --param			DLIO param when set, will override the config file value

Example:

For running benchmark on unet3d workload with data located in unet3d_data directory using 4 accelerators and results on unet3d_results directory ,

./benchmark.sh run --workload unet3d --num-accelerators 4 --results-dir unet3d_results --param dataset.data_folder=unet3d_data
  1. Reports are generated from the benchmark results
./benchmark.sh reportgen -h

Usage: ./benchmark.sh reportgen [options]
Generate a report from the benchmark results.


Options:
  -h, --help			Print this message
  -r, --results-dir		Location to the results directory

Workloads

Currently, the storage benchmark suite supports benchmarking of 3 deep learning workloads

  • Image segmentation using U-Net3D model (unet3d)
  • Natural language processing using BERT model (bert)
  • Recommendation using DLRM model (TODO)

U-Net3D Workload

Generate data for the benchmark run

./benchmark.sh datagen --workload unet3d --num-parallel 8

Flush the filesystem caches before benchmark run

sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

Run the benchmark.

./benchmark.sh run --workload unet3d --num-accelerators 8

All results will be stored in results/unet3d/$DATE-$TIME folder or in the directory when overriden using --results-dir(or -r) argument. To generate the final report, one can do

./benchmark.sh reportgen --results-dir results/unet3d/$DATE-$TIME

This will generate DLIO_$model_report.txt in the output folder.

BERT Workload

Generate data for the benchmark run

./benchmark.sh datagen --workload bert --num-parallel 8

Flush the filesystem caches before benchmark run

sudo sync && echo 3 | sudo tee /proc/sys/vm/drop_caches

Run the benchmark

./benchmark.sh run --workload bert --num-accelerators 8

All results will be stored in results/bert/$DATE-$TIME folder or in the directory when overriden using --results-dir(or -r) argument. To generate the final report, one can do

./benchmark.sh reportgen -r results/bert/$DATE-$TIME

This will generate DLIO_$model_report.txt in the output folder.

DLRM Workload

To be added

Parameters

Below table displays the list of configurable paramters for the benchmark.

Parameter Description Default
Dataset params
dataset.num_files_train Number of files for the training set --
dataset.num_subfolders_train Number of subfolders that the training set is stored 0
dataset.data_folder The path where dataset is stored --
dataset.keep_files Flag whether to keep the dataset files afer the run True
Reader params
reader.read_threads Number of threads to load the data --
reader.computation_threads Number of threads to preprocess the data(only for bert) --
reader.prefetch_size Number of batch to prefetch 0
Checkpoint params
checkpoint.checkpoint_folder The folder to save the checkpoints --
Storage params
storage.storage_root The storage root directory ./
storage.storage_type The storage type local_fs

Time to completion

Here we provide the expected time to completion for all the workload

  • UNet3D
Total time = (num_files_train*num_samples_per_file)/batch_size*computation_time/num_gpus*epochs
  • Bert
Total time = computation_time*total_training_steps
Parameters UNet3D Bert
num_files_train 168 500
num_samples_per_file 1 313532
batch_size 4 48
computation_time 1.3604 0.968
epoch/step 10 500
Time (Lower bound) 571.37 4840.00

For UNet3D the total time will scale with dataset_scale_factor/num_gpu. For Bert, it remains the same.

Releases

v0.5-rc0 (2022-02-03)

First MLPerf Storage benchmark preview release

mlperf_storage's People

Contributors

guschmue avatar johnugeorge avatar morphine00 avatar petermattson avatar theoanab avatar zhenghh04 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.