Giter Site home page Giter Site logo

gbm-benchmarks's Introduction

GBM-Benchmarks

GBM benchmark suite for the purpose of evaluating the speed of XGBoost GPU on multi-GPU systems with large datasets.

This benchmark is designed to be run on an AWS p3.16xlarge instance with 8 V100 GPUs. It is recommended to use the Deep Learning Base AMI (Ubuntu) and storage of at least 150GB for this task.

Requirements

  • Cuda 9.0
  • Python 3
  • Sklearn, Pandas, numpy
  • Kaggle CLI with a valid API token

Usage

Install XGBoost, LightGBM and Catboost:

sh install_gbm.sh

Run the benchmarks

python3 benchmarks.py

It can be useful to run the benchmarks with a small number of rows/rounds to quickly check everything is working:

python3 benchmarks.py --rows 100 --num_rounds 10

Benchmark parameters:

usage: benchmark.py [-h] [--rows ROWS] [--num_rounds NUM_ROUNDS]
                    [--datasets DATASETS] [--algs ALGS]

optional arguments:
  -h, --help            show this help message and exit
  --rows ROWS           Max rows to benchmark for each dataset. (default:
                        None)
  --num_rounds NUM_ROUNDS
                        Boosting rounds. (default: 500)
  --datasets DATASETS   Datasets to run. (default:
                        YearPredictionMSD,Synthetic,Higgs,Cover
                        Type,Bosch,Airline,)
  --algs ALGS           Boosting algorithms to run. (default: xgb-cpu-
                        hist,xgb-gpu-hist,lightgbm-cpu,lightgbm-gpu,cat-
                        cpu,cat-gpu)

Datasets

Datasets are loaded using ml_dataset_loader. Datasets are automatically downloaded and cached over subsequent runs. Allow time for these downloads on the first run.

Example results

Run on 7 June 2018

"('YearPredictionMSD' 'Time(s)')" "('YearPredictionMSD' 'RMSE')" "('Synthetic' 'Time(s)')" "('Synthetic' 'RMSE')" "('Higgs' 'Time(s)')" "('Higgs' 'Accuracy')" "('Cover Type' 'Time(s)')" "('Cover Type' 'Accuracy')" "('Bosch' 'Time(s)')" "('Bosch' 'Accuracy')" "('Airline' 'Time(s)')" "('Airline' 'Accuracy')"
xgb-cpu-hist 397.27372694015503 8.879391001888838 565.2947809696198 13.610471042735508 470.09188079833984 0.7474345454545455 464.05221605300903 0.891982134712529 752.5890619754791 0.994454065469905 1948.264995098114 0.7494303418939346
xgb-gpu-hist 34.25581908226013 8.879935744972384 38.48715591430664 13.460576927868603 34.07960486412048 0.747475 103.3895480632782 0.8928685145822397 32.12634301185608 0.9944244984160507 144.8635070323944 0.749484266051801
lightgbm-cpu 38.12508988380432 8.877691075962955 421.0538258552551 13.585034611136265 306.9785330295563 0.7473804545454545 83.76876091957092 0.8928340920630277 250.0972819328308 0.9943907074973601 916.0412080287933 0.7504912703697312
lightgbm-gpu 80.04824590682983 8.88175154521266 609.4814240932465 13.585007307447382 529.5377051830292 0.7469995454545455 126.52870297431946 0.8930578384379061 487.14922618865967 0.9944076029567054 614.7447829246521 0.749949160947056
cat-cpu 38.49950695037842 8.994799241732066 436.58789801597595 9.389984249250787 397.02287697792053 0.7406940909090909 288.1107921600342 0.8518626885708631 242.90423798561096 0.9944160506863781 2949.0425968170166 0.7265709745333714
cat-gpu 9.802947044372559 9.036473602545339 35.474628925323486 9.399963630634538 30.145710945129395 0.7406177272727272 N/A N/A N/A N/A 303.35544514656067 0.7277047723183877

Scalability test

We test the scalability of multi-GPU XGBoost by running with between 1-8 GPUs on the airline dataset and timing the results.

python3 scalability.py -h
usage: scalability.py [-h] [--rows ROWS] [--num_rounds NUM_ROUNDS]

optional arguments:
  -h, --help            show this help message and exit
  --rows ROWS           Max rows to benchmark for each dataset. (default:
                        None)
  --num_rounds NUM_ROUNDS
                        Boosting rounds. (default: 500)

gbm-benchmarks's People

Contributors

ramitchell avatar noxoomo avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.