Giter Site home page Giter Site logo

apac-hpc-ai-2019-apacsc13's Introduction

apac-hpc-ai-2019-apacsc13

This repository contains the final source code and job scripts of team GIST, for 2019 APAC HPC/AI Competition.

Only key codes are uploaded. Please check the whole files and run the job on the NSCC server.

Login information

  • Login ID: apacsc13

Base code information

1) Framework

  • Tensorflow (from Tensorflow Benchmarks)
    • Version

      commit ID:  4828965154c424bc61a7ec361edb67bb267869f4
      commit date: Thu Apr 11 21:37:22 2019 -0700
      
    • File location on the server:

      /home/users/industry/ai-hpc/apacsc13/benchmarks
      
  • Horovod 0.13.10
  • OpenMPI 3.0.0
  • CUDA 10.0.130

2) Model

  • ResNet-50

3) Dataset

  • ImageNet 2012

    • File location on the server:
      /scratch/users/industry/ai-hpc/apacsc13/ILSVRC2012
      

Optimizations we made

1) Data preprocessing

  • We changed input data format from .jpg to tf-records to for faster running and better accuracy.
    • setDataset.sh
    • build_imagenet_data.pbs

2) Hyperparameter tuning

  • We adjusted batch_size, optimizer, num_epochs, weight_decay to obtain optimal accuracy and total images/sec.
    • Check the end of the final PBS file.
    • For example,
      python tf_cnn_benchmarks.py --data_format=NCHW --batch_size=256 \
      --model=resnet50 --optimizer=momentum --variable_update=replicated \
      --nodistortions --gradient_repacking=8 --num_gpus=8 \
      --num_epochs=10 --weight_decay=1e-3 --data_dir=${DATA_DIR} --use_fp16 \
      --train_dir=${CKPT_DIR}
      
  • For details, see the official paper.

Results

  • From some experimental results, we expect that

    • batch_size=256
    • num_epochs=10 weight_decay=1e-3
    • will make maximum accuracy about 93.088 and total images/sec=5436.71, only in case of Momentum optimizer.
  • Doing experiments with other optimizers, it was found that SGD optimizer gave the best performance where other conditions are the same.

  • However, since we could not write complete code of distributed training code based on Horovod, we need more practice and effort to complete this competition.

  • For details, see the official paper.

    accuracy_by_optimizer

Running the code

  1. Go to
    /ILSVRC2012
    
    on the NSCC GTX-1 server and run
    ./setdataset.sh
    
  2. Go to
    /home/project/21170158/apacsc13/pbsfiles
    
    on the NSCC GTX-1 server and run
    qsub DockerSingleTrain.pbs
    
    or
    qsub DockerHorovod.pbs
    

Authors

See also the list of contributors who participated in this project.

apac-hpc-ai-2019-apacsc13's People

Contributors

s00hyun avatar

Watchers

James Cloos avatar Jungsu Han avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.