Giter Site home page Giter Site logo

Comments (3)

yxlao avatar yxlao commented on May 21, 2024 1

Summary

  • Selected Google's benchmark scripts implementation for ResNet50.
  • List of XLA ops needed:
    ['add', 'arg', 'bitcast', 'broadcast', 'concatenate', 'constant', 
     'convolution', 'copy', 'divide', 'dot', 'equal', 'exponential', 
     'fusion', 'get', 'greater', 'less', 'log', 'logical', 'map', 'maximum', 
     'multiply', 'negate', 'pad', 'reduce', 'reshape', 'reverse', 'select',
     'subtract', 'transpose', 'tuple']
    
  • Updates from Aug 8's standup:
    • ResNet 56 for Cifar dataset could be achived first, before ResNet 50 on i1k.

Comparision of two implementations

Candidate 1: Google's benchmark implementation (link)

This implementation is based on (the first paper of ResNet), also called ResNetV1. It includes ResNet 18, 34, 50, 101, 152.

  • Goods
    • Our original requirement is ResNet50 (although our requirement can be set by us as well).
    • It aligns with our benchmark team's repo, easy for comparison.
  • Bads
    • It has a bunch of benchmark scripts wrapped around for other NN models, needs clean ups.
    • This script is intended for benchmarking purpose. It's not meant to represent the exact reproduction of the original papers result, for example, the optimizer, learning rates could be different.
    • Also according to our benchmark team (Jing Huang), we haven't systematically tested the convergence of the model and the accuracies. But there are plans to do so in the future.

Candidate 2: Google's model implementation (link)

This implementation is based on (the second paper of ResNet), also called ResNetV2. It includes ResNet 32, 110, 164, 1001.

  • Goods
    • Cleaner implementation, only minimal clean up needed.
    • We have precision results published by Google that we can use to check the convergence. This implementation is meant for model training rather than benchmarking.
  • Bads
    • It's not ResNet50.
    • Does not align with our benchmark efforts.

Running and getting the XLA Ops

  • Step 1: Build and install TF with XLA.
  • Step 2: Modify the file to enable XLA
    • Add config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
      to create_config_proto() function in tf_cnn_benchmarks.py. For more info, see this doc.
  • Step 3: Run resnet model, dumping XLA graph to dot files
    TF_XLA_FLAGS=--xla_generate_hlo_graph=.* python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC   
    
  • Step 4: Grep from the .dot files. There are more than 10,000 of .dot files, most of them are intermediate graphs from the passes, but collectively they represent the superset of all the ops used in the model.
    import os
    import re
    
    # get all .dot files
    file_names = []
    for file_name in os.listdir("/tmp"):
        if file_name.endswith(".dot"):
            file_names.append(os.path.join("/tmp", file_name))
    
    # get all opcodes
    opcode_re = r'%[a-z]+'
    all_opcodes = set()
    for file_name in file_names:
        with open(file_name) as f:
            for l in f.readlines():
                opcodes = set(re.findall(opcode_re, l))
                all_opcodes |= opcodes
    all_opcodes = [opcode[1:] for opcode in all_opcodes]
    print(sorted(all_opcodes))
    The output is the list at the very beginning.

TODO

  • The current dump is with XLA-GPU. Somehow the hlo_graph_dumper.cc does not work for CPU here. Need to investigate. This may create extra ops here like the fusion.
  • Extract from the benchmark scripts to make a clean ResNet50 implementation as our example.

from ngraph.

yxlao avatar yxlao commented on May 21, 2024

While we're here, I also did some benchmarks on the Candidate 1 above.

  • Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
  • GeForce GTX 1080 Ti (x2)
  • Tested with MKL vs non-MKL

CPU

CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 1.8 +/- 0.0 (jitter = 0.0)

CPU MKL

CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 3.5 +/- 0.0 (jitter = 0.0)

GPU

CUDA_VISIBLE_DEVICES="0" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --da
ta_format NHWC
images/sec: 131.0 +/- 0.0 (jitter = 0.0)

2xGPU

CUDA_VISIBLE_DEVICES="0,1" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 -
-data_format NHWC --num_gpus 2
images/sec: 250.1 +/- 0.0 (jitter = 0.0)

XLA CPU

CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 1.8 +/- 0.0 (jitter = 0.0)

XLA GPU

CUDA_VISIBLE_DEVICES="0" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --da
ta_format NHWC
images/sec: 112.9 +/- 0.0 (jitter = 0.0)

XLA 2xGPU

CUDA_VISIBLE_DEVICES="0,1" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 -
-data_format NHWC --num_gpus 2
images/sec: 160.1 +/- 0.0 (jitter = 0.0)

The CPU results are very poor, this is quite strange. With Candidate 2, when running with Non-XLA CPU I see all CPU threads used to full, but somehow with Candidate 1, we only see 30% usage of CPU threads. Maybe some settings problem.

from ngraph.

diyessi avatar diyessi commented on May 21, 2024

Model chosen.

from ngraph.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.