Comments (3)
Summary
- Selected Google's benchmark scripts implementation for ResNet50.
- List of XLA ops needed:
['add', 'arg', 'bitcast', 'broadcast', 'concatenate', 'constant', 'convolution', 'copy', 'divide', 'dot', 'equal', 'exponential', 'fusion', 'get', 'greater', 'less', 'log', 'logical', 'map', 'maximum', 'multiply', 'negate', 'pad', 'reduce', 'reshape', 'reverse', 'select', 'subtract', 'transpose', 'tuple']
- Updates from Aug 8's standup:
- ResNet 56 for Cifar dataset could be achived first, before ResNet 50 on i1k.
Comparision of two implementations
Candidate 1: Google's benchmark implementation (link)
This implementation is based on (the first paper of ResNet), also called ResNetV1. It includes ResNet 18, 34, 50, 101, 152.
- Goods
- Our original requirement is ResNet50 (although our requirement can be set by us as well).
- It aligns with our benchmark team's repo, easy for comparison.
- Bads
- It has a bunch of benchmark scripts wrapped around for other NN models, needs clean ups.
- This script is intended for benchmarking purpose. It's not meant to represent the exact reproduction of the original papers result, for example, the optimizer, learning rates could be different.
- Also according to our benchmark team (Jing Huang), we haven't systematically tested the convergence of the model and the accuracies. But there are plans to do so in the future.
Candidate 2: Google's model implementation (link)
This implementation is based on (the second paper of ResNet), also called ResNetV2. It includes ResNet 32, 110, 164, 1001.
- Goods
- Cleaner implementation, only minimal clean up needed.
- We have precision results published by Google that we can use to check the convergence. This implementation is meant for model training rather than benchmarking.
- Bads
- It's not ResNet50.
- Does not align with our benchmark efforts.
Running and getting the XLA Ops
- Step 1: Build and install TF with XLA.
- Step 2: Modify the file to enable XLA
- Add
config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
tocreate_config_proto()
function intf_cnn_benchmarks.py
. For more info, see this doc.
- Add
- Step 3: Run resnet model, dumping XLA graph to
dot
filesTF_XLA_FLAGS=--xla_generate_hlo_graph=.* python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
- Step 4: Grep from the
.dot
files. There are more than 10,000 of.dot
files, most of them are intermediate graphs from the passes, but collectively they represent the superset of all the ops used in the model.The output is the list at the very beginning.import os import re # get all .dot files file_names = [] for file_name in os.listdir("/tmp"): if file_name.endswith(".dot"): file_names.append(os.path.join("/tmp", file_name)) # get all opcodes opcode_re = r'%[a-z]+' all_opcodes = set() for file_name in file_names: with open(file_name) as f: for l in f.readlines(): opcodes = set(re.findall(opcode_re, l)) all_opcodes |= opcodes all_opcodes = [opcode[1:] for opcode in all_opcodes] print(sorted(all_opcodes))
TODO
- The current dump is with XLA-GPU. Somehow the
hlo_graph_dumper.cc
does not work for CPU here. Need to investigate. This may create extra ops here like thefusion
. - Extract from the benchmark scripts to make a clean ResNet50 implementation as our example.
from ngraph.
While we're here, I also did some benchmarks on the Candidate 1 above.
- Intel(R) Core(TM) i7-6850K CPU @ 3.60GHz
- GeForce GTX 1080 Ti (x2)
- Tested with MKL vs non-MKL
CPU
CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 1.8 +/- 0.0 (jitter = 0.0)
CPU MKL
CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 3.5 +/- 0.0 (jitter = 0.0)
GPU
CUDA_VISIBLE_DEVICES="0" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --da
ta_format NHWC
images/sec: 131.0 +/- 0.0 (jitter = 0.0)
2xGPU
CUDA_VISIBLE_DEVICES="0,1" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 -
-data_format NHWC --num_gpus 2
images/sec: 250.1 +/- 0.0 (jitter = 0.0)
XLA CPU
CUDA_VISIBLE_DEVICES="" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --data_format NHWC
images/sec: 1.8 +/- 0.0 (jitter = 0.0)
XLA GPU
CUDA_VISIBLE_DEVICES="0" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 --da
ta_format NHWC
images/sec: 112.9 +/- 0.0 (jitter = 0.0)
XLA 2xGPU
CUDA_VISIBLE_DEVICES="0,1" python tf_cnn_benchmarks.py --model resnet50 --batch_size 32 -
-data_format NHWC --num_gpus 2
images/sec: 160.1 +/- 0.0 (jitter = 0.0)
The CPU results are very poor, this is quite strange. With Candidate 2, when running with Non-XLA CPU I see all CPU threads used to full, but somehow with Candidate 1, we only see 30% usage of CPU threads. Maybe some settings problem.
from ngraph.
Model chosen.
from ngraph.
Related Issues (20)
- Installs an empty directory hierarchy under include/ngraph/frontend/ATen HOT 1
- Retrieving LLVM IR using ngraph HOT 1
- Does ngraph support importing a control flow graph into ngraph? HOT 1
- the debug and release mode
- NGraph node cache error HOT 8
- TypeError: Failed to convert object of type <class ‘plaidml.tile.Value’> to Tensor. HOT 4
- openvino error HOT 16
- Pybind11 error when building nGraph wheel HOT 2
- Feature request: GCPU code generation. HOT 1
- Integrating ngraph into Godot Engine HOT 2
- How to implement Depthwise_Convolution via GroupConvolution? HOT 3
- Failed to implement v1::TopK, always get value but not index as output HOT 1
- Missing type name check for distinguishing between bfloat16 and float16 when instantiating element::Type in ngraph deserializer HOT 1
- Are batch sizes fixed in ngraph? HOT 6
- error: ‘cudnnSetRNNDescriptor’ was not declared in this scope; did you mean ‘cudnnSetLRNDescriptor HOT 1
- error: ‘cudnnSetRNNDescriptor’ was not declared in this scope -- cudnn 8.2.0 ...
- src.c:(.text+0x46): undefined reference to `pthread_create' HOT 1
- No matching distribution found for ngraph-core HOT 1
- Dynamic shape for input like NLP HOT 1
- Is this a defunct project? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ngraph.