Comments (13)
Dear Roma,
Thanks for your quick reply.
In my case, there is little performance change even if applying the latest MKLML version.
-MKLURL="https://github.com/01org/mkl-dnn/releases/download/v0.7/mklml_lnx_2017.0.2.20170209.tgz"
+MKLURL="https://github.com/01org/mkl-dnn/releases/download/v0.7/mklml_lnx_2017.0.3.20170424.tgz"
With regard to the compiler, I have used gcc version 4.8.5 20150623.
Does it make sense to apply the intel compiler? Is there a makefile with intel compiler?
For your reference, I have attached my makefile.
Thank you,
Daejin Jung.
from onednn.
Hello Daejin, I did not realize that now IntelCaffe builds MKL-DNN on its own...
-
I'm not suggesting using the latest mklml, but mkldnn. But if you are relying on IntelCaffe to build MKL-DNN, then you are already using whatever is the latest.
-
For the best performance it is best to build both IntelCaffe and MKL-DNN with icc/icpc. The reason is not better code generation, but is that using icc's OpenMP library tends to result in better performance. To build Caffe using icc it is typically sufficient to run
CC=icc CXX=icpc make
.
from onednn.
Hi Daejin,
I desperately want to improve mkl-dnn to the performance of mkl2017.
That's what the team is currently working on
Did you use this prototxt? Can you please provide per-layer timings?
Thanks,
Roma
from onednn.
Dear Roma,
I used my modified prototxt based on mkl2017-resnet_50 included in the intel caffe to test backward path. The prototxt you attached does not work in the backward path.
I also submitted the per-layer timings measured from my prototxt.
full log here
Actullay, the performance drop of MKLDNN compared to MKL2017 is considerably large, and some layers (e.g., BW_bn4j_branch2c, BW_bn2c_branch2c, BW_bn3h_branch2c) drop to 500 times.
I still wonder why this performance difference occurs.
Thank you,
Daejin Jung.
from onednn.
Thanks for the timings. Can you please try the latest version? I see Vadim has just published 0.7.
Also, which compiler do you use to build mkldnn?
Thanks,
Roma
from onednn.
Hello Roma,
I need a little more your help with intel compiler.
When using icc to build mkl-dnn, I have encountered the following errors.
src/caffe/mkldnn_memory.cpp(242): error: no suitable constructor exists to convert from "long" to "boost::shared_ptr<caffe::PrvMemDescr>"blob->set_prv_diff_descriptor(NULL);
^
detected during instantiation of "boost::shared_ptr<mkldnn::primitive> caffe::MKLDNNMemoryDescriptor<Dtype, is_diff>::get_blob_prv_primitive(caffe::Blob<Dtype> *, bool, bool, caffe::MKLDNNMemoryDescriptor<Dtype, is_diff> *) [with Dtype=double, is_diff=true]" at line 394
src/caffe/mkldnn_memory.cpp(247): error: no suitable constructor exists to convert from "long" to "boost::shared_ptr<caffe::PrvMemDescr>" blob->set_prv_data_descriptor(NULL);
^
detected during instantiation of "boost::shared_ptr<mkldnn::primitive> caffe::MKLDNNMemoryDescriptor<Dtype, is_diff>::get_blob_prv_primitive(caffe::Blob<Dtype> *, bool, bool, caffe::MKLDNNMemoryDescriptor<Dtype, is_diff> *) [with Dtype=double, is_diff=true]" at line 394
src/caffe/layers/mkldnn_inner_product_layer.cpp(70): error: no instance of constructor "boost::shared_ptr<T>::shared_ptr [with T=caffe::MKLDNNDiff<double>]" matches the argument list
argument types are: (long)
bwdw_weights_diff(NULL),
^
It seems that icc and boost are not compatible. So, I am trying to apply a new intel compiler.
Do you have any suggestion?
my icc version: 17.0.1 (gcc version 4.8.5 compatibility)
Intel® Parallel Studio version: 2017.1.132
from onednn.
When I updated boost version to 1.64, I found other bugs like below.
ld: warning: libimf.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libsvml.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libirng.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libintlc.so.5, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libimf.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libsvml.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libirng.so, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: warning: libintlc.so.5, needed by external/mkldnn/install/lib/libmkldnn.so, not found (try using -rpath or -rpath-link)
ld: .build_release/tools/convert_imageset.bin: hidden symbol `__intel_cpu_features_init_x' in /opt/intel/compilers_and_libraries_2017.1.132/linux/compiler/lib/intel64_lin/libirc.a(cpu_feature_disp.o) is referenced by DSO
ld: final link failed: Bad value
make: *** [.build_release/tools/convert_imageset.bin] Error 1
I have seen the related discussions and have not gotten the clear answer yet.. : (
from onednn.
I have just tried reproducing this with 1.63.0 and latest IntelCaffe from github and did not run into any issues.
The messages above suggest that icc's libraries are not in LD_LIBRARY_PATH, so I suspect environment setup issues.
Can you please try setting CUSTOM_CXX := icpc
in your Makefile.config? Also, please make sure that you do a make clean
.
from onednn.
Clarification on IntelCaffe: it's locked to specific commit of Intel(R) MKL-DNN tracked in mkldnn.commit. Currently it builds the version from March 17th.
from onednn.
@rsdubtso
Hello Roma,
I have successfully built IntelCaffe with your help.
When I checked again, MKLDNN is about three times slower than MKL2017 on my Resent-152 prototxt.
However, the performance difference according to the compiler(e.g., GCC vs ICC) seems not to be large.
Each execution time of MKLDNN-GCC, MKLDNN-ICC, and MKL2017 is attached.
At this point, Is it reasonable that the MKLDNN is 2 to 3 times slower than the MKL2017?
Thank you very much for your help.
Daejin Jung
from onednn.
I briefly looked at the logs, and I see that the top gap is for a 1x1 convolution. If I remember correctly, 1x1s have been improved in the latest MKL-DNN. So as soon as IntelCaffe moves on to the 0.7 mkl-dnn release, you should see at least some speedup. For the March release the performance ratio looks about right. Ultimately, we want MKL-DNN to show the same performance as MKL.
from onednn.
@rsdubtso
Dear Roma,
In addition, I confirmed that the performance of MKLDLL and MKL2017 are almost similar for VGG.
In the case of ResNet, the most significant performance degradation occurs in certain layers such as BW_res2c_branch2c, BW_bn2c_branch2c, BW_res3h_branch2c, BW_bn3h_branch2c.
These are the first encountered convolution and batch normalization layers in backward path after the eltwise operation while crossing over from convN-1 to convN. I think additional optimization also needs at this point to improve performance on resnet.
Thank you for your help and support.
Daejin Jung
from onednn.
Glad to help! I'm closing this issue then. Please feel free to open a new one in case you have any questions or run into any issues.
from onednn.
Related Issues (20)
- Add option to disable python 2.7 finding via docs HOT 5
- why the result shape of conv is not same with input HOT 3
- test_benchdnn_modeC_rnn_ci_cpu failing on AArch64 with and without ACL
- how to link the dnnl library from a git submodule build HOT 7
- how can i use the cache for cpu inference HOT 12
- 'ONEDNN_VERBOSE' is not recognized at the windows cmd HOT 2
- Core utilization on heterogeneous architectures HOT 2
- when i set the ONEDNN_VERBOSE=all for windows, it only display serveral information, why? HOT 2
- why the reorder cost a lot time? HOT 9
- Getting configuration error on RiscV Qemu: Only Sequential Runtime is now supported for a Risc-V CPU HOT 4
- can support the fp16 or bf16 for cpu with 3.3.6 version? HOT 11
- when i compile the dll of onednn wit debug, it failds to build, but the release is built successfully。 HOT 2
- how can i close the verbose when running HOT 4
- Graphs with single StaticReshape or StaticTranspose fail HOT 12
- test_benchdnn_modeC_reorder_ci_cpu failing on AArch64 HOT 5
- when i link the lib of onednn to compile the project with vs2019, it report that "error LNK2019: ????????? __std_find_trivial" HOT 2
- Calculation Error in BenchDNN SoftMax Threshold (trh) for bf16, f16, and Other Datatypes Except f32 HOT 5
- Pytorch quantization bias is not quantised on aarch64 HOT 4
- MSVC doesn't support -fsycl flag. HOT 3
- Is this based opencl or intel dpc++ HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onednn.