Comments (2)
Hi @ItayLasch,
Looks like there's no optimized implementation for binary post-op with broadcast mask 1
:
$ DNNL_VERBOSE=1 ./benchdnn --matmul --wtag=abc --dt=s8:s8:s8 --attr-post-ops="binary_mul:u8:1+binary_mul:f32:1+eltwise_clip:-128:127" 6x200x16:6x16x200
onednn_verbose,info,oneDNN v3.3.0 (commit dc66df7b18ad12ecd5fa438a5055bbae4628f481)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:48
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0,,,6x1x1,0.0319824
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_u8::blocked:abc::f0,,,6x1x1,0.0288086
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_s8::blocked:abc::f0,,,6x16x200,0.026123
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_s8::blocked:abc::f0,,,6x200x16,0.0249023
onednn_verbose,primitive,exec,cpu,matmul,ref:any,undef,src_s8:a:blocked:abc::f0 wei_s8::blocked:abc::f0 dst_s8:a:blocked:abc::f0,attr-post-ops:binary_mul:u8:1+binary_mul:f32:1+eltwise_clip:-128:127 ,,6x200x16:6x16x200,9.46313
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0,,,6x200x200,0.072998
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_s8::blocked:abc::f0 dst_f32::blocked:abc::f0,,,6x200x200,0.0688477
0:PASSED __REPRO: --matmul --dt=s8:s8:s8 --wtag=abc --attr-post-ops=mul:u8:1+mul:f32:1+clip:-128:127 6x200x16:6x16x200
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.03s; fill: 0.00s (16%); compute_ref: 0.01s (18%); compare: 0.01s (30%);
Either 0
or 2
go to optimized implementation:
$ DNNL_VERBOSE=1 ./benchdnn --matmul --wtag=abc --dt=s8:s8:s8 --attr-post-ops="binary_mul:u8:0+binary_mul:f32:0+eltwise_clip:-128:127" 6x200x16:6x16x200
onednn_verbose,info,oneDNN v3.3.0 (commit dc66df7b18ad12ecd5fa438a5055bbae4628f481)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:48
onednn_verbose,info,cpu,isa:Intel AVX-512 with Intel DL Boost
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,graph,backend,0:dnnl_backend
onednn_verbose,primitive,info,template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,graph,info,template:operation,engine,partition_id,partition_kind,op_names,data_formats,logical_tensors,fpmath_mode,backend,exec_time
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0,,,1x1x1,1.69897
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_u8::blocked:abc::f0,,,1x1x1,0.0158691
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_s8::blocked:abc::f0,,,6x16x200,0.0168457
onednn_verbose,primitive,exec,cpu,reorder,rnn_data_reorder,undef,src_f32::blocked:abc::f0 dst_s8::blocked:abc::f0,,,6x200x16,0.0180664
onednn_verbose,primitive,exec,cpu,matmul,brg:avx512_core_vnni,undef,src_s8:a:blocked:abc::f0 wei_s8::blocked:abc::f0 dst_s8:a:blocked:abc::f0,attr-post-ops:binary_mul:u8:0+binary_mul:f32:0+eltwise_clip:-128:127 ,,6x200x16:6x16x200,0.156982
onednn_verbose,primitive,exec,cpu,reorder,simple:any,undef,src_f32::blocked:abc::f0 dst_f32::blocked:abc::f0,,,6x200x200,0.0529785
onednn_verbose,primitive,exec,cpu,reorder,jit:uni,undef,src_s8::blocked:abc::f0 dst_f32::blocked:abc::f0,,,6x200x200,0.0510254
0:PASSED __REPRO: --matmul --dt=s8:s8:s8 --wtag=abc --attr-post-ops=mul:u8:0+mul:f32:0+clip:-128:127 6x200x16:6x16x200
tests:1 passed:1 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:0 listed:0
total: 0.01s; fill: 0.00s (29%); compute_ref: 0.00s (34%); compare: 0.00s (21%);
from onednn.
This is documented in 'Attributes and Post-Ops' section in oneDNN developer guide.
from onednn.
Related Issues (20)
- [nvidia] int8 convolution with s8 dst primitive and a sum post op fails correctness check
- windows build faile with "FAILED: cmTC_478c5.exe" HOT 2
- test the example of ocl, it reports "onednn_verbose,primitive,error,ocl,errcode -30,CL_INVALID_VALUE,src\gpu\ocl\ocl_utils.cpp:509" HOT 9
- X64: "Error in M_tail_block index, not within range" raised in brgemm_matmul HOT 3
- Understanding Injectors and evaluating their performance. HOT 6
- Security.md: replace incorrect email address HOT 1
- Build failure on AArch64 due to brgemm_matmul_t HOT 3
- which case can report "No configurations found." HOT 9
- Why is the convolution performance of bf16 using opencl very low? HOT 3
- Bad speed for f32:s8:f32 matmul HOT 11
- How can I create a matmul primitive with A16W8 (active 16bits, weight 8bits) configuration? HOT 2
- [Proposal] Add cpu alloc/free callback to support customlize memory alloctor APIs. HOT 3
- Assertion `dynamic_cast<derived_type>(base) == base' failed HOT 3
- Why do the "reorder" operations of the same operator take very different times on the CPU and GPU platforms? HOT 3
- [ACL] 3D convolution kernel `NEConv3D` is not integrated
- INT8 Performance difference between OneDNN v2.6.3 and v3.4.1 HOT 1
- Possible null pointer dereference in cpu_reorder_pd
- Assertion failure in brgemm in debug build on G3 aarch64 machine HOT 2
- question about matmul_perf example HOT 2
- Information regarding threading backend in oneDNN HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onednn.