My project has to use gemm with bfloat16 data, but dnnl_gemm_bf16bf16f32 is not in dnn

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Why dnnl_gemm_bf16bf16f32 is much slower than dnnl_sgemm？ about onednn HOT 2 CLOSED

wells-wei-wei commented on June 7, 2024

Why dnnl_gemm_bf16bf16f32 is much slower than dnnl_sgemm？

from onednn.

Comments (2)

yehudaorel commented on June 7, 2024

Hi @wells-wei-wei,

Regarding implementation of dnnl_gemm_bf16bf16f32, the recommended interface for this internal GEMM API call is via a MatMul primitive.

Please reference the solution provided here, as well as this example demonstrating MatMul as a replacement for sGEMM functions for additional details.

But when I use it, I find dnnl_gemm_bf16bf16f32 is much slower than dnnl_sgemm, why is that? I'm sure the host cpu support avx512_bf16, is this because I didn't add some options during build time?

In general without Intel AMX support, bf16 gemm based ops have little to no performance gain depending on problem shape/HW/etc., which might explain your results.

Could you run the workload with the following environment variable: ONEDNN_VERBOSE=all and share the output? this will tell you specific supported ISA's and primitive dispatch information.

Additional information such as problem size/shape, CPU, OS will be helpful!

from onednn.

vpirogov commented on June 7, 2024

@wells-wei-wei, in addition to what @yehudaorel shared there's an example that benchmarks and reports matmul primitive performance with various data types.

from onednn.

Why dnnl_gemm_bf16bf16f32 is much slower than dnnl_sgemm？ about onednn HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent