Hello, How scales variable works in oneDNN? Also where is file location in oneDNN

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

+<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/u

How scales work in oneDNN about onednn HOT 9 CLOSED

vineel96 commented on June 21, 2024

How scales work in oneDNN

from onednn.

Comments (9)

jondea commented on June 21, 2024 3

@igorsafo You are correct, we removed the ACL quantized matmul implementations in oneDNN v3.0 because ACL currently needs the quantization parameters to be supplied at compile time. We are working on enabling runtime scales in ACL, specifically for int8*int8->FP32, hopefully in time for oneDNN v3.5.

@vineel96 Are you able to share what kinds of quantization/types you are interested in?

from onednn.

igorsafo commented on June 21, 2024 1

@yehudaorel As far as I know ACL backend does not support scaling since oneDNN 3.0 when API moved from Output Scales to Arg Scales. So as a result the primitive with scales on Arm platform should dispatch into the reference Matmul.

@vineel96 Could you please provide output with verbose set to ONEDNN_VERBOSE=all for both cases (with and without post-ops)?

+@cfRod @milpuz01 @kawakami-k

from onednn.

yehudaorel commented on June 21, 2024

Hi @vineel96,

For reduced precision computations, oneDNN library leverages Primitive Attributes: Quantization API for setting the scaling factors for supported primitives, for general dynamic data (de)quantization the reorder primitive can be used.

In oneDNN, scales are used as quantization factors to map the original values into the corresponding lower/higher precision data types. (i.e FP32 -> INT8). In general, the quantization formula is:

$$ X_{f32} := scale_X * (X_{int8}(:) - zero\textunderscore point_{X}) $$

The value of the scaling factor will affect the accuracy / precision of the computation, and is up to the user to choose optimal values. Please check out the MatMul Tutorial: Quantization for full detail as well this white paper for motivation. If you would like to check out the source code for matmul reference implementation see here.

Hope this helps, let us know if you have any other questions!

from onednn.

vineel96 commented on June 21, 2024

Hi @yehudaorel,
Thank you for the reply.
Where quantization happens in brgemm algorithm in this file : https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/matmul/brgemm_matmul.cpp . I could not find the exact location where quantization happens here. Also is it instruction length dependent?

from onednn.

yehudaorel commented on June 21, 2024

@vineel96, Quantization in itself does not happen within the matmul primitive, just scaling and zero points, which are passed by the user during initialization - /cpu/x64/matmul/brgemm_matmul.cpp#L241 and precomputed during execution - /cpu/x64/matmul/brgemm_matmul.cpp#L271

The input data for the primitive have to be in the proper format, hence the reorder primitive is needed prior to matmul operation. For example if you wanted to do a f32-> matmul(u8, u8) - > f32 matmul you would need to use a reorder primitive to align source and weights to u8.

Please check out the sample code here https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp, it should have everything you need.

from onednn.

vineel96 commented on June 21, 2024

Hi @yehudaorel,
Thanks for the links and insights. I am testing matmul for ARM architecture.
benchDNN tests are failing for test cases where we have scaling factor for source, weight & destination and particularly when there is no postops (like relu) . When we have postops we are getting correct output and tests are passing.
My doubts are:

is it vector length dependent? if its dependent where to find in file?
Also in which file postops scales are applied? Since without postops I am seeing benchdnn tests are failing. So how postops are applied with scales?

from onednn.

yehudaorel commented on June 21, 2024

@vineel96 are you using ACL backend? can you provide the benchdnn input used to reproduce the test fail?

@igorsafo are there any specific limitations to acl_matmul implementation with scaling?

from onednn.

vpirogov commented on June 21, 2024

+@jondea

from onednn.

vineel96 commented on June 21, 2024

Hi all,
Thank you for the all insights and information. I was able to check on the issue and it might take time. Closing the issue as of now.

from onednn.

How scales work in oneDNN about onednn HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent