Giter Site home page Giter Site logo

How scales work in oneDNN about onednn HOT 9 CLOSED

vineel96 avatar vineel96 commented on June 21, 2024
How scales work in oneDNN

from onednn.

Comments (9)

jondea avatar jondea commented on June 21, 2024 3

@igorsafo You are correct, we removed the ACL quantized matmul implementations in oneDNN v3.0 because ACL currently needs the quantization parameters to be supplied at compile time. We are working on enabling runtime scales in ACL, specifically for int8*int8->FP32, hopefully in time for oneDNN v3.5.

@vineel96 Are you able to share what kinds of quantization/types you are interested in?

from onednn.

igorsafo avatar igorsafo commented on June 21, 2024 1

@yehudaorel As far as I know ACL backend does not support scaling since oneDNN 3.0 when API moved from Output Scales to Arg Scales. So as a result the primitive with scales on Arm platform should dispatch into the reference Matmul.

@vineel96 Could you please provide output with verbose set to ONEDNN_VERBOSE=all for both cases (with and without post-ops)?

+@cfRod @milpuz01 @kawakami-k

from onednn.

yehudaorel avatar yehudaorel commented on June 21, 2024

Hi @vineel96,

For reduced precision computations, oneDNN library leverages Primitive Attributes: Quantization API for setting the scaling factors for supported primitives, for general dynamic data (de)quantization the reorder primitive can be used.

In oneDNN, scales are used as quantization factors to map the original values into the corresponding lower/higher precision data types. (i.e FP32 -> INT8). In general, the quantization formula is:

$$ X_{f32} := scale_X * (X_{int8}(:) - zero\textunderscore point_{X}) $$

The value of the scaling factor will affect the accuracy / precision of the computation, and is up to the user to choose optimal values. Please check out the MatMul Tutorial: Quantization for full detail as well this white paper for motivation. If you would like to check out the source code for matmul reference implementation see here.

Hope this helps, let us know if you have any other questions!

from onednn.

vineel96 avatar vineel96 commented on June 21, 2024

Hi @yehudaorel,
Thank you for the reply.
Where quantization happens in brgemm algorithm in this file : https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/matmul/brgemm_matmul.cpp . I could not find the exact location where quantization happens here. Also is it instruction length dependent?

from onednn.

yehudaorel avatar yehudaorel commented on June 21, 2024

@vineel96, Quantization in itself does not happen within the matmul primitive, just scaling and zero points, which are passed by the user during initialization - /cpu/x64/matmul/brgemm_matmul.cpp#L241 and precomputed during execution - /cpu/x64/matmul/brgemm_matmul.cpp#L271

The input data for the primitive have to be in the proper format, hence the reorder primitive is needed prior to matmul operation. For example if you wanted to do a f32-> matmul(u8, u8) - > f32 matmul you would need to use a reorder primitive to align source and weights to u8.

Please check out the sample code here https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp, it should have everything you need.

from onednn.

vineel96 avatar vineel96 commented on June 21, 2024

Hi @yehudaorel,
Thanks for the links and insights. I am testing matmul for ARM architecture.
benchDNN tests are failing for test cases where we have scaling factor for source, weight & destination and particularly when there is no postops (like relu) . When we have postops we are getting correct output and tests are passing.
My doubts are:

  1. is it vector length dependent? if its dependent where to find in file?
  2. Also in which file postops scales are applied? Since without postops I am seeing benchdnn tests are failing. So how postops are applied with scales?

from onednn.

yehudaorel avatar yehudaorel commented on June 21, 2024

@vineel96 are you using ACL backend? can you provide the benchdnn input used to reproduce the test fail?

@igorsafo are there any specific limitations to acl_matmul implementation with scaling?

from onednn.

vpirogov avatar vpirogov commented on June 21, 2024

+@jondea

from onednn.

vineel96 avatar vineel96 commented on June 21, 2024

Hi all,
Thank you for the all insights and information. I was able to check on the issue and it might take time. Closing the issue as of now.

from onednn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.