Comments (9)
@igorsafo You are correct, we removed the ACL quantized matmul implementations in oneDNN v3.0 because ACL currently needs the quantization parameters to be supplied at compile time. We are working on enabling runtime scales in ACL, specifically for int8*int8->FP32, hopefully in time for oneDNN v3.5.
@vineel96 Are you able to share what kinds of quantization/types you are interested in?
from onednn.
@yehudaorel As far as I know ACL backend does not support scaling since oneDNN 3.0 when API moved from Output Scales to Arg Scales. So as a result the primitive with scales on Arm platform should dispatch into the reference Matmul.
@vineel96 Could you please provide output with verbose set to ONEDNN_VERBOSE=all
for both cases (with and without post-ops)?
from onednn.
Hi @vineel96,
For reduced precision computations, oneDNN library leverages Primitive Attributes: Quantization API for setting the scaling factors for supported primitives, for general dynamic data (de)quantization the reorder
primitive can be used.
In oneDNN, scales are used as quantization factors to map the original values into the corresponding lower/higher precision data types. (i.e FP32 -> INT8). In general, the quantization formula is:
The value of the scaling factor will affect the accuracy / precision of the computation, and is up to the user to choose optimal values. Please check out the MatMul Tutorial: Quantization for full detail as well this white paper for motivation. If you would like to check out the source code for matmul reference implementation see here.
Hope this helps, let us know if you have any other questions!
from onednn.
Hi @yehudaorel,
Thank you for the reply.
Where quantization happens in brgemm algorithm in this file : https://github.com/oneapi-src/oneDNN/blob/main/src/cpu/x64/matmul/brgemm_matmul.cpp . I could not find the exact location where quantization happens here. Also is it instruction length dependent?
from onednn.
@vineel96, Quantization in itself does not happen within the matmul primitive, just scaling and zero points, which are passed by the user during initialization - /cpu/x64/matmul/brgemm_matmul.cpp#L241 and precomputed during execution - /cpu/x64/matmul/brgemm_matmul.cpp#L271
The input data for the primitive have to be in the proper format, hence the reorder primitive is needed prior to matmul operation. For example if you wanted to do a f32-> matmul(u8, u8) - > f32 matmul you would need to use a reorder primitive to align source and weights to u8.
Please check out the sample code here https://oneapi-src.github.io/oneDNN/page_cpu_matmul_quantization_cpp.html#doxid-cpu-matmul-quantization-cpp, it should have everything you need.
from onednn.
Hi @yehudaorel,
Thanks for the links and insights. I am testing matmul for ARM architecture.
benchDNN tests are failing for test cases where we have scaling factor for source, weight & destination and particularly when there is no postops (like relu) . When we have postops we are getting correct output and tests are passing.
My doubts are:
- is it vector length dependent? if its dependent where to find in file?
- Also in which file postops scales are applied? Since without postops I am seeing benchdnn tests are failing. So how postops are applied with scales?
from onednn.
@vineel96 are you using ACL backend? can you provide the benchdnn input used to reproduce the test fail?
@igorsafo are there any specific limitations to acl_matmul implementation with scaling?
from onednn.
from onednn.
Hi all,
Thank you for the all insights and information. I was able to check on the issue and it might take time. Closing the issue as of now.
from onednn.
Related Issues (20)
- Static builds with ONEDNN_VERBOSE=OFF produce undefined symbol: dnnl::impl::rt_mds2str HOT 1
- Builds with ONEDNN_ENABLE_MAX_CPU_ISA=OFF crash HOT 2
- accuracy issue in a graph conv test HOT 2
- create memory with tag::any,it crash HOT 3
- bibtex ref about oneDNN HOT 4
- Understand oneDNN graph compiler HOT 7
- Difference between BRGEMM in oneDNN and GEMM in openBLAS HOT 7
- Issue building oneDNN 3.4.4 with CLANG for ARM64 on Windows HOT 1
- cblass_gemm incorrect output HOT 2
- test_shuffle fails on aarch64 when BF16 data type is enabled. HOT 3
- [ACL] Potentially redundant check in `acl_init_conf` HOT 1
- Enable building oneDNN with MKL when integrated with PyTorch and IPEX via ideep HOT 4
- oneDNN 'Build from Source' doesn't work HOT 9
- Why are the convolutional inference results of OneDNN very different from the convolutional structure of pytorch? HOT 8
- How to disable USM feature for GPU plugin HOT 9
- running destructors before completion of a primitive HOT 7
- why the result of eltwise_hardswish is zero? HOT 8
- test_benchdnn_modeC_softmax_ci_cpu fails due to F16 accumulation HOT 2
- Check timings of assembly level instructions HOT 10
- How to use coverage.cmake file HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onednn.