Giter Site home page Giter Site logo

Comments (9)

igorsafo avatar igorsafo commented on June 1, 2024

Hi @dagdoron , Sorry for the late response, I was trying to understand what happens in the example and how it can be mitigated. The issue is binary primitive is optimized for operations where src0 and dst memory descriptors are similar (have the same tag, the same data type) while src1 can be broadcasted or have another data type. In your case src0 and dst have different data types so binary dispatches into reference implementation.

For (De-)quantization oneDNN uses reorder primitive which have optimized implementations for reorders between different data types. So one of potential solutions would be to use reorder with src scale (it must be in f32) to dequantize data from s8 to f32. Then separately use binary primitive to complete dequantization.

I see that your dequantization involeves shift and scale, but why do you multiply on shift and not add/subtract it? This prevents from implementing whole dequantization as a single reorder. Here is quantization that is supported by oneDNN: https://oneapi-src.github.io/oneDNN/dev_guide_attributes_quantization.html

from onednn.

dagdoron avatar dagdoron commented on June 1, 2024

Hi @igorsafo
Thanks,
I'll try reorder before the binary and let you know if that helped
We are using a slightly different dequantization schema, where the shifts are actually bit shifts and we try to emulate the HW by multiplying by 2^x instead of shifting

from onednn.

dagdoron avatar dagdoron commented on June 1, 2024

Hi @igorsafo

I've changed src0 to be s32 in accordance with tmp0 type so now src0 and dst have the same type

e.g.
std::vector<int32_t> src0(512, 1);
memory::desc src0_md = memory::desc(dims, dt::s32, tag::nhwc);

the execution still falls back to ref
onednn_verbose,create:cache_miss,cpu,binary,ref:any,undef,src_s32::blocked:acdb::f0 src_f32::blocked:acdb::f0 dst_s32::blocked:acdb::f0,attr-post-ops:binary_mul:s8:2 ,alg:binary_mul,1x8x8x8:1x8x1x1,0.104004
onednn_verbose,exec,cpu,binary,ref:any,undef,src_s32::blocked:acdb::f0 src_f32::blocked:acdb::f0 dst_s32::blocked:acdb::f0,attr-post-ops:binary_mul:s8:2 ,alg:binary_mul,1x8x8x8:1x8x1x1,1.11914

from onednn.

igorsafo avatar igorsafo commented on June 1, 2024

Yes, I was able to reproduce it. Another limitation of JIT I found is it doesn't support s32 data type. I will create an internal ticket to track this issue.

from onednn.

igorsafo avatar igorsafo commented on June 1, 2024

@dagdoron s32 support is in progress for jit implementation.

A separate question: Would shift operation serve better or binary mul with s32 support is enough for your use cases?

from onednn.

dagdoron avatar dagdoron commented on June 1, 2024

@igorsafo
s32 would be good enough, however if you can support shifts it would be the best, it would save us some cycles converting them and I guess integer shifts may be faster than float mul

from onednn.

igorsafo avatar igorsafo commented on June 1, 2024

@dagdoron Could you please try the latest version of master branch? The support is added in 46135fd

from onednn.

dagdoron avatar dagdoron commented on June 1, 2024

@igorsafo - Thanks for the fast respond and fix

With this commit, the binary is executing the jit version

onednn_verbose,exec,cpu,reorder,jit:uni,undef,src_s8::blocked:acdb::f0 dst_s32::blocked:acdb::f0,,,1x256x128x128,0.911133
onednn_verbose,exec,cpu,binary,jit:uni,undef,src_s32::blocked:acdb::f0 src_f32::blocked:acdb::f0 dst_s32::blocked:acdb::f0,attr-scratchpad:user attr-post-ops:binary_mul:s8:2 ,alg:binary_mul,1x256x128x128:1x256x1x1,7.27001
onednn_verbose,exec,cpu,binary,jit:uni,undef,src_s8::blocked:acdb::f0 src_f32::blocked:acdb::f0 dst_s8🅰️blocked:acdb::f0,attr-scratchpad:user attr-post-ops:binary_mul:s8:2+binary_add:s32:14:acdb ,alg:binary_mul,1x256x128x128:1x256x1x1,1.86791

from onednn.

igorsafo avatar igorsafo commented on June 1, 2024

Great to know! I added an internal request about shift operation, but there is no guarantee it will be implemented until we have more use cases and users, because it will require much more resources on our side to implement and maintain it.

I am closing this issue since the performance issue is fixed. Feel free to re-open or create a separate issue if you have any other requests.

from onednn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.