cpu: i3-8100, 4/4; windows10 just using single convolution, forward and backw

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Is dnnl::convolution_backward_data running on a single core？ about onednn HOT 9 OPEN

w1005444804 commented on June 7, 2024

Is dnnl::convolution_backward_data running on a single core？

from onednn.

Comments (9)

w1005444804 commented on June 7, 2024

void convolution_node::backward2data(const dnnl::memory& diff_dst)
{
m_src_diff_md = dnnl::memory::desc(m_src_dims, dt::f32, tag::any);
m_weights_diff_md = dnnl::memory::desc({ m_weights_dims }, dt::f32, tag::any);
m_dst_diff_md = dnnl::memory::desc({ m_dst_dims }, dt::f32, tag::any);
// // std::cout << "Creating backward Convolutional layer primitive descriptor\n";
m_conv_bwd_data_desc = dnnl::convolution_backward_data::primitive_desc(m_engine,
dnnl::algorithm::convolution_direct,
m_src_diff_md, m_weights_md, m_dst_diff_md,
m_stride_dims, m_dilation_dims, m_padding_dims, m_padding_dims,
m_conv_fwd_desc);

// if 
m_arg_diff_dst = diff_dst;
if (diff_dst.get_desc() != m_conv_bwd_data_desc.diff_dst_desc()) {
    m_arg_diff_dst = dnnl::memory(m_conv_bwd_data_desc.diff_dst_desc(), m_engine);
    m_net_bwd_data.push_back(dnnl::reorder(diff_dst, m_arg_diff_dst));
    m_net_bwd_data_args.push_back({ {DNNL_ARG_FROM, diff_dst},
            {DNNL_ARG_TO, m_arg_diff_dst} });
}
m_arg_diff_src = dnnl::memory(m_conv_bwd_data_desc.diff_src_desc(), m_engine);
m_net_bwd_data.push_back(dnnl::convolution_backward_data(m_conv_bwd_data_desc));
m_net_bwd_data_args.push_back(
    { {DNNL_ARG_DIFF_SRC, m_arg_diff_src},
     {DNNL_ARG_DIFF_DST, m_arg_diff_dst},
    // If something does not work check this, there might be some
    // reordering needed done in a similar fashion to cnn_training_f32.cpp
    {DNNL_ARG_WEIGHTS, m_arg_weights} });

auto user_diff_src_md = dnnl::memory::desc({ m_src_dims }, dt::f32, tag::nchw);
m_user_diff_src = m_arg_diff_src;
if (m_arg_diff_src.get_desc() != user_diff_src_md) {
    m_user_diff_src = dnnl::memory(user_diff_src_md, m_engine);
    m_net_bwd_data.push_back(dnnl::reorder(m_arg_diff_src, m_user_diff_src));
    m_net_bwd_data_args.push_back({ {DNNL_ARG_FROM, m_arg_diff_src},
            {DNNL_ARG_TO, m_user_diff_src} });
}

assert(m_net_bwd_data.size() == m_net_bwd_data_args.size() && "something is missing");

}

from onednn.

w1005444804 commented on June 7, 2024

dnnl::convolution_backward_data is quite time-consuming；
infer cost(ms): 10
backward2data cost(ms): 232 （however pytorch or libtorch cost(ms) 30~50）
backward2weights cost(ms): 12

from onednn.

igorsafo commented on June 7, 2024

Hi @w1005444804 , could you please run oneDNN with verbose enabled?
Here is the documentation: https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html?highlight=verbose

from onednn.

w1005444804 commented on June 7, 2024

@igorsafo thanks, Activate ONEDNN_ VERBOSE does have a certain effect, but it is very unstable, and the time consumption has changed from the previous 230ms to a dynamic range of 60-200ms,
onednn_verbose,188439297.948300,exec,cpu,convolution,jit:avx2,backward_data,src_f32:ap:blocked:aBcd8b::f0 wei_f32:ap:blocked:ABcd8a8b::f0 bia_undef::undef::: dst_f32:ap:blocked:aBcd8b::f0,,alg:convolution_direct,mb10_ic3oc6_ih160oh156kh5sh1dh0ph0_iw160ow156kw5sw1dw0pw0,100.937

from onednn.

w1005444804 commented on June 7, 2024

Hi @igorsafo , Is the problem caused by me?

from onednn.

igorsafo commented on June 7, 2024

@w1005444804 Thanks for the additional information! It looks it is not an integration problem, because data formats are blocked and an optimized implementation is called. Also I was able to reproduce low performance for this case. It doesn't run on a single thread, but the optimized implementation seems to have a gap for this kind of shapes.

Is it the first layer in the model? You usually don't need to compute backward wrt data for the first layer. Unfortunately, if there are other layers before this convolution then the gradient is required.

If you can provide more details about the use case (model, hw/isa) this would be helpful. How much of time does this convolution takes comparing to the overall model time?

from onednn.

w1005444804 commented on June 7, 2024

@igorsafo Yes, It is the first layer, my model is a conv-layer, I just wanted to test the speed of forward and backward propagation of convolutions, and then found this issue in comparison with Pytorch.
Thank you for your reply！

from onednn.

w1005444804 commented on June 7, 2024

The code is roughly as follows：
...
dnnl::memory::dims conv1_src_tz = { 10, 3, 160, 160 };
auto conv1_src_memory = dnnl::memory({ {conv1_src_tz}, dt::f32, tag::nchw }, engine);
convolution_node conv1(engine, 3, 6, 5, 1, 0, 0, 1, 0, 1, conv1_src_memory);
...
for (size_t i = 0; i < conv1.m_net_fwd.size(); i++) {
conv1.m_net_fwd[i].execute(s, conv1.m_net_fwd_args[i]);
}
...
conv1.backward2data(top_memory);
for (size_t i = 0; i < conv1.m_net_bwd_data.size(); i++) {
conv1.m_net_bwd_data[i].execute(s, conv1.m_net_bwd_data_args[i]);
}

from onednn.

igorsafo commented on June 7, 2024

Hi @w1005444804 ,
Thank you for the information. I created an internal tracker for this issue, however I can't guarantee this issue will be fixed until we have more requests/use cases for this particular shape.

from onednn.

Is dnnl::convolution_backward_data running on a single core？ about onednn HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent