I use ggml to deploy the mobilenetv2 model, and compared with the deployment using onn

Did you build in Release? What do the apply_</c

ggml inference time is significantly slower than onnxruntime about ggml HOT 6 OPEN

Francis235 commented on August 19, 2024

ggml inference time is significantly slower than onnxruntime

from ggml.

Comments (6)

ggerganov commented on August 19, 2024

Did you build in Release? What do the apply_ functions do?

The ggml convolution operations are for sure not very optimal, but 100x difference is too much

from ggml.

Francis235 commented on August 19, 2024

Thanks for your reply. I build in master branch. apply_ functions is the wrappper of conv, as follow:

static ggml_tensor * apply_conv2d_no_clamp(ggml_context * ctx, ggml_tensor * input, const conv2d_layer & layer)
{
    ggml_tensor * result = ggml_conv_2d(ctx,  layer.weights, input, \
        layer.stride,  layer.stride, \
        layer.padding,  layer.padding, \
        layer.dilation,  layer.dilation);
    return result;
}

static ggml_tensor * apply_conv2d(ggml_context * ctx, ggml_tensor * input, const conv2d_layer & layer)
{
    ggml_tensor * result = ggml_conv_2d(ctx, layer.weights, input, layer.stride, layer.stride, layer.padding, layer.padding, layer.dilation, layer.dilation);
    result = ggml_clamp(ctx, result, 0.0f, 6.0f);
    return result;
}

static ggml_tensor * apply_conv_depthwise_2d(ggml_context * ctx, ggml_tensor * input, const conv2d_layer & layer)
{
    ggml_tensor * result = ggml_conv_depthwise_2d(ctx, layer.weights, input, layer.stride, layer.stride, layer.padding, layer.padding, layer.dilation, layer.dilation);
    result = ggml_clamp(ctx, result, 0.0f, 6.0f);
    return result;
}

Did you build in Release? What do the apply_ functions do?

The ggml convolution operations are for sure not very optimal, but 100x difference is too much

from ggml.

Francis235 commented on August 19, 2024

Did you build in Release? What do the apply_ functions do?

The ggml convolution operations are for sure not very optimal, but 100x difference is too much

I tested mobilenetv2 inference on the release branch code, and the inference time was about the same.

from ggml.

ggerganov commented on August 19, 2024

By Release I mean to build with -O3 optimizaion flags. What hardware are you running on?

from ggml.

Francis235 commented on August 19, 2024

By Release I mean to build with -O3 optimizaion flags. What hardware are you running on?

I build with -O3 flags, the inference time has been accelerated, but it is still not ideal, about 15x slower than onnxruntime inference. I test on my PC, CPU info: Intel(R) Core(TM) i7-7560U CPU @ 2.40GHz 2.40 GHz.

from ggml.

ggerganov commented on August 19, 2024

Make sure you are building with AVX2 support and ramp up the threads a bit:

const int n_threads = 4;
ggml_graph_compute_with_ctx(ctx0, gf, n_threads);

from ggml.

ggml inference time is significantly slower than onnxruntime about ggml HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent