Hi, appreciate for your work! I am trying to build a Residual-like N

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, I check the you provide and "ADD" is not included in the so the scri

arm_elementwise_add_s8 does not work properly for a Residual Architecture about cmsis-nn HOT 7 CLOSED

arm-software commented on July 30, 2024

arm_elementwise_add_s8 does not work properly for a Residual Architecture

from cmsis-nn.

Comments (7)

mansnils commented on July 30, 2024

Hi @0xyd,

CMSIS-NN should be compatible with TFLM and TFL reference kernels. Tensorflow lite (TFL) also supports other kernels other than the reference kernels, for example optimized kernels. They might sometimes differ slightly and CMSIS-NN is only compatible with the reference kernels. Sometimes even TFL and TFLM reference kernels might differ for some operator.
It seems you were able to run the model with TFLM with CMSIS-NN kernels?
If yes are you able to run the model with TFLM reference kernels?
The output should match.

One way to make sure it is a fair match could be to use the network tester example:
https://github.com/tensorflow/tflite-micro/tree/main/tensorflow/lite/micro/examples/network_tester
It requires converting the model to tflite file format and then in source file format with xxd.

from cmsis-nn.

0xyd commented on July 30, 2024

Hello, thank you for your reply. I actually don't know what "run the model with TFLM with CMSIS-NN kernels" means. But my approach is based on C code generation: I first build a model in tensorflow, convert it into a quantized model in tensorflow-lite, and parse its computational graph to get the parameters I require. My platform is a bit restricted so I have to use its tool chain for building models from C.

from cmsis-nn.

mansnils commented on July 30, 2024

CMSIS-NN is a library so the recommended way is to run a model with Tensorflow Lite Micro (TFLM), which then use CMSIS-NN kernels for any operators supported by CMSIS-NN. Calling CMSIS-NN directly is cumbersome and error prone. Since you would need to calculate many parameters and multipliers/shifts manually. Maybe this is what you are doing?
What I meant was that you could use xxd to convert your model in tflite format into: https://github.com/tensorflow/tflite-micro/blob/main/tensorflow/lite/micro/examples/network_tester/network_model.h and then run the network tester example, with and without CMSIS-N kernels, print and compare the output for each:
make -f tensorflow/lite/micro/tools/make/Makefile OPTIMIZED_KERNEL_DIR=cmsis_nn TARGET=cortex_m_corstone_300 TARGET_ARCH=cortex-m55 test_network_tester_test
make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_corstone_300 TARGET_ARCH=cortex-m55 test_network_tester_test
Or if you could provide a script that can generate your model in tflite format, I could check it.

from cmsis-nn.

0xyd commented on July 30, 2024

Yes, my code calculates the multipliers/shifts based on the quantization parameters stored in the Tensorflow-Lite model. So far, convolution, maxpooling, avgpooling, fully-connected, and softmax work fine. (The output of these operations are identical to the TF Lite models) The link you provide is testing with the depthwise convolution which I haven't checked yet. So what you suggest is to follow a similar way to build an example with element-wise add operator?
The way I generate the element-wise add code is very similar to here. As my previous Residual block shows, there are two input tensors for the KL.Add: One is the output of the first convolution layer and the other one is the last convolution layer. The KL.Add has one output tensor in the end. Using the testing code as the instruction, I implement the code to calculate the multipliers/shifts as following:

inputScale_1 = inputTensor_1['quantization']['scale']
inputZeroPoint_1 = -1 * inputTensor_1['quantization']['zeroPoint']
inputScale_2 = inputTensor_2['quantization']['scale']
inputZeroPoint_2 = -1 * inputTensor_2['quantization']['zeroPoint']

doubleMaxInputScale = max(inputScale_1, inputScale_2) * 2

inputMultiplier_1, shift_1 = math.frexp(inputScale_1 / doubleMaxInputScale)
inputMultiplier_1 = round(inputMultiplier_1 * (1 << 31))
inputMultiplier_2, shift_2 = math.frexp(inputScale_2 / doubleMaxInputScale)
inputMultiplier_2 = round(inputMultiplier_2 * (1 << 31))

outputScale = outputTensor['quantization']['scale']
outputZeroPoint = outputTensor['quantization']['zeroPoint']


leftShift = 20
realOutputScale = doubleMaxInputScale / ((1 << leftShift) * outputScale)
outputMultiplier, outputShift = math.frexp(realOutputScale)
outputMultiplier = round(outputMultiplier * (1 << 31))

The model is quantized with INT8 scheme, so I use the arm_elementwise_add_s8 in my CMSIS-NN code. Because I just mimic the testing code, several things I am not so sure are:

What is leftShift and why is it 20?
Why do we need to find the maximum input scale of two input tensors? Why do we need to double it?
Why do we need to calibrate the output scale?

I check the CMSIS-NN paper but I still coudn't figure it out. It will be nice if you know how this mechanism is implemented in TFLM with CMSIS-NN together.

For the tflite model, you can just copy the code at the beginning of the thread and quantize it with INT8 scheme.

from cmsis-nn.

mansnils commented on July 30, 2024

What I am suggesting is based on your first question (I am wondering is the function proved to be compatible with tensorflow-lite or do I miss something?) is some way to verify the generated model to be bit exact between Tensorflow Lite and TFLM+CMSIS-NN kernels. I am pretty sure that there is no diff by looking at the code you provided. However, another way to do is apart from what I already suggested might be to use this script, which will extract multipliers etc for a given model so that CMSIS-NN can use that information and run the model without TFLM: https://github.com/ARM-software/CMSIS-NN/blob/main/Tests/UnitTest/model_extractor.py
Perhaps this will give you some insight related your own script, as the scripts (model_extractor.py and yours) should generate the same output basically.
The short answer to your questions, 1-3 is that if that is not done, CMSIS-NN will not be bit exact against Tensorflow reference kernels.

from cmsis-nn.

0xyd commented on July 30, 2024

Hi, I check the script you provide and "ADD" is not included in the script so the script is not helpful for me to check if I make ADD operation correctly or not. (The script will just skip the ADD apparently)
Since arm_elementwise_add_s8 is already existed in commit here, I expect the function has been tested with tensorflow-lite. But maybe it is not tested when input is 3 dimension feature maps?

from cmsis-nn.

mansnils commented on July 30, 2024

Can't really see a difference between TFLM reference and CMSIS-NN kernels related input dimension, or perhaps you ADD requires broadcast, in that case CMSIS-NN does not support it: https://github.com/tensorflow/tflite-micro/blob/2f2c744672f176d3b774cb105402ba705922ca7e/tensorflow/lite/micro/kernels/cmsis_nn/add.cc#L136
I can't say for sure if it is a bug or something not supported, or something with your setup. If you can provide a script that generates a tflite file with the ADD, I could verify it.

from cmsis-nn.

arm_elementwise_add_s8 does not work properly for a Residual Architecture about cmsis-nn HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent