Giter Site home page Giter Site logo

ARM performance about nrsc5 HOT 8 CLOSED

theori-io avatar theori-io commented on June 27, 2024
ARM performance

from nrsc5.

Comments (8)

mrbubble62 avatar mrbubble62 commented on June 27, 2024

On Allwinner R8 (C.H.I.P) still more work to do. Definite improvement, playing back sample.xz, dropping out <50% vs >90%
CMAKE_C_FLAGS "-mcpu=cortex-a8 -mfloat-abi=hard -mfpu=neon"
-DUSE_THREADS=ON -DUSE_NEON=ON -DUSE_FAST_MATH=ON

Have to say works brilliantly on i386 :) TY

from nrsc5.

awesie avatar awesie commented on June 27, 2024

If you are testing with sample.xz, make sure that you decompress it first, and then test the performance. The xz tool itself will use quite a bit of CPU.

from nrsc5.

mrbubble62 avatar mrbubble62 commented on June 27, 2024

decompressed sample but no detectable difference with nrsc5 -r ../support/sample 0

from nrsc5.

awesie avatar awesie commented on June 27, 2024

Great to know, thanks!

from nrsc5.

awesie avatar awesie commented on June 27, 2024

I decreased the number of taps in the filters when USE_FAST_MATH is set. This should shave off another 10~20% of CPU usage. I would be curious if this makes things any better.

Useful metrics for performance would be:

time src/nrsc5 -r sample -o /dev/null -f wav -q 0
time src/nrsc5 -r sample -o /dev/null -f adts -q 0

This will tell how much time is required to process the data, and how much time is required to process the data and decode to audio.

from nrsc5.

mrbubble62 avatar mrbubble62 commented on June 27, 2024

results

chip@chip:~/nrsc5/build$ time src/nrsc5 -r sample -o /dev/null -f adts -q 0
real    0m0.238s
user    0m0.215s
sys     0m0.020s
chip@chip:~/nrsc5/build$ time src/nrsc5 -r sample -o /dev/null -f wav -q 0
real    0m0.218s
user    0m0.205s
sys     0m0.015s

from nrsc5.

mrbubble62 avatar mrbubble62 commented on June 27, 2024

Performance has definitely improved, from strong signal audio decodes occasionally.

chip@chip:~/nrsc5/build$ nrsc5 -p 12  88500000 0
12:10:30 INFO  main.c:176: [0] Generic RTL2832U OEM
Found Rafael Micro R820T tuner
Exact sample rate is: 1488375.071248 Hz
12:10:31 INFO  main.c:63: Gain: 0.0 dB, CNR: 13.824152 dB
12:10:31 INFO  main.c:63: Gain: 0.9 dB, CNR: 14.034353 dB
12:10:32 INFO  main.c:63: Gain: 1.4 dB, CNR: 14.064837 dB
12:10:32 INFO  main.c:63: Gain: 2.7 dB, CNR: 14.218107 dB
12:10:32 INFO  main.c:63: Gain: 3.7 dB, CNR: 14.165344 dB
12:10:33 INFO  main.c:63: Gain: 7.7 dB, CNR: 13.962760 dB
12:10:33 INFO  main.c:63: Gain: 8.7 dB, CNR: 13.858078 dB
12:10:33 INFO  main.c:63: Gain: 12.5 dB, CNR: 13.359507 dB
12:10:34 INFO  main.c:63: Gain: 14.4 dB, CNR: 13.144488 dB
12:10:34 INFO  main.c:63: Gain: 15.7 dB, CNR: 12.828616 dB
12:10:35 INFO  main.c:63: Gain: 16.6 dB, CNR: 12.347807 dB
12:10:35 INFO  main.c:63: Gain: 19.7 dB, CNR: 10.950316 dB
12:10:35 DEBUG main.c:67: Best gain: 27
12:10:38 INFO  input.c:154: CFO: 1090.118408 Hz (12 ppm)
12:10:38 DEBUG sync.c:244: First block @ 15
12:10:39 INFO  sync.c:222: Synchronized!
12:10:41 INFO  sync.c:298: MER: 7.237570 dB (lower), 7.242609 dB (upper)
12:10:41 INFO  decode.c:74: BER: 0.000027, avg: 0.000027, min: 0.000027, max: 0.000027
12:10:41 DEBUG frame.c:168: pdu_seq: 1, seq: 32, nop: 33
12:10:41 DEBUG frame.c:197: ignoring partial pdu
12:10:43 INFO  sync.c:298: MER: 7.404940 dB (lower), 7.376904 dB (upper)
12:10:44 INFO  decode.c:74: BER: 0.000022, avg: 0.000025, min: 0.000022, max: 0.000027
12:10:44 DEBUG frame.c:168: pdu_seq: 0, seq: 0, nop: 33
12:10:46 INFO  sync.c:298: MER: -1.457466 dB (lower), -3.960696 dB (upper)
12:10:47 INFO  decode.c:74: BER: 0.062330, avg: 0.020793, min: 0.000022, max: 0.062330
12:10:47 DEBUG frame.c:168: pdu_seq: 1, seq: 32, nop: 33
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 DEBUG sync.c:199: lost sync (-1, -1)!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 DEBUG sync.c:244: First block @ 11
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:48 ERROR input.c:265: input buffer overflow!
12:10:49 DEBUG sync.c:244: First block @ 30
12:10:50 DEBUG sync.c:244: First block @ 3
12:10:50 DEBUG sync.c:244: First block @ 1
12:10:51 DEBUG sync.c:244: First block @ 0
12:10:51 INFO  sync.c:222: Synchronized!
12:10:52 INFO  acquire.c:98: Timing offset: 642.187500, slope: -4.199219 (adjust)
12:10:52 INFO  sync.c:298: MER: 6.963532 dB (lower), 6.934787 dB (upper)
12:10:53 INFO  decode.c:74: BER: 0.000022, avg: 0.015600, min: 0.000022, max: 0.062330
12:10:53 DEBUG frame.c:168: pdu_seq: 0, seq: 0, nop: 33
12:10:53 ERROR output.c:125: Decode error: Array index out of range
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:54 ERROR input.c:265: input buffer overflow!
12:10:56 INFO  sync.c:298: MER: 0.483052 dB (lower), -0.037717 dB (upper)
12:10:56 INFO  decode.c:74: BER: 0.207394, avg: 0.053959, min: 0.000022, max: 0.207394

from nrsc5.

argilo avatar argilo commented on June 27, 2024

#95, #106 and #107 have made significant improvements in ARM performance, and it looks like USE_FAST_MATH is no longer required. 15-minute load average is around 0.55 on a Raspberry Pi 3 with USE_NEON. I have some further improvement in mind, but I think I'll close this issue for now as ARM performance seems to be adequate already.

from nrsc5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.