Giter Site home page Giter Site logo

williamyang98 / viterbidecodercpp Goto Github PK

View Code? Open in Web Editor NEW
7.0 2.0 2.0 218 KB

Viterbi decoder with vectorisation written in C++

License: GNU Lesser General Public License v2.1

CMake 0.28% C++ 98.97% C 0.75%
convolutional-encoder decoding-algorithm dsp viterbi-decoder

viterbidecodercpp's Introduction

Introduction

x86-windows x86-linux arm-linux

This is a C++ port of Phil Karn's Viterbi decoder which can be found here.

This is a header only library. Just copy and paste the header files to your desired location.

See examples/run_simple.cpp for a common usage scenario.

Modifications include:

  • Templated code for creating decoders of any code rate and constraint length
  • Branch table can be specified at runtime and shared between multiple decoders
  • Initial error values and renormalisation threshold can be specified
  • Vectorisation using intrinsics for arbitary constraint lengths (where possible) for significant speedups

Performance is similar to Phil Karn's original C implementation for provided decoders.

Heavy templating is used for better performance. Compared to code that uses constraint length (K) and code rate (R) as runtime parameters here, the templated version is up to 50% faster. This is because the compiler can perform more optimisations if the constraint length and code rate are known ahead of time.

Intrinsics support

For x86 processors AVX2 or SSE4.1 is required for vectorisation.

For arm processors aarch64 is required for vectorisation.

The following intrinsic implementations exist:

  • 16bit error metrics and soft decision values
  • 8bit error metrics and soft decision values

Each vectorisaton type requires the convolution code to have a minimum constraint length (K)

Type Width Kmin Speedup
Scalar 2 1x
x86 SSE4.1 16bit 5 8x
x86 SSE4.1 8bit 6 16x
x86 AVX2 16bit 6 16x
x86 AVX2 8bit 7 32x
ARM Neon 16bit 5 8x
ARM Neon 8bit 6 16x

Benchmarks show that the vectorised decoders have significiant speedups that can approach or supercede the theoretical values.

Using the 16bit and 8bit based intrinsics implementations as a guide you can make your own intrinsics implementation.

Additional notes

  • Significant performance improvements can be achieved with using offset binary
    • Soft decision values take the form of offset binary given by: 0 to N
    • The branch table needs values: 0 and N
    • Instead of performing a subtract then absolute, you can use an XOR operation which behaves like conditional negation
    • Refer to the original Phil Karn code for this improvement
    • Explanation with example
      • Branch table has values: 0 or 255
      • Soft decision values in offset binary: 0 to 255
      • Consider a soft decision value of x
      • If branch value is 0, XOR will return x
      • If branch value is 255, XOR will return 255-x
  • Performance improvements can be achieved with using signed integer types
    • Using signed integer types allows for the use of modular arithmetic instead of saturated arithmetic. This can provide a up to a 33% speed boost due to CPI decreasing from 0.5 to 0.33.
    • Unsigned integer types are used since they increase the range of error values after renormalisation, and saturated arithmetic will prevent overflows/underflows.
  • The implementations uses template parameters and static asserts to:
    • Check if the provided constraint length and code rate meet the vectorisation requirements
    • Generate aligned data structures and provide the compiler more information about the decoder for better optimisation
  • Unsigned 8bit error metrics have severe limitations. These include:
    • Limited range of soft decision values to avoid overflowing past the renormalisation threshold.
    • Higher code rates (such as Cassini) will quickly reach the renormalisation threshold and deteriorate in accuracy. This is because the maximum error for each branch is a multiple of the code rate.
  • If you are only interested in hard decision decoding then using 8bit error metrics is the better choice
    • Use hard decision values [-1,+1] of type int8_t
    • Due to the small maximum branch error of 2*R we can set the renormalisation threshold to be quite high.
    • renormalisation_threshold = UINT8_MAX - (2*R*10)
    • This is less accurate compared to 16bit soft decision decoding but with up to 2x performance due to the usage of 8bit values.
    • image
  • Depending on your usage requirements changes to the library are absolutely encouraged
  • Additionally check out Phil Karn's fine tuned assembly code here for the best possible performance
  • This code is not considered heavily tested and your mileage may vary. Refer to /examples for applications to benchmark performance and verify accuracy.

Useful alternatives

  • Spiral project aims to auto-generate high performance code for any input parameters
  • ka9q/libfec is Phil Karn's original C implementation

viterbidecodercpp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

jimzgchow mfkiwl

viterbidecodercpp's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.