Giter Site home page Giter Site logo

py-tans's Introduction

Py-tANS

This repository contains an implementation of the tANS algorithm developer by both Yann Collet and Jarek Duda.

Asymmetric Numeral Systems is an approach to entropy encoding discovered by Jarek Duda, detailed in the following report. https://arxiv.org/abs/1311.2540

The code in this repository is an adaptation of the method of tANS used by Yann Collet within Zstd, developed for Facebook, specifically the Finite State Entropy code. The original code detailed its usage with the C programming language while this repo contains a Python Implementation. https://github.com/Cyan4973/FiniteStateEntropy http://fastcompression.blogspot.com/2013/12/finite-state-entropy-new-breed-of.html

Algorithm Overview

tANS or FSE is a method of range coding using an Asymmetric Numeral System. A set of states are used so that the probability of a state occuring is as close to equal as possible as the probability of the state it represents.

The action of changing states is analagous to changing state within a finite state machine, where we can output new symbols to a bit stream when we change state by encoding one more character.

To encode symbols using the number of bits that we can theoretically reach by calculating the optimal bits per symbol using the Shannon Entropy calculation, is the hardest part of this algorithm to understand. We only output an integer number of bits for symbols which may have a fractional number of bits as their Shannon Entropy optimal value, therefore we must occasionally use fewer or more bits to encode a symbol as to best match the fractional bits used on average.

This process can be achieved by outputting a variable number of bits depending on which "subrange" of states the current state falls into. The idea is that the number of bits output will not be optimal on a one output basis, but instead will average the the correct fraction.

Another factor to consider is the concept of moving between states, which correctly tell us what symbol was last used. This is accomplished by outputting the Binary represntation of how many states we must move as the bit output. This therefore requires the spacing between states to be consistent with the sub-range bits used seen in the above picture.

To acomplish this problem, we spread the states across the state space in such a way to provide the correct gaps between sub-range values.

This spreading of symbols can be done in multiple different ways, and can even be scrambled randomly based on some cryptographic key, meaning we may not keep optimal performance but will actually provide security by only being able to decode using a correct key.

Limitations

There are issues associated with this algorithm such as the fragility of the encoding, where if one bit is corrupted in our bitstream, we may have a totally different output as a consequence.

py-tans's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.