alessandroaimar / ryg_rans Goto Github PK
View Code? Open in Web Editor NEWThis project forked from rygorous/ryg_rans
Simple rANS encoder/decoder (arithmetic coding-ish entropy coder).
This project forked from rygorous/ryg_rans
Simple rANS encoder/decoder (arithmetic coding-ish entropy coder).
This is a public-domain implementation of several rANS variants. rANS is an entropy coder from the ANS family, as described in Jarek Duda's paper "Asymmetric numeral systems" (http://arxiv.org/abs/1311.2540). - "rans_byte.h" has a byte-aligned rANS encoder/decoder and some comments on how to use it. This implementation should work on all 32-bit architectures. "main.cpp" is an example program that shows how to use it. - "rans64.h" is a 64-bit version that emits entire 32-bit words at a time. It is (usually) a good deal faster than rans_byte on 64-bit architectures, and also makes for a very precise arithmetic coder (i.e. it gets quite close to entropy). The trade-off is that this version will be slower on 32-bit machines, and the output bitstream is not endian-neutral. "main64.cpp" is the corresponding example. - "rans_word_sse41.h" has a SIMD decoder (SSE 4.1 to be precise) that does IO in units of 16-bit words. It has less precision than either rans_byte or rans64 (meaning that it doesn't get as close to entropy) and requires at least 4 independent streams of data to be useful; however, it is also a good deal faster. "main_simd.cpp" shows how to use it. See my blog http://fgiesen.wordpress.com/ for some notes on the design. I've also written a paper on interleaving output streams from multiple entropy coders: http://arxiv.org/abs/1402.3392 this documents the underlying design for "rans_word_sse41", and also shows how the same approach generalizes to e.g. GPU implementations, provided there are enough independent contexts coded at the same time to fill up a warp/wavefront or whatever your favorite GPU's terminology for its native SIMD width is. Finally, there's also "main_alias.cpp", which shows how to combine rANS with the alias method to get O(1) symbol lookup with table size proportional to the number of symbols. I presented an overview of the underlying idea here: http://fgiesen.wordpress.com/2014/02/18/rans-with-static-probability-distributions/ Results on my machine (Sandy Bridge i7-2600K) with rans_byte in 64-bit mode: ---- rANS encode: 12896496 clocks, 16.8 clocks/symbol (192.8MiB/s) 12486912 clocks, 16.2 clocks/symbol (199.2MiB/s) 12511975 clocks, 16.3 clocks/symbol (198.8MiB/s) 12660765 clocks, 16.5 clocks/symbol (196.4MiB/s) 12550285 clocks, 16.3 clocks/symbol (198.2MiB/s) rANS: 435113 bytes 17023550 clocks, 22.1 clocks/symbol (146.1MiB/s) 18081509 clocks, 23.5 clocks/symbol (137.5MiB/s) 16901632 clocks, 22.0 clocks/symbol (147.1MiB/s) 17166188 clocks, 22.3 clocks/symbol (144.9MiB/s) 17235859 clocks, 22.4 clocks/symbol (144.3MiB/s) decode ok! interleaved rANS encode: 9618004 clocks, 12.5 clocks/symbol (258.6MiB/s) 9488277 clocks, 12.3 clocks/symbol (262.1MiB/s) 9460194 clocks, 12.3 clocks/symbol (262.9MiB/s) 9582025 clocks, 12.5 clocks/symbol (259.5MiB/s) 9332017 clocks, 12.1 clocks/symbol (266.5MiB/s) interleaved rANS: 435117 bytes 10687601 clocks, 13.9 clocks/symbol (232.7MB/s) 10637918 clocks, 13.8 clocks/symbol (233.8MB/s) 10909652 clocks, 14.2 clocks/symbol (227.9MB/s) 10947637 clocks, 14.2 clocks/symbol (227.2MB/s) 10529464 clocks, 13.7 clocks/symbol (236.2MB/s) decode ok! ---- And here's rans64 in 64-bit mode: ---- rANS encode: 10256075 clocks, 13.3 clocks/symbol (242.3MiB/s) 10620132 clocks, 13.8 clocks/symbol (234.1MiB/s) 10043080 clocks, 13.1 clocks/symbol (247.6MiB/s) 9878205 clocks, 12.8 clocks/symbol (251.8MiB/s) 10122645 clocks, 13.2 clocks/symbol (245.7MiB/s) rANS: 435116 bytes 14244155 clocks, 18.5 clocks/symbol (174.6MiB/s) 15072524 clocks, 19.6 clocks/symbol (165.0MiB/s) 14787604 clocks, 19.2 clocks/symbol (168.2MiB/s) 14736556 clocks, 19.2 clocks/symbol (168.8MiB/s) 14686129 clocks, 19.1 clocks/symbol (169.3MiB/s) decode ok! interleaved rANS encode: 7691159 clocks, 10.0 clocks/symbol (323.3MiB/s) 7182692 clocks, 9.3 clocks/symbol (346.2MiB/s) 7060804 clocks, 9.2 clocks/symbol (352.2MiB/s) 6949201 clocks, 9.0 clocks/symbol (357.9MiB/s) 6876415 clocks, 8.9 clocks/symbol (361.6MiB/s) interleaved rANS: 435120 bytes 8133574 clocks, 10.6 clocks/symbol (305.7MB/s) 8631618 clocks, 11.2 clocks/symbol (288.1MB/s) 8643790 clocks, 11.2 clocks/symbol (287.7MB/s) 8449364 clocks, 11.0 clocks/symbol (294.3MB/s) 8331444 clocks, 10.8 clocks/symbol (298.5MB/s) decode ok! ---- Finally, here's the rans_word_sse41 decoder on an 8-way interleaved stream: ---- SIMD rANS: 435626 bytes 4597641 clocks, 6.0 clocks/symbol (540.8MB/s) 4514356 clocks, 5.9 clocks/symbol (550.8MB/s) 4780918 clocks, 6.2 clocks/symbol (520.1MB/s) 4532913 clocks, 5.9 clocks/symbol (548.5MB/s) 4554527 clocks, 5.9 clocks/symbol (545.9MB/s) decode ok! ---- There's also an experimental 16-way interleaved AVX2 version that hits faster rates still, developed by my colleague Won Chun; I will post it soon. Note that this is running "book1" which is a relatively short test, and the measurement setup is not great, so take the results with a grain of salt. -Fabian "ryg" Giesen, Feb 2014.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.