Giter Site home page Giter Site logo

rainerzufalldererste / hypersonic-rle-kit Goto Github PK

View Code? Open in Web Editor NEW
25.0 5.0 6.0 1.73 MB

The fastest Run-Length-Encoding on the Planet (for x64)

Home Page: https://rainerzufalldererste.github.io/hypersonic-rle-kit/

License: BSD 2-Clause "Simplified" License

Batchfile 0.09% Lua 0.38% C 99.52%
compression compression-algorithm rle rle-compression-algorithm runlengthencoding c simd-variants avx2 avx avx-512

hypersonic-rle-kit's Introduction

hypersonic rle kit logo

What is it?

  • A collection of 120+ RLE and related codecs optimized for all kinds of different inputs and scenarios.
  • Usually the fastest run length en/decoder by far. Single Core Decompression Speeds > 34 GB/s and Compression Speeds > 28 GB/s have been observed with large files. (small files can exceed 120 GB/s decode, 60 GB/s encode)
  • Written in C.
  • SIMD Variants for AVX-512F, AVX2, AVX, SSE4.1, SSSE3 and SSE2 variants are available for various decoders and encoders. Automatically picked at runtime based on the extensions available on the current platform.
  • Variants include: Single RLE Symbol, Short Strings of RLE Symbols, Byte Alignmed, Symbol Aligned, 8 Bit, 16 Bit, 24 Bit, 32 Bit, 48 Bit, 64 Bit, 128 Bit, Different probabilities of reoccuring symbols, ...
  • OpenCL variant available for some of the decoders.
  • Licensed under Two Clause BSD.

Benchmark

See Full Benchmark with Graphs, the tables below only contain a tiny selection of the 100+ codecs.

hypersonic rle kit pareto graph screenshot

  • Single-Threaded
  • Running on an AMD Ryzen 9 7950X, 32 GB DDR5-6000 CL30 on Windows 11.
  • Compiled with Visual Studio 2022.
  • Compared to TurboRLE and MRLE (with AVX2 enabled; benchmarking results have been converted from MB/s to MiB/s) (7f6e486, April 2023)
  • Contained Codecs w/ Compression / Decompression Speed vs. Ratio Pareto + Notable Entropy Highlighted

video-frame.raw (heavily quantized video frame DCTs, 88,473,600 Bytes)

Type Ratio Encoding Speed Decoding Speed Entropy
Compressible To
8 Bit 19.34 % 3,319.6 MiB/s 25,830.4 MiB/s 9.50 %
8 Bit Packed 17.95 % 2,907.5 MiB/s 19,783.6 MiB/s 9.69 %
8 Bit 1LUT Short 16.93 % 2,365.1 MiB/s 11,069.1 MiB/s 9.87 %
8 Bit 3LUT 17.41 % 2,146.7 MiB/s 15,819.2 MiB/s 9.80 %
8 Bit 3LUT Short 16.49 % 2,085.6 MiB/s 8,100.4 MiB/s 9.93 %
8 Bit 7LUT 17.46 % 2,094.6 MiB/s 16,100.9 MiB/s 9.94 %
8 Bit Single 18.31 % 1,654.7 MiB/s 23,750.2 MiB/s 9.71 %
8 Bit Single Short 17.05 % 1,414.4 MiB/s 14,143.6 MiB/s 9.81 %
16 Bit 1LUT Short (Symbol) 18.23 % 3,783.5 MiB/s 13,705.5 MiB/s 9.72 %
16 Bit 1LUT Short (Byte) 18.11 % 3,745.3 MiB/s 12,951.5 MiB/s 9.90 %
16 Bit 3LUT (Byte) 17.74 % 3,708.2 MiB/s 15,577.4 MiB/s 9.82 %
16 Bit 7LUT (Byte) 17.53 % 3,526.7 MiB/s 13,249.8 MiB/s 9.96 %
16 Bit 7LUT Short Greedy (Byte) 15.37 % 691.3 MiB/s 3,075.2 MiB/s 10.59 %
24 Bit Packed (Byte) 19.19 % 6,890.9 MiB/s 21,560.0 MiB/s 9.57 %
24 Bit 3LUT (Byte) 18.52 % 5,367.1 MiB/s 18,636.5 MiB/s 9.49 %
24 Bit 7LUT (Byte) 18.44 % 5,196.5 MiB/s 17,679.4 MiB/s 9.59 %
24 Bit 7LUT Short Greedy (Byte) 15.46 % 639.3 MiB/s 3,375.4 MiB/s 10.67 %
32 Bit Packed (Byte) 19.82 % 8,946.4 MiB/s 23,569.8 MiB/s 9.46 %
32 Bit 3LUT (Byte) 19.25 % 7,149.3 MiB/s 23,580.3 MiB/s 9.34 %
32 Bit 7LUT (Byte) 19.22 % 7,066.1 MiB/s 23,033.8 MiB/s 9.45 %
32 Bit 7LUT Short Greedy (Byte) 15.09 % 663.1 MiB/s 3,298.1 MiB/s 10.59 %
48 Bit Packed (Byte) 20.84 % 10,911.9 MiB/s 26,397.7 MiB/s 9.40 %
48 Bit 3LUT (Byte) 20.41 % 9,304.9 MiB/s 25,603.9 MiB/s 9.25 %
48 Bit 3LUT Short Greedy (Byte) 15.88 % 931.3 MiB/s 6,064.8 MiB/s 10.04 %
48 Bit 7LUT (Byte) 20.42 % 9,457.1 MiB/s 25,088.5 MiB/s 9.34 %
64 Bit Packed (Symbol) 22.34 % 12,697.9 MiB/s 26,375.4 MiB/s 9.23 %
64 Bit (Byte) 23.66 % 13,514.1 MiB/s 25,773.6 MiB/s 9.63 %
64 Bit Packed (Byte) 21.63 % 13,306.5 MiB/s 26,183.1 MiB/s 9.40 %
64 Bit 3LUT (Byte) 21.30 % 11,442.9 MiB/s 27,864.0 MiB/s 9.25 %
64 Bit 3LUT Short Greedy (Byte) 15.66 % 905.2 MiB/s 6,293.0 MiB/s 9.95 %
64 Bit 7LUT (Byte) 21.32 % 11,479.3 MiB/s 26,879.6 MiB/s 9.32 %
8 Bit RLE + Huffman-esque 12.51 % 332.0 MiB/s 1,676.7 MiB/s 9.82 %
Low Entropy 19.93 % 800.2 MiB/s 2,690.6 MiB/s 9.05 %
Low Entropy Single 19.96 % 802.4 MiB/s 4,389.0 MiB/s 9.10 %
- - - - -
memcpy 100.00 % 28,667.8 MiB/s 28,590.1 MiB/s 14.03 %
trle 16.0 % 1,177.7 MiB/s 5,356.0 MiB/s -
srle 0 17.8 % 1,182.3 MiB/s 11,214.7 MiB/s -
srle 8 18.7 % 2,044.3 MiB/s 12,300.2 MiB/s -
srle 16 21.3 % 1,997.0 MiB/s 15,479.0 MiB/s -
srle 32 24.2 % 3,782.2 MiB/s 18,975.8 MiB/s -
srle 64 27.5 % 6,581.4 MiB/s 23,128.3 MiB/s -
mrle 19.7 % 450.3 MiB/s 3,168.4 MiB/s -

1034.db (Checkers program "End Game Table Base", 419,225,625 Bytes)

Type Ratio Encoding Speed Decoding Speed Entropy
Compressible To
8 Bit 23.02 % 2,443.8 MiB/s 21,198.0 MiB/s 12.08 %
8 Bit Packed 20.10 % 2,199.4 MiB/s 15,151.8 MiB/s 12.74 %
8 Bit 1LUT Short 18.91 % 1,767.7 MiB/s 7,905.5 MiB/s 12.46 %
8 Bit 3LUT 19.77 % 1,841.7 MiB/s 14,211.6 MiB/s 12.31 %
8 Bit 7LUT 19.75 % 1,768.7 MiB/s 14,629.5 MiB/s 12.50 %
8 Bit Single 20.59 % 1,325.0 MiB/s 18,469.7 MiB/s 12.06 %
8 Bit Single Short 18.55 % 1,157.3 MiB/s 9,756.9 MiB/s 12.09 %
16 Bit Packed (Byte) 20.65 % 3,378.7 MiB/s 15,516.6 MiB/s 12.59 %
16 Bit 3LUT (Byte) 20.20 % 2,641.6 MiB/s 15,500.6 MiB/s 12.30 %
16 Bit 7LUT (Byte) 20.14 % 2,708.0 MiB/s 13,971.5 MiB/s 12.35 %
24 Bit Packed (Byte) 22.05 % 4,233.2 MiB/s 17,670.5 MiB/s 12.34 %
24 Bit 3LUT (Byte) 21.52 % 3,448.9 MiB/s 16,464.5 MiB/s 11.96 %
32 Bit Packed (Byte) 23.44 % 5,240.1 MiB/s 17,864.4 MiB/s 12.15 %
32 Bit 3LUT (Byte) 22.86 % 4,289.1 MiB/s 20,829.1 MiB/s 11.79 %
32 Bit 7LUT (Byte) 22.89 % 4,294.6 MiB/s 20,065.2 MiB/s 11.86 %
48 Bit Packed (Byte) 25.87 % 6,516.7 MiB/s 22,210.4 MiB/s 11.95 %
48 Bit 3LUT (Byte) 25.41 % 5,712.4 MiB/s 20,904.4 MiB/s 11.66 %
64 Bit 3LUT (Symbol) 28.66 % 6,464.5 MiB/s 23,401.9 MiB/s 11.56 %
64 Bit (Byte) 31.66 % 8,034.7 MiB/s 21,153.2 MiB/s 12.16 %
64 Bit Packed (Byte) 27.29 % 7,836.3 MiB/s 22,219.0 MiB/s 11.90 %
64 Bit 3LUT (Byte) 26.90 % 6,780.3 MiB/s 23,777.1 MiB/s 11.63 %
64 Bit 7LUT (Byte) 27.05 % 6,839.2 MiB/s 22,970.8 MiB/s 11.76 %
128 Bit (Byte) 37.25 % 6,763.4 MiB/s 24,011.4 MiB/s 12.28 %
128 Bit Packed (Byte) 31.21 % 6,582.9 MiB/s 23,994.6 MiB/s 11.98 %
8 Bit RLE + Huffman-esque 16.76 % 326.5 MiB/s 1,456.9 MiB/s 12.36 %
Low Entropy 21.15 % 799.7 MiB/s 4,282.1 MiB/s 10.37 %
Low Entropy Single 21.15 % 765.9 MiB/s 4,292.2 MiB/s 10.37 %
- - - - -
memcpy 100.00 % 26,689.1 MiB/s 27,304.6 MiB/s 16.76 %
trle 17.4 % 1,112.8 MiB/s 4,097.4 MiB/s -
srle 0 20.2 % 1,080.3 MiB/s 8,027.7 MiB/s -
srle 8 22.0 % 1,816.5 MiB/s 8,802.3 MiB/s -
srle 16 27.1 % 1,644.2 MiB/s 9,915.1 MiB/s -
srle 32 32.7 % 2,870.4 MiB/s 13,289.7 MiB/s -
srle 64 39.5 % 4,763.9 MiB/s 18,966.4 MiB/s -
mrle 21.0 % 446.9 MiB/s 3,115.4 MiB/s -

enwik9.bwt (Wikipedia extract enwiki9 encoded using libdivsufsort, 1,000,000,124 Bytes)

Type Ratio Encoding Speed Decoding Speed Entropy
Compressible To
8 Bit 48.80 % 1,225.6 MiB/s 16,097.6 MiB/s 34.13 %
8 Bit Short 44.42 % 899.3 MiB/s 6,975.0 MiB/s 32.26 %
8 Bit Packed 44.86 % 1,038.7 MiB/s 7,514.9 MiB/s 32.95 %
8 Bit 1LUT Short 41.87 % 828.2 MiB/s 4,088.0 MiB/s 31.38 %
8 Bit 3LUT 43.29 % 775.1 MiB/s 6,116.6 MiB/s 32.27 %
8 Bit 3LUT Short 40.34 % 765.3 MiB/s 3,046.5 MiB/s 30.63 %
8 Bit 7LUT 42.98 % 720.4 MiB/s 4,845.2 MiB/s 32.42 %
16 Bit Packed (Symbol) 47.89 % 1,871.4 MiB/s 9,417.5 MiB/s 34.25 %
16 Bit 1LUT Short (Symbol) 46.43 % 1,425.4 MiB/s 5,926.8 MiB/s 33.47 %
16 Bit Packed (Byte) 46.99 % 1,975.6 MiB/s 8,322.4 MiB/s 33.96 %
16 Bit 1LUT Short (Byte) 46.15 % 1,404.6 MiB/s 5,410.5 MiB/s 33.47 %
16 Bit 3LUT (Byte) 44.91 % 1,332.9 MiB/s 5,786.9 MiB/s 33.36 %
16 Bit 7LUT (Byte) 44.17 % 1,269.3 MiB/s 4,792.9 MiB/s 33.00 %
24 Bit Packed (Byte) 49.39 % 2,846.0 MiB/s 10,096.3 MiB/s 35.00 %
24 Bit 3LUT (Byte) 47.63 % 2,154.2 MiB/s 8,316.6 MiB/s 34.23 %
24 Bit 7LUT (Byte) 47.15 % 2,021.5 MiB/s 7,081.2 MiB/s 33.86 %
32 Bit Packed (Byte) 51.63 % 3,991.3 MiB/s 12,316.8 MiB/s 35.94 %
32 Bit 3LUT (Byte) 50.02 % 3,049.7 MiB/s 11,014.7 MiB/s 35.15 %
32 Bit 7LUT (Byte) 49.72 % 2,929.3 MiB/s 9,734.1 MiB/s 34.97 %
48 Bit (Byte) 57.58 % 5,568.3 MiB/s 17,036.4 MiB/s 38.86 %
48 Bit Packed (Byte) 55.01 % 5,463.9 MiB/s 16,106.6 MiB/s 37.48 %
48 Bit 3LUT (Byte) 53.66 % 4,511.6 MiB/s 14,552.7 MiB/s 36.78 %
48 Bit 7LUT (Byte) 53.48 % 4,362.4 MiB/s 13,340.3 MiB/s 36.64 %
64 Bit Packed (Symbol) 58.87 % 6,995.2 MiB/s 17,252.6 MiB/s 39.42 %
64 Bit (Byte) 59.94 % 7,594.6 MiB/s 17,821.4 MiB/s 40.05 %
64 Bit Packed (Byte) 57.33 % 7,285.0 MiB/s 16,557.0 MiB/s 38.65 %
64 Bit 1LUT Short (Byte) 57.24 % 6,176.4 MiB/s 16,263.8 MiB/s 38.60 %
64 Bit 3LUT (Byte) 56.28 % 6,050.7 MiB/s 17,019.3 MiB/s 38.04 %
64 Bit 7LUT (Byte) 56.10 % 5,933.0 MiB/s 16,061.7 MiB/s 37.92 %
128 Bit Packed (Symbol) 64.32 % 5,640.8 MiB/s 20,237.7 MiB/s 42.35 %
128 Bit Packed (Byte) 62.94 % 5,650.8 MiB/s 20,158.4 MiB/s 41.61 %
8 Bit MMTF 128 63.49 % 1,101.2 MiB/s 1,299.9 MiB/s 29.84 %
Low Entropy 64.19 % 383.3 MiB/s 913.9 MiB/s 34.06 %
Low Entropy Single 89.22 % 561.6 MiB/s 7,491.0 MiB/s 59.50 %
Low Entropy Short 66.17 % 394.1 MiB/s 1,412.4 MiB/s 36.11 %
Low Entropy Short Single 89.65 % 576.5 MiB/s 9,841.2 MiB/s 61.05 %
Multi MTF 128 Bit (Transform) 100.00 % 1,206.2 MiB/s 1,289.8 MiB/s 33.92 %
Multi MTF 256 Bit (Transform) 100.00 % 2,012.2 MiB/s 1,966.1 MiB/s 35.65 %
Bit MMTF 8 Bit (Transform) 100.00 % 1,843.2 MiB/s 1,973.5 MiB/s 35.87 %
Bit MMTF 16 Bit (Transform) 100.00 % 2,309.8 MiB/s 2,272.9 MiB/s 38.34 %
- - - - -
memcpy 100.00 % 26,973.3 MiB/s 27,041.2 MiB/s 65.94 %
trle 42.0 % 697.6 MiB/s 2,062.9 MiB/s -
srle 0 46.4 % 680.6 MiB/s 5,004.5 MiB/s -
srle 8 46.9 % 858.6 MiB/s 6,095.1 MiB/s -
srle 16 54.4 % 906.4 MiB/s 8,676.0 MiB/s -
srle 32 61.0 % 1,766.0 MiB/s 13,019.7 MiB/s -
srle 64 66.8 % 3,289.0 MiB/s 18,491.6 MiB/s -
mrle 64.1 % 320.2 MiB/s 1,104.0 MiB/s -

The 24 Bit and 48 Bit Variants allow for run length encoding of common data layouts that are usually not covered by RLE implementations:

Pixel Art Bitmap Image (PNG converted to BMP, 123,710,454 Bytes)

Type Ratio Encoding Speed Decoding Speed Entropy
Compressible To
8 Bit 99.99 % 6,995.3 MiB/s 18,357.0 MiB/s 76.10 %
8 Bit 1LUT Short 99.98 % 6,974.3 MiB/s 18,023.2 MiB/s 75.63 %
8 Bit 3LUT 99.99 % 6,936.1 MiB/s 18,595.7 MiB/s 75.94 %
8 Bit Single 100.00 % 3,512.8 MiB/s 18,470.1 MiB/s 77.03 %
16 Bit Short (Byte) 99.99 % 18,127.2 MiB/s 18,953.0 MiB/s 75.56 %
16 Bit 3LUT Short (Byte) 99.99 % 17,991.9 MiB/s 19,252.9 MiB/s 75.71 %
24 Bit (Symbol) 1.84 % 24,537.1 MiB/s 33,067.5 MiB/s 1.32 %
24 Bit Short (Symbol) 2.08 % 21,904.2 MiB/s 32,713.6 MiB/s 1.52 %
24 Bit Packed (Symbol) 2.00 % 23,590.4 MiB/s 32,955.6 MiB/s 1.39 %
24 Bit 1LUT Short (Symbol) 2.08 % 19,894.5 MiB/s 32,751.7 MiB/s 1.51 %
24 Bit 3LUT (Symbol) 1.31 % 21,232.5 MiB/s 33,073.5 MiB/s 0.93 %
24 Bit 3LUT Short (Symbol) 1.53 % 19,147.7 MiB/s 32,890.5 MiB/s 1.04 %
24 Bit 7LUT (Symbol) 1.20 % 20,599.0 MiB/s 32,846.9 MiB/s 0.83 %
24 Bit 7LUT Short (Symbol) 1.41 % 18,916.5 MiB/s 31,750.2 MiB/s 0.92 %
24 Bit (Byte) 2.13 % 25,939.7 MiB/s 33,113.4 MiB/s 1.44 %
24 Bit Short (Byte) 2.17 % 23,579.9 MiB/s 32,825.6 MiB/s 1.58 %
24 Bit Packed (Byte) 2.32 % 26,738.2 MiB/s 32,812.0 MiB/s 1.51 %
24 Bit 1LUT Short (Byte) 2.17 % 21,550.6 MiB/s 32,835.9 MiB/s 1.59 %
24 Bit 1LUT Short Greedy (Byte) 2.17 % 5,534.5 MiB/s 32,679.0 MiB/s 1.59 %
24 Bit 3LUT (Byte) 1.49 % 23,408.0 MiB/s 33,010.6 MiB/s 1.03 %
24 Bit 3LUT Short (Byte) 1.63 % 21,063.2 MiB/s 32,713.6 MiB/s 1.14 %
24 Bit 3LUT Short Greedy (Byte) 1.63 % 5,403.1 MiB/s 32,578.9 MiB/s 1.14 %
24 Bit 7LUT (Byte) 1.37 % 22,589.6 MiB/s 32,629.3 MiB/s 0.98 %
24 Bit 7LUT Short (Byte) 1.52 % 20,560.6 MiB/s 31,915.9 MiB/s 0.97 %
24 Bit 7LUT Short Greedy (Byte) 1.52 % 5,436.8 MiB/s 32,009.5 MiB/s 0.97 %
32 Bit Packed (Byte) 99.99 % 16,961.2 MiB/s 19,048.9 MiB/s 75.71 %
32 Bit 7LUT Short (Byte) 99.99 % 16,887.8 MiB/s 19,240.9 MiB/s 75.79 %
48 Bit (Symbol) 2.78 % 23,645.9 MiB/s 32,870.0 MiB/s 2.12 %
48 Bit Short (Symbol) 2.79 % 21,386.8 MiB/s 32,833.3 MiB/s 2.26 %
48 Bit Packed (Symbol) 2.87 % 22,575.5 MiB/s 32,913.6 MiB/s 2.17 %
48 Bit 1LUT Short (Symbol) 3.09 % 19,362.4 MiB/s 32,750.9 MiB/s 2.41 %
48 Bit 3LUT (Symbol) 1.71 % 20,826.5 MiB/s 32,843.5 MiB/s 1.33 %
48 Bit 3LUT Short (Symbol) 1.99 % 18,834.0 MiB/s 32,717.0 MiB/s 1.47 %
48 Bit 7LUT (Symbol) 1.49 % 20,185.7 MiB/s 32,619.2 MiB/s 1.15 %
48 Bit 7LUT Short (Symbol) 1.77 % 18,184.3 MiB/s 31,846.0 MiB/s 1.26 %
48 Bit (Byte) 3.16 % 25,473.5 MiB/s 32,911.9 MiB/s 2.31 %
48 Bit Short (Byte) 3.20 % 23,439.2 MiB/s 32,689.1 MiB/s 2.48 %
48 Bit Packed (Byte) 3.35 % 25,371.9 MiB/s 32,548.7 MiB/s 2.40 %
48 Bit 1LUT Short (Byte) 3.20 % 20,935.0 MiB/s 32,603.2 MiB/s 2.48 %
48 Bit 1LUT Short Greedy (Byte) 3.20 % 9,723.8 MiB/s 32,490.3 MiB/s 2.48 %
48 Bit 3LUT (Byte) 1.98 % 22,790.5 MiB/s 32,861.4 MiB/s 1.49 %
48 Bit 3LUT Short (Byte) 2.13 % 20,273.7 MiB/s 32,717.9 MiB/s 1.62 %
48 Bit 3LUT Short Greedy (Byte) 2.13 % 8,178.3 MiB/s 32,537.9 MiB/s 1.62 %
48 Bit 7LUT (Byte) 1.76 % 21,776.5 MiB/s 32,615.9 MiB/s 1.34 %
48 Bit 7LUT Short (Byte) 1.90 % 20,145.3 MiB/s 31,847.6 MiB/s 1.35 %
48 Bit 7LUT Short Greedy (Byte) 1.90 % 7,895.9 MiB/s 31,782.1 MiB/s 1.35 %
64 Bit (Symbol) 99.99 % 14,820.8 MiB/s 19,040.6 MiB/s 75.87 %
64 Bit 7LUT (Byte) 99.99 % 14,692.8 MiB/s 19,217.2 MiB/s 75.78 %
8 Bit MMTF 128 17.70 % 5,287.0 MiB/s 2,978.1 MiB/s 12.31 %
Multi MTF 128 Bit (Transform) 100.00 % 3,474.5 MiB/s 3,482.1 MiB/s 19.94 %
Multi MTF 256 Bit (Transform) 100.00 % 5,030.6 MiB/s 5,028.6 MiB/s 27.52 %
Bit MMTF 8 Bit (Transform) 100.00 % 1,861.6 MiB/s 1,984.2 MiB/s 73.85 %
Bit MMTF 16 Bit (Transform) 100.00 % 2,340.1 MiB/s 2,293.9 MiB/s 72.48 %
- - - - -
memcpy 100.00 % 28,288.5 MiB/s 28,261.3 MiB/s 77.03 %
trle 100.0 % 420.41 MiB/s 5,473.00 MiB/s -
srle 0 100.0 % 329.09 MiB/s 18,106.88 MiB/s -
srle 8 100.0 % 468.74 MiB/s 17,951.84 MiB/s -
srle 16 100.0 % 575.39 MiB/s 20,198.96 MiB/s -
srle 32 100.0 % 1,195.77 MiB/s 18,437.51 MiB/s -
srle 64 100.0 % 2,294.94 MiB/s 19,407.80 MiB/s -
mrle 100.0 % 383.18 MiB/s 2,551.12 MiB/s -

Variants

8, 16, 24, 32, 48, 64, 128 Bit (Byte Aligned + Symbol Aligned)

  • Extremely Fast.
  • Variants for always aligning with the symbol width or allowing byte-wide repeats even for > 8 bit symbols.
  • Decoder interprets blocks of data to boil down to a highly optimized memcpy, memset, memcpy, memset (with various different byte-lengths).
  • Encoder searches for repeats and their respective lengths using movemask instructions.
  • Optional variant for 8 bit single symbol encoding.

8, 16, 24, 32, 48, 64, 128 Bit Packed (Byte Aligned + Symbol Aligned)

  • Similar to the base variant, but keeps around the previously used RLE symbol which is usually very beneficial to the compression ratio and tries to pack lengths a bit more optimistically.
  • Also Extremely Fast, whilst providing better compression ratio for many inputs
  • Also has those variants for always aligning with the symbol width or allowing byte-wide repeats even for > 8 bit symbols.
  • Optional variant for 8 bit single symbol encoding (only has the optimistic packing, as it already knows what the next symbol is going to be anyways).

1 Sym LUT / 3 Sym LUT / 3 Sym LUT Short / 7 Sym LUT / 7 Sym LUT Short

  • Similar to the base variant, but keeps around one / three / seven of the previously used RLE symbols, usually further improving compression ratios.
  • Short Variant: Packs Range & Count Bits to fit the entire RLE command into just one byte for short ranges.

1 Sym LUT Short Greedy / 3 Sym LUT Short Greedy / 7 Sym LUT Short Greedy

  • Similar to n-Sym LUT Short (actually uses the exact same decoder), but tries to even compress partial occurences of a symbol if that would save space
  • Only available when byte-aligned for 16, 24, 32, 48, 64 Bit Variants
  • Comparably slow to en/decode

Low Entropy / Low Entropy Short

  • Tries to keep symbol general symbol frequency to improve compression ratio of an entropy encoder that could go after the Run Length Encoding like ANS, Arithmetic Coding or Huffman.
  • Parses the output for run-length-encodable symbols, which are specified in the header.
  • Has a single-symbol variant, that only encodes the most run-length-encodable symbol (useful for some image codecs).
  • Short Variant: Same as Low Entropy, but optimized for shorter strings of run-length-encodable symbols, usually faster, also has a single-symbol variant.

8 Bit RLE + MMTF (Multi Move-To-Front Transformation)

  • Runs a block-wide vectorized MTF transform on the input and depending on how many bits this needs to represent a given block (if it's not entirely representable by a variant of memset) uses only the required amount of bits to encode the block.
  • Performs well on a wide variety of inputs, but usually doesn't produce the best compression ratios or (de-) compression speeds, as all blocks need to be decoded and cannot simply be memcpyd if they don't contain an encodable symbol.

RLE + Huffman-esque

  • Uses a separate header, that contains a huffman-esque instructions to place recent high-prevalence symbols, copy or set a specific symbol to a block.
  • Usually very high compression ratios, but comparably slow to en- & decode.

Raw MMTF (Multi Move-To-Front Transformation)

  • Simply runs the block-wide vectorized MTF transform on the input to improve compressability for some scenarios.
  • Doesn't compress itself.

Setup

git clone https://github.com/rainerzufalldererste/hypersonic-rle-kit.git
cd hypersonic-rle-kit
git submodule update --init --recursive

On Windows

create_project.bat

Choose your preferred compiler toolset

MSBuild /p:Configuration=Release /nologo /v:m

On Linux

premake/premake5 gmake2
config=release_x64 make

How to use it?

#include "rle.h"

uint8_t *pUncompressedData; // Some Data.
uint32_t fileSize; // Some Size.

// Get Compress Bounds.
const uint32_t compressedBufferSize = rle_compress_bounds(fileSize);
uint8_t *pCompressedData = (uint8_t *)malloc(compressedBufferSize);

// Compress.
const uint32_t compressedSize = rle8_multi_compress(pUncompressedData, fileSize, pCompressedData, compressedBufferSize);

// Allocate Output Buffer.
uint8_t *pDecompressedData = (uint8_t *)malloc(fileSize + rle_decompress_additional_size());

// Decompress.
const uint32_t decompressedSize = rle8_decompress(pCompressedData, compressedSize, pDecompressedData, fileSize);

// Cleanup.
free(pCompressedData);
free(pDecompressedData);

License

Two Clause BSD

hypersonic-rle-kit's People

Contributors

rainerzufalldererste avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hypersonic-rle-kit's Issues

Benchmarking 'Bansheequeen' and 'Overwatch' drawings

Hi Christoph,
wanted to see your superb RLE [de]crunching two random drawings from my 'Deathship' compression corpus:

  • The_Undead_Lady_Sylvanas_Windrunner_(The_Banshee_Queen)_Warchief_of_the_Horde_3840x1600.bmp 17.5 MB (18,432,056 bytes)
  • Overwatch-Cast-4K-Wallpaper.bmp 24.7 MB (25,987,472 bytes)

Previously (the old trle) did scream on 'overwatch' setting a Pareto:
Sanmayce/Nakamichi#1 (comment)

C Size     ratio   C MB/s   D MB/s    Name
16056915   61.8    310.68   7771.37   trle
25987476   100.0   9991.34  10191.17  memcpy

The 'Banshee Queen' is not so RLEish, wonder how your RLE8 extreme fares against the trle...

The .BMPs were created out by these .JPGs:

RLE8 ist not a true Run Length Encoding

RLE8 ist more lz than rle. This is not definition of run length encoding.
You must compare the speed with similar compression ratio. TurboRLE is compressing better.
In general RLE ist used as preprocessing before entropy coding. RLE8 has worse compression ratio because of the lz similarity and you are not showing a practical usage with true entropy coding like turborc, huffman,....
The entropy coding numbers are also missing for TurboRLE. In this Benchmark the results are first sorted by file size and then by speed. The entropy benchmark is showing true EC compression and not any estimation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.