lemire / simdcompressionandintersection Goto Github PK

A C++ library to compress and intersect sorted lists of integers using SIMD instructions

License: Apache License 2.0

Shell 0.04% C++ 67.68% C 32.13% Makefile 0.14%

compression integer-compression simd-instructions simd algorithms intersection

simdcompressionandintersection's Introduction

SIMDCompressionAndIntersection

As the name suggests, this is a C/C++ library for fast compression and intersection of lists of sorted integers using SIMD instructions. The library focuses on innovative techniques and very fast schemes, with particular attention to differential coding. It introduces new SIMD intersections schemes such as SIMD Galloping.

This library can decode at least 4 billions of compressed integers per second on most desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s. This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.

Authors: Leonid Boystov, Nathan Kurz, Daniel Lemire, Owen Kaser, Andrew Consroe, Shlomi Vaknin, Christoph Rupp, Bradley Grainger, and others.

Documentation

Daniel Lemire, Nathan Kurz, Christoph Rupp, Stream VByte: Faster Byte-Oriented Integer Compression, Information Processing Letters 130, 2018 https://arxiv.org/abs/1709.08990
Daniel Lemire, Leonid Boytsov, Nathan Kurz, SIMD Compression and the Intersection of Sorted Integers, Software Practice & Experience 46 (6), 2016 http://arxiv.org/abs/1401.6399
Daniel Lemire and Leonid Boytsov, Decoding billions of integers per second through vectorization, Software Practice & Experience 45 (1), 2015. http://arxiv.org/abs/1209.2137 http://onlinelibrary.wiley.com/doi/10.1002/spe.2203/abstract
Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/abs/1503.07387
Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen, A General SIMD-based Approach to Accelerating Compression Algorithms, ACM Transactions on Information Systems 33 (3), 2015. http://arxiv.org/abs/1502.01916

This work has also inspired other work such as...

P. Damme, D. Habich, J. Hildebrandt, W. Lehner, Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses), EDBT 2017 http://openproceedings.org/2017/conf/edbt/paper-146.pdf
P. Damme, D. Habich, J. Hildebrandt, W. Lehner, Insights into the Comparative Evaluation of Lightweight Data Compression Algorithms, EDBT 2017 http://openproceedings.org/2017/conf/edbt/paper-414.pdf
T. D. Wu, Bitpacking techniques for indexing genomes: I. Hash tables, Algorithms for Molecular Biology 11 (5), 2016. http://almob.biomedcentral.com/articles/10.1186/s13015-016-0069-5
Jianguo Wang, Chunbin Lin, Yannis Papakonstantinou, Steven Swanson, An Experimental Study of Bitmap Compression vs. Inverted List Compression, SIGMOD 2017 http://db.ucsd.edu/wp-content/uploads/2017/03/sidm338-wangA.pdf

Simple demo

Check out example.cpp

You can run it like so (e.g., under Linux or macOS):

Usage (Linux, macOS and similar systems)

make
./unit

A static library file is built as libSIMDCompressionAndIntersection.a which you can use in your own projects along with our header files located in the include subdirectory.

You may also build and run our example:

make example
./example

To run tests, you can do

./testcodecs

(follow the instructions)

Usage (Windows users)

Windows users wishing to build using Visual Studio should go into a Developer Powershell, which is accessible through the menus in the Visual Studio interface, and run the following from the directory of the project:

nmake -f .\makefile.vc
 .\example.exe
 .\unit.exe

Under Windows, the static library is built as the file simdcomp_a.lib which you can use in your own projects, along with our header files located in the include subdirectory.

For a simple C library

This library is a C++ research library. For something simpler, written in C, see:

https://github.com/lemire/simdcomp

Comparison with the FastPFOR C++ library

The FastPFOR C++ Library available at https://github.com/lemire/FastPFor implements some of the same compression schemes except that it is not optimized for the compression of sorted lists of integers.

Other recommended libraries

libvbyte: A fast implementation for varbyte 32bit/64bit integer compression https://github.com/cruppstahl/libvbyte
TurboPFor is a C library that offers lots of interesting optimizations. Well worth checking! (GPL license) https://github.com/powturbo/TurboPFor
Oroch is a C++ library that offers a usable API (MIT license) https://github.com/ademakov/Oroch
CSharpFastPFOR: A C# integer compression library https://github.com/Genbox/CSharpFastPFOR
JavaFastPFOR: A java integer compression library https://github.com/lemire/JavaFastPFOR
Encoding: Integer Compression Libraries for Go https://github.com/zhenjl/encoding
FrameOfReference is a C++ library dedicated to frame-of-reference (FOR) compression: https://github.com/lemire/FrameOfReference

Licensing

Apache License, Version 2.0

As far as the authors know, this work is patent-free.

Requirements

A CPU (AMD or Intel) with support for SSE2 (Pentium 4 or better) is required while a CPU with SSE 4.1* (Penryn [2007] processors or better) is recommended.

A recent GCC (4.7 or better), Clang, Intel or Visual C++ compiler.

A processor support AVX (Intel or AMD).

Tested on Linux, MacOS and Windows. It should be portable to other platforms.

*- The default makefile might assume AVX support, but AVX is not required. For GCC compilers, you might need the -msse2 flag, but you will not need the -mavx flag.

For advanced benchmarking, please see

advancedbenchmarking/README.md

where there is additional information as well as links to real data sets.

Acknowledgement

Thanks to Kelly Sommers for useful feedback.

This work was supported by NSERC grant number 26143.

simdcompressionandintersection's People

Contributors

Stargazers

Watchers

Forkers

searchivarius pombredanne kod3r pombreda zuowang philippecp cruppstahl sparklexs rom3r4 shlomiv alnorris caomw bgrainger joebone dwintour ladislavsopko datatechboost ssmccoy rbrovko ronymin mukul2604 graphprocessor galaxysubrepos qandrew olaossama lixuxian vic4key dnbaker sebastianlng boynanboy cuisonghui fyy-100 iraqigeek luyuncheng clayne yonik byronhe benrobby xinrunxue jqk6 5ace turningme sguinebert greenlion liujp deerjason grideyes-2010 chenyang8094 martinbirks flowold doogplus honbinz zifengyu darkersquirrel pratyakshk004 yuyfbj waldoweng

simdcompressionandintersection's Issues

Wanted: Exception when provided an insufficient buffer

When trying to decode data using uncompress function I receive an Unhandled exception (Access violation).

The size sending to function is too small, and I expected to get NotEnoughStorage exception so the buffer will resize to the right uncompressed size.

Is this will be ported to java

Sorry for asking but do you plan to port this code to java? I saw your implementation of roaring bitmaps for java (and this question may be obsolete), and the readme point out that some functionality is missing in fastPForDelta (thus in java impl. too). Otherwise its very impressive stuff albeit i iterate trough the academic papers again because its somewhat complicated for me.

arm64 support

I recently start using this library in our project and it greatly helps our implementation, thanks so much.

I wonder if there is any chance this library could be ported to support arm64 architecture. I guess there is no such plan yet since this library is called "SIMD"comp... :) If users, actually it is myself, without too much knowledge on simd programming currently, would like to do such a porting task, is this feasible and how much effort it may take typically? Thanks.

Conflicting Namespaces

SIMDCompressionAndIntersection shares many classes and header file names with FastPFor. When using them together, there are issues since this library doesn't enclose anything in a namespace.
I'm proposing renaming all #ifndef NAME_H_ to #ifndef SIMDCompressionAndIntersection_NAME_H_ and enclosing all relevant pieces in a namespace, maybe called SIMDCompressionLib?
If these sound like reasonable names and changes, I will go ahead and do so.

how to get the unaligned version to work?

Hey,

Is there a simple way to get an unaligned codec? I see you have a usimdbitpacking.cpp/h, but doesnt seem like its easy to make a codec out of it..

By manually sed-ding load and store to _mm_loadu_si128 _mm_storeu_si128, I verified that on recent haswell architecture there is very little difference in performance (YMMV), and it makes sense to use unaligned access in my case.

Thanks for this really awesome library,
Shlomi

typo

codec factory header defines "s4-bp128-dm" twice, first looks like it should be the -ni version.

Add the vectorized encoder for stream vbyte

We have a vectorized encoder:

https://github.com/lemire/streamvbyte

[Discussion] Implementation in Go

I'm interested in using SIMD compression and intersection from within a Go program. Do you have any tips for implementing this library in Go? Do you have plans for implementing the algorithms in any other languages?

Also, do you know the extent to which your library is being used in industry?

Add API hooks so that users can quickly get the uncompressed size of an array given the compressed output

fastpfor codec has a bug

./example
You are using 2.09 bits per integer.
terminate called after throwing an instance of 'std::runtime_error'
  what():  bug!
Aborted (core dumped)

C++/Java compression output interoperability

It is more a question than an issue. Are outputs of Java and C++ implementation interoperable? Can I create a binary output in Java (using, for example, new SkippableIntegratedComposition(new IntegratedBinaryPacking(),new IntegratedVariableByte())) save it to file and then load and unpack it in C++ using bp32 codec?
I guess, one of the issues could be the endianness, but it would be easily fixed. Something else I have to consider?

Decoding problems

When trying to decode values after encoding them I receive error.
valuesBug.txt
I'm attaching a file with 1931 numbers, the compression seems to be working fine, decoding's failing.

I've used:
CompositeCodec<BinaryPacking,
VariableByte> codec = CompositeCodec<BinaryPacking,
VariableByte>();

scalar vs nate_scalar

The intersection header has this method:

/*
 * Given two arrays, this writes the intersection to out. Returns the
 * cardinality of the intersection.
 *
 * This is a well-written, but otherwise unsophisticated function.
 * Written by N. Kurz.
 */
size_t nate_scalar(const uint32_t *set1, const size_t length1,
                   const uint32_t *set2, const size_t length2, uint32_t *out);

However, it is not implemented anywhere. There is an implemented method called scalar which doesn't appear in the header. Is one of them named from?

@nkurz @lemire

Question: theoretical compression ratio limit

Dear Prof. Lemire,

I am interested in this compression algorithm for sorted integers. I am curious about what you think are rule-of-thumb rules for theoretical compression ratio limit, and what this library can achieve.

Let's say we have an array like 3, 6, 9, 12 ... which is the case in the example and the ideal case for compression. I see it uses 2 bits per 32-bit int. However when we have an array like 0, rand(), rand() + rand(), rand() + rand() + rand(), this is probably the worst possible case for compression, I see it uses 14 bits per int.

Are there any rules of thumb I can apply for what compression ratio I should expect? Is there a way to do better than 14 bits for random sequence?

Thank you!