Comments (13)
Note that my PR will come 'soon'. (Not weeks or months.)
from croaring.
Ok. I merged the PR and we now have sensible benchmarks with Google Benchmarks.
The instructions are...
Running microbenchmarks
We have microbenchmarks constructed with the Google Benchmarks.
Under Linux or macOS, you may run them as follows:
cmake --build build
./build/microbenchmarks/bench
By default, the benchmark tools picks one data set (e.g., CRoaring/benchmarks/realdata/census1881
).
We have several data sets and you may pick others:
./build/microbenchmarks/bench benchmarks/realdata/wikileaks-noquotes
You may disable some functionality for the purpose of benchmarking. For example, you could
benchmark the code without AVX-512 even if both your processor and compiler supports it:
cmake -B buildnoavx512 -D ROARING_DISABLE_AVX512=ON
cmake --build buildnoavx512
./buildnoavx512/microbenchmarks/bench
from croaring.
The benchmarks that come with the library were always so-so. I wrote most of them very quickly and we did not rely on them to optimize the library. We should scrap them and do better.
I will prepare something, a first step forward, soon...
from croaring.
I need to write a better benchmark harness for this project. Give me a bit of time.
from croaring.
I try to use avx2 to optimize union_vector16. but there is no improvement. Actually, i run the test comparing union_vector16 to union_uint16. the result is the same. Have you noticed this?
from croaring.
i run the test comparing union_vector16 to union_uint16. the result is the same.
@huihan365 In our production environment, they have obvious performance gap. Are you sure you've tested it properly?
from croaring.
i run the test comparing union_vector16 to union_uint16. the result is the same.
@huihan365 In our production environment, they have obvious performance gap. Are you sure you've tested it properly?
@CharlesChen888 I use the latest version and run the benchmark.(icelake Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz)
(base) root@icx-07:/data_ext/bak/hanhui/CRoaring/buildnoavx# taskset -c 0 benchmarks/array_container_benchmark
==intersection and union test 1
input 1 cardinality = 21846, input 2 cardinality = 13108
union cardinality = 30584
B1 card = 21846 B2 card = 13108
union_test(B1, B2, BO): 0.87 cycles per operation
==intersection and union test 2
input 1 cardinality = 4096, input 2 cardinality = 16
union cardinality = 4100
B1 card = 4096 B2 card = 16
union_test(B1, B2, BO): 0.08 cycles per operation
(base) root@icx-07:/data_ext/bak/hanhui/CRoaring/buildavx2# taskset -c 0 benchmarks/array_container_benchmark
==intersection and union test 1
input 1 cardinality = 21846, input 2 cardinality = 13108
union cardinality = 30584
B1 card = 21846 B2 card = 13108
union_test(B1, B2, BO): 0.99 cycles per operation
input 1 cardinality = 4096, input 2 cardinality = 16
union cardinality = 4100
B1 card = 4096 B2 card = 16
union_test(B1, B2, BO): 0.08 cycles per operation
from croaring.
@huihan365 OK... I just did a benchmark on my mac (i7-9750H), and the results are similar. We need to test it with more different datasets.
AVX2 disabled:
==intersection and union test 1
union_test(B1, B2, BO): 0.72 cycles per operation
==intersection and union test 2
union_test(B1, B2, BO): 0.06 cycles per operation
AVX2 enabled:
==intersection and union test 1
union_test(B1, B2, BO): 0.53 cycles per operation
==intersection and union test 2
union_test(B1, B2, BO): 0.06 cycles per operation
from croaring.
@huihan365 @CharlesChen888 Have a look at PR
#460
I will merge it as soon as the tests are green. This gives you a much better way to benchmark the code. If you have privileged access to the system (via sudo) you will get performance counters.
from croaring.
@huihan365 One can do some profiling using the new microbenchmarks, to identify the functions worth optimizing.
./build/microbenchmarks/bench --benchmark_filter=SuccessiveIntersectionCardinality
43.72% bench libroaring.so.1.0.1 [.] intersect_skewed_uint16_cardinality
35.35% bench libroaring.so.1.0.1 [.] intersect_vector16_cardinality
11.83% bench libroaring.so.1.0.1 [.] roaring_bitmap_and_cardinality
./build/microbenchmarks/bench --benchmark_filter=SuccessiveUnionCardinality
28.57% bench libroaring.so.1.0.1 [.] intersect_skewed_uint16_cardinality
24.45% bench libroaring.so.1.0.1 [.] intersect_vector16_cardinality
22.58% bench libroaring.so.1.0.1 [.] roaring_bitmap_get_cardinality
8.06% bench libroaring.so.1.0.1 [.] roaring_bitmap_and_cardinality
4.52% bench libroaring.so.1.0.1 [.] _avx512_run_container_cardinality.isra.0
./build/microbenchmarks/bench --benchmark_filter=ToArray
93.92% bench libroaring.so.1.0.1 [.] array_container_to_uint32_array
2.29% bench libroaring.so.1.0.1 [.] run_container_to_uint32_array
1.73% bench libroaring.so.1.0.1 [.] ra_to_uint32_array
./build/microbenchmarks/bench --benchmark_filter="TotalUnion$"
82.96% bench libroaring.so.1.0.1 [.] bitset_set_list
4.74% bench libroaring.so.1.0.1 [.] bitset_container_from_array
2.21% bench libroaring.so.1.0.1 [.] roaring_bitmap_lazy_or_inplace
1.55% bench libc.so.6 [.] ____wcstod_l_internal
1.45% bench libc.so.6 [.] ____wcstold_l_internal
1.26% bench bench [.] load
0.85% bench libroaring.so.1.0.1 [.] avx512_vpopcount.constprop.0
./build/microbenchmarks/bench --benchmark_filter="SuccessiveUnion$"
21.74% bench libc.so.6 [.] ____wcstold_l_internal
11.01% bench libroaring.so.1.0.1 [.] union_uint16
9.00% bench libc.so.6 [.] ____wcstof_l_internal
5.98% bench libc.so.6 [.] ____wcstod_l_internal
4.47% bench libroaring.so.1.0.1 [.] array_run_container_union
4.19% bench libroaring.so.1.0.1 [.] convert_run_to_efficient_container
4.19% bench libroaring.so.1.0.1 [.] union_vector16
./build/microbenchmarks/bench --benchmark_filter="SuccessiveIntersection$"
28.75% bench libroaring.so.1.0.1 [.] intersect_skewed_uint16
19.10% bench libroaring.so.1.0.1 [.] intersect_vector16
13.75% bench libc.so.6 [.] ____wcstof_l_internal
from croaring.
Note that you can disable AVX (and AVX-512) entirely if you want...
cmake -B buildnoavx -D ROARING_DISABLE_AVX=ON
cmake --build buildnoavx
./buildnoavx/microbenchmarks/bench
from croaring.
I have committed a PR that vectorizes array_container_to_uint32_array.
from croaring.
My commit is in the main branch. The difference between a vectorized and a non-vectorized approach is rather obvious:
$ ./build/microbenchmarks/bench --benchmark_filter=ToArray
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
ToArray 137763 ns 137104 ns 5276
$ ./noavx/microbenchmarks/bench --benchmark_filter=ToArray
-----------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------
ToArray 570715 ns 570574 ns 1221
Roughly a 2x gain.
The AVX-512 routine is not faster in this instance, but SIMD definitively helps.
from croaring.
Related Issues (20)
- roaring64: and_inplace gives incorrect value HOT 1
- roaring64: or_inplace gives incorrect values HOT 2
- Wrong result for roaring64_bitmap_contains_range HOT 1
- Roaring64 iterator cannot come back from the edge
- Roaring64 iterator's move_equalorlarger can only go forward
- Copying a Roaring64MapSetBitForwardIterator into an iterator from another map does not work HOT 1
- Feature request: intersection, union, and difference with the tail of another bitmap HOT 4
- Segfault with roaring64_bitmap_flip_inplace HOT 4
- roaring64_iterator_move_equalorlarger cannot come back from the end
- How to use `containsRange` in roaring64 HOT 1
- Improve lazyor in CRoaring HOT 1
- Compiling croaring as C++ fails on Windows
- No github release/tag for 3.0.0 HOT 1
- Compilation issues with oldest clang versions with 3.0.0 HOT 1
- Allow allocating container in memory pool HOT 6
- Add a function to move the iterator to be equal or smaller than a value HOT 5
- Implement roaring64_bitmap_statistics HOT 10
- Avoid calculating the sum of items in bitmap statistics HOT 3
- Roaring - OR Operation - High memory usage in Oracle Linux 8(OEL8) when compared to Oracle Linux 7 (OEL7) HOT 7
- access the nullptr in function "run_container_is_full" HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from croaring.