Comments (3)
Initial attempt using memchr
crate caused a performance regression of about 20% in all benchmarks which used the search functions:
Benchmark is comparing 62dec88 versus 7979569:
skewed/search/first_key time: [228.0429 cycles 228.0435 cycles 228.0440 cycles]
change: [+11.766% +11.768% +11.771%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/search/middle_key
time: [17035.9464 cycles 17036.2015 cycles 17036.4510 cycles]
change: [+23.159% +23.163% +23.168%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/search/last_key time: [33660.2670 cycles 33660.3431 cycles 33660.4204 cycles]
change: [+23.257% +23.260% +23.263%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/minimum time: [10985.9127 cycles 10985.9386 cycles 10985.9655 cycles]
change: [-0.0017% +0.0008% +0.0032%] (p = 0.55 > 0.05)
No change in performance detected.
skewed/maximum time: [108.0185 cycles 108.0187 cycles 108.0190 cycles]
change: [-0.0021% +0.0004% +0.0028%] (p = 0.76 > 0.05)
No change in performance detected.
fixed_length/search/first_key
time: [1163.2561 cycles 1163.2592 cycles 1163.2622 cycles]
change: [+20.767% +20.771% +20.774%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/search/middle_key
time: [1219.2575 cycles 1219.2607 cycles 1219.2639 cycles]
change: [+18.693% +18.697% +18.701%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/search/last_key
time: [1275.2677 cycles 1275.2709 cycles 1275.2742 cycles]
change: [+18.606% +18.609% +18.612%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/minimum time: [406.2254 cycles 406.2330 cycles 406.2404 cycles]
change: [+0.0260% +0.0319% +0.0376%] (p = 0.00 < 0.05)
Change within noise threshold.
fixed_length/maximum time: [423.1099 cycles 423.1113 cycles 423.1128 cycles]
change: [-0.0035% +0.0007% +0.0047%] (p = 0.74 > 0.05)
No change in performance detected.
large_prefixes/search/first_key
time: [1255.2685 cycles 1255.2718 cycles 1255.2752 cycles]
change: [+18.953% +18.958% +18.963%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/search/middle_key
time: [1326.2755 cycles 1326.2791 cycles 1326.2826 cycles]
change: [+16.924% +16.928% +16.931%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/search/last_key
time: [1382.2774 cycles 1382.2809 cycles 1382.2844 cycles]
change: [+16.925% +16.928% +16.931%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/minimum time: [406.1022 cycles 406.1036 cycles 406.1052 cycles]
change: [-0.0023% +0.0019% +0.0071%] (p = 0.44 > 0.05)
No change in performance detected.
large_prefixes/maximum time: [423.1083 cycles 423.1098 cycles 423.1113 cycles]
change: [-0.0052% +0.0000% +0.0042%] (p = 0.99 > 0.05)
No change in performance detected.
My current guess is that the memchr
optimizations are not useful for the small vector sizes (4,16), so the linear search wins.
from blart.
Next step for this would be implementing a generalized search function using std::simd
from blart.
Initial attempt at optimizing with std::simd
resulted in significant slowdown. Benchmark is comparing commit b1b1cac to b75697d :
skewed/search/first_key time: [16.128 ns 16.159 ns 16.191 ns]
change: [+19.492% +19.649% +19.783%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/search/middle_key
time: [1.1508 µs 1.1509 µs 1.1510 µs]
change: [+49.440% +49.519% +49.593%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/search/last_key time: [2.2543 µs 2.2549 µs 2.2555 µs]
change: [+42.548% +42.761% +42.974%] (p = 0.00 < 0.05)
Performance has regressed.
skewed/minimum time: [901.53 ns 901.70 ns 901.92 ns]
change: [+0.1286% +0.1762% +0.2229%] (p = 0.00 < 0.05)
Change within noise threshold.
skewed/maximum time: [7.3368 ns 7.3502 ns 7.3632 ns]
change: [-6.4921% -6.3671% -6.2193%] (p = 0.00 < 0.05)
Performance has improved.
fixed_length/search/first_key
time: [85.108 ns 85.174 ns 85.248 ns]
change: [+32.576% +32.684% +32.800%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/search/middle_key
time: [84.978 ns 84.995 ns 85.017 ns]
change: [+23.260% +23.433% +23.600%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/search/last_key
time: [87.592 ns 87.658 ns 87.729 ns]
change: [+26.340% +26.450% +26.549%] (p = 0.00 < 0.05)
Performance has regressed.
fixed_length/minimum time: [24.959 ns 25.004 ns 25.052 ns]
change: [-7.5806% -7.4452% -7.3023%] (p = 0.00 < 0.05)
Performance has improved.
fixed_length/maximum time: [24.572 ns 24.586 ns 24.601 ns]
change: [-10.081% -10.012% -9.9429%] (p = 0.00 < 0.05)
Performance has improved.
large_prefixes/search/first_key
time: [89.627 ns 89.644 ns 89.667 ns]
change: [+29.362% +29.514% +29.665%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/search/middle_key
time: [89.319 ns 89.342 ns 89.377 ns]
change: [+27.225% +27.362% +27.492%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/search/last_key
time: [93.617 ns 93.786 ns 93.966 ns]
change: [+29.486% +29.680% +29.881%] (p = 0.00 < 0.05)
Performance has regressed.
large_prefixes/minimum time: [24.519 ns 24.535 ns 24.553 ns]
change: [-11.876% -11.768% -11.663%] (p = 0.00 < 0.05)
Performance has improved.
large_prefixes/maximum time: [24.548 ns 24.563 ns 24.581 ns]
change: [-13.087% -12.959% -12.832%] (p = 0.00 < 0.05)
Performance has improved.
from blart.
Related Issues (20)
- Implement tree delete HOT 3
- Implement bulk insert on create
- Compare current iterator implementation versus parent pointers
- Add benchmarks for memory usage
- Fix current prefix compression implementation to match paper HOT 1
- Implement Map wrapper over raw node operations HOT 3
- Rework tree insert to remove unneeded parent update
- Make minimum/maximum function infalliable HOT 2
- Create `Key` trait HOT 6
- Optimizing TreeMap clone
- Optimize check for key existence
- Optimize IntoIter implementation
- Compile crate with stable rust HOT 2
- Implement `AsBytes` for endian types and `Ipv*` types HOT 1
- Implement `TreeSet` interface
- Implement remaining `TreeMap` functions
- Implement `Cursor` API for BTreeMap
- I made a fork, what do I do ? HOT 16
- `Concat` ordering is a bit weird HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blart.