Giter Site home page Giter Site logo

sparkey-java's Introduction

This is the java version of sparkey. It's not a binding, but a full reimplementation. See Sparkey for more documentation on how it works.

Travis

Continuous integration with travis.

Build Status

Dependencies

  • Java 6 or higher
  • Maven

Building

mvn package

Changelog

See changelog.

Usage

Sparkey is meant to be used as a library embedded in other software.

To import it with maven, use this for Java 8 or higher:

<dependency>
  <groupId>com.spotify.sparkey</groupId>
  <artifactId>sparkey</artifactId>
  <version>3.0.0</version>
</dependency>

Use this for Java 6 or 7:

<dependency>
  <groupId>com.spotify.sparkey</groupId>
  <artifactId>sparkey</artifactId>
  <version>2.3.2</version>
</dependency>

To help get started, take a look at the API documentation or an example usage: SparkeyExample

License

Apache License, Version 2.0

Performance

This data is the direct output from running the benchmarks via IntelliJ, using the JMH plugin on a machine with Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz

Random lookups

Benchmark             (numElements)  (type)   Mode  Cnt        Score         Error  Units
LookupBenchmark.test           1000    NONE  thrpt    4  4768851.493 ±  934818.190  ops/s
LookupBenchmark.test          10000    NONE  thrpt    4  4404571.185 ±  952648.631  ops/s
LookupBenchmark.test         100000    NONE  thrpt    4  3892021.755 ±  560657.509  ops/s
LookupBenchmark.test        1000000    NONE  thrpt    4  2748648.598 ± 1345168.410  ops/s
LookupBenchmark.test       10000000    NONE  thrpt    4  1921539.397 ±   64678.755  ops/s
LookupBenchmark.test      100000000    NONE  thrpt    4     8576.763 ±   10666.193  ops/s
LookupBenchmark.test           1000  SNAPPY  thrpt    4   776222.884 ±   46536.257  ops/s
LookupBenchmark.test          10000  SNAPPY  thrpt    4   707242.387 ±  460934.026  ops/s
LookupBenchmark.test         100000  SNAPPY  thrpt    4   651857.795 ±  975531.050  ops/s
LookupBenchmark.test        1000000  SNAPPY  thrpt    4   791848.718 ±   19363.131  ops/s
LookupBenchmark.test       10000000  SNAPPY  thrpt    4   700438.201 ±   27910.579  ops/s
LookupBenchmark.test      100000000  SNAPPY  thrpt    4   681790.103 ±   45388.918  ops/s

Here we see that performance goes down slightly as more elements are added. This is caused by:

  • More frequent CPU cache misses.
  • More page faults.

If you can mlock the full dataset, the performance should be more predictable.

Appending data

Benchmark                   (type)   Mode  Cnt         Score         Error  Units
AppendBenchmark.testMedium    NONE  thrpt    4   1595668.598 ±  850581.241  ops/s
AppendBenchmark.testMedium  SNAPPY  thrpt    4    919539.081 ±  205638.008  ops/s
AppendBenchmark.testSmall     NONE  thrpt    4   9800098.075 ±  707288.665  ops/s
AppendBenchmark.testSmall   SNAPPY  thrpt    4  20227222.636 ± 7567353.506  ops/s

This is mostly disk bound, CPU is not fully utilized. So even though Snappy uses more CPU, it's still faster because there's less data to write to disk, especially when the keys and values are small. (The score is number of appends, not amount of data, so this may be slightly misleading)

Writing hash file

Benchmark                (constructionMethod)  (numElements)  Mode  Cnt   Score   Error  Units
WriteHashBenchmark.test             IN_MEMORY           1000    ss    4   0.003 ± 0.001   s/op
WriteHashBenchmark.test             IN_MEMORY          10000    ss    4   0.009 ± 0.016   s/op
WriteHashBenchmark.test             IN_MEMORY         100000    ss    4   0.035 ± 0.088   s/op
WriteHashBenchmark.test             IN_MEMORY        1000000    ss    4   0.273 ± 0.120   s/op
WriteHashBenchmark.test             IN_MEMORY       10000000    ss    4   4.862 ± 0.970   s/op
WriteHashBenchmark.test               SORTING           1000    ss    4   0.005 ± 0.033   s/op
WriteHashBenchmark.test               SORTING          10000    ss    4   0.014 ± 0.008   s/op
WriteHashBenchmark.test               SORTING         100000    ss    4   0.070 ± 0.111   s/op
WriteHashBenchmark.test               SORTING        1000000    ss    4   0.835 ± 0.310   s/op
WriteHashBenchmark.test               SORTING       10000000    ss    4  13.919 ± 1.908   s/op

Writing using the in-memory method is faster, but only works if you can keep the full hash file in memory while building it. Sorting is about 3x slower, but is more reliably for very large data sets.

sparkey-java's People

Contributors

krka avatar mattnworb avatar mbruggmann avatar nresare avatar spkrka avatar vasi-stripe avatar yonromai avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.