Giter Site home page Giter Site logo

Comments (8)

tdunning avatar tdunning commented on May 23, 2024

These results are anomalous.

In my own tests, I add a million or so points to a t-digest and see
insertion times of about 200 ns. You are seeing times that are more than
100 times longer than that.

One question I have is whether you are initializing the latencies to
interesting values or if they are all set to zero. If you have identical
values, there might be a corner case that is causing an inefficient data
structure.

Have you tried this experiment with the MergingDigest?

To do this, use the one line:

TDigest tDigest = MergingDigest.create(100);

On Tue, Dec 15, 2015 at 12:19 PM, akalani [email protected] wrote:

I have been playing with the t-digest on a sample dataset of latencies and
comparing performance against raw sorting to derive quantile statistics I
am using the standard APIs (and recommended configuration values) to build
the t-digest, however I am observing significantly worse times than raw
sorting Is this to be expected? I would appreciate any help in this regards

Here is my sample code:

long[] latencies = new long[1000000];

TDigest tDigest = TDigestcreateDigest(100);
for (int i=0; i < 1000000; i++) {
tDigestadd(latencies[i]);
}
Systemoutprintln("Median = " + tDigestquantile(05));

I used a dataset with 1 million latency values (in ms) The max is 100,000
It took about 27 secs to build the digest and extract the 50% quantile
value With a dataset with 2 million latency values, the computation time
doubled Raw sorting is under 100ms

Reducing the compression factor to 5 helps but computation time is still
excessively high


Reply to this email directly or view it on GitHub
#62.

from t-digest.

akalani avatar akalani commented on May 23, 2024

I initialized the latencies by randomizing and duplicating a sample set of Google ping latencies. The values are in milliseconds; the max (the timeout value) in the set is 100,000, though the max values are outliers and occur infrequently.

Could the outlier timeout values be causing performance degradation?

I have built the TDigest using the MergingDigest API, but see no improvement in performance.

from t-digest.

tdunning avatar tdunning commented on May 23, 2024

The outliers don't cause any problems at all. Nor does the actual distribution cause any issues except possible when there are massive numbers of duplicates. The MergingDigest should not have any distributional issues.

Can you share your code? Your data?

from t-digest.

akalani avatar akalani commented on May 23, 2024

Here is the test that I wrote:
http://pastebin.com/Rmue62ch

The Java class takes the path to the latencies file as an input. Here is an output that it generated for the latencies dataset with 1 million records (time is in ms, and memory size is in KB):

TDigest: time = 1218
TDigest: median = 44.43894320782193
TDigest: mem size = 21496
Raw: time = 54
Raw: median = 45
Raw: mem size = 8000016

The data file is attached.
latencies_1m.txt.zip

from t-digest.

akalani avatar akalani commented on May 23, 2024

A minor correction, the memory size reported above is in bytes, not kilobytes.

from t-digest.

tdunning avatar tdunning commented on May 23, 2024

OK. I find a few issues.

  1. you are passing in long's, the digest works in doubles. This costs a bit.

  2. your timing was confused by the prints. I changed the code to store the time into variables.

  3. there is some warmup to be had

Here is my adapted code:

  double[] latencies = readLatenciesFile(args[0]);

            long t1 = System.currentTimeMillis();
            TDigest tDigest = MergingDigest.createDigest(200);
            for (int i = 0; i < RECORD_COUNT; i++) {
                tDigest.add(latencies[i]);
            }
            long t2 = System.currentTimeMillis();
            Arrays.sort(latencies);
            long t3 = System.currentTimeMillis();

            System.out.println("Read time = " + (t1 - start));
            System.out.println("TDigest: time = " + (t2 - t1));
            System.out.println("TDigest: median = " + tDigest.quantile(0.5));
            //        System.out.println("TDigest: mem size = " + SizeEstimator.estimate(tDigest));

            System.out.println("Raw: time = " + (t3 - t2));

            System.out.println("Raw: median = " + latencies[latencies.length / 2]);
            System.out.println();

After this, I get results like this on my laptop (note that these are with compression = 200):

Read time = 551
TDigest: time = 571
TDigest: median = 44.55266098760193
Raw: time = 302
Raw: median = 44.552607870527396

Read time = 204
TDigest: time = 325
TDigest: median = 44.54911853552305
Raw: time = 78
Raw: median = 44.549239966209704

Read time = 156
TDigest: time = 295
TDigest: median = 44.5561107724475
Raw: time = 83
Raw: median = 44.55575939279197

Read time = 136
TDigest: time = 297
TDigest: median = 44.552868775685916
Raw: time = 83
Raw: median = 44.55243261312767

Read time = 139
TDigest: time = 307
TDigest: median = 44.557862132539675
Raw: time = 82
Raw: median = 44.557593756576985

This means that the t-digest is showing at about 4x slower than sorting the array and the cost per element is about 300 ns which isn't far from the result that I had from micro-benchmarking. I think that there is about another 2-4x speedup available in the code by simple improvements, but I don't have time to do it so these times will have to stand. It makes some sense that a raw sort of a million items would be very fast since the merging digest has to touch the data more often. If there are k elements in the insertion buffer and n in the retained buffer for the merging digest, the sorting time will be N log k versus N log N (with similar constants) for the entire sort where k = 32, log k = 5, and N = 10^6, log N = 20. The insertion time will require a pass over the retention buffer for each sort of the insertion buffer and thus will require n N/k = 6 N touches. With better optimization on the part of the java library writers, this isn't unreasonable.

from t-digest.

akalani avatar akalani commented on May 23, 2024

Thanks very much for your help. I will keep the above in mind.

from t-digest.

tdunning avatar tdunning commented on May 23, 2024

I just did some experiments (see the t-digests-benchmark repo). I think that your observations lead some good speedups. I experimented with allocating bigger insertion buffers and found that due to the phenomenon that you noted (built-in sort is fast), we can make some big improvements.

Here are the results:

Benchmark            (compression) (factor)   Mode   Samples         Mean   Mean error    Units
c.t.MergeBench.add             100        1   avgt         5     2747.205       71.506    ns/op
c.t.MergeBench.add             100       10   avgt         5      625.746       14.814    ns/op
c.t.MergeBench.add             100      100   avgt         5      416.151       14.514    ns/op
c.t.MergeBench.add             200        1   avgt         5     3137.047       97.135    ns/op
c.t.MergeBench.add             200       10   avgt         5      659.008       33.876    ns/op
c.t.MergeBench.add             200      100   avgt         5      422.534        8.808    ns/op
c.t.MergeBench.add             500        1   avgt         5     3843.771      137.530    ns/op
c.t.MergeBench.add             500       10   avgt         5      749.032       27.951    ns/op
c.t.MergeBench.add             500      100   avgt         5      442.522       14.167    ns/op
c.t.MergeBench.add            1000        1   avgt         5     5875.645      218.997    ns/op
c.t.MergeBench.add            1000       10   avgt         5      953.075       19.278    ns/op
c.t.MergeBench.add            1000      100   avgt         5      474.294       17.505    ns/op

Mean is the time you want to look at. Factor determines how much more space I am allocating. Allocating 100x more space makes a whacking big difference. I will look at this some more and decide if that is a reasonable thing to do. If so, good news for speed, especially for higher compression factors.

from t-digest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.