The q-digest has a size of at most 3k, where k is the compression parameter.

Answering my own question. From section 3.3 of the paper. <p dir="aut

Space complexity of t-digest about t-digest HOT 5 CLOSED

tdunning commented on May 25, 2024

Space complexity of t-digest

from t-digest.

Comments (5)

tibbe commented on May 25, 2024

Answering my own question. From section 3.3 of the paper.

Size of the digest also grows roughly with the log of the number of samples observed.

Assuming I sample 10,000 values per second (common for e.g. a web server) and that storing 10,000 samples requires 10kb, the server will use log2(7_3600_24)*10kb = 192kb after one week. Is that correct?

from t-digest.

tdunning commented on May 25, 2024

Empirically speaking, it grows with the log, but accuracy also increases
with N. I believe that this can be tightened.

None of the tests I have done have pushed the size experiments beyond a few
million. You are asking about numbers in the 10 billion range so I
wouldn't trust the empirical extrapolation.

This is a very rare use case since anybody with billions of measurements
typically is doing stats on a massive number of subsets of the data, not on
the entire set. This makes the storage size of small digests more
important than the size of a few large ones.

For constant accuracy, you can force a constant bound on memory except for
adversarial inputs.

Ordered inputs can have transient memory usages considerably larger, but
when you persist or compress them, they revert to the normal bound.

On Thu, May 15, 2014 at 7:13 AM, Johan Tibell [email protected]:

Answering my own question. From section 3.3 of the paper.

Size of the digest also grows roughly with the log of the number of
samples observed.

Assuming I sample 10,000 values per second (common for e.g. a web server)
and that storing 10,000 samples requires 10kb, the server will use
log2(7_3600_24)*10kb = 192kb after one week.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-43214416
.

from t-digest.

tdunning commented on May 25, 2024

Mind if I close this issue?

Or is there a documentation effort required?

from t-digest.

tibbe commented on May 25, 2024

Please go ahead. I leave it up to you whether you want to talk about space complexity in the paper. I believe people using t-digest for long running server applications would appreciate it.

from t-digest.

tdunning commented on May 25, 2024

I definitely think that the topic is important and should be addressed more
carefully than I have done thus far.

On Sun, Aug 10, 2014 at 11:06 PM, Johan Tibell [email protected]
wrote:

Please go ahead. I leave it up to you whether you want to talk about space
complexity in the paper. I believe people using t-digest for long running
server applications would appreciate it.

—
Reply to this email directly or view it on GitHub
#24 (comment).

from t-digest.

Space complexity of t-digest about t-digest HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent