Giter Site home page Giter Site logo

Comments (5)

tibbe avatar tibbe commented on May 25, 2024

Answering my own question. From section 3.3 of the paper.

Size of the digest also grows roughly with the log of the number of samples observed.

Assuming I sample 10,000 values per second (common for e.g. a web server) and that storing 10,000 samples requires 10kb, the server will use log2(7_3600_24)*10kb = 192kb after one week. Is that correct?

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

Empirically speaking, it grows with the log, but accuracy also increases
with N. I believe that this can be tightened.

None of the tests I have done have pushed the size experiments beyond a few
million. You are asking about numbers in the 10 billion range so I
wouldn't trust the empirical extrapolation.

This is a very rare use case since anybody with billions of measurements
typically is doing stats on a massive number of subsets of the data, not on
the entire set. This makes the storage size of small digests more
important than the size of a few large ones.

For constant accuracy, you can force a constant bound on memory except for
adversarial inputs.

Ordered inputs can have transient memory usages considerably larger, but
when you persist or compress them, they revert to the normal bound.

On Thu, May 15, 2014 at 7:13 AM, Johan Tibell [email protected]:

Answering my own question. From section 3.3 of the paper.

Size of the digest also grows roughly with the log of the number of
samples observed.

Assuming I sample 10,000 values per second (common for e.g. a web server)
and that storing 10,000 samples requires 10kb, the server will use
log2(7_3600_24)*10kb = 192kb after one week.


Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-43214416
.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

Mind if I close this issue?

Or is there a documentation effort required?

from t-digest.

tibbe avatar tibbe commented on May 25, 2024

Please go ahead. I leave it up to you whether you want to talk about space complexity in the paper. I believe people using t-digest for long running server applications would appreciate it.

from t-digest.

tdunning avatar tdunning commented on May 25, 2024

I definitely think that the topic is important and should be addressed more
carefully than I have done thus far.

On Sun, Aug 10, 2014 at 11:06 PM, Johan Tibell [email protected]
wrote:

Please go ahead. I leave it up to you whether you want to talk about space
complexity in the paper. I believe people using t-digest for long running
server applications would appreciate it.


Reply to this email directly or view it on GitHub
#24 (comment).

from t-digest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.