Comments (5)
Answering my own question. From section 3.3 of the paper.
Size of the digest also grows roughly with the log of the number of samples observed.
Assuming I sample 10,000 values per second (common for e.g. a web server) and that storing 10,000 samples requires 10kb, the server will use log2(7_3600_24)*10kb = 192kb after one week. Is that correct?
from t-digest.
Empirically speaking, it grows with the log, but accuracy also increases
with N. I believe that this can be tightened.
None of the tests I have done have pushed the size experiments beyond a few
million. You are asking about numbers in the 10 billion range so I
wouldn't trust the empirical extrapolation.
This is a very rare use case since anybody with billions of measurements
typically is doing stats on a massive number of subsets of the data, not on
the entire set. This makes the storage size of small digests more
important than the size of a few large ones.
For constant accuracy, you can force a constant bound on memory except for
adversarial inputs.
Ordered inputs can have transient memory usages considerably larger, but
when you persist or compress them, they revert to the normal bound.
On Thu, May 15, 2014 at 7:13 AM, Johan Tibell [email protected]:
Answering my own question. From section 3.3 of the paper.
Size of the digest also grows roughly with the log of the number of
samples observed.Assuming I sample 10,000 values per second (common for e.g. a web server)
and that storing 10,000 samples requires 10kb, the server will use
log2(7_3600_24)*10kb = 192kb after one week.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/24#issuecomment-43214416
.
from t-digest.
Mind if I close this issue?
Or is there a documentation effort required?
from t-digest.
Please go ahead. I leave it up to you whether you want to talk about space complexity in the paper. I believe people using t-digest for long running server applications would appreciate it.
from t-digest.
I definitely think that the topic is important and should be addressed more
carefully than I have done thus far.
On Sun, Aug 10, 2014 at 11:06 PM, Johan Tibell [email protected]
wrote:
Please go ahead. I leave it up to you whether you want to talk about space
complexity in the paper. I believe people using t-digest for long running
server applications would appreciate it.—
Reply to this email directly or view it on GitHub
#24 (comment).
from t-digest.
Related Issues (20)
- Mergeability of t-digest HOT 3
- Allow AVLTreeDigest's to be identical to another given the same set of inputs HOT 1
- Release notes for 3.3? HOT 1
- Will merging multiple t-digest preserve the exact value of min/max? HOT 3
- Behavior when compression ratio is 1 HOT 1
- TDigest objet serializable HOT 1
- tag missing problem HOT 2
- Decay TDigest HOT 4
- T-Digest (Re)Construction
- Merge implementation of MergingDigest HOT 2
- Question on quantile calculation logic HOT 3
- -deleted- HOT 1
- Add support for double weights HOT 2
- how to implement sliding windows quantile? HOT 1
- Determining quality HOT 3
- OpenTelemetry, Summaries and TDigests HOT 5
- Have `TDigest` implement `Consumer` HOT 1
- New release? HOT 3
- AssertionError if weight > 1 HOT 3
- Modifying T-digest that handle deletion HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from t-digest.