Comments (2)
Yeah... there has been a fair bit of discussion on this.
The core question is what does the t-digest invariant actually mean with non-integer weights.
Do you have thoughts on that?
The key problems in the past include:
- violation of invariant has been allowed for centroids with weight = 1. If weights < 1 are allowed, what is the status of this exemption? Is a weight less than one still an indivisible value?
- if the exemption is removed so that all centroids must meet the invariant, we will require an infinite number of centroids for K=2 or K=3. How could that be resolved?
- if we assume that any centroid that is added represents a single sample with variable weight rather than the number of samples represented by the weight, then we could allow the exemption from the scale invariant for all samples <= 1. What happens if we merge two such centroids and the weight is still < 1. Do we have to remember that this new centroid has more than one sample?
So, what do you think?
from t-digest.
Centroids now have to maintain their cardinality (the number of samples). Then, the exemption can be done based on the cardinality, not the weight (in fact, weight used to be some kind of cardinality, with the assumption of unit weight).
With non-integer weights, the t-digest invariant
Then, the invariant should behave identical to equivalent integer weights.
For example, non-integer weighted samples (sample value, sample weight)
(1, 0.1), (2, 0.25), (2, 0.2)
with quantiles
(0, 0.1/0.55), (0.1/0.55, 1)
are equivalent to these integer-weighted samples:
(1, 2), (2, 9)
with quantiles
(0, 2/11), (2/11, 1)
Difference is that a cluster
from t-digest.
Related Issues (20)
- Release notes for 3.3? HOT 1
- Will merging multiple t-digest preserve the exact value of min/max? HOT 3
- Behavior when compression ratio is 1 HOT 1
- TDigest objet serializable HOT 1
- tag missing problem HOT 2
- Decay TDigest HOT 4
- T-Digest (Re)Construction
- Merge implementation of MergingDigest HOT 2
- Question on quantile calculation logic HOT 3
- -deleted- HOT 1
- how to implement sliding windows quantile? HOT 1
- Determining quality HOT 3
- OpenTelemetry, Summaries and TDigests HOT 5
- Have `TDigest` implement `Consumer` HOT 1
- New release? HOT 4
- AssertionError if weight > 1 HOT 3
- Modifying T-digest that handle deletion HOT 1
- New release?
- Merging MergingDigests sometimes fails sanity check asserts HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from t-digest.