Giter Site home page Giter Site logo

Decay TDigest about t-digest HOT 4 OPEN

rnataf avatar rnataf commented on September 25, 2024
Decay TDigest

from t-digest.

Comments (4)

tdunning avatar tdunning commented on September 25, 2024

from t-digest.

rnataf avatar rnataf commented on September 25, 2024

Hi Ted, thank you for your answer.
I want to make sure I understand you correctly:

  • My first idea was to each time period dividing by some factor the weight of all the centroids. This way, the more the time goes, the less old data has influence. If I understand you correctly, this may be a bad way since in such case we are not ensured that the centroids in tails contain/represent a single element and then it is not clear which accuracy we will get. - Conversely, we could also multiply the weight of a new entry (or simulate inserting several times the same entry) in order to give more importance to new data. This way, it would still hold that the centroids at the tail contain a single entry.
  • What you're suggesting though is to use several instances of tDigest, each one for a defined time period and then merging them. Is that correct ? If so, I don't see how it would enable to simulate a time decay. Can you enlighten me a little more please ?

from t-digest.

tdunning avatar tdunning commented on September 25, 2024

Yes. You have the first point correct. If a centroid has one sample, then it decays to 1/2, another sample, and it decays by a factor of 2/3, we have a weight of 1, but this represents two samples. Normally, we would use this specially for interpolation purposes to gain accuracy, but we can't do that any more.

One fix for this would be to keep separate counter of number of samples and total weight for a centroid, but once you have at least one sample, then it can never decay to the point where it can be merged if it is in the tail and thus you force other samples to be merged.

Regarding your second point, the most common desire for exponential decay of a digest is so that you can estimate the distribution over a particular (recent) time period. With short time range based digests, you don't have to estimate ... you can just combine the pertinent digests and you have exactly what you want.

If you want some sort of weighting over a time period, you can do that too. When you want to do the quantile or CDF operation you translate into composite operations on each set of digests that have the same weight.

from t-digest.

rnataf avatar rnataf commented on September 25, 2024

Regarding your last point, I understand that what you suggest to do is to have one digest per time period. Then, I can decrease the weights of "old" digests when I no longer use them. Let's say now I'm in time period 2 and I decreased the weights of digests of time period 0 and 1. When I want to compute the quantile I should merge the digests of periods 0, 1 and 2 and then compute the quantile on the merged result. Did I understand you correctly ?

from t-digest.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.