Giter Site home page Giter Site logo

datadog / sketches-js Goto Github PK

View Code? Open in Web Editor NEW
8.0 22.0 6.0 405 KB

TypeScript implementation of the distributed quantile sketch algorithm DDSketch

License: Other

JavaScript 33.45% TypeScript 66.55%
histogram quantiles percentiles relative-error ddsketch

sketches-js's Introduction

sketches-js

Continuous Integration License

This repo contains the TypeScript implementation of the distributed quantile sketch algorithm DDSketch. DDSketch is mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.

Installation

The package is under @datadog/sketches-js and can be installed through NPM or Yarn:

# NPM
npm install @datadog/sketches-js

# Yarn
yarn add @datadog/sketches-js

When using Protobuf serialization, the protobufjs module must also be installed manually:

# NPM
npm install protobufjs

# Yarn
yarn add protobufjs

Usage

Initialize a sketch

To initialize a sketch with the default parameters:

import { DDSketch } from '@datadog/sketches-js'; // or const { DDSketch } = require('@datadog/sketches-js');
const sketch = new DDSketch();

Modify the relativeAccuracy

If you want more granular control over how accurate the sketch's results will be, you can pass a relativeAccuracy parameter when initializing a sketch.

Whereas other histograms use rank error guarantees (i.e. retrieving the p99 of the histogram will give you a value between p98.9 and p99.1), DDSketch uses a relative error guarantee (if the actual value at p99 is 100, the value will be between 99 and 101 for a relativeAccuracy of 0.01).

This property makes DDSketch especially useful for long-tailed distributions of data, like measurements of latency.

import { DDSketch } from '@datadog/sketches-js';

const sketch = new DDSketch({
  relativeAccuracy: 0.01, // `relativeAccuracy` must be between 0 and 1
});

Add values to a sketch

To add a number to a sketch, call sketch.accept(value). Both positive and negative numbers are supported.

const measurementOne = 1607374726;
const measurementTwo = 0;
const measurementThree = -3.1415;

sketch.accept(measurementOne);
sketch.accept(measurementTwo);
sketch.accept(measurementThree);

Retrieve measurements from the sketch

To retrieve measurements from a sketch, use sketch.getValueAtQuantile(quantile). Any number between 0 and 1 (inclusive) can be used as a quantile.

Additionally, common summary statistics are available such as sketch.min, sketch.max, sketch.sum, and sketch.count:

const measurementOne = 1607374726;
const measurementTwo = 0;
const measurementThree = -3.1415;

sketch.accept(measurementOne);
sketch.accept(measurementTwo);
sketch.accept(measurementThree);

sketch.getValueAtQuantile(0)     // -3.1415
sketch.getValueAtQuantile(0.5)   // 0
sketch.getValueAtQuantile(0.99)  // 1607374726
sketch.getValueAtQuantile(1)     // 1607374726

sketch.min                       // -3.1415
sketch.max                       // 1607374726
sketch.count                     // 3
sketch.sum                       // 1607374722.86

Merge multiple sketches

Independent sketches can be merged together, provided that they were initialized with the same relativeAccuracy. This allows collecting and transmitting measurements in a distributed manner, and merging their results together while preserving the relativeAccuracy guarantee.

import { DDSketch } from '@datadog/sketches-js';

const sketch1 = new DDSketch();
const sketch2 = new DDSketch();

[1,2,3,4,5].forEach(value => sketch1.accept(value));
[6,7,8,9,10].forEach(value => sketch2.accept(value));

// `sketch2` is merged into `sketch1`, without modifying `sketch2`
sketch1.merge(sketch2);

sketch1.getValueAtQuantile(1) // 10

References

sketches-js's People

Contributors

brimtown avatar flentini avatar githomin avatar qard avatar rochdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sketches-js's Issues

Protobuf .d.ts file overwritten at compile time?

I noticed that I couldn't compile a TypeScript project which depended on sketch-js, because it couldn't find interfaces exported by the proto:

node_modules/@datadog/sketches-js/dist/ddsketch/mapping/CubicallyInterpolatedMapping.d.ts:2:10 - error TS2305: Module '"../proto/compiled"' has no exported member 'IndexMapping'.

Looking in the node_modules of my project, I noticed that @datadog/sketches-js/dist/ddsketch/proto/compiled.d.ts contained only the lines:

export = $root;
declare var $root: $protobuf.Root;
import $protobuf = require("protobufjs/minimal");

...instead of the rich types you provide in that file in this git repo.

This looks similar to an issue in another project, which I found when googling the error message I was seeing:
cryptowatch/cw-sdk-node#31

The problem is that tsc, when generating declarations, overrides the existing compiled.d.ts file in its zeal to declare types for compiled.js.

When I simply paste the real file into my node_modules, I'm able to successfully build my dependent project.

I looked for ways to avoid generating declarations for only one file, but couldn't find an option like that in tsc.

I believe this should be resolvable by appending a copy to the build script in package.json:

    "build": "yarn clean; tsc -p tsconfig.build.json; cp src/ddsketch/proto/compiled.d.ts dist/ddsketch/proto/compiled.d.ts",

Merging sketches with negative numbers appears to generate incorrect results

Say you have a range of negative numbers, -10 -> -1. You add the lower half of the range to one sketch, and the upper half of the range to another sketch. If you add these sketches together ("Sketch AB"), it will produce different getValueAtQuantile() results than the same code in Python or the same data from both halves added to a single sketch ("Sketch C") in TypeScript. This only happens with negative numbers: for positive ranges the values match. Full snippets for the ts-node and python interpreters is below, but here are how the results differ for a single sketch reading all the data versus a merge of two sketches with half the data:

q0.01	c: -10.07470	  ab: -10.07470
q0.25	c: -7.92497	  ab: -7.92497
q0.50	c: -5.98951	  ab: -5.98951
q0.75	c: -4.01484	  ab: 0.00000
q0.99	c: -1.99366	  ab: 0.00000

Whereas in Python the same test returns:

q0.01	c: -10.07470	  ab: -10.07470
q0.25	c: -7.92497	  ab: -7.92497
q0.50	c: -5.98951	  ab: -5.98951
q0.75	c: -4.01484	  ab: -4.01484
q0.99	c: -1.99366	  ab: -1.99366

To reproduce in ts-node:

import { DDSketch } from '@datadog/sketches-js'

// test data
const posA = [1, 2, 3, 4, 5]
const posB = [6, 7, 8, 9, 10]

// these two will each get half the positive data
const ddsPosA = new DDSketch()
const ddsPosB = new DDSketch()

// and then merge into this one
const ddsPosAB = new DDSketch()

// whereas this one reads all the positive data sequentially
const ddsPosC = new DDSketch()

posA.forEach(s => {
  ddsPosA.accept(s)
  ddsPosC.accept(s)
})

posB.forEach(s => {
  ddsPosB.accept(s)
  ddsPosC.accept(s)
})

ddsPosAB._copy(ddsPosA)
ddsPosAB.merge(ddsPosB)

// we'll repeat this for negative numbers
const negA = [-10, -9, -8, -7, -6]
const negB = [-5, -4, -3, -2, -1]
const ddsNegA = new DDSketch()
const ddsNegB = new DDSketch()
const ddsNegAB = new DDSketch()
const ddsNegC = new DDSketch()

negA.forEach(s => {
  ddsNegA.accept(s)
  ddsNegC.accept(s)
})

negB.forEach(s => {
  ddsNegB.accept(s)
  ddsNegC.accept(s)
})

ddsNegAB._copy(ddsNegA)
ddsNegAB.merge(ddsNegB)

// the percentiles should be the same
// for the sequential c sketches
// and the merged ab sketches
const qs = [0.01, 0.25, 0.50, 0.75, 0.99]

qs.forEach(q => {
  console.log(`pq: ${q.toFixed(2)}\tc: ${ddsPosC.getValueAtQuantile(q).toFixed(5)}\tab: ${ddsPosAB.getValueAtQuantile(q).toFixed(5)}`)
})

qs.forEach(q => {
  console.log(`nq: ${q.toFixed(2)}\tc: ${ddsNegC.getValueAtQuantile(q).toFixed(5)}\tab: ${ddsNegAB.getValueAtQuantile(q).toFixed(5)}`)
})

To see what I believe are "correct" results in the python implementation:

from ddsketch import DDSketch

# test data
pos_a = [1, 2, 3, 4, 5]
pos_b = [6, 7, 8, 9, 10]

# these two will each get half the positive data
dds_pos_a = DDSketch()
dds_pos_b = DDSketch()

# and then merge into this one
dds_pos_ab = DDSketch()

# whereas this one reads all the positive data seqquentially
dds_pos_c = DDSketch()

for s in pos_a:
  dds_pos_a.add(s)
  dds_pos_c.add(s)

for s in pos_b:
  dds_pos_b.add(s)
  dds_pos_c.add(s)

dds_pos_ab._copy(dds_pos_a)
dds_pos_ab.merge(dds_pos_b)

# we'll repeat this for negative numbers
neg_a = [-10, -9, -8, -7, -6]
neg_b = [-5, -4, -3, -2, -1]
dds_neg_a = DDSketch()
dds_neg_b = DDSketch()
dds_neg_c = DDSketch()
dds_neg_ab = DDSketch()

for s in neg_a:
  dds_neg_a.add(s)
  dds_neg_c.add(s)

for s in neg_b:
  dds_neg_b.add(s)
  dds_neg_c.add(s)

dds_neg_ab._copy(dds_neg_a)
dds_neg_ab.merge(dds_neg_b)

# the percentiles should be the same
# for the sequential c sketches
# and the merged ab sketches
qs = [0.01, 0.25, 0.50, 0.75, 0.99]

for q in qs:
  print("pq: {:.2f}\tc: {:.5f}\tab: {:.5f}".format(q, dds_pos_c.get_quantile_value(q), dds_pos_ab.get_quantile_value(q)))

for q in qs:
  print("nq: {:.2f}\tc: {:.5f}\tab: {:.5f}".format(q, dds_neg_c.get_quantile_value(q), dds_neg_ab.get_quantile_value(q)))

I've tried comparing the TypeScript vs Python implementations looking for differences that could explain this behavior, but nothing jumped out at me. It does look like the sumOfRange() utility function in the store is used in _adjust() and merge() in situations where the equivalent Python code might be an exclusive range while sumOfRange() is inclusive, but when I tried overriding to add a -1 to the end index for those calls, it didn't seem to change anything.

If anyone has any ideas what the problem might be, I'd be happy to look further and keep testing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.