Giter Site home page Giter Site logo

s4ayub / tempo Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 0.0 24 KB

A data structure for efficient in-memory storage of time-series data; inspired by Facebook's Gorilla DB :hourglass_flowing_sand:

License: MIT License

Go 100.00%
gorilla time-series map golang delta-encoding xor-encoder

tempo's Introduction

tempo

Go implementation of Facebook's GorillaDB: an in-memory write-through cache for time-series data

import (
  "time"
  "github.com/s4ayub/tempo"
)

func main() {
  // A time series is the underlying data structure behind the time series map.
  // A user of the package will never interact with the TimeSeries struct, but the
  // example code highlights our implementation of the delta-of-delta encoding for time and XOR encoding for values.

  startTime := time.Now()
  ts := NewTimeSeries(startTime)

  t1 := startTime.Add(time.Second * time.Duration(60)).Unix() // store in seconds
  ts.timeEncode(t1)
  ts.dataEncode(2)

  t2 := t1.Add(time.Second * time.Duration(60)).Unix()
  ts.timeEncode(t2)
  ts.dataEncode(3)
  
  fmt.Println(ts)
}

Changes to the encoding scheme:

Tempo stores its timestamps and data values in a single byte-stream (for now). The encoding schemes used by GorillaDB takes advantage of bit-level granularity.

Here, we discuss how we changed the algorithms to be byte-friendly, and the ramifications for doing so.

Timeseries encoding scheme:

  1. The tag bits are always stored in 8 bits instead of 4.

    • The compression ratio for timestamps is reduced
  2. The value bits associated with each tag is changed:

    • [0] -> 0 bits (the same as before)

    • [-63, 64] -> 8 bits

    • [-255, 256] -> 16 bits

    • [-2047, 2048] -> 16 bits

    • [Otherwise] -> 32 bits (the same as before)

    • Again, the compression ratio drops as unused bits are added.

    • [-255, 256] for example, requires only 9 bits but 16 bits are reserved.

  3. Example of an encoded timestamp entry

    • {Tag bits}{Value}
      • 'Tag bits' is the tag defining the number of bits the time value will use
      • 'Value' is the actual delta-of-delta time value
      • Each { } item is 1 byte, but '...' represents a variable amount of bytes

Data encoding scheme:

  1. The dissimilar bit, the control bit, and the leading zeroes are stored in the first byte.

    • [xxxx xxxx]
    • b[7] is the dissimilar bit
    • b[6] is the control bit
    • b[5:0] is the number of leading zeroes of the meaningful xor
      • LZ is encoded in GorillaDB, but it is not encoded in Tempo
      • These 6 bits would have been here regardless of whether we used it to store LZ
  2. The control bit represents whether the length of the meaningful xor is reused

    • In GorillaDB, this is an indicator for whether the LZ and the MXOR length is reusued
    • In Tempo, this is an indicator for whether the MXOR length is reused
      • The LZ is always in the first byte
  3. Example of an encoded data entry

    • {Header}{MXOR length}{MXOR...}
      • 'Header' is the byte containing the dissimilar bit, control bit, and LZ
      • 'MXOR length' is the number of bits used to store the MXOR
      • 'MXOR' is the actual meaningful xor bits
      • Each { } item is 1 byte, but '...' represents a variable amount of bytes

To do:

  • Lay out the TimeSeries struct to encase timestamps and data in a single byte stream
  • Implement and test XOR encoding for data values
  • Implement and test delta-of-delta encoding for timestamps
  • Write a decoder to return decoded blocks of time series data
  • Build the TimeSeriesMap structure which will abstract away the TimeSeries struct

tempo's People

Contributors

arnettus avatar s4ayub avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tempo's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.