Giter Site home page Giter Site logo

lysevi / dariadb Goto Github PK

View Code? Open in Web Editor NEW
22.0 5.0 8.0 6.05 MB

dariadb - is a numeric time-series database storage engine.

License: Apache License 2.0

CMake 2.94% C++ 95.42% Shell 0.60% PowerShell 0.02% Go 1.02%
c-plus-plus database timeseries engine no-sql

dariadb's Introduction

dariadb - numeric time-series database.

Continuous Integration

version build & tests test coverage
master Build Status codecov
develop Build Status codecov

Features

  • True columnar storage
  • Can be used as a server application or an embedded library.
  • Full featured http api.
  • Golang client (see folders "go" and "examples/go")
  • Accept unordered data.
  • Each measurement contains:
    • Id - x32 unsigned integer value.
    • Time - x64 timestamp.
    • Value - x64 float.
    • Flag - x32 unsigned integer.
  • Write strategies:
    • wal - little cache and all values storing to disk in write ahead log. optimised for big write load(but slower than 'memory' strategy).
    • compressed - all values compressed for good disk usage without writing to sorted layer.
    • memory - all values stored in memory and dropped to disk when memory limit is ended.
    • cache - all values stored in memory with writes to disk.
    • memory-only - all valeus stored only in memory.
  • LSM-like storage struct with three layers:
    • Memory cache or Append-only files layer, for fast write speed and crash-safety(if strategy is 'wal').
    • Old values stored in compressed block for better disk space usage.
  • High write speed:
    • as embedded engine - to disk - 1.5 - 3.5 millions values per second to disk
    • as memory storage(when strategy is 'memory') - 7-9 millions.
    • across the network - 700k - 800k values per second
  • Shard-engine: you can split values per shard in disk, for better compaction and read speed up.
  • Crash recovery.
  • CRC32 for all values.
  • Two variants of API:
    • Functor API (async) - engine apply given function to each measurement in the incoming request.
    • Standard API - You can Query interval as list or values in time point as dictionary.
  • Compaction old data with filtration support;
  • Statistic:
    • time min/max
    • value min/max
    • measurement count
    • values sum
  • Statistical functions:
    • minimum
    • maximum
    • count
    • average
    • median
    • sigma(standard deviation)
    • percentile90
    • percentile99
  • Interval aggregation support. Available intervals: raw,minute, half hour, hour, day, week, month, year.

Usage example

  • See folder "examples"
  • How to use dariadb as a embedded storage engine: dariadb-example

Dependencies

  • Boost 1.54.0 or higher: system, filesystem, date_time,regex, program_options, asio.
  • cmake 3.1 or higher
  • c++ 14/17 compiler (MSVC 2015, gcc 6.0, clang 3.8)

Build


Install dependencies

$ sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
$ sudo apt-get update
$ sudo apt-get install -y libboost-dev  libboost-filesystem-dev libboost-program-options-dev libasio-dev libboost-date-time-dev cmake  g++-6  gcc-6 cpp-6 
$ export CC="gcc-6"
$ export CXX="g++-6"

Jemalloc

Optionaly you can install jemalloc for better memory usage.

$ sudo apt-get install libjemalloc-dev

Or you may use builtin jemalloc source in dariadb - just add build option -DSYSTEM_JEMALLOC=OFF

Git submodules

$ cd dariadb
$ git submodules init 
$ git submodules update

Available build options

  • DARIADB_ENABLE_TESTS - Enable testing of the dariadb. - ON
  • DARIADB_ENABLE_INTEGRATION_TESTS - Enable integration test. - ON
  • DARIADB_ENABLE_SERVER - Enable build dariadb server. - ON
  • DARIADB_ENABLE_BENCHMARKS - Enable build dariadb benchmarks. - ON
  • DARIADB_ENABLE_SAMPLES - Build dariadb sample programs. - ON
  • DARIADB_ASAN_UBSAN - Enable address & undefined behavior sanitizer for binary. - OFF
  • DARIADB_MSAN - Enable memory sanitizer for binary. - OFF
  • DARIADB_SYSTEM_JEMALLOC - Use jemalloc installed in the system. - ON

Configure to build with all benchmarks, but without tests and server.


$ cmake  -DCMAKE_BUILD_TYPE=Release -DDARIADB_ENABLE_TESTS=OFF -DDARIADB_ENABLE_INTEGRATION_TESTS=OFF -DDARIADB_ENABLE_BENCHMARKS=ON -DDARIADB_ENABLE_SERVER=OFF . 

clang


Clang currently does not supported.

$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_CXX_FLAGS_RELEASE="${CMAKE_CXX_FLAGS_RELEASE} -stdlib=libc++" -DCMAKE_EXE_LINKER_FLAGS="${CMAKE_EXE_LINKER_FLAGS} -lstdc++" .
$ make

gcc


$ cmake -DCMAKE_BUILD_TYPE=Release .
$ make

Microsoft Visual Studio


$ cmake -G "Visual Studio 14 2015 Win64" .
$ cmake --build .

if you want to build benchmarks and tests

$ cmake -G "Visual Studio 14 2015 Win64" -DBUILD_SHARED_LIBS=FALSE  .
$ cmake --build .

build with non system installed boost


$ cmake  -DCMAKE_BUILD_TYPE=Release -DBOOST_ROOT="path/to/boost/" .
$ make

dariadb's People

Contributors

lysevi avatar xgdgsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dariadb's Issues

Bucket

in memory storage can be write only sorted by time data.

  1. in bucket measurements stored in sorted sets(by time).
  2. when set in full, all meas in them writes to memstorage.
  3. membucket check time of writed measurement. time should be great or equal some time. (write window).
  • TimeOrderedSet
  • Bucket
    • list of TimeOrderedSet
    • add meas with max time
    • insert in midle of set
    • insert in closest set
    • add meas with time less than min (equal max time of older set)
    • check time window
    • max count of appended meas
    • dump older set to memstorage.
    • split TimeOrderedSets by memseries id. (use inner meas representation)

refact

  • rename storeage_test storage_bench. add "mem" prefix
  • page_manager => storage dir

write to past

  • - capacitor support unordered data.
  • - Chunk::Info::is_sorted flag. if date ordered by time, flag is true.
  • - chunk support unordered data. new class OverlappedChunks: set of chunks with overlapped times. return result of k-merge chunks data from interval.
  • - engine support unordered data.
  • - test with writing random data (engine and capacitor).

continue writing

  • Pagemanager load not closed chunks
  • Memstorage: method, to add chunks (and FreeChunks filling)

Capacitor flush by timer

Capcitor should sync. with storage by timer (period eq. write window size+const)
add checks to common test after fillinf:

... sleep ...
dariadb::Meas::MeasList current_mlist;
as->currentValue(dariadb::IdArray{}, 0)->readAll(&current_mlist);

extended filter

  • positive only
  • negative only
  • all by default
  • by measurement source
    • - bloom filter

Page manager

  • - create empty page with partition of index (allocate in memory. to test. )
    • - calc size of index
    • - readers count
    • - writers lock
  • - append new chunk:
    • - add chunk buffer to chunks partition
    • - add index info.
  • - read chunk to memory storage with read only mode (or copy on read).
  • - rewrite old chunk on append
  • - read cursor.
  • - send all/old chunks from memstorage to page storage.
  • - save in chunk BinaryBuffer params (position etc)
  • - restore meas id.
  • - page is a mapped file.
    • - create new
    • - reopen
  • - read page info (without loading all page)
    • - free space
  • - benchmark
    • - write single thread
    • - read single thread

dtw

Dynamic time wraping algorithm

Statistic

  • average
  • integral
  • median
  • benchmarks

storage procedure language

  • - All statistic functions should be available for stored procedures.
  • - scripts(lua):
    • - build options SYSTEM_LUA (ON by default). if off, use git subproject lua.
  • - API:
    • - create database
    • - readInterval
    • - readInTimePoint
    • - join:
      • - table pretty print
    • - add continues query rule with map callback (for example: add to QueryTable sum of id=1 and id=2 for each time.)
    • - load raw to bystep with step.

read optimisation

flag isOrderedStorage. stop reading in Reader when readed Time > to

subscribe

calback to new data. filter by id.

time series analise helpers

  • flags
  • steps
    • byChange - is default
    • byStep
  • reader must be a callback
  • clone readers
  • reader reset pos. to read from begin.

steps

byChange - by defaul byTime - with step value in miliseconds

delta encoding

store full time when delta>2048 (currently stored uint32 part)

chunks pool

ChunksPoll class. create new if no free. Chunks not deleting, just get back to the pool. Chunks pool store buffers of uin8_t.

UnionStorage: read query to pagemanager

  • Chunk readOnly param.
  • methods readInterval, readInTimePoint implement in AbstractStorage.
  • AbstractStorage is derived from ChunkStorage.
  • AbstractStorage rename to BaseStorage.
  • implement chunks operation in PageManager.
  • implement chunks operation in UnionStorage.
  • MemStorage dropAll

storage test

fill file storage. read interval from minTime to maxTime.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.