Giter Site home page Giter Site logo

yzchang / tsm-bench Goto Github PK

View Code? Open in Web Editor NEW

This project forked from exascaleinfolab/tsm-bench

0.0 0.0 0.0 4.22 GB

Comprehensive Benchmark for Time Series Database Systems

Shell 0.13% Python 0.83% Java 0.01% PLpgSQL 0.02% Jupyter Notebook 99.02%

tsm-bench's Introduction

Comprehensive Benchmark for Time Series Database Systems

TSM-Bench is a new benchmark that compares seven Time Series Database Systems (TSDBs) using a mixed set of workloads. It can be easily extended with new systems, queries, datasets, and workloads. The benchmark introduces a novel data generation method that augments seed real-world time series datasets, enabling realistic and scalable benchmarking. Technical details can be found in the paper TSM-Bench: Benchmarking Time Series Database Systems for Monitoring Applications, PVLDB'23.

  • List of benchmarked systems: ClickHouse, Druid, eXtremeDB*, InfluxDB, MonetDB, QuestDB, TimescaleDB.
  • The benchmark evaluates bulk-loading, storage performance, offline/online query performance, and the impact of time series features on compression.
  • We use two datasets for the evaluation: D-LONG [d1] and D-MULTI [d2]. The evaluated datasets can be found here.
  • *Note: Due to license restrictions, we can only share the evaluation version of extremeDB. The results between the benchmarked and the public version might diverge.

Prerequisites | Installation | Datasets Loading | Experiments | Benchmark Extension | Technical Report | Data Generation | Contributors


Prerequisites

  • Ubuntu 20 (including Ubuntu derivatives, e.g., Xubuntu); 128 GB RAM
  • Clone this repository (this can take a couple of minutes as it uploads one of the datasets)

Systems Setup

  • Install the dependencies and activate the created virtual environment
cd systems/
sh install_dep.sh
source TSMvenv/bin/activate
  • Install all the systems (takes ~15mins)
sh install_all_sys.sh

Datasets Loading

  • Download and decompress Dataset 1 (takes ~ 3 mins)
cd ../datasets
sh build.sh d1
  • Load Dataset 1 into all the systems (takes ~ 2 hours)
sh load_all.sh d1
  • Note: To build and load the larger dataset d2, replace d1 with d2.

Experiments

Offline Workload

  • Activate the virtual environment, if not already done:

    source systems/TSMvenv/bin/activate
  • The offline queries for all systems can be executed from the root folder using:

    python3 tsm_eval.py [args]
  • Mandatory Arguments: [args] should be replaced with the name of the system, query, and dataset:

--system --queries --datasets
clickhouse q1 (selection) d1
druid q2 (filtering) d2
extremedb* q3 (aggregation)
influx q4 (downsampling)
monetdb q5 (upsampling)
questdb q6 (average)
timescaledb q7 (correlation)
all all all
  • Optional Arguments: The following arguments allow to add variation in the number of sensors and dynamic changes in predicate ranges:

    • --nb_st: Number of queried stations when varying other dimensions (Default = 1)
    • --nb_sr: Number of queried sensors when varying other dimensions (Default = 3)
    • --range: Query range value when varying other dimensions (Default = 1)
    • --rangeUnit: Query range unit when varying other dimensions (Default = day)
    • --timeout: Maximum query time after five runs (s) (Default = 20)
    • --min_ts: Minimum query timestamp (Default = "2019-04-01T00:00:00")
    • --max_ts: Maximum query timestamp (Default = "2019-04-30T00:00:00")
  • Results: All the runtimes and plots will be added to the results folder.

    • The runtime results of the systems for a given dataset and query will be added to: results/offline/{dataset}/{query}/runtime/. The runtime plots will be added to the folder results/offline/{dataset}/{query}/plots/.

    • All the queries return the runtimes by varying the number of stations (nb_st), number of sensors (nb_sr), and the range.

  • Examples:

  1. Run query q1 on extremedb for Dataset 1 using default parameters (nb_st=1, nb_sr=3, range=1 day)
python3 tsm_eval.py --systems extremedb --queries q1 --datasets d1
  1. Run q2 and q3 on extremedb and timescaledb for Dataset 1
python3 tsm_eval.py --systems extremedb timescaledb --queries q2 q3 --datasets d1
  1. Run all the offline workload on all systems for Dataset 1 (takes ~ 3 hours)
python3 tsm_eval.py --systems all --queries all --datasets d1 

Online Workload

This workload requires two servers: the first serves as a host machine to deploy the systems (similar to above), and the second runs as a client to generate writes and queries.

Client Setup

  • Clone this repo

  • Install dependencies:

    cd systems/
    sh install_dep.sh
    source TSMvenv/bin/activate
  • Install the system libraries

    sh install_client_lib.sh

Query Execution

  1. Run the system on the host side

    cd systems/{system}
    sh launch.sh
  2. If the virtual environment is not activated from the root folder using:

    source systems/TSMvenv/bin/activate
  3. Execute the online query on the client side using the --host flag (see examples below).

  4. Stop the system on the host server

    sh stop.sh

Optional Arguments:

  • --host : remote host machine name (Default = "localhost")
  • --n_threads: Number of threads to use. (Default 10)
  • --batch_start: Number data points to be inserted each second (if possible) in each thread (Default = 10000)
  • --batch_step: Number data points to be inserted each second (if possible) in each thread (Default = 10000)

Examples:

  1. Run query q1 in an online manner on clickhouse.
python3 tsm_eval_online.py --system clickhouse --queries q1 --host "host_address"
  1. Run all queries online on questdb using one thread.
python3 tsm_eval_online.py --system questdb --queries all --n_threads 1 --host "host_address" 

Notes:

  • We launch each system separately on the host machine and execute the online query on the client machine using the --host flag.
  • The maximal batchsize depends on your architecture and the selected TSDB.
  • Druid supports ingestion and queries concurrently, while QuestDB does not support multithreading.
  • If you stop the program before its termination or shut down the system, the database might not be set into its initial state properly; you need to reload the dataset in the host machine:
    cd systems/{system}
    sh load.sh

Results:

  • The runtime results of the systems will be added to: results/online/{dataset}/{query}/runtime/.
  • The runtime plots will be added to the folder results/online/{dataset}/{query}/plots/.
  • All the queries return the runtimes by varying the ingestion rate.

Storage Performance

  • To compute the storage performance for a given system:
    cd systems/{system}
    sh compression.sh
  • Note: {system} needs to be replaced with the name of one of the systems from the table below.

Benchmark Extension

TSM-Bench allows the integration of new systems seamlessly. We provide a step-by-step tutorial on how to integrate your system as part of the benchmark.

Should users wish, new queries can also be added to the benchmark. They must be added under each system's {system}/queries.sql file. Note that the order of the queries should be respected (e.g., q8 is the eighth query in the file).


Time Series Generation

We provide a GAN-based generation that allows augmenting a seed dataset with more and/or longer time series that have akin properties to the seed ones. The generation can be used either as a pre-trained model or by retraining from scratch the model.


Technical Report

Additional results not reported in the paper can be found here. The additional experiments cover:

  • Advanced analytical queries in SQL and UDF
  • Selection of the evaluated systems
  • Parameterization of the systems
  • Impact of data characteristics

Contributors

Abdelouahab Khelifati ([email protected]), Mourad Khayati ([email protected]) and Luca Althaus.


Citation

@article{DBLP:journals/pvldb/KhelifatiKDDC23,
  author       = {Abdelouahab Khelifati and
                  Mourad Khayati and
                  Anton Dign{\"{o}}s and
                  Djellel Eddine Difallah and
                  Philippe Cudr{\'{e}}{-}Mauroux},
  title        = {TSM-Bench: Benchmarking Time Series Database Systems for Monitoring
                  Applications},
  journal      = {Proc. {VLDB} Endow.},
  volume       = {16},
  number       = {11},
  pages        = {3363--3376},
  year         = {2023},
  url          = {https://www.vldb.org/pvldb/vol16/p3363-khelifati.pdf},
  doi          = {10.14778/3611479.3611532},
  timestamp    = {Mon, 23 Oct 2023 16:16:16 +0200},
  biburl       = {https://dblp.org/rec/journals/pvldb/KhelifatiKDDC23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

tsm-bench's People

Contributors

akheli avatar mkhayati avatar althausluca avatar yzchang avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.