percona-lab / promhouse Goto Github PK

PromHouse is a long-term remote storage with built-in clustering and downsampling for Prometheus 2.x on top of ClickHouse.

License: Apache License 2.0

Makefile 1.56% Go 97.36% Shell 1.07%

prometheus clickhouse

promhouse's Introduction

PromHouse

PromHouse is a long-term remote storage with built-in clustering and downsampling for 2.x on top of ClickHouse. Or, rather, it will be someday. Feel free to ~~like, share, retweet,~~ star and watch it, but do not use it in production yet.

Database Schema

CREATE TABLE time_series (
    date Date CODEC(Delta),
    fingerprint UInt64,
    labels String
)
ENGINE = ReplacingMergeTree
    PARTITION BY date
    ORDER BY fingerprint;

CREATE TABLE samples (
    fingerprint UInt64,
    timestamp_ms Int64 CODEC(Delta),
    value Float64 CODEC(Delta)
)
ENGINE = MergeTree
    PARTITION BY toDate(timestamp_ms / 1000)
    ORDER BY (fingerprint, timestamp_ms);

SELECT * FROM time_series WHERE fingerprint = 7975981685167825999;

┌───────date─┬─────────fingerprint─┬─labels─────────────────────────────────────────────────────────────────────────────────┐
│ 2017-12-31 │ 7975981685167825999 │ {"__name__":"up","instance":"promhouse_clickhouse_exporter_1:9116","job":"clickhouse"} │
└────────────┴─────────────────────┴────────────────────────────────────────────────────────────────────────────────────────┘

SELECT * FROM samples WHERE fingerprint = 7975981685167825999 LIMIT 3;

┌─────────fingerprint─┬──timestamp_ms─┬─value─┐
│ 7975981685167825999 │ 1514730532900 │     0 │
│ 7975981685167825999 │ 1514730533901 │     1 │
│ 7975981685167825999 │ 1514730534901 │     1 │
└─────────────────────┴───────────────┴───────┘

Time series in Prometheus are identified by label name/value pairs, including __name__ label, which stores metric name. Hash of those pairs is called a fingerprint. PromHouse uses the same hash algorithm as Prometheus to simplify data migration. During the operation, all fingerprints and label name/value pairs a kept in memory for fast access. The new time series are written to ClickHouse for persistence. They are also periodically read from it for discovering new time series written by other ClickHouse instances. ReplacingMergeTree ensures there are no duplicates if several ClickHouses wrote the same time series at the same time.

PromHouse currently stores 24 bytes per sample: 8 bytes for UInt64 fingerprint, 8 bytes for Int64 timestamp, and 8 bytes for Float64 value. The actual compression rate is about 4.5:1, that's about 24/4.5 = 5.3 bytes per sample. Prometheus local storage compresses 16 bytes (timestamp and value) per sample to 1.37, that's 12:1.

Since ClickHouse v19.3.3 it is possible to use delta and double delta for compression, which should make storage almost as efficient as TSDB's one.

Outstanding features in the roadmap

Downsampling (become possible since ClickHouse v18.12.14)
Query Hints (become possible since prometheus PR 4122, help wanted issue #24)

SQL queries

The largest jobs and instances by time series count:

SELECT
    job,
    instance,
    COUNT(*) AS value
FROM time_series
GROUP BY
    visitParamExtractString(labels, 'job') AS job,
    visitParamExtractString(labels, 'instance') AS instance
ORDER BY value DESC LIMIT 10

The largest metrics by time series count (cardinality):

SELECT
    name,
    COUNT(*) AS value
FROM time_series
GROUP BY
    visitParamExtractString(labels, '__name__') AS name
ORDER BY value DESC LIMIT 10

The largest time series by samples count:

SELECT
    labels,
    value
FROM time_series
ANY INNER JOIN
(
    SELECT
        fingerprint,
        COUNT(*) AS value
    FROM samples
    GROUP BY fingerprint
    ORDER BY value DESC
    LIMIT 10
) USING (fingerprint)

promhouse's People

Contributors

Stargazers

Watchers

promhouse's Issues

Memory usage

Hello!

We are testing promhouse. We have 25 hosts with 3000 metrics on 1 minutes. We have 8Gb memory usage already and it are increasing slowly ... We will want use promhouse on 100 hosts, but we don't understand why promhouse usage that amount memory. Can we decrease memory usage on our promhouse?

Experiment with gogoprotobuf

I can be faster.

Add some variance to promhouse-spread

At the moment promhouse-spread stores exactly the same metric values for each generated instance. That leads to unrealistically good compression ratio 75:1. We should add some variance to them. Say, ±10% to each metric value that doesn't look like a small natural number or 0.

how to use it?

Hello Please write some quick tutorial how to start and use it.
I don't have docker and kebernetes
Thanks

Update README to include CODEC example

Based on PR #ClickHouse/ClickHouse#4052 merge, next ClickHouse release 19.3.3 will have it already.
README must be updated accordingly.

Rename metrics table to time_series

That's what we actually store there.

Use custom partition key

Fuzz fancy stuff

We have some tricky stuff like custom JSON marshaller/unmarshaller. We should fuzz it.

Some random thoughts technical aspects of the project

GraphiteMergeTree if I'm correct will only work normally with exactly 4 column database: name, timestamp, value, date. I'm not sure that it'll be ok to use it with other set of columns. So I'd suggest to go for AggregatingMergeTree
You should allow people to create Replicated and Distributed tables. You also need to take into account that it's generally bad idea to insert directly to distributed table, but it's perfectly ok to read from it.
It doesn't really matter if you'll add an array of strings as a tag or you'll dynamically create columns that'll contain tags. In Clickhouse you can specify index only at table creation time, so all those columns will be out of index. In terms of query speed that won't be efficient. So it'll be much better if you'll have either another table that'll give you tags->names mapping.

You also should consider talking with developers in telegram chats about Clickhouse (https://t.me/clickhouse_ru or https://t.me/clickhouse_en)

We tried using ClickHouse as a remote storage for Prometheus.

We tried using ClickHouse as a remote storage for Prometheus.
Gauge 、Histogram and so on are satisfied，hower what should Counter of prometheus do?
It expected to provide functions like MySQL lag() or lead() ?
e.g http://www.silota.com/docs/recipes/sql-mom-growth-rate.html

I tried to store metrics with Nested, and expected Calculated Growth Rate.

CREATE TABLE IF NOT EXISTS app_metrics(
   partition Date DEFAULT toDate(timestamp),
   timestamp DateTime DEFAULT now(),
   instance_id String,
   metric String,
   tags Nested
   (
     key String,
     value String
   ),
   value Float64
  ) ENGINE = MergeTree(partition, (timestamp, metric), 8192);

Use gogoprotobuf extensions

https://github.com/gogo/protobuf/blob/master/extensions.md
Specifically, nullable should give us a noticeable performance boost.

Support read query hints

prometheus/prometheus#2580

What happened to the project?

I see it is not actively maintained. Did ClickHouse meet any limitations?

Allow user to select engine

Allow switching between various MergeTree variants, including aggregated and distributed.

Implement fake_exporter

It should scrape upstream exporter and expose new metrics for new instances.

Implement downsampling

Add support for 1.7/1.8 remote storage APIs

If possible and not to invasive.

Extract common labels

__name__
job
instance

Store (ID, value) pairs in separate tables, store NameID, JobID, InstanceID in samples table. Storing them as UInt32 IDs instead of Strings should make storage efficient.

In theory, that will allow us to optimize simple queries that use only those labels.