Giter Site home page Giter Site logo

danielqsj / kafka_exporter Goto Github PK

View Code? Open in Web Editor NEW
2.0K 37.0 592.0 18.17 MB

Kafka exporter for Prometheus

License: Apache License 2.0

Makefile 7.90% Go 88.89% Dockerfile 0.64% Mustache 2.57%
prometheus prometheus-exporter kafka kafka-metrics metrics

kafka_exporter's Introduction

kafka_exporter

kafka_exporter

CIDocker PullsGo Report CardLanguageGitHub releaseLicense

Kafka exporter for Prometheus. For other metrics from Kafka, have a look at the JMX exporter.

Table of Contents

Compatibility

Support Apache Kafka version 0.10.1.0 (and later).

Dependency

Download

Binary can be downloaded from Releases page.

Compile

Build Binary

make

Build Docker Image

make docker

Docker Hub Image

docker pull danielqsj/kafka-exporter:latest

It can be used directly instead of having to build the image yourself. (Docker Hub danielqsj/kafka-exporter)

Run

Run Binary

kafka_exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]

Run Docker Image

docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...]

Run Docker Compose

make a docker-compose.yml flie

services:
  kafka-exporter:
    image: danielqsj/kafka-exporter 
    command: ["--kafka.server=kafka:9092", "[--kafka.server=another-server ...]"]
    ports:
      - 9308:9308     

then run it

docker-compose up -d

Flags

This image is configurable using different flags

Flag name Default Description
kafka.server kafka:9092 Addresses (host:port) of Kafka server
kafka.version 2.0.0 Kafka broker version
sasl.enabled false Connect using SASL/PLAIN
sasl.handshake true Only set this to false if using a non-Kafka SASL proxy
sasl.username SASL user name
sasl.password SASL user password
sasl.mechanism SASL mechanism can be plain, scram-sha512, scram-sha256
sasl.service-name Service name when using Kerberos Auth
sasl.kerberos-config-path Kerberos config path
sasl.realm Kerberos realm
sasl.keytab-path Kerberos keytab file path
sasl.kerberos-auth-type Kerberos auth type. Either 'keytabAuth' or 'userAuth'
tls.enabled false Connect to Kafka using TLS
tls.server-name Used to verify the hostname on the returned certificates unless tls.insecure-skip-tls-verify is given. The kafka server's name should be given
tls.ca-file The optional certificate authority file for Kafka TLS client authentication
tls.cert-file The optional certificate file for Kafka client authentication
tls.key-file The optional key file for Kafka client authentication
tls.insecure-skip-tls-verify false If true, the server's certificate will not be checked for validity
server.tls.enabled false Enable TLS for web server
server.tls.mutual-auth-enabled false Enable TLS client mutual authentication
server.tls.ca-file The certificate authority file for the web server
server.tls.cert-file The certificate file for the web server
server.tls.key-file The key file for the web server
topic.filter .* Regex that determines which topics to collect
topic.exclude ^$ Regex that determines which topics to exclude
group.filter .* Regex that determines which consumer groups to collect
group.exclude ^$ Regex that determines which consumer groups to exclude
web.listen-address :9308 Address to listen on for web interface and telemetry
web.telemetry-path /metrics Path under which to expose metrics
log.enable-sarama false Turn on Sarama logging
use.consumelag.zookeeper false if you need to use a group from zookeeper
zookeeper.server localhost:2181 Address (hosts) of zookeeper server
kafka.labels Kafka cluster name
refresh.metadata 30s Metadata refresh interval
offset.show-all true Whether show the offset/lag for all consumer group, otherwise, only show connected consumer groups
concurrent.enable false If true, all scrapes will trigger kafka operations otherwise, they will share results. WARN: This should be disabled on large clusters
topic.workers 100 Number of topic workers
verbosity 0 Verbosity log level

Notes

Boolean values are uniquely managed by Kingpin. Each boolean flag will have a negative complement: --<name> and --no-<name>.

For example:

If you need to disable sasl.handshake, you could add flag --no-sasl.handshake

Metrics

Documents about exposed Prometheus metrics.

For details on the underlying metrics please see Apache Kafka.

Brokers

Metrics details

Name Exposed informations
kafka_brokers Number of Brokers in the Kafka Cluster

Metrics output example

# HELP kafka_brokers Number of Brokers in the Kafka Cluster.
# TYPE kafka_brokers gauge
kafka_brokers 3

Topics

Metrics details

Name Exposed informations
kafka_topic_partitions Number of partitions for this Topic
kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition
kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition
kafka_topic_partition_in_sync_replica Number of In-Sync Replicas for this Topic/Partition
kafka_topic_partition_leader Leader Broker ID of this Topic/Partition
kafka_topic_partition_leader_is_preferred 1 if Topic/Partition is using the Preferred Broker
kafka_topic_partition_replicas Number of Replicas for this Topic/Partition
kafka_topic_partition_under_replicated_partition 1 if Topic/Partition is under Replicated

Metrics output example

# HELP kafka_topic_partitions Number of partitions for this Topic
# TYPE kafka_topic_partitions gauge
kafka_topic_partitions{topic="__consumer_offsets"} 50

# HELP kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition
# TYPE kafka_topic_partition_current_offset gauge
kafka_topic_partition_current_offset{partition="0",topic="__consumer_offsets"} 0

# HELP kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition
# TYPE kafka_topic_partition_oldest_offset gauge
kafka_topic_partition_oldest_offset{partition="0",topic="__consumer_offsets"} 0

# HELP kafka_topic_partition_in_sync_replica Number of In-Sync Replicas for this Topic/Partition
# TYPE kafka_topic_partition_in_sync_replica gauge
kafka_topic_partition_in_sync_replica{partition="0",topic="__consumer_offsets"} 3

# HELP kafka_topic_partition_leader Leader Broker ID of this Topic/Partition
# TYPE kafka_topic_partition_leader gauge
kafka_topic_partition_leader{partition="0",topic="__consumer_offsets"} 0

# HELP kafka_topic_partition_leader_is_preferred 1 if Topic/Partition is using the Preferred Broker
# TYPE kafka_topic_partition_leader_is_preferred gauge
kafka_topic_partition_leader_is_preferred{partition="0",topic="__consumer_offsets"} 1

# HELP kafka_topic_partition_replicas Number of Replicas for this Topic/Partition
# TYPE kafka_topic_partition_replicas gauge
kafka_topic_partition_replicas{partition="0",topic="__consumer_offsets"} 3

# HELP kafka_topic_partition_under_replicated_partition 1 if Topic/Partition is under Replicated
# TYPE kafka_topic_partition_under_replicated_partition gauge
kafka_topic_partition_under_replicated_partition{partition="0",topic="__consumer_offsets"} 0

Consumer Groups

Metrics details

Name Exposed informations
kafka_consumergroup_current_offset Current Offset of a ConsumerGroup at Topic/Partition
kafka_consumergroup_lag Current Approximate Lag of a ConsumerGroup at Topic/Partition
kafka_consumergroupzookeeper_lag_zookeeper Current Approximate Lag(zookeeper) of a ConsumerGroup at Topic/Partition

Important Note

To be able to collect the metrics kafka_consumergroupzookeeper_lag_zookeeper, you must set the following flags:

  • use.consumelag.zookeeper: enable collect consume lag from zookeeper
  • zookeeper.server: address for connection to zookeeper

Metrics output example

# HELP kafka_consumergroup_current_offset Current Offset of a ConsumerGroup at Topic/Partition
# TYPE kafka_consumergroup_current_offset gauge
kafka_consumergroup_current_offset{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} -1

# HELP kafka_consumergroup_lag Current Approximate Lag of a ConsumerGroup at Topic/Partition
# TYPE kafka_consumergroup_lag gauge
kafka_consumergroup_lag{consumergroup="KMOffsetCache-kafka-manager-3806276532-ml44w",partition="0",topic="__consumer_offsets"} 1

Grafana Dashboard

Grafana Dashboard ID: 7589, name: Kafka Exporter Overview.

For details of the dashboard please see Kafka Exporter Overview.

Contribute

If you like Kafka Exporter, please give me a star. This will help more people know Kafka Exporter.

Please feel free to send me pull requests.

Contributors ✨

Thanks goes to these wonderful people:

Star ⭐

Stargazers over time

Donation

Your donation will encourage me to continue to improve Kafka Exporter. Support Alipay donation.

License

Code is licensed under the Apache License 2.0.

kafka_exporter's People

Contributors

alesj avatar alx-th avatar avanier avatar caizhihao1 avatar chenwumail avatar crypto89 avatar danielqsj avatar darklore avatar dependabot[bot] avatar gpaggi avatar hateeyan avatar iamgd67 avatar jorgelbg avatar joway avatar matheusdaluz avatar mihail-i4v avatar nandanrao avatar nikunjy avatar orlandoburli avatar piclemx avatar qclaogui avatar rashmichandrashekar avatar saurabhshendye avatar sidong-wei avatar skiloop avatar sknot-rh avatar spirin22 avatar squ1d123 avatar wangqinghuan avatar zeoses avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kafka_exporter's Issues

sasl.username/password requirement in default, can't see lag in metrics

Hello i found 2 issues in v 1.0

kafka_exporter requirements sasl.username/password without sasl.enabled.

for avoid it i just add this in running line
--sasl.username=" " --sasl.password=" " (with empty string)

  1. i can't see lag in metrics, and can't understood why.
    in my exporter isn't there metrics
    kafka_consumergroup_current_offset
    kafka_consumergroup_lag

kafka 0.10.2.1
kafka-exporter 1.0.0
command line for run:
kafka_exporter --kafka.server=localhost:9092 --log.level="debug" --sasl.username=" " --sasl.password=" "

Failed to read SASL handshake header : EOF

I need to use the following configuration to connect to my Kafka broker. (e.g. used for kafka-console-consumer.sh )

security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="xxx" password="yyy";

I hit "Failed to read SASL handshake header : EOF" when using Kafka exporter with --sasl.enabled --sasl.username=xxx --sasl.password=yyy.

Appreciate suggestion for how to resolve the problem.

Invalid kafka_consumergroup_lag reported when there is no current-offset.

The kafka_consumergroup_lag reported by the kafka_exporter is invalid when there is no current offset for a partition. Here is an example with a topic with 3 partitions, but only 1 consumer.
When I perform the command: kafka-consumer-groups.sh --describe --group g I get the following results:

TOPIC                          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG        CONSUMER-ID                                       HOST                           CLIENT-ID
a          0          -               3083            -          consumer-17-4dfbf580-772d-4e3d-89fb-1ecbe2f2ec6d  /...                consumer-17
a         1          -               0               -          consumer-33-c3a75e00-d6ed-4725-842e-30753082566a  /...                consumer-33
a         2          4388            4388            0          consumer-4-2df6bdff-b47b-47f7-918f-1f2721325c91   /...                    consumer-4

As you can see the Lag for the partitions that are not consumed are denoted with '-'.

This is the metrics reported by the kafka_exporter:

kafka_consumergroup_current_offset{consumergroup="g",partition="0",topic="a"} -1
kafka_consumergroup_current_offset{consumergroup="g",partition="1",topic="a"} -1
kafka_consumergroup_current_offset{consumergroup="g",partition="2",topic="a"} 4388
kafka_consumergroup_lag{consumergroup="g",partition="0",topic="a"} 3084
kafka_consumergroup_lag{consumergroup="g",partition="1",topic="a"} 1
kafka_consumergroup_lag{consumergroup="g",partition="2",topic="a"} 0

As you can see for partition 0, we get a lag of 3084 and for partition 1 a lag of 1. Both are incorrect, if you look at the results of kafka-consumer-groups.sh, the lag is denoted with a '-', it is unknown. So it is strange to see a lag of 3084 or 1 back from the kafka_exporter.

Topic being recreated after deletion

Version: danielqsj/kafka-exporter:v1.1.0

Observation

I have a hunch that there is a race condition when:
Step 1. kafka-exporter -> Request for current topics
Step 2. Someone deletes a topic
Step 3. kafka-exporter -> Request for deleted topic metadata which cause the topic to be recreated with default parmameters.

Reproduce

While kafka exporter is running.

Step 1: Create Topic
kafka-topics --zookeeper $ZOOKEEPER --create --topic delete-me --partitions 3 --replication-factor 1

Step 2: Delete Topic
kafka-topics --zookeeper $ZOOKEEPER --delete --topic delete-me

Step 3: List topics; wait 0-30 seconds
kafka-topics --zookeeper $ZOOKEEPER --list

connect kafka

`
➜ kafka docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=localhost:9092 --kafka.server=localhost:9093 --kafka.server=localhost:9094
INFO[0000] Starting kafka_exporter (version=1.1.0, branch=HEAD, revision=b87796c32ab4376fe6d47fb17df6a41cca21b046) source="kafka_exporter.go:463"
INFO[0000] Build context (go=go1.9, user=travis@travis-job-a0256c7c-80b7-43b1-8756-c973a09fc3c9, date=20180521-08:56:09) source="kafka_exporter.go:464"
ERRO[0000] Error Init Kafka Client source="kafka_exporter.go:209"
panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

goroutine 1 [running]:
main.NewExporter(0xc420144440, 0x3, 0x4, 0x100, 0x8ffc58, 0x0, 0x8ffc58, 0x0, 0x0, 0x8ffc58, ...)
/home/travis/gopath/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:210 +0x94a
main.main()
/home/travis/gopath/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:470 +0x1f95
`

Noticed Abnormal GC/Memory Usage

All three lines represent the residential memory size reported (process_resident_memory_bytes{}) from the same environment/cluster. Not sure why in certain cases, it regulates itself and in others it does not. I am looking further into it, but wanted to share the current situation just in case anyone had any insight.

4w time period: process_resident_memory_bytes{}

screen shot 2018-01-10 at 10 25 06 am

Error when connecting via SSL with client certificate

Enabling TLS and verifying the server certificate with tls.ca-file seems to work fine. As soon as I pass tls.key-file and tls.cert-file, I receive the following log output:

time="2018-07-26T13:23:34Z" level=info msg="Starting kafka_exporter (version=1.2.0, branch=HEAD, revision=830660212e6c109e69dcb1cb58f5159fe3b38903)" source="kafka_exporter.go:474"
time="2018-07-26T13:23:34Z" level=info msg="Build context (go=go1.9, user=travis@travis-job-2baddb7a-390a-490a-9723-d39790ff6c41, date=20180707-14:33:44)" source="kafka_exporter.go:475"
time="2018-07-26T13:23:35Z" level=error msg="Error Init Kafka Client" source="kafka_exporter.go:210"
panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

Connecting to the cluster using the exact same certificates is possible with the Python client.

Missing ConsumerGroup metrics

I am using:
kafka_exporter: 1.1.0
Kafka Version: 0.10.1.0

./kafka_exporter --kafka.server=127.0.0.1:9092 --kafka.version=0.10.1.0 --log.level=debug INFO[0000] Starting kafka_exporter (version=1.1.0, branch=HEAD, revision=84cee7e0672f0161c05a93557cc2794a48c8a024) source="kafka_exporter.go:463" INFO[0000] Build context (go=go1.10.1, user=root@1df3c8748bcd, date=20180407-12:31:56) source="kafka_exporter.go:464" INFO[0000] Done Init Clients source="kafka_exporter.go:212" INFO[0000] Listening on :9308 source="kafka_exporter.go:488"

I also tried --log.enable-sarama but I didn't get anything helpful:

[root@ip-10-89-1-90 kafka_exporter-1.1.0.linux-amd64]# ./kafka_exporter --kafka.server=127.0.0.1:9092 --kafka.version=0.10.1.0 --log.enable-sarama INFO[0000] Starting kafka_exporter (version=1.1.0, branch=HEAD, revision=84cee7e0672f0161c05a93557cc2794a48c8a024) source="kafka_exporter.go:463" INFO[0000] Build context (go=go1.10.1, user=root@1df3c8748bcd, date=20180407-12:31:56) source="kafka_exporter.go:464" [sarama] 2018/06/06 01:56:45 Initializing new client [sarama] 2018/06/06 01:56:45 client/metadata fetching metadata for all topics from broker 127.0.0.1:9092 [sarama] 2018/06/06 01:56:45 Connected to broker at 127.0.0.1:9092 (unregistered) [sarama] 2018/06/06 01:56:45 client/brokers registered new broker #0 at 127.0.0.1:9092 [sarama] 2018/06/06 01:56:45 Successfully initialized new client INFO[0000] Done Init Clients source="kafka_exporter.go:212" INFO[0000] Listening on :9308 source="kafka_exporter.go:488" [sarama] 2018/06/06 01:56:47 client/metadata fetching metadata for all topics from broker 127.0.0.1:9092 [sarama] 2018/06/06 01:56:47 Connected to broker at 127.0.0.1:9092 (registered as #0) [sarama] 2018/06/06 01:56:47 Closed connection to broker 127.0.0.1:9092 [sarama] 2018/06/06 01:56:54 client/metadata fetching metadata for all topics from broker 127.0.0.1:9092 [sarama] 2018/06/06 01:56:54 Connected to broker at 127.0.0.1:9092 (registered as #0) [sarama] 2018/06/06 01:56:54 Closed connection to broker 127.0.0.1:9092

Metrics output:

HELP go_gc_duration_seconds A summary of the GC invocation durations.

TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile="0"} 3.7732e-05
go_gc_duration_seconds{quantile="0.25"} 3.7732e-05
go_gc_duration_seconds{quantile="0.5"} 4.5654e-05
go_gc_duration_seconds{quantile="0.75"} 0.000156033
go_gc_duration_seconds{quantile="1"} 0.000156033
go_gc_duration_seconds_sum 0.000239419
go_gc_duration_seconds_count 3

HELP go_goroutines Number of goroutines that currently exist.

TYPE go_goroutines gauge

go_goroutines 10

HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.

TYPE go_memstats_alloc_bytes gauge

go_memstats_alloc_bytes 568448

HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.

TYPE go_memstats_alloc_bytes_total counter

go_memstats_alloc_bytes_total 7.617624e+06

HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.

TYPE go_memstats_buck_hash_sys_bytes gauge

go_memstats_buck_hash_sys_bytes 1.443617e+06

HELP go_memstats_frees_total Total number of frees.

TYPE go_memstats_frees_total counter

go_memstats_frees_total 16058

HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.

TYPE go_memstats_gc_sys_bytes gauge

go_memstats_gc_sys_bytes 405504

HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.

TYPE go_memstats_heap_alloc_bytes gauge

go_memstats_heap_alloc_bytes 568448

HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.

TYPE go_memstats_heap_idle_bytes gauge

go_memstats_heap_idle_bytes 4.431872e+06

HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.

TYPE go_memstats_heap_inuse_bytes gauge

go_memstats_heap_inuse_bytes 1.400832e+06

HELP go_memstats_heap_objects Number of allocated objects.

TYPE go_memstats_heap_objects gauge

go_memstats_heap_objects 5217

HELP go_memstats_heap_released_bytes_total Total number of heap bytes released to OS.

TYPE go_memstats_heap_released_bytes_total counter

go_memstats_heap_released_bytes_total 0

HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.

TYPE go_memstats_heap_sys_bytes gauge

go_memstats_heap_sys_bytes 5.832704e+06

HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.

TYPE go_memstats_last_gc_time_seconds gauge

go_memstats_last_gc_time_seconds 1.5282498239396126e+09

HELP go_memstats_lookups_total Total number of pointer lookups.

TYPE go_memstats_lookups_total counter

go_memstats_lookups_total 80

HELP go_memstats_mallocs_total Total number of mallocs.

TYPE go_memstats_mallocs_total counter

go_memstats_mallocs_total 21275

HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.

TYPE go_memstats_mcache_inuse_bytes gauge

go_memstats_mcache_inuse_bytes 3472

HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.

TYPE go_memstats_mcache_sys_bytes gauge

go_memstats_mcache_sys_bytes 16384

HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.

TYPE go_memstats_mspan_inuse_bytes gauge

go_memstats_mspan_inuse_bytes 24472

HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.

TYPE go_memstats_mspan_sys_bytes gauge

go_memstats_mspan_sys_bytes 49152

HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.

TYPE go_memstats_next_gc_bytes gauge

go_memstats_next_gc_bytes 4.194304e+06

HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.

TYPE go_memstats_other_sys_bytes gauge

go_memstats_other_sys_bytes 778711

HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.

TYPE go_memstats_stack_inuse_bytes gauge

go_memstats_stack_inuse_bytes 458752

HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.

TYPE go_memstats_stack_sys_bytes gauge

go_memstats_stack_sys_bytes 458752

HELP go_memstats_sys_bytes Number of bytes obtained by system. Sum of all system allocations.

TYPE go_memstats_sys_bytes gauge

go_memstats_sys_bytes 8.984824e+06

HELP kafka_brokers Number of Brokers in the Kafka Cluster.

TYPE kafka_brokers gauge

kafka_brokers 1

HELP kafka_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which kafka_exporter was built.

TYPE kafka_exporter_build_info gauge

kafka_exporter_build_info{branch="HEAD",goversion="go1.10.1",revision="84cee7e0672f0161c05a93557cc2794a48c8a024",version="1.1.0"} 1

HELP kafka_topic_partition_current_offset Current Offset of a Broker at Topic/Partition

TYPE kafka_topic_partition_current_offset gauge

kafka_topic_partition_current_offset{partition="0",topic="venkata1"} 6
kafka_topic_partition_current_offset{partition="1",topic="venkata1"} 6
kafka_topic_partition_current_offset{partition="2",topic="venkata1"} 6

HELP kafka_topic_partition_in_sync_replica Number of In-Sync Replicas for this Topic/Partition

TYPE kafka_topic_partition_in_sync_replica gauge

kafka_topic_partition_in_sync_replica{partition="0",topic="venkata1"} 1
kafka_topic_partition_in_sync_replica{partition="1",topic="venkata1"} 1
kafka_topic_partition_in_sync_replica{partition="2",topic="venkata1"} 1

HELP kafka_topic_partition_leader Leader Broker ID of this Topic/Partition

TYPE kafka_topic_partition_leader gauge

kafka_topic_partition_leader{partition="0",topic="venkata1"} 0
kafka_topic_partition_leader{partition="1",topic="venkata1"} 0
kafka_topic_partition_leader{partition="2",topic="venkata1"} 0

HELP kafka_topic_partition_leader_is_preferred 1 if Topic/Partition is using the Preferred Broker

TYPE kafka_topic_partition_leader_is_preferred gauge

kafka_topic_partition_leader_is_preferred{partition="0",topic="venkata1"} 1
kafka_topic_partition_leader_is_preferred{partition="1",topic="venkata1"} 1
kafka_topic_partition_leader_is_preferred{partition="2",topic="venkata1"} 1

HELP kafka_topic_partition_oldest_offset Oldest Offset of a Broker at Topic/Partition

TYPE kafka_topic_partition_oldest_offset gauge

kafka_topic_partition_oldest_offset{partition="0",topic="venkata1"} 0
kafka_topic_partition_oldest_offset{partition="1",topic="venkata1"} 0
kafka_topic_partition_oldest_offset{partition="2",topic="venkata1"} 0

HELP kafka_topic_partition_replicas Number of Replicas for this Topic/Partition

TYPE kafka_topic_partition_replicas gauge

kafka_topic_partition_replicas{partition="0",topic="venkata1"} 1
kafka_topic_partition_replicas{partition="1",topic="venkata1"} 1
kafka_topic_partition_replicas{partition="2",topic="venkata1"} 1

HELP kafka_topic_partition_under_replicated_partition 1 if Topic/Partition is under Replicated

TYPE kafka_topic_partition_under_replicated_partition gauge

kafka_topic_partition_under_replicated_partition{partition="0",topic="venkata1"} 0
kafka_topic_partition_under_replicated_partition{partition="1",topic="venkata1"} 0
kafka_topic_partition_under_replicated_partition{partition="2",topic="venkata1"} 0

HELP kafka_topic_partitions Number of partitions for this Topic

TYPE kafka_topic_partitions gauge

kafka_topic_partitions{topic="venkata1"} 3

HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.

TYPE process_cpu_seconds_total counter

process_cpu_seconds_total 0.02

HELP process_max_fds Maximum number of open file descriptors.

TYPE process_max_fds gauge

process_max_fds 1024

HELP process_open_fds Number of open file descriptors.

TYPE process_open_fds gauge

process_open_fds 8

HELP process_resident_memory_bytes Resident memory size in bytes.

TYPE process_resident_memory_bytes gauge

process_resident_memory_bytes 8.835072e+06

HELP process_start_time_seconds Start time of the process since unix epoch in seconds.

TYPE process_start_time_seconds gauge

process_start_time_seconds 1.52824969805e+09

HELP process_virtual_memory_bytes Virtual memory size in bytes.

TYPE process_virtual_memory_bytes gauge

process_virtual_memory_bytes 1.7281024e+07

I cannot find the following metrics:
kafka_consumergroup_current_offset
kafka_consumergroup_lag

@danielqsj Any ideas on what I might be doing incorrectly? Or is this a known bug?

"Index out of range" error when cluster is rebalancing

panic: runtime error: index out of range

goroutine 729480 [running]:
main.(*Exporter).Collect(0xc4200df0a0, 0xc42eeb6a80)
	/go/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:201 +0x2d15
github.com/danielqsj/kafka_exporter/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func2(0xc42ef98230, 0xc42eeb6a80, 0xb3ce60, 0xc4200df0a0)
	/go/src/github.com/danielqsj/kafka_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:433 +0x61
created by github.com/danielqsj/kafka_exporter/vendor/github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/src/github.com/danielqsj/kafka_exporter/vendor/github.com/prometheus/client_golang/prometheus/registry.go:431 +0x2e1

kafka_exporter, version 0.2.0 (branch: HEAD, revision: 9aecc7d)

Slow for clusters with a lot of topics with a lot of partitions.

Working with about 60+ topics with 100 partitions each, it takes about 5 minutes to pull the metrics, and that is only scraping from a single stack. I'll have to test and see how it does with multiple scrapes.

I suspect it would benefit from both parallelism and probably some ability to enable caching.

Health of Cluster

How to monitor the number of healthy machines in Kafka Cluster?
Why the "kafka_brokers" did not change when I kill a machines ?
Must I to deploy a kafk_export for each machine?

INVALID : Removing special character from metric and label name

If I have a "-' in my kafka topic name the metric exported by the exporter in promethues at time of scraping.

Need changes in "KafkaCollector.java"
String metricName = topic.replaceAll("[^A-Za-z0-9]","") + "" + record.getName().replaceAll("[^A-Za-z0-9]","");
labelNames.add(entry.getKey().replaceAll("[^A-Za-z0-9]","
"));

Unable to see New Topics

whenever I introduce a new topic to the Kafka cluster, unless i restarted the exporter it won't show up in the metrics. Is this something that has a timer, or perhaps isn't being picked up for some reason? Is it intended?

prometheus output context deadline exceeded

I install kafka_exporter for kafka cluster in prometheus environment by docker swarm like this:
kafka_exporter:
image: danielqsj/kafka-exporter:v1.1.0
volumes:
- /etc/localtime:/etc/localtime:ro
command: --kafka.server=10.110.25.212:9094 --kafka.server=10.110.25.213:9094 --kafka.server=10.110.25.214:9094
ports:
- 9308:9308
networks:
- monitor
Prometheus output kafka_exporter context deadline exceeded, I found error in exporter log:
time="2018-05-22T14:26:49+08:00" level=error msg="Cannot get leader of topic __consumer_offsets partition 20: kafka server: In the middle of a leadership election, there is currently no leader for this partition and hence it is unavailable for writes." source="kafka_exporter.go:273"

Metrics Do Not Return Consumer Group Lag Time

Hello,

My team really loves this project and we are very thankful that you wrote this.

We are seeing one slight issue where the exporter is not reporting on Consumer Group Lag Time.
Here are the metrics we are not seeing:

kafka_consumergroup_lag
kafka_consumergroup_current_offset

All of the other metrics listed in the README have been observed.

We are running this on Kubernetes 1.7 using the Docker image found in the documentation:
https://hub.docker.com/r/danielqsj/kafka-exporter/

Here is the version of the image that we are using:
danielqsj/kafka-exporter:1.0.1

The version of Kafka that we are using is 0.10.0.1.
We also tried an updated version of Kafka, at version 0.10.1.0 and still encountered the same issues.

Here is the command we are running when starting the container:

/bin/kafka_exporter --kafka.server=host1:9092 --kafka.server=host2:9092 --kafka.server=host3:9092 --log.enable-sarama --log.level=debug --topic.filter=(topic1|topic2)

We appreciate you taking a look into this issue, please let us know if we can provide any additional diagnostic information to solve this.

Thank You,
Matt

Cache the topics metadata for some time to reduce load on the cluster

When the exporter is connected to a big cluster each call to the RefreshMetadata() method is expensive because Kafka needs to pull data from the entire cluster. Since the metadata info doesn't change very often it would be helpful to be able to control this with a refresh interval.

This means that we could keep a low scrape interval on the Prometheus side for the lag/offsets and just query the metadata every once in a while. The specific configuration would be dependant on how often a new topic/partition is added/changed in the cluster.

We could use a custom flag like --refresh.metadata with a default value of 30s. Still, the metrics would be accurate because we don't cache the lag/offsets, only the metadata.

panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

[root@hadoop001 kafka_exporter-1.0.1.linux-amd64]# ./kafka_exporter \
> --kafka.server=192.169.137.141:9092 \
> --kafka.server=192.169.137.142:9092 \
> --kafka.server=192.169.137.143:9092 

INFO[0000] Starting kafka_exporter (version=1.0.1, branch=HEAD, revision=f1639a649ebcfe11bce6782ab281a59d05b6d9e9) source="kafka_exporter.go:432"
INFO[0000] Build context (go=go1.9.2, user=root@bc4118dc5bad, date=20180112-13:07:13) source="kafka_exporter.go:433"

Error Init Kafka Client
panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

goroutine 1 [running]:
main.NewExporter(0xc42000ea80, 0x3, 0x4, 0x100, 0x8ea778, 0x0, 0x8ea778, 0x0, 0x0, 0x8ea778, ...)
/go/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:205 +0x92d
main.main()
/go/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:439 +0x1d6c
[root@hadoop001 kafka_exporter-1.0.1.linux-amd64]#

Update prometheus server?

I am unable to add it as a datasource using both grafana 5.1.5 and 5.2.2.
All I am seeing is an error icon with no text.
image

Sometimes, kafka_exporter cannot connect to broker and generate too many sockets.

Usually it works well.

Sometimes it could not connect to brokers and generates too many sockets (up to about 1,000 sockets open) and generate following error messages.

Whenever I restart kafka_exporter, it works well again.


Jul 20 12:36:38 172.27.115.150 kafka_exporter[20251]: time="2018-07-20T12:36:38+09:00" level=error msg="Can't get current offset of topic test partition %!s(int32=10): kafka: broker not connected" source="kafka_exporter.go:271"
...

screen shot 2018-07-20 at 12 42 53 pm

get log size

hello, how can I get number of messages per partition?

can't monitor 0.8.2 version kafka

got
ERRO[0000] Error Init Kafka Client source="kafka_exporter.go:170"
panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

when connecting to 0.8.2 version kafka

No metrics when topics or partitions are not available in kafka cluster

I noticed errors like,

time="2018-06-21T14:39:09Z" level=error msg="Cannot get in-sync replicas of topic xxxx xx: kafka server: Replica information not available, one or more brokers are down." source="kafka_exporter.go:312"

I don't understand what it the point of using this exporter when even it itself failed when one or partition or topic is not available.

Consumer lag as time.

The current consumer lag is just a count of missed messages, but sometimes it is useful to know how far back in time a consumer group is. If I look at a consumer that is 10,000 messages behind, I can't easily tell if the client is 1 minute/hour/day behind.

I'm not sure how hard it would be to export the timestamp of the latest offset and the consumer's current offset, but it would be really useful.

Unable to understand lags result.

I am unable to understand the consumerlag metrics.
some times it is reporting + , _ and 0 results in value column.

what is +lag . what is -lag ..

[kafka_consumergroup_lag{consumergroup="Service",instance="172.28.18.166:9308",job="qa-kafka",partition="0",topic="asset"} | -2

kafka_consumergroup_lag{consumergroup="Service",instance="172.28.18.166:9308",job="qa-kafka",partition="1",topic="asset"} | -1

kafka_consumergroup_lag{consumergroup="Service",instance="172.28.18.166:9308",job="qa-kafka",partition="2",topic="asset"} | 0]

Grafana dashboard

This exporter is great because it gives exactly the semantic metrics that are useful with Kafka (rather than the generic JMX ones).

Is there available a Grafana dashboard preconfigured for it?

Filter Topics

Reading through the code it doesn't appear that you can filter which topics. Is this possible?

If not, would be nice to be able to do a regex based filter on topics. This would allow for places where "multi-tenancy" is happening in the broker to allow for collecting metrics on only the topics that we are interested in.

Without this the amount of metrics collected is very large.

Kerberos support

Hi,

Anyone using this exported againt a kerberized cluster?

We have a kerberized cluster with TLS enabled, Had some issues with TLS but with
info from [https://github.com//issues/51] this seems solved.
But not sure, as I still get :
panic: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)
I do a kinit before starting the exporter.

Tried with varioues combinations of the following:

./kafka_exporter --kafka.server=node1:9093
--kafka.server=node2:9093
--tls.enabled
--tls.ca-file=//ca.crt
--tls.cert-file=//kafka-exporter-cert.pem
--tls.key-file=//kafka-exporter-key.pem
--web.listen-address=9308

--sasl.enabled
--sasl.username=
--sasl.password= \

--no-sasl.handshake

any ideas?

Regards

Jan

SSL kafka exporter

Hello,
Can the same exporter be used for SSL kafka with listeners on 9093?

How to use SSL connection

in my site,i use ssl to connect kafka cluster
eg:
security.protocol=SSL
ssl.truststore.location=/kafka-ssl-client/client_java_client.truststore.jks
ssl.keystore.location=/client_java_client.keystore.jks
ssl.truststore.password=test
ssl.keystore.password=test
ssl.key.password=test
how can i use kafka_exporter to connect success?

Can't override default value of kafka:9092 for kafka.server

Hi,

We're unable to completely override the default value of kafka:9092 -- when we pass a kafka.server uri, it seems to append it instead. I think it is due to how Kingpin treats composite types (in this case, a []string) -- check out alecthomas/kingpin#206.

We think this is true because the Sarama client connects to and outputs successful logging in our passed kafka.server parameter, but also barks about there being no such host at kafka:9092 but that's because there isn't. That leads me to believe Kingpin is appending our param in addition to the default value of kafka:9092:

kingpin.Flag("kafka.server", "Address (host:port) of Kafka server.").Default("kafka:9092").StringsVar(&opts.uri)

We're more than happy to submit a PR for a fix but how should we do it? We could remove the default value. Or adjust the usage of Kingpin to override instead of append?

Thanks!

Cannot get replicas and in-sync replicas of topic when I killed a kafka instance

When I killed a kafka instace, kafka_exporter reported following error:
time="2018-05-08T16:00:00+08:00" level=error msg="Cannot get in-sync replicas of topic test partition 0: kafka server: Replica information not available, one or more brokers are down." source="kafka_exporter.go:312"
time="2018-05-08T16:00:00+08:00" level=error msg="Cannot get replicas of topic 6547921892168160511_192-168-1-88_8250 partition 0: kafka server: Replica information not available, one or more brokers are down." source="kafka_exporter.go:303"

The kafka cluster has three nodes, the other two nodes are alive, but there're no kafka_topic_partition_in_sync_replica and kafka_topic_partition_replicas metrics any more.

Using on 2-way SSL secured Kafka Cluster, error reading /etc/kafka/secrets/ca-file, certificate and key must be supplied as a pair"

Hi, thanks for uploading. I've been using this repo to great effect so far, but am encountering issues when running against an authenticating Kafka Cluster.

Now that my cluster has ACLs and client certs manditory, I need to figure out how to get this exporter to make use of client certs. When configuring the tls.ca-file switch, I try pointing it to a ca-file that looks something like this:

-----BEGIN CERTIFICATE-----
MIIDkzCCAnugAwIBAgIJAJtmWRyaaaaaaaaaaaaaaaaaaaaaaaaaaa...
................
.................==
-----END CERTIFICATE-----

But I get an error on boot:

error reading /etc/kafka/secrets/ca-file, certificate and key must be supplied as a pair"

After looking into things, it looks like maybe I need to point to a file that consists of server.crt and server.key? I'll be giving that a try tomorrow but thought I'd leave bread crumbs here as that it's the file types I've grown accustomed to using after working with Kafka are those .jks files.

高CPU占用问题

你好,部署到服务器后发现CPU占用很高,请问如何解决?谢谢

Run into some issues

I run the exporter but get the following errors. Any help will be appreciated!

ERRO[0349] Can't get current offset of topic <....> partition %!s(int32=1): EOF source="kafka_exporter.go:271"
ERRO[0349] Can't get oldest offset of topic <....> partition %!s(int32=1): EOF source="kafka_exporter.go:283"

Does any one know how can I solve this?

run some hours cant't get metrics

the kafka_exporter error log:
time="2018-02-11T13:48:50+08:00" level=error msg="Can't get current offset of topic vpcprod_sanya_onlineSwitch partition %!s(int32=0): kafka: broker not connected" source="kafka_exporter.go:271"
time="2018-02-11T13:48:50+08:00" level=error msg="Can't get current offset of topic prod_erp_delivery_data partition %!s(int32=1): kafka: broker not connected" source="kafka_exporter.go:271"
time="2018-02-11T13:48:50+08:00" level=error msg="Can't get current offset of topic prod_erp_delivery_data partition %!s(int32=1): kafka: broker not connected" source="kafka_exporter.go:271"
time="2018-02-11T13:48:50+08:00" level=error msg="Can't get current offset of topic vpcprod_sanya_onlineSwitch partition %!s(int32=0): kafka: broker not connected" source="kafka_exporter.go:271"

Metric for tracking timestamp of oldest message

Scenario: Use a size-based retention strategy. The idea is that I want to track the timestamp of the oldest available message in each partition. This way I know how long time consumers can be down (or at least not processing) before data is lost before being processed.

If I'm not mistaken each message has metadata associated with it where it contains a timestamp, and if that is the case, it should be straight forward to collect and expose those timestamps.

The idea is that given an increasing amount of throughput, I'd like to make sure that I have at least X amount of time where it is safe for consumers to be down.

not able to run docker image

Hi,

I am getting following error while trying to use docker image:

sudo docker run -ti --rm -p 9308:9308 danielqsj/kafka-exporter --kafka.server=kafka1ip:9092 --kafka.server=kafka2ip:9092 --kafka.server=kafka3ip:9092
INFO[0000] Starting kafka_exporter (version=1.2.0, branch=HEAD, revision=74450bffe4fd6df1a8c85c54bb2babb7506ac7fc) source="kafka_exporter.go:423"
INFO[0000] Build context (go=go1.9, user=travis@travis-job-999c3d4e-5314-489c-97c2-b78ee2befa68, date=20181008-03:07:38) source="kafka_exporter.go:424"
panic: runtime error: index out of range

goroutine 1 [running]:
main.main()
/home/travis/gopath/src/github.com/danielqsj/kafka_exporter/kafka_exporter.go:429 +0x520c

can you please help?
go version: go version go1.9.3 linux/amd64

[question] No kafka metric in Grafana/prometheus

Thanks for this repo. I successfully deployed helm chart prometheus operator, kube-prometheus and kafka (tried both image danielqsj/kafka_exporter v1.0.1 and v1.2.0).

Install with default value mostly, rbac are enabled.

I can see 3 up nodes in Kafka target list in prometheus, but when go in Grafana, I can's see any kafka metric.

Anything I missed or what I can check to fix this issue?

env:

kuberentes: AWS EKS (kubernetes version is 1.10.x)
grafana dashboard: kafka overview

brach

Hello, to do a pull request a need create a branch ?

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.