opstree / druid-exporter Goto Github PK

View Code? Open in Web Editor NEW

109.0 6.0 56.0 12.92 MB

A Golang based exporter captures druid API related metrics and receives druid-emitting HTTP JSON data.

Home Page: https://opstree.github.io

License: Apache License 2.0

Go 91.77% Dockerfile 1.68% Makefile 2.10% Mustache 4.45%

prometheus prometheus-exporter grafana druid druid-io exporter kubernetes golang druid-exporter apache hacktoberfest

druid-exporter's People

Contributors

Stargazers

Watchers

Forkers

bushwhackr banlilin scotforescout fcivaner homi-e adelcast prashantkumar1291998 igorbelitei lineclappe ggarcia-te iamabhishek-dubey zwb-github liubo-it merouaneben lijingzhuang pedrohff cheevo aka-shi batas udayjalagam vrrs wyndhblb kimbad ajitchahal mstein11 nollain stephan3555 ablinkinz crowdsalat abothli shais-mmg devpokhariya volhakurylionak ramesharun sangmt acherla zhangjianchuang layoaster tungvc derrekwoodworth akhilvenkata bingclouds mysky528 wiegandf youngwookim 1345190856 thanawatp tvm18860 wiquan wiquanappd appdynamics beketon nem0s raisonqyc mrdjanpoletanovic rumeshmadhusanka

druid-exporter's Issues

Unsupported protocol scheme for Druid 0.20.1

Hi!
I was following the installation guide in the documentation. Some of the metrics are available. For other metrics the druid-exporter throws those error messages:

{"level":"error","msg":"Error on GET request for druid healthcheck: Get "druid-router.druid:8888/status/health": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot collect data for druid segments: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot retrieve data for druid's workers: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot retrieve data for druid's tasks: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot collect data for druid's supervisors: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:23Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:27Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:31Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/broker","time":"2021-03-04T15:05:33Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:36Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/router","time":"2021-03-04T15:05:37Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:39Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:41Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:41Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:44Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:48Z"}
{"level":"error","msg":"Error on GET request for druid healthcheck: Get "druid-router.druid:8888/status/health": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot collect data for druid segments: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot retrieve data for druid's workers: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot retrieve data for druid's tasks: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot collect data for druid's supervisors: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}

Is this an issue with the newest Druid Version 0.20.1?

Error in decoding JSON sent by druid

Hello!
Thank You for your great work!
Would you be able to fix this bug ?

{"err":"json: cannot unmarshal number 0.2525 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.180Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.180Z"}
{"err":"json: cannot unmarshal number 0.038 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.497Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.497Z"}
{"err":"json: cannot unmarshal number 0.1245 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.862Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.862Z"}
{"err":"json: cannot unmarshal number 0.052333333333333336 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:58.540Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:58.541Z"}

Missing druid_emitted_metrics

Druid metrics are missing after implementing commit: 7596baf

Steps to reproduce:

Start druid server
Start druid-exporter @ any commit on or after 7596baf
Review that /metrics page is missing all druid_emitted_metrics values

/metrics on commit `7596baf` (First failed commit)

git reset --hard && git checkout 7596baf && make build-code && ./druid-exporter --druid.uri=http://localhost:8888 --log.level=debug

# HELP druid_health_status Health of Druid, 1 is healthy 0 is not
# TYPE druid_health_status counter
druid_health_status{druid="health"} 1
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.3313e-05
go_gc_duration_seconds{quantile="0.25"} 7.3313e-05
go_gc_duration_seconds{quantile="0.5"} 7.3313e-05
go_gc_duration_seconds{quantile="0.75"} 7.3313e-05
go_gc_duration_seconds{quantile="1"} 7.3313e-05
go_gc_duration_seconds_sum 7.3313e-05
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 12
...

/metrics on commit `f7e673f` (Last good commit)

git reset --hard && git checkout f7e673f && make build-code && ./druid-exporter --druid.uri=http://localhost:8888 --log.level=debug

# HELP druid_emitted_metrics Druid emitted metrics from druid emitter
# TYPE druid_emitted_metrics gauge
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="compact-task-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jetty-numOpenConnections",service="druid-coordinator"} 6
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-capacity",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-used",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-percent",service="druid-coordinator"} 0.01749650069986003
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-sys",service="druid-coordinator"} 170
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-total",service="druid-coordinator"} 1050
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-user",service="druid-coordinator"} 880
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-cpu",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-capacity",service="druid-coordinator"} 9.9614728e+07
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-init",service="druid-coordinator"} 2.52706824e+08
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-max",service="druid-coordinator"} 2.68435464e+08
...
...
...
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

Panic while collecting emitted metrics

In production, we frequently see errors similar to this:

2021/02/04 16:13:23 http: panic serving *.*.*.*:29223: interface conversion: interface {} is nil, not string
goroutine 881818 [running]:
net/http.(*conn).serve.func1(0xc000c82280)
        /usr/local/go/src/net/http/server.go:1800 +0x139
panic(0x8b91a0, 0xc000494060)
        /usr/local/go/src/runtime/panic.go:975 +0x3e3
druid-exporter/listener.DruidHTTPEndpoint.func1(0x9f4600, 0xc0007f61c0, 0xc00021c600)
        /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1596
net/http.HandlerFunc.ServeHTTP(0xc0001372c0, 0x9f4600, 0xc0007f61c0, 0xc00021c600)
        /usr/local/go/src/net/http/server.go:2041 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001a6000, 0x9f4600, 0xc0007f61c0, 0xc00021c400)
        /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc0001b4000, 0x9f4600, 0xc0007f61c0, 0xc00021c400)
        /usr/local/go/src/net/http/server.go:2836 +0xa3
net/http.(*conn).serve(0xc000c82280, 0x9f5cc0, 0xc0010f0740)
        /usr/local/go/src/net/http/server.go:1924 +0x86c
created by net/http.(*Server).Serve
        /usr/local/go/src/net/http/server.go:2962 +0x35c
time="2021-02-04T16:13:23Z" level=info msg="Successfully collected data from druid emitter, broker"

These errors seem to correspond with blank stretches in our Grafana dashboard that pull this data from Prometheus.

We are using Druid 0.19.0, and using the Docker image quay.io/opstree/druid-exporter:v0.8 deployed to Kubernetes with the Helm chart. We have a Kubernetes Ingress forwarding the metrics into Kubernetes.

Exporter suddenly fails to show any metrics.

Ok, I've got myself an X Files case here.
Our druid-exporter suddenly died yesterday around 20:00 (CET).
This are the logs:

Jan 21 18:55:31 druid-exporter-1 druid-prometheus-exporter[2114]: time="2021-01-21T18:55:31Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Main process exited, code=killed, status=9/KILL
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Failed with result 'signal'.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Service RestartSec=100ms expired, scheduling restart.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Scheduled restart job, restart counter is at 1.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Stopped Apache Druid Prometheus Exporter..
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Started Apache Druid Prometheus Exporter..
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid exporter started listening on: 8080"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Metrics endpoint - http://0.0.0.0:8080/metrics"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:8080/druid"

Since then, we've got 0 metrics. Tried reinstalling, rebooting the instance, restarting the router, broker, overlord, coordinator... 0 metrics.

Logs are empty, besides the typical "starting 8080, endpoint X, endpoint Y"

A tcpdump shows that the instance is receiving metrics from druid.
curl-ing the metrics in druid-exporter returns metrics, but takes ages, just now I timed it:

$ time curl localhost:8080/metrics
[...]
real	1m38.608s
user	0m0.007s
sys	0m0.011s

Any idea what might have happened?

Panic when there is no available worker

If Druid cluster is in a state where there is no workers, druid-exporter will crash due to
hostname = workers[rand.Intn(len(workers))].hostname()

where rand.Intn takes an strictly positive argument, but workers is []

Should druid-exporter ignore those worker-related metrics where there is none?

Can we enable https endpoint for druid-exporter?

Currently i have successfully configured to listen to druid cluster and emit the metrics over http like
http://localhost:8080/metrics

Now do i have option to emit over https like
https://localhost:4443/metrics ?

Fix k8s environment variable conflict

TLS & User/Pass auth accessing druid coordinator endpoint

I didn't see an option to add in a username & password & specify a cacert when accessing a druid coordinator endpoint. Any chance to add those?

how to config to get more monitor ？

hello master :
i use the druid-exporter in my druid cluster ，i run it ./druid-exporter --durid.uri=http://172.16.1.84:8888 ，this is router node ，i visit http://172.16.1.84:8080/metrics , can see the monitoring items, but very few, only some druid task monitoring (druid_completed_tasks, druid_peding_tasks .......), druid segment monitoring (druid_segment_count, deruid_segment_size ......), druid_healthe ,druid_datasource, there are too few monitoring items. How to configure this to enable more monitoring?
i config the apache-druid-0.20.0/conf/druid/cluster/_common/common.runtime.properties ，

druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://172.16.1.84:8080/druid

But when I visited http://172.16.1.84:8080/druid, it was blank, and I couldn’t see any monitoring content.

please help me ,how to config can show more and more monitor？
thank You!

Provide more metrics

FEATURE REQUEST

Hey there,

additional Tasks Metrics would be useful to support HPA.

* druid_tasks_pending
* druid_tasks_running
* druid_tasks_waiting
* druid_tasks_completed

Any change that you would implement them?

Best

schmichri

Error: http panic serving druid: interface conversion: interface {} is nil, not string

OS: Debian 10
Go: go version go1.15.5 linux/amd64
druid-exporter: Downloaded and build 2020/11/13

This is showing on the logs a lot. I guess by the error that is some issue parsing the IP from druid:
DRUID_URL=XXX.YYY.ZZZ.AAA

http: panic serving XXX.YYY.ZZZ.AAA:60190: interface conversion: interface {} is nil, not string
goroutine 1074098 [running]:
net/http.(*conn).serve.func1(0xc00055a320)
#011/usr/local/go/src/net/http/server.go:1767 +0x139
panic(0x8f6ba0, 0xc06bd7b5c0)
#011/usr/local/go/src/runtime/panic.go:679 +0x1b2
druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc001110000, 0xc0012a4200)
#011/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
net/http.HandlerFunc.ServeHTTP(0xc00009bea0, 0xa415c0, 0xc001110000, 0xc0012a4200)
#011/usr/local/go/src/net/http/server.go:2007 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000118000, 0xa415c0, 0xc001110000, 0xc0012a4000)
#011/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc00012e000, 0xa415c0, 0xc001110000, 0xc0012a4000)
#011/usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc00055a320, 0xa42c80, 0xc00170e000)
#011/usr/local/go/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
#011/usr/local/go/src/net/http/server.go:2927 +0x38e

Error decoding JSON sent by druid: json: cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string

Hello,

I'm running the exporter using image quay.io/opstree/druid-exporter:v0.6 and Druid 0.18.0.

I am getting the below error when decoding the collected metric.

level=debug msg="Error decoding JSON sent by druid: json: cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string"

Regarding the error message, I suspect that maybe the exporter cannot parse metrics with some datasource values in some cases.

For instance, there is no druid_emitted_metrics created for the metric collected from druid/middleManager with the datasource below:
90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user

However I can see this datasource in druid_datasource metric.
druid_datasource{datasource="90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user"}

On the other hand, the metric below is parsed and exposed as prometheus metric.

{metrics 2020-06-17 11:53:57.146 +0000 UTC druid/middleManager 192.168.128.4:8117 0.18.0 ingest/sink/count 90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user_activity 0}

Prometheus metric is:
druid_emitted_metrics{datasource="90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user_activity",service="druid-middleManager", metric_name="ingest-sink-count"}

Versioning support

Is version 0.5 not Druid0.16 supported?

druid_health_status returns 0 but druid services return true for health status

when i use implementing commit: db1c4bd,
i got the stdout logs of druid exporter

the metric message:

the command:
/usr/local/bin/druid-exporter --druid.uri=http://10.18.64.247 --port=10086 --log.level=debug

Error with druid 0.19.0

Hi.
I just installed druid-exporter via docker image and getting below error.

2020/10/06 12:15:47 http: panic serving X.X.X.X:48434: interface conversion: interface {} is nil, not string
goroutine 21184 [running]:
net/http.(*conn).serve.func1(0xc0000b60a0)
/usr/local/go/src/net/http/server.go:1800 +0x139
panic(0x8b91a0, 0xc000c083c0)
/usr/local/go/src/runtime/panic.go:975 +0x3e3
druid-exporter/listener.DruidHTTPEndpoint.func1(0x9f4600, 0xc0000d00e0, 0xc0006a6200)
/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1596
net/http.HandlerFunc.ServeHTTP(0xc00008e4e0, 0x9f4600, 0xc0000d00e0, 0xc0006a6200)
/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0000b0000, 0x9f4600, 0xc0000d00e0, 0xc0006a6000)
/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc0000d0000, 0x9f4600, 0xc0000d00e0, 0xc0006a6000)
/usr/local/go/src/net/http/server.go:2836 +0xa3
net/http.(*conn).serve(0xc0000b60a0, 0x9f5cc0, 0xc0002e82c0)
/usr/local/go/src/net/http/server.go:1924 +0x86c
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2962 +0x35c

Druid version is 0.19.0 and config on druid side like this

druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.client.cache.CacheMonitor"]
druid.emitter.logging.logLevel=info
druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://exporter:8080/druid

Don't know what did I do wrong. Any idea?

Error on v0.9: Unable to read JSON response: Unexpected EOF

After the latest release (just downloaded and build) I can see this line happening quite often:
Jan 14 11:01:41 druid-exporter-1 druid-prometheus-exporter[2114]: time="2021-01-14T11:01:41Z" level=error msg="Unable to read JSON response: unexpected EOF"

Jan 13 release says 2020 when it should say 2021

Minor typo in the releases page, a little confusing.

/metrics return 500

An error has occurred while serving metrics:

39 error(s) occurred:
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-elapsed-time" > label:<name:"groupd_id" value:"index_kafka_xapi-elapsed-time" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-inventories" > label:<name:"groupd_id" value:"index_kafka_xapi-inventories" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label valu

panic: invalid argument to Intn goroutine

Version 0.9
Druid version 0.20.0

panic: invalid argument to Intn goroutine 23214 [running]:
math/rand.(*Rand).Intn(0xc00009e060, 0x0, 0xc00115e000)
	/usr/local/go/src/math/rand/rand.go:169 +0x9c
math/rand.Intn(...)
	/usr/local/go/src/math/rand/rand.go:337
druid-exporter/collector.(*MetricCollector).Collect(0xc000376140, 0xc0000ad200)
	/go/src/druid-exporter/collector/druid.go:171 +0xad2
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:443 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
	/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:454 +0x599

Can export metrics to prometheus

Hi folks, my team and I are try to send metrics to prometheus, we see that exporter is collecting data whoever we dont see this in prometheus, there is some config we have to do?
Using prometheus operator
Azure Kubernetes Services

Druid Metric Dimensions as Labels

Hello,

Druid emits some of the metrics with dimensions such as datasource, tier, taskid etc. It is possible to use these dimension values as Prometheus metric labels.

It would be beneficial to see metrics by datasource. Is there any plan to add labels (mostly datasource) to exported metrics?

panic serving 127.0.0.1:47654: interface conversion: interface {} is nil, not string

Hi,
Shortly after start i get this errors (i have metrics collected but this panic message is logged continuously):

Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Druid exporter started listening on: 8080"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Metrics endpoint - http://0.0.0.0:8080/metrics"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:8080/druid"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47654: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 12 [running]:
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve.func1(0xc0001da140)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1767 +0x139
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: panic(0x8f6ba0, 0xc000274720)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/runtime/panic.go:679 +0x1b2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001342a0, 0xc0000ded00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.HandlerFunc.ServeHTTP(0xc0000a7f60, 0xa415c0, 0xc0001342a0, 0xc0000ded00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2007 +0x44
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc000122000, 0xa415c0, 0xc0001342a0, 0xc0000deb00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.serverHandler.ServeHTTP(0xc000134000, 0xa415c0, 0xc0001342a0, 0xc0000deb00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2802 +0xa4
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve(0xc0001da140, 0xa42c80, 0xc00028a280)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1890 +0x875
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: created by net/http.(*Server).Serve
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2927 +0x38e
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47656: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 13 [running]:
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve.func1(0xc0001da1e0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1767 +0x139
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: panic(0x8f6ba0, 0xc0003711a0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/runtime/panic.go:679 +0x1b2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001561c0, 0xc000158c00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.HandlerFunc.ServeHTTP(0xc0000a7f60, 0xa415c0, 0xc0001561c0, 0xc000158c00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2007 +0x44
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc000122000, 0xa415c0, 0xc0001561c0, 0xc000158a00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.serverHandler.ServeHTTP(0xc000134000, 0xa415c0, 0xc0001561c0, 0xc000158a00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2802 +0xa4
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve(0xc0001da1e0, 0xa42c80, 0xc0000690c0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1890 +0x875
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: created by net/http.(*Server).Serve
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2927 +0x38e
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47658: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 33 [running]:

Metric reporting wrong value when fail-over occurs

One of our druid-overlord failed over to another instance and since that happened, the exporter reported a constant value (false value) until I restarted the process.

Seems to me than the exporter keeps reporting last known value. If that's the case, maybe you can set a TTL, after which it should then report null.

Thanks!

Concerning the next release

Hello,

I congratulate you for the work that you are doing, we were wondering if it would be possible to release any time soon, we're interested in this fix: #37

Thanks,
Keep up.

Crash with druid 0.20.0

We're seeing crashes in druid nodes like this:

2020-10-29T13:46:30,713 ERROR [HttpPostEmitter-1] org.apache.druid.java.util.emitter.core.HttpPostEmitter - Failed to send events to url[http://druid-exporter:8080/druid]
Druid exporter output is:

 2020/10/29 14:00:06 http: panic serving XX.XX.XX.XX:53522: interface conversion: interface {} is nil, not string
 net/http.(*conn).serve.func1(0xc00012cfa0)
         /usr/local/go/src/net/http/server.go:1767 +0x139
 panic(0x8f6ba0, 0xc000981cb0)
         /usr/local/go/src/runtime/panic.go:679 +0x1b2
 druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001460e0, 0xc000464200)
         /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
 net/http.HandlerFunc.ServeHTTP(0xc00000e480, 0xa415c0, 0xc0001460e0, 0xc000464200)
         /usr/local/go/src/net/http/server.go:2007 +0x44
 github.com/gorilla/mux.(*Router).ServeHTTP(0xc000132000, 0xa415c0, 0xc0001460e0, 0xc000464000)
         /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
 net/http.serverHandler.ServeHTTP(0xc000146000, 0xa415c0, 0xc0001460e0, 0xc000464000)
         /usr/local/go/src/net/http/server.go:2802 +0xa4
 net/http.(*conn).serve(0xc00012cfa0, 0xa42c80, 0xc0006ca600)
         /usr/local/go/src/net/http/server.go:1890 +0x875
 created by net/http.(*Server).Serve
         /usr/local/go/src/net/http/server.go:2927 +0x38e

Any idea why it might be?
Thanks a lot!!!

druid_health_status returns 0 value but druid services return true for health status

after using implementing commit: db1c4bd, here is the messages

the command:
/usr/local/bin/druid-exporter --druid.uri=http://10.18.64.247 --port=10086 --log.level=debug

Export bucketed latency metrics

Problem

When tracking query latencies it is very useful to compute percentiles to understand how the majority of queries are performing and any outliers.

Currently, this is not possible because the value emitted by druid is directly exported in the endpoint scraped by prometheus:

druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical"} 9

Proposal

It would be more useful to export buckets of metrics for the observed latencies:

druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="1"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="5"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="10"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="20"} 6
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="50"} 6
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="100"} 7
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="250"} 10
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="500"} 10
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="1000"} 10

In this case, all queries that have a latency of < 1 ms would increment the le="1" bucket counter. All queries that took < 5ms would increment the counter value of le="5" and le="1".

With this way of exporting latency metrics, it would be possible to compute dynamic percentiles via prometheus with the recommended approach (https://prometheus.io/docs/practices/histograms/#quantiles):

histogram_quantile(0.95, sum(rate(druid_emitted_metrics[5m])) by (le))

the centos7 cannot execute

hello,master,I download the druid-exporter <druid-exporter-v0.9-linux-amd64.gz>， my system is centos 7.2 ，

gzip -d druid-exporter-v0.9-linux-amd64.gz
chmod 755 druid-exporter-v0.9-linux-amd64
./druid-exporter-v0.9-linux-amd64

the error is cannot execute

please help me ,thank you

Add datasource row count

Hi!
Me and my team use Druid ingesting data with kafka indexer, and I'd like to track any possible lag on Druid's side.
I've seen on Druid's UI that it displays the "Total rows" on the Datasources pane, and I'd like to bring this counter to the project, would it be usefull to it's users? If yes, I'd like to contribute.

With this data I could use Grafana to measure any kind of bottleneck in the process of gathering data from my origin datasource, passing through Kafka (with connect, monitored by kafka lag exporter) and getting processed on Druid.

Tier dimension is missing in coordinator metrics

Exporter is emitting the below metric :

druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="segment-assigned-count",service="coordinator"} 230

But According to the druid metrics docs , there should be tier dimension here.
I didn't find any parameter to configure this.
Looking at druid_endpoint.go file, I see that only four dimensions(datasource,host,metric_name,service) are supported.

Can someone please guide me what changes are required to get all dimensions ?

Total worker capacity as a metric

As discussed in #74, pending/running task alone wouldn't help in scaling up/down using HPA. We would also need the total worker capacity of the druid cluster to decide if the HPA controller can scale down. An additional metric like druid_worker_capacity_total would be useful.

As the exporter already emits the workers used metric, this would be a pretty small change. I tried to push my branch to contribute but I get access denied. @iamabhishek-dubey can you help me out here?

allow metric exclusions

The metric-exporter is producing thousands of metrics. Too many metrics can overload Prometheus. We only need some of the metrics. It would be nice if there could be an option to supply a regexp to exclude metrics that matched it (or alternatively include them).

cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string

Hi
After updated to v0.4.

{"level":"info","msg":"GET request is successful for druid api","ts":"2020-05-21T06:50:22.761Z","url":"http://data-xxx:8888/druid/indexer/v1/supervisor?full"}
{"level":"info","msg":"Successfully retrieved the data for druid's supervisors tasks","ts":"2020-05-21T06:50:22.761Z"}
{"err":"json: cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-21T06:50:23.327Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-21T06:50:23.328Z"}

Parsing error due to forced integer

DruidEmittedData.Value takes in int type, while some druid metrics sends out float values

Create a new grafana dashboard with better and useful insights

/metrics return 404

1、Run druid-exporter-v0.8-linux-amd64 with following command :
druid-exporter --druid.uri="http://0.0.0.0" --port="10086"

2、Add the following configuration to druid(version:0.17.1) common.runtime.properties
druid.emitter=composing
druid.emitter.composing.emitters=["logging", "http"]
druid.emitter.logging.logLevel=info
druid.emitter.http.flushMillis=60000
druid.emitter.http.flushTimeOut=60000
druid.emitter.http.flushCount=500
druid.emitter.http.recipientBaseUrl=http://10.18.64.247:10086

druid send emitter successfully
2020-08-13T12:01:21,153 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.153Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"jvm/heapAlloc/bytes","value":574232}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/numEntries","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/sizeBytes","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/hits","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/misses","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/evictions","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/hitRate","value":0.0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/averageBytes","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/timeouts","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/errors","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/put/ok","value":0}

3、Visit http://10.18.64.247:10086/metrics
HELP druid_health_status Health of Druid, 1 is healthy 0 is not
TYPE druid_health_status counter
druid_health_status{druid="health"} 0
HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
HELP go_goroutines Number of goroutines that currently exist.
TYPE go_goroutines gauge
go_goroutines 10
HELP go_info Information about the Go environment.
TYPE go_info gauge
go_info{version="go1.13"} 1
HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 546144
HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 546144
HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.44308e+06
HELP go_memstats_frees_total Total number of frees.
TYPE go_memstats_frees_total counter
go_memstats_frees_total 160
HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.240512e+06
HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 546144
HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.5200128e+07
HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.482752e+06
HELP go_memstats_heap_objects Number of allocated objects.
TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2978
HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.516736e+07
HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.668288e+07
HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
HELP go_memstats_lookups_total Total number of pointer lookups.
TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
HELP go_memstats_mallocs_total Total number of mallocs.
TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3138
HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6944
HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 21624
HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 789488
HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 425984
HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 425984
HELP go_memstats_sys_bytes Number of bytes obtained from system.
TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.1631096e+07
HELP go_threads Number of OS threads created.
TYPE go_threads gauge
go_threads 8
HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0
HELP process_max_fds Maximum number of open file descriptors.
TYPE process_max_fds gauge
process_max_fds 131072
HELP process_open_fds Number of open file descriptors.
TYPE process_open_fds gauge
process_open_fds 10
HELP process_resident_memory_bytes Resident memory size in bytes.
TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 9.396224e+06
HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
TYPE process_start_time_seconds gauge
process_start_time_seconds 1.59729039577e+09
HELP process_virtual_memory_bytes Virtual memory size in bytes.
TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.39341568e+08
HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0

I can not see any druid_emitted_metrics and druid-exporter print 404

Cannot parse JSON data: invalid character '<' looking for beginning of value

$ ./druid-exporter --druid.uri=http://172.18.19.243:8081 --port=7451 --log.level="trace"
time="2021-01-24T02:44:25Z" level=info msg="Druid exporter started listening on: 7451"
time="2021-01-24T02:44:25Z" level=info msg="Metrics endpoint - http://0.0.0.0:7451/metrics"
time="2021-01-24T02:44:25Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:7451/druid"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid healthcheck"
time="2021-01-24T02:44:52Z" level=debug msg="Successful healthcheck request for druid - http://172.18.19.243:8081/status/health"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/status/health"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/coordinator/v1/datasources?simple"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid segment"
time="2021-01-24T02:44:52Z" level=debug msg="Druid segment's metric data, []"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/workers"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's workers"
time="2021-01-24T02:44:52Z" level=debug msg="Druid workers's metric data, [{{172.18.21.206:8091 0 172.18.21.206 2} 1 [index_kafka_slaprod.1_42bb463eb8d970b_dbbmhjgl]} {{172.18.20.68:8091 0 172.18.20.68 2} 0 []} {{172.18.19.4:8091 0 172.18.19.4 2} 1 [index_kafka_default.0_c5b5622de0f241a_okpnnpbf]}]"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/tasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's tasks"
time="2021-01-24T02:44:52Z" level=debug msg="Druid tasks's metric data, [{index_kafka_slaprod.1_42bb463eb8d970b_eginaogo  index_kafka 2021-01-23T14:11:58.724Z SUCCESS  NONE 2.8806753e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_lbpjhimb  index_kafka 2021-01-23T14:11:32.712Z SUCCESS  NONE 2.8807223e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_neomaeel  index_kafka 2021-01-23T06:11:52.904Z SUCCESS  NONE 2.8806483e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_iagokihf  index_kafka 2021-01-23T06:11:26.818Z SUCCESS  NONE 2.8806557e+07 default.0} {index_hadoop_slaprod.1_2021-01-23T03:31:48.670Z  index_hadoop 2021-01-23T03:31:48.670Z SUCCESS  NONE 431484 slaprod.1} {index_hadoop_default.0_2021-01-23T03:30:18.435Z  index_hadoop 2021-01-23T03:30:18.435Z SUCCESS  NONE 61964 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_ochhnnkp  index_kafka 2021-01-22T22:11:46.954Z SUCCESS  NONE 2.8806628e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_elnchikp  index_kafka 2021-01-22T22:11:20.018Z SUCCESS  NONE 2.8807396e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_fllkfpce  index_kafka 2021-01-22T14:11:40.540Z SUCCESS  NONE 2.8807006e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_ogilknne  index_kafka 2021-01-22T14:11:12.759Z SUCCESS  NONE 2.8807885e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_bohicikp  index_kafka 2021-01-22T06:11:34.804Z SUCCESS  NONE 2.8806441e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_fkhbiddb  index_kafka 2021-01-22T06:11:04.844Z SUCCESS  NONE 2.8808537e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_cplijlpk  index_kafka 2021-01-22T04:11:11.944Z FAILED  NONE -1 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_hfhbikdf  index_kafka 2021-01-22T04:10:41.976Z FAILED  NONE -1 default.0} {index_hadoop_slaprod.1_2021-01-22T03:31:46.511Z  index_hadoop 2021-01-22T03:31:46.511Z SUCCESS  NONE 456767 slaprod.1} {index_hadoop_default.0_2021-01-22T03:30:16.295Z  index_hadoop 2021-01-22T03:30:16.295Z SUCCESS  NONE 65224 default.0} {index_kafka_default.0_c5b5622de0f241a_okpnnpbf  index_kafka 2021-01-23T22:11:39.347Z RUNNING  RUNNING 0 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_dbbmhjgl  index_kafka 2021-01-23T22:12:04.859Z RUNNING  RUNNING 0 slaprod.1}]"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/supervisor?full"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid's supervisors"
time="2021-01-24T02:44:52Z" level=error msg="Cannot parse JSON data: json: cannot unmarshal string into Go value of type map[string]interface {}"
time="2021-01-24T02:44:52Z" level=error msg="Druid's API response is not 200, Status Code - 404"
time="2021-01-24T02:44:52Z" level=error msg="Possible issue can be with Druid's URL, Username or Password"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's datasources rows"
time="2021-01-24T02:44:52Z" level=error msg="Cannot parse JSON data: invalid character '<' looking for beginning of value"

$ curl -sS http://172.18.99.134:7451/metrics
An error has occurred while serving metrics:

2 error(s) occurred:
* [from Gatherer #2] collected metric "druid_workers_capacity_used" { label:<name:"pod" value:"172" > label:<name:"version" value:"0" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "druid_workers_capacity_used" { label:<name:"pod" value:"172" > label:<name:"version" value:"0" > gauge:<value:0 > } was collected before with the same name and label values

I am getting the above errors with the latest release. Not sure if I am missing any configuration on Druid.

Unsure about project objectives

Great project thus far! However, I feel that there are some issues regarding the direction the exporter is taking. It seem like there there is a mismatch of objectives, i.e. a k8 environment and bare metal environment.

For example, in the emitted metrics, the host parameter has been changed to pod (k8 infra).

// listener/druid-endpoint.go
gauge.With(prometheus.Labels{
    "metric_name": strings.Replace(metricName, "/", "-", 3),
    "service":     strings.Replace(serviceName, "/", "-", 3),
    "datasource":  datasource,
    "pod":         podName,
.Set(value)

I understand that podName is important in the k8 environment however, a more generic host should suffice, correct me if I'm wrong. I'll be submitting a few PR to restructure to do the reverse DNS lookup to obtain podname, whilst allowing host=localhost without errors.

@exherb kindly review the PR if it would solve your use case with k8.

Error: "Unable to read JSON response: unexpected EOF"

After downloading and building the latest version (right now) I am seeing this message like every 2 seconds.

Nov 13 16:00:08 druid-exporter-1 druid-prometheus-exporter[29815]: time="2020-11-13T16:00:08Z" level=error msg="Unable to read JSON response: unexpected EOF"

Any idea why? Thanks a lot!

Not possible to configure kubernetes resource limits and requests

With the current implementation of the helm-chart it is not possible to configure kubernetes resource requests and limits.

see: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Add check for sign-off commit

quay.io/opstree/druid-exporter:latest is not up to date

Using docker image with latest tag, POST request to /druid endpoint gives "404 page not found".
For tag v0.8, it works fine.

no keys for druid auth in values.yaml for helm chart

Hi, I was trying to deploy using helm chart and I dont see any option for username and password to authenticate with the druid. I get this message.

level=error msg="Possible issue can be with Druid's URL, Username or Password"

druid_health_status returns 0 value but druid services return true for health status

druid services health check endpoints return true for all individual services but druid exporter for metric druid_health_status returns 0 value.what parameters affect this overall health check metric?

Heathcheck panic

druid-exporter requires druid-query server to constantly be running. Once you kill the service, druid-exporter automatically panics after a heathcheck query is passed.

{"err":"Get \"<QUERY ENDPOINT>:8888/status/health\": dial tcp 123.123.123.123:8888: connect: connection refused","level":"error","msg":"Error while making GET request for druid healthcheck","ts":"2020-05-26T01:47:08.674Z"}
{"level":"info","msg":"GET request is successful on druid healthcheck","ts":"2020-05-26T01:47:08.683Z","url":"<QUERY ENDPOINT>:8888/status/health"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x731e30]

goroutine 336113 [running]:
druid-exporter/utils.GetHealth(0xc0000fa240, 0x32, 0x0)
        ~/repo/druid-exporter/utils/http.go:22 +0x270
druid-exporter/collector.GetDruidHealthMetrics(0x0)
        ~/repo/druid-exporter/collector/druid.go:24 +0x179
druid-exporter/collector.(*MetricCollector).Collect(0xc0000bacc0, 0xc000446120)
        ~/repo/druid-exporter/collector/druid.go:100 +0x37
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
        ~/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:443 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
        ~/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:454 +0x599

Bad credentials debugging statements

Extension of issue #19

I don't know if this something to necessarily fix but I do wish to point out something.

When accessing a druid cluster health endpoint that forces authentication, you can still access the health check with bad credentials and avoiding TLS root certs ( -k on curl)

curl -k -u LOLBADUSER:LOLBADPASSWORD https://DRUIDCLUSTERENDPOINT:9088/status/health |jq

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4 0 4 0 0 133 0 --:--:-- --:--:-- --:--:-- 137
true

I believe some better debugging revolving the user/password check would help out. We didn't catch this problem due to passing the user & password by way of environment variables with docker configurations. I didn't setup the environment variable (DRUID_PASSWORD) according to:

https://github.com/opstree/druid-exporter/blob/master/utils/http.go#L14

I had it setup as DRUID_PASS, so the password field was empty.

However since health checks can still return from Druid regardless of authentication, the exporter will still work in terms of just health checks. I think having a debug statement saying "bad username & password" on the responses on this line would alleviate a lot of potential problems in the future:

https://github.com/opstree/druid-exporter/blob/master/utils/http.go#L38

	if *user != "" && *password != "" {
		req.SetBasicAuth(*user, *password)
	}
	resp, err := client.Do(req)
	if err != nil {
		logrus.Errorf("Error on GET request for druid healthcheck: %v", err)
		return 0
	}
	logrus.Debugf("Successful healthcheck request for druid - %v", url)

opstree / druid-exporter Goto Github PK

druid-exporter's People

Contributors

Stargazers

Watchers

Forkers

druid-exporter's Issues

Steps to reproduce:

/metrics on commit 7596baf (First failed commit)

/metrics on commit f7e673f (Last good commit)

FEATURE REQUEST

Problem

Proposal

Recommend Projects

Recommend Topics

Recommend Org

/metrics on commit `7596baf` (First failed commit)

/metrics on commit `f7e673f` (Last good commit)