opstree / druid-exporter Goto Github PK
View Code? Open in Web Editor NEWA Golang based exporter captures druid API related metrics and receives druid-emitting HTTP JSON data.
Home Page: https://opstree.github.io
License: Apache License 2.0
A Golang based exporter captures druid API related metrics and receives druid-emitting HTTP JSON data.
Home Page: https://opstree.github.io
License: Apache License 2.0
Hi!
I was following the installation guide in the documentation. Some of the metrics are available. For other metrics the druid-exporter throws those error messages:
{"level":"error","msg":"Error on GET request for druid healthcheck: Get "druid-router.druid:8888/status/health": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot collect data for druid segments: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot retrieve data for druid's workers: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot retrieve data for druid's tasks: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:19Z"}
{"level":"error","msg":"Cannot collect data for druid's supervisors: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:19Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:23Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:27Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:31Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/broker","time":"2021-03-04T15:05:33Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/coordinator","time":"2021-03-04T15:05:36Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/router","time":"2021-03-04T15:05:37Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:39Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:41Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:41Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:44Z"}
{"level":"info","msg":"Successfully collected data from druid emitter, druid/middleManager","time":"2021-03-04T15:05:48Z"}
{"level":"error","msg":"Error on GET request for druid healthcheck: Get "druid-router.druid:8888/status/health": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot collect data for druid segments: Get "druid-router.druid:8888/druid/coordinator/v1/datasources?simple": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot retrieve data for druid's workers: Get "druid-router.druid:8888/druid/indexer/v1/workers": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot retrieve data for druid's tasks: Get "druid-router.druid:8888/druid/indexer/v1/tasks": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Error on making http request for druid: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Possible issue can be with Druid's URL, Username or Password","time":"2021-03-04T15:05:49Z"}
{"level":"error","msg":"Cannot collect data for druid's supervisors: Get "druid-router.druid:8888/druid/indexer/v1/supervisor?full": unsupported protocol scheme "druid-router.druid"","time":"2021-03-04T15:05:49Z"}
Is this an issue with the newest Druid Version 0.20.1?
Hello!
Thank You for your great work!
Would you be able to fix this bug ?
{"err":"json: cannot unmarshal number 0.2525 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.180Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.180Z"}
{"err":"json: cannot unmarshal number 0.038 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.497Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.497Z"}
{"err":"json: cannot unmarshal number 0.1245 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:57.862Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:57.862Z"}
{"err":"json: cannot unmarshal number 0.052333333333333336 into Go struct field DruidEmittedData.value of type int","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-18T15:14:58.540Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-18T15:14:58.541Z"}
Druid metrics are missing after implementing commit: 7596baf
git reset --hard && git checkout 7596baf && make build-code && ./druid-exporter --druid.uri=http://localhost:8888 --log.level=debug
# HELP druid_health_status Health of Druid, 1 is healthy 0 is not
# TYPE druid_health_status counter
druid_health_status{druid="health"} 1
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 7.3313e-05
go_gc_duration_seconds{quantile="0.25"} 7.3313e-05
go_gc_duration_seconds{quantile="0.5"} 7.3313e-05
go_gc_duration_seconds{quantile="0.75"} 7.3313e-05
go_gc_duration_seconds{quantile="1"} 7.3313e-05
go_gc_duration_seconds_sum 7.3313e-05
go_gc_duration_seconds_count 1
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 12
...
git reset --hard && git checkout f7e673f && make build-code && ./druid-exporter --druid.uri=http://localhost:8888 --log.level=debug
# HELP druid_emitted_metrics Druid emitted metrics from druid emitter
# TYPE druid_emitted_metrics gauge
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="compact-task-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jetty-numOpenConnections",service="druid-coordinator"} 6
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-capacity",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-bufferpool-used",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-percent",service="druid-coordinator"} 0.01749650069986003
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-sys",service="druid-coordinator"} 170
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-total",service="druid-coordinator"} 1050
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-cpu-user",service="druid-coordinator"} 880
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-count",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-cpu",service="druid-coordinator"} 0
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-capacity",service="druid-coordinator"} 9.9614728e+07
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-init",service="druid-coordinator"} 2.52706824e+08
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="jvm-gc-mem-max",service="druid-coordinator"} 2.68435464e+08
...
...
...
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
# TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
In production, we frequently see errors similar to this:
2021/02/04 16:13:23 http: panic serving *.*.*.*:29223: interface conversion: interface {} is nil, not string
goroutine 881818 [running]:
net/http.(*conn).serve.func1(0xc000c82280)
/usr/local/go/src/net/http/server.go:1800 +0x139
panic(0x8b91a0, 0xc000494060)
/usr/local/go/src/runtime/panic.go:975 +0x3e3
druid-exporter/listener.DruidHTTPEndpoint.func1(0x9f4600, 0xc0007f61c0, 0xc00021c600)
/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1596
net/http.HandlerFunc.ServeHTTP(0xc0001372c0, 0x9f4600, 0xc0007f61c0, 0xc00021c600)
/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0001a6000, 0x9f4600, 0xc0007f61c0, 0xc00021c400)
/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc0001b4000, 0x9f4600, 0xc0007f61c0, 0xc00021c400)
/usr/local/go/src/net/http/server.go:2836 +0xa3
net/http.(*conn).serve(0xc000c82280, 0x9f5cc0, 0xc0010f0740)
/usr/local/go/src/net/http/server.go:1924 +0x86c
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2962 +0x35c
time="2021-02-04T16:13:23Z" level=info msg="Successfully collected data from druid emitter, broker"
These errors seem to correspond with blank stretches in our Grafana dashboard that pull this data from Prometheus.
We are using Druid 0.19.0, and using the Docker image quay.io/opstree/druid-exporter:v0.8
deployed to Kubernetes with the Helm chart. We have a Kubernetes Ingress forwarding the metrics into Kubernetes.
Ok, I've got myself an X Files case here.
Our druid-exporter suddenly died yesterday around 20:00 (CET).
This are the logs:
Jan 21 18:55:31 druid-exporter-1 druid-prometheus-exporter[2114]: time="2021-01-21T18:55:31Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Main process exited, code=killed, status=9/KILL
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Failed with result 'signal'.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Service RestartSec=100ms expired, scheduling restart.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: druid-prometheus-exporter.service: Scheduled restart job, restart counter is at 1.
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Stopped Apache Druid Prometheus Exporter..
Jan 21 18:56:12 druid-exporter-1 systemd[1]: Started Apache Druid Prometheus Exporter..
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid exporter started listening on: 8080"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Metrics endpoint - http://0.0.0.0:8080/metrics"
Jan 21 18:56:13 druid-exporter-1 druid-prometheus-exporter[11122]: time="2021-01-21T18:56:13Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:8080/druid"
Since then, we've got 0 metrics. Tried reinstalling, rebooting the instance, restarting the router, broker, overlord, coordinator... 0 metrics.
Logs are empty, besides the typical "starting 8080, endpoint X, endpoint Y"
A tcpdump shows that the instance is receiving metrics from druid.
curl-ing the metrics in druid-exporter returns metrics, but takes ages, just now I timed it:
$ time curl localhost:8080/metrics
[...]
real 1m38.608s
user 0m0.007s
sys 0m0.011s
Any idea what might have happened?
If Druid cluster is in a state where there is no workers, druid-exporter will crash due to
hostname = workers[rand.Intn(len(workers))].hostname()
where rand.Intn
takes an strictly positive argument, but workers
is []
Should druid-exporter ignore those worker-related metrics where there is none?
Currently i have successfully configured to listen to druid cluster and emit the metrics over http like
http://localhost:8080/metrics
Now do i have option to emit over https like
https://localhost:4443/metrics ?
I didn't see an option to add in a username & password & specify a cacert when accessing a druid coordinator endpoint. Any chance to add those?
hello master :
i use the druid-exporter in my druid cluster ,i run it ./druid-exporter --durid.uri=http://172.16.1.84:8888 ,this is router node ,i visit http://172.16.1.84:8080/metrics , can see the monitoring items, but very few, only some druid task monitoring (druid_completed_tasks, druid_peding_tasks .......), druid segment monitoring (druid_segment_count, deruid_segment_size ......), druid_healthe ,druid_datasource, there are too few monitoring items. How to configure this to enable more monitoring?
i config the apache-druid-0.20.0/conf/druid/cluster/_common/common.runtime.properties ,
druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://172.16.1.84:8080/druid
But when I visited http://172.16.1.84:8080/druid, it was blank, and I couldn’t see any monitoring content.
please help me ,how to config can show more and more monitor?
thank You!
Hey there,
additional Tasks Metrics would be useful to support HPA.
* druid_tasks_pending
* druid_tasks_running
* druid_tasks_waiting
* druid_tasks_completed
Any change that you would implement them?
Best
schmichri
OS: Debian 10
Go: go version go1.15.5 linux/amd64
druid-exporter: Downloaded and build 2020/11/13
This is showing on the logs a lot. I guess by the error that is some issue parsing the IP from druid:
DRUID_URL=XXX.YYY.ZZZ.AAA
http: panic serving XXX.YYY.ZZZ.AAA:60190: interface conversion: interface {} is nil, not string
goroutine 1074098 [running]:
net/http.(*conn).serve.func1(0xc00055a320)
#011/usr/local/go/src/net/http/server.go:1767 +0x139
panic(0x8f6ba0, 0xc06bd7b5c0)
#011/usr/local/go/src/runtime/panic.go:679 +0x1b2
druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc001110000, 0xc0012a4200)
#011/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
net/http.HandlerFunc.ServeHTTP(0xc00009bea0, 0xa415c0, 0xc001110000, 0xc0012a4200)
#011/usr/local/go/src/net/http/server.go:2007 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000118000, 0xa415c0, 0xc001110000, 0xc0012a4000)
#011/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc00012e000, 0xa415c0, 0xc001110000, 0xc0012a4000)
#011/usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc00055a320, 0xa42c80, 0xc00170e000)
#011/usr/local/go/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
#011/usr/local/go/src/net/http/server.go:2927 +0x38e
Hello,
I'm running the exporter using image quay.io/opstree/druid-exporter:v0.6
and Druid 0.18.0.
I am getting the below error when decoding the collected metric.
level=debug msg="Error decoding JSON sent by druid: json: cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string"
Regarding the error message, I suspect that maybe the exporter cannot parse metrics with some datasource values in some cases.
For instance, there is no druid_emitted_metrics
created for the metric collected from druid/middleManager with the datasource below:
90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user
However I can see this datasource in druid_datasource
metric.
druid_datasource{datasource="90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user"}
On the other hand, the metric below is parsed and exposed as prometheus metric.
{metrics 2020-06-17 11:53:57.146 +0000 UTC druid/middleManager 192.168.128.4:8117 0.18.0 ingest/sink/count 90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user_activity 0}
Prometheus metric is:
druid_emitted_metrics{datasource="90df4ca9-7ae6-11e8-bf0a-0fb906e37fcf_events_user_activity",service="druid-middleManager", metric_name="ingest-sink-count"}
Is version 0.5 not Druid0.16 supported?
when i use implementing commit: db1c4bd,
i got the stdout logs of druid exporter
the command:
/usr/local/bin/druid-exporter --druid.uri=http://10.18.64.247 --port=10086 --log.level=debug
Hi.
I just installed druid-exporter via docker image and getting below error.
2020/10/06 12:15:47 http: panic serving X.X.X.X:48434: interface conversion: interface {} is nil, not string
goroutine 21184 [running]:
net/http.(*conn).serve.func1(0xc0000b60a0)
/usr/local/go/src/net/http/server.go:1800 +0x139
panic(0x8b91a0, 0xc000c083c0)
/usr/local/go/src/runtime/panic.go:975 +0x3e3
druid-exporter/listener.DruidHTTPEndpoint.func1(0x9f4600, 0xc0000d00e0, 0xc0006a6200)
/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1596
net/http.HandlerFunc.ServeHTTP(0xc00008e4e0, 0x9f4600, 0xc0000d00e0, 0xc0006a6200)
/usr/local/go/src/net/http/server.go:2041 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc0000b0000, 0x9f4600, 0xc0000d00e0, 0xc0006a6000)
/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc0000d0000, 0x9f4600, 0xc0000d00e0, 0xc0006a6000)
/usr/local/go/src/net/http/server.go:2836 +0xa3
net/http.(*conn).serve(0xc0000b60a0, 0x9f5cc0, 0xc0002e82c0)
/usr/local/go/src/net/http/server.go:1924 +0x86c
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2962 +0x35c
Druid version is 0.19.0 and config on druid side like this
druid.monitoring.monitors=["org.apache.druid.java.util.metrics.JvmMonitor", "org.apache.druid.client.cache.CacheMonitor"]
druid.emitter.logging.logLevel=info
druid.emitter=http
druid.emitter.http.recipientBaseUrl=http://exporter:8080/druid
Don't know what did I do wrong. Any idea?
After the latest release (just downloaded and build) I can see this line happening quite often:
Jan 14 11:01:41 druid-exporter-1 druid-prometheus-exporter[2114]: time="2021-01-14T11:01:41Z" level=error msg="Unable to read JSON response: unexpected EOF"
Minor typo in the releases page, a little confusing.
An error has occurred while serving metrics:
39 error(s) occurred:
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-elapsed-time" > label:<name:"groupd_id" value:"index_kafka_xapi-elapsed-time" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-statistics" > label:<name:"groupd_id" value:"index_kafka_xapi-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"xapi-inventories" > label:<name:"groupd_id" value:"index_kafka_xapi-inventories" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label values
* collected metric "druid_tasks_duration" { label:<name:"created_time" value:"2020-06-17T10:14:17.654Z" > label:<name:"datasource" value:"back-ad-srv-statistics" > label:<name:"groupd_id" value:"index_kafka_back-ad-srv-statistics" > label:<name:"task_status" value:"RUNNING" > gauge:<value:0 > } was collected before with the same name and label valu
Version 0.9
Druid version 0.20.0
panic: invalid argument to Intn goroutine 23214 [running]:
math/rand.(*Rand).Intn(0xc00009e060, 0x0, 0xc00115e000)
/usr/local/go/src/math/rand/rand.go:169 +0x9c
math/rand.Intn(...)
/usr/local/go/src/math/rand/rand.go:337
druid-exporter/collector.(*MetricCollector).Collect(0xc000376140, 0xc0000ad200)
/go/src/druid-exporter/collector/druid.go:171 +0xad2
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:443 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:454 +0x599
Hi folks, my team and I are try to send metrics to prometheus, we see that exporter is collecting data whoever we dont see this in prometheus, there is some config we have to do?
Using prometheus operator
Azure Kubernetes Services
Hello,
Druid emits some of the metrics with dimensions such as datasource, tier, taskid etc. It is possible to use these dimension values as Prometheus metric labels.
It would be beneficial to see metrics by datasource. Is there any plan to add labels (mostly datasource) to exported metrics?
Hi,
Shortly after start i get this errors (i have metrics collected but this panic message is logged continuously):
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Druid exporter started listening on: 8080"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Metrics endpoint - http://0.0.0.0:8080/metrics"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:8080/druid"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:36:59 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:36:59Z" level=info msg="Successfully collected data from druid emitter, druid/broker"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/middleManager"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: time="2020-08-11T06:37:00Z" level=info msg="Successfully collected data from druid emitter, druid/historical"
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47654: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 12 [running]:
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve.func1(0xc0001da140)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1767 +0x139
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: panic(0x8f6ba0, 0xc000274720)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/runtime/panic.go:679 +0x1b2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001342a0, 0xc0000ded00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.HandlerFunc.ServeHTTP(0xc0000a7f60, 0xa415c0, 0xc0001342a0, 0xc0000ded00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2007 +0x44
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc000122000, 0xa415c0, 0xc0001342a0, 0xc0000deb00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.serverHandler.ServeHTTP(0xc000134000, 0xa415c0, 0xc0001342a0, 0xc0000deb00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2802 +0xa4
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve(0xc0001da140, 0xa42c80, 0xc00028a280)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1890 +0x875
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: created by net/http.(*Server).Serve
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2927 +0x38e
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47656: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 13 [running]:
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve.func1(0xc0001da1e0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1767 +0x139
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: panic(0x8f6ba0, 0xc0003711a0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/runtime/panic.go:679 +0x1b2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001561c0, 0xc000158c00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.HandlerFunc.ServeHTTP(0xc0000a7f60, 0xa415c0, 0xc0001561c0, 0xc000158c00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2007 +0x44
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: github.com/gorilla/mux.(*Router).ServeHTTP(0xc000122000, 0xa415c0, 0xc0001561c0, 0xc000158a00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.serverHandler.ServeHTTP(0xc000134000, 0xa415c0, 0xc0001561c0, 0xc000158a00)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2802 +0xa4
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: net/http.(*conn).serve(0xc0001da1e0, 0xa42c80, 0xc0000690c0)
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:1890 +0x875
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: created by net/http.(*Server).Serve
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: /usr/local/go/src/net/http/server.go:2927 +0x38e
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: 2020/08/11 06:37:00 http: panic serving 127.0.0.1:47658: interface conversion: interface {} is nil, not string
Aug 11 06:37:00 ip-172-31-61-156 druid-exporter[6641]: goroutine 33 [running]:
One of our druid-overlord failed over to another instance and since that happened, the exporter reported a constant value (false value) until I restarted the process.
Seems to me than the exporter keeps reporting last known value. If that's the case, maybe you can set a TTL, after which it should then report null
.
Thanks!
Hello,
I congratulate you for the work that you are doing, we were wondering if it would be possible to release any time soon, we're interested in this fix: #37
Thanks,
Keep up.
We're seeing crashes in druid nodes like this:
2020-10-29T13:46:30,713 ERROR [HttpPostEmitter-1] org.apache.druid.java.util.emitter.core.HttpPostEmitter - Failed to send events to url[http://druid-exporter:8080/druid]
Druid exporter output is:
2020/10/29 14:00:06 http: panic serving XX.XX.XX.XX:53522: interface conversion: interface {} is nil, not string
net/http.(*conn).serve.func1(0xc00012cfa0)
/usr/local/go/src/net/http/server.go:1767 +0x139
panic(0x8f6ba0, 0xc000981cb0)
/usr/local/go/src/runtime/panic.go:679 +0x1b2
druid-exporter/listener.DruidHTTPEndpoint.func1(0xa415c0, 0xc0001460e0, 0xc000464200)
/go/src/druid-exporter/listener/druid_endpoint.go:33 +0x1578
net/http.HandlerFunc.ServeHTTP(0xc00000e480, 0xa415c0, 0xc0001460e0, 0xc000464200)
/usr/local/go/src/net/http/server.go:2007 +0x44
github.com/gorilla/mux.(*Router).ServeHTTP(0xc000132000, 0xa415c0, 0xc0001460e0, 0xc000464000)
/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:210 +0xe2
net/http.serverHandler.ServeHTTP(0xc000146000, 0xa415c0, 0xc0001460e0, 0xc000464000)
/usr/local/go/src/net/http/server.go:2802 +0xa4
net/http.(*conn).serve(0xc00012cfa0, 0xa42c80, 0xc0006ca600)
/usr/local/go/src/net/http/server.go:1890 +0x875
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2927 +0x38e
Any idea why it might be?
Thanks a lot!!!
after using implementing commit: db1c4bd, here is the messages
the command:
/usr/local/bin/druid-exporter --druid.uri=http://10.18.64.247 --port=10086 --log.level=debug
When tracking query latencies it is very useful to compute percentiles to understand how the majority of queries are performing and any outliers.
Currently, this is not possible because the value emitted by druid is directly exported in the endpoint scraped by prometheus:
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical"} 9
It would be more useful to export buckets of metrics for the observed latencies:
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="1"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="5"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="10"} 5
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="20"} 6
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="50"} 6
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="100"} 7
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="250"} 10
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="500"} 10
druid_emitted_metrics{datasource="foo", metric_name="query-time", service="druid-historical", le="1000"} 10
In this case, all queries that have a latency of < 1 ms would increment the le="1" bucket counter. All queries that took < 5ms would increment the counter value of le="5" and le="1".
With this way of exporting latency metrics, it would be possible to compute dynamic percentiles via prometheus with the recommended approach (https://prometheus.io/docs/practices/histograms/#quantiles):
histogram_quantile(0.95, sum(rate(druid_emitted_metrics[5m])) by (le))
hello,master,I download the druid-exporter <druid-exporter-v0.9-linux-amd64.gz>, my system is centos 7.2 ,
the error is cannot execute
please help me ,thank you
Hi!
Me and my team use Druid ingesting data with kafka indexer, and I'd like to track any possible lag on Druid's side.
I've seen on Druid's UI that it displays the "Total rows" on the Datasources pane, and I'd like to bring this counter to the project, would it be usefull to it's users? If yes, I'd like to contribute.
With this data I could use Grafana to measure any kind of bottleneck in the process of gathering data from my origin datasource, passing through Kafka (with connect, monitored by kafka lag exporter) and getting processed on Druid.
Exporter is emitting the below metric :
druid_emitted_metrics{datasource="",host="localhost:8081",metric_name="segment-assigned-count",service="coordinator"} 230
But According to the druid metrics docs , there should be tier dimension here.
I didn't find any parameter to configure this.
Looking at druid_endpoint.go file, I see that only four dimensions(datasource,host,metric_name,service) are supported.
Can someone please guide me what changes are required to get all dimensions ?
As discussed in #74, pending/running task alone wouldn't help in scaling up/down using HPA. We would also need the total worker capacity of the druid cluster to decide if the HPA controller can scale down. An additional metric like druid_worker_capacity_total
would be useful.
As the exporter already emits the workers used metric, this would be a pretty small change. I tried to push my branch to contribute but I get access denied. @iamabhishek-dubey can you help me out here?
The metric-exporter is producing thousands of metrics. Too many metrics can overload Prometheus. We only need some of the metrics. It would be nice if there could be an option to supply a regexp to exclude metrics that matched it (or alternatively include them).
Hi
After updated to v0.4.
{"level":"info","msg":"GET request is successful for druid api","ts":"2020-05-21T06:50:22.761Z","url":"http://data-xxx:8888/druid/indexer/v1/supervisor?full"}
{"level":"info","msg":"Successfully retrieved the data for druid's supervisors tasks","ts":"2020-05-21T06:50:22.761Z"}
{"err":"json: cannot unmarshal array into Go struct field DruidEmittedData.dataSource of type string","level":"error","msg":"Error in decoding JSON sent by druid","ts":"2020-05-21T06:50:23.327Z"}
{"level":"info","msg":"Successfully recieved data from druid emitter","ts":"2020-05-21T06:50:23.328Z"}
DruidEmittedData.Value takes in int type, while some druid metrics sends out float values
1、Run druid-exporter-v0.8-linux-amd64 with following command :
druid-exporter --druid.uri="http://0.0.0.0" --port="10086"
2、Add the following configuration to druid(version:0.17.1) common.runtime.properties
druid.emitter=composing
druid.emitter.composing.emitters=["logging", "http"]
druid.emitter.logging.logLevel=info
druid.emitter.http.flushMillis=60000
druid.emitter.http.flushTimeOut=60000
druid.emitter.http.flushCount=500
druid.emitter.http.recipientBaseUrl=http://10.18.64.247:10086
druid send emitter successfully
2020-08-13T12:01:21,153 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.153Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"jvm/heapAlloc/bytes","value":574232}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/numEntries","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/sizeBytes","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/hits","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/misses","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/evictions","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/hitRate","value":0.0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/averageBytes","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/timeouts","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/errors","value":0}
2020-08-13T12:01:21,243 INFO [MonitorScheduler-0] org.apache.druid.java.util.emitter.core.LoggingEmitter - {"feed":"metrics","timestamp":"2020-08-13T04:01:21.243Z","service":"druid/historical","host":"10.10.18.102:8083","version":"0.17.1","metric":"query/cache/delta/put/ok","value":0}
3、Visit http://10.18.64.247:10086/metrics
HELP druid_health_status Health of Druid, 1 is healthy 0 is not
TYPE druid_health_status counter
druid_health_status{druid="health"} 0
HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
HELP go_goroutines Number of goroutines that currently exist.
TYPE go_goroutines gauge
go_goroutines 10
HELP go_info Information about the Go environment.
TYPE go_info gauge
go_info{version="go1.13"} 1
HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 546144
HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 546144
HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.44308e+06
HELP go_memstats_frees_total Total number of frees.
TYPE go_memstats_frees_total counter
go_memstats_frees_total 160
HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.240512e+06
HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 546144
HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.5200128e+07
HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.482752e+06
HELP go_memstats_heap_objects Number of allocated objects.
TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2978
HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 6.516736e+07
HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.668288e+07
HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
HELP go_memstats_lookups_total Total number of pointer lookups.
TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
HELP go_memstats_mallocs_total Total number of mallocs.
TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 3138
HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6944
HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 21624
HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 789488
HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 425984
HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 425984
HELP go_memstats_sys_bytes Number of bytes obtained from system.
TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.1631096e+07
HELP go_threads Number of OS threads created.
TYPE go_threads gauge
go_threads 8
HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0
HELP process_max_fds Maximum number of open file descriptors.
TYPE process_max_fds gauge
process_max_fds 131072
HELP process_open_fds Number of open file descriptors.
TYPE process_open_fds gauge
process_open_fds 10
HELP process_resident_memory_bytes Resident memory size in bytes.
TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 9.396224e+06
HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
TYPE process_start_time_seconds gauge
process_start_time_seconds 1.59729039577e+09
HELP process_virtual_memory_bytes Virtual memory size in bytes.
TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.39341568e+08
HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
TYPE process_virtual_memory_max_bytes gauge
process_virtual_memory_max_bytes -1
HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
TYPE promhttp_metric_handler_requests_in_flight gauge
promhttp_metric_handler_requests_in_flight 1
HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
TYPE promhttp_metric_handler_requests_total counter
promhttp_metric_handler_requests_total{code="200"} 0
promhttp_metric_handler_requests_total{code="500"} 0
promhttp_metric_handler_requests_total{code="503"} 0
I can not see any druid_emitted_metrics and druid-exporter print 404
$ ./druid-exporter --druid.uri=http://172.18.19.243:8081 --port=7451 --log.level="trace"
time="2021-01-24T02:44:25Z" level=info msg="Druid exporter started listening on: 7451"
time="2021-01-24T02:44:25Z" level=info msg="Metrics endpoint - http://0.0.0.0:7451/metrics"
time="2021-01-24T02:44:25Z" level=info msg="Druid emitter endpoint - http://0.0.0.0:7451/druid"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid healthcheck"
time="2021-01-24T02:44:52Z" level=debug msg="Successful healthcheck request for druid - http://172.18.19.243:8081/status/health"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/status/health"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/coordinator/v1/datasources?simple"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid segment"
time="2021-01-24T02:44:52Z" level=debug msg="Druid segment's metric data, []"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/runningTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/waitingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/completeTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid task: /druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected tasks status count: /druid/indexer/v1/pendingTasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/workers"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's workers"
time="2021-01-24T02:44:52Z" level=debug msg="Druid workers's metric data, [{{172.18.21.206:8091 0 172.18.21.206 2} 1 [index_kafka_slaprod.1_42bb463eb8d970b_dbbmhjgl]} {{172.18.20.68:8091 0 172.18.20.68 2} 0 []} {{172.18.19.4:8091 0 172.18.19.4 2} 1 [index_kafka_default.0_c5b5622de0f241a_okpnnpbf]}]"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/tasks"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's tasks"
time="2021-01-24T02:44:52Z" level=debug msg="Druid tasks's metric data, [{index_kafka_slaprod.1_42bb463eb8d970b_eginaogo index_kafka 2021-01-23T14:11:58.724Z SUCCESS NONE 2.8806753e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_lbpjhimb index_kafka 2021-01-23T14:11:32.712Z SUCCESS NONE 2.8807223e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_neomaeel index_kafka 2021-01-23T06:11:52.904Z SUCCESS NONE 2.8806483e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_iagokihf index_kafka 2021-01-23T06:11:26.818Z SUCCESS NONE 2.8806557e+07 default.0} {index_hadoop_slaprod.1_2021-01-23T03:31:48.670Z index_hadoop 2021-01-23T03:31:48.670Z SUCCESS NONE 431484 slaprod.1} {index_hadoop_default.0_2021-01-23T03:30:18.435Z index_hadoop 2021-01-23T03:30:18.435Z SUCCESS NONE 61964 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_ochhnnkp index_kafka 2021-01-22T22:11:46.954Z SUCCESS NONE 2.8806628e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_elnchikp index_kafka 2021-01-22T22:11:20.018Z SUCCESS NONE 2.8807396e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_fllkfpce index_kafka 2021-01-22T14:11:40.540Z SUCCESS NONE 2.8807006e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_ogilknne index_kafka 2021-01-22T14:11:12.759Z SUCCESS NONE 2.8807885e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_bohicikp index_kafka 2021-01-22T06:11:34.804Z SUCCESS NONE 2.8806441e+07 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_fkhbiddb index_kafka 2021-01-22T06:11:04.844Z SUCCESS NONE 2.8808537e+07 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_cplijlpk index_kafka 2021-01-22T04:11:11.944Z FAILED NONE -1 slaprod.1} {index_kafka_default.0_c5b5622de0f241a_hfhbikdf index_kafka 2021-01-22T04:10:41.976Z FAILED NONE -1 default.0} {index_hadoop_slaprod.1_2021-01-22T03:31:46.511Z index_hadoop 2021-01-22T03:31:46.511Z SUCCESS NONE 456767 slaprod.1} {index_hadoop_default.0_2021-01-22T03:30:16.295Z index_hadoop 2021-01-22T03:30:16.295Z SUCCESS NONE 65224 default.0} {index_kafka_default.0_c5b5622de0f241a_okpnnpbf index_kafka 2021-01-23T22:11:39.347Z RUNNING RUNNING 0 default.0} {index_kafka_slaprod.1_42bb463eb8d970b_dbbmhjgl index_kafka 2021-01-23T22:12:04.859Z RUNNING RUNNING 0 slaprod.1}]"
time="2021-01-24T02:44:52Z" level=debug msg="Successful GET request on Druid API - http://172.18.19.243:8081/druid/indexer/v1/supervisor?full"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully collected the data for druid's supervisors"
time="2021-01-24T02:44:52Z" level=error msg="Cannot parse JSON data: json: cannot unmarshal string into Go value of type map[string]interface {}"
time="2021-01-24T02:44:52Z" level=error msg="Druid's API response is not 200, Status Code - 404"
time="2021-01-24T02:44:52Z" level=error msg="Possible issue can be with Druid's URL, Username or Password"
time="2021-01-24T02:44:52Z" level=debug msg="Successfully retrieved the data for druid's datasources rows"
time="2021-01-24T02:44:52Z" level=error msg="Cannot parse JSON data: invalid character '<' looking for beginning of value"
$ curl -sS http://172.18.99.134:7451/metrics
An error has occurred while serving metrics:
2 error(s) occurred:
* [from Gatherer #2] collected metric "druid_workers_capacity_used" { label:<name:"pod" value:"172" > label:<name:"version" value:"0" > gauge:<value:1 > } was collected before with the same name and label values
* [from Gatherer #2] collected metric "druid_workers_capacity_used" { label:<name:"pod" value:"172" > label:<name:"version" value:"0" > gauge:<value:0 > } was collected before with the same name and label values
I am getting the above errors with the latest release. Not sure if I am missing any configuration on Druid.
Great project thus far! However, I feel that there are some issues regarding the direction the exporter is taking. It seem like there there is a mismatch of objectives, i.e. a k8 environment and bare metal environment.
For example, in the emitted metrics, the host parameter has been changed to pod (k8 infra).
// listener/druid-endpoint.go
gauge.With(prometheus.Labels{
"metric_name": strings.Replace(metricName, "/", "-", 3),
"service": strings.Replace(serviceName, "/", "-", 3),
"datasource": datasource,
"pod": podName,
.Set(value)
I understand that podName is important in the k8 environment however, a more generic host should suffice, correct me if I'm wrong. I'll be submitting a few PR to restructure to do the reverse DNS lookup to obtain podname, whilst allowing host=localhost without errors.
@exherb kindly review the PR if it would solve your use case with k8.
After downloading and building the latest version (right now) I am seeing this message like every 2 seconds.
Nov 13 16:00:08 druid-exporter-1 druid-prometheus-exporter[29815]: time="2020-11-13T16:00:08Z" level=error msg="Unable to read JSON response: unexpected EOF"
Any idea why? Thanks a lot!
With the current implementation of the helm-chart it is not possible to configure kubernetes resource requests and limits.
see: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
Using docker image with latest tag, POST request to /druid endpoint gives "404 page not found".
For tag v0.8, it works fine.
Hi, I was trying to deploy using helm chart and I dont see any option for username and password to authenticate with the druid. I get this message.
level=error msg="Possible issue can be with Druid's URL, Username or Password"
druid services health check endpoints return true for all individual services but druid exporter for metric druid_health_status returns 0 value.what parameters affect this overall health check metric?
druid-exporter requires druid-query server to constantly be running. Once you kill the service, druid-exporter automatically panics after a heathcheck query is passed.
{"err":"Get \"<QUERY ENDPOINT>:8888/status/health\": dial tcp 123.123.123.123:8888: connect: connection refused","level":"error","msg":"Error while making GET request for druid healthcheck","ts":"2020-05-26T01:47:08.674Z"}
{"level":"info","msg":"GET request is successful on druid healthcheck","ts":"2020-05-26T01:47:08.683Z","url":"<QUERY ENDPOINT>:8888/status/health"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x731e30]
goroutine 336113 [running]:
druid-exporter/utils.GetHealth(0xc0000fa240, 0x32, 0x0)
~/repo/druid-exporter/utils/http.go:22 +0x270
druid-exporter/collector.GetDruidHealthMetrics(0x0)
~/repo/druid-exporter/collector/druid.go:24 +0x179
druid-exporter/collector.(*MetricCollector).Collect(0xc0000bacc0, 0xc000446120)
~/repo/druid-exporter/collector/druid.go:100 +0x37
github.com/prometheus/client_golang/prometheus.(*Registry).Gather.func1()
~/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:443 +0x19d
created by github.com/prometheus/client_golang/prometheus.(*Registry).Gather
~/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/registry.go:454 +0x599
Extension of issue #19
I don't know if this something to necessarily fix but I do wish to point out something.
When accessing a druid cluster health endpoint that forces authentication, you can still access the health check with bad credentials and avoiding TLS root certs ( -k on curl)
curl -k -u LOLBADUSER:LOLBADPASSWORD https://DRUIDCLUSTERENDPOINT:9088/status/health |jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4 0 4 0 0 133 0 --:--:-- --:--:-- --:--:-- 137
true
I believe some better debugging revolving the user/password check would help out. We didn't catch this problem due to passing the user & password by way of environment variables with docker configurations. I didn't setup the environment variable (DRUID_PASSWORD) according to:
https://github.com/opstree/druid-exporter/blob/master/utils/http.go#L14
I had it setup as DRUID_PASS, so the password field was empty.
However since health checks can still return from Druid regardless of authentication, the exporter will still work in terms of just health checks. I think having a debug statement saying "bad username & password" on the responses on this line would alleviate a lot of potential problems in the future:
https://github.com/opstree/druid-exporter/blob/master/utils/http.go#L38
if *user != "" && *password != "" {
req.SetBasicAuth(*user, *password)
}
resp, err := client.Do(req)
if err != nil {
logrus.Errorf("Error on GET request for druid healthcheck: %v", err)
return 0
}
logrus.Debugf("Successful healthcheck request for druid - %v", url)
When running
druid-exporter --version
it always reports 0.5. Happening in v0.8 and v0.9
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.