braedon / prometheus-es-exporter Goto Github PK

Prometheus Elasticsearch Exporter

License: MIT License

Python 99.77% Dockerfile 0.23%

elasticsearch prometheus prometheus-exporter

prometheus-es-exporter's Introduction

Hi, I'm Braedon, a Kiwi software developer. Currently I'm busy building Alias, a privacy focused authentication provider, and Well-Known, an open index of well-known resources.

In my spare time I maintain a few open source services and tools.

Shh!

Need to send a password to a colleague, but don't want it in your chat history? Use Shh! to safely share secrets with expiring, one-time links. (source)

Up?

Got a link that's down? Up? will notify you when it's back up. (source)

prometheus-es-exporter's People

Contributors

Stargazers

Watchers

Forkers

sbadia echojc anishsid servehub sachinar showmax yangchangshun beylistan contentwise bratislavml dan-simplee monikaka1996 wangjieboy chris-u bigmyx ramaro nicholasgibson2 kioco cznewt suhuaguo marfeel wlf061 mpursley kotchaosu johnroach duedil-ltd lindseyanderson clercrobin giganteous predicthq anhpt379 jdbaldry sjamadari weasellin aqk heirish micst luohuazju nachomillangarcia pasientskyhosting smarkets malnor dpavlos totycro jeyrce a-devops phplucas thenextfreelancer andrew-tully tbcasoft jonny-wg2 palmexx hemanthkini vishrantgupta rg2011 max-len docdocker zemiak froheik sapcc lvkun0223 huangpeng0817 cyrilkuzmin ianboard hgovinda chunhongxu dipsk2 ntavares honestica stowncraft tuapuikia paladox digrix ulimit65535 dartoledo jasonwanga mjozefcz pixelfederation zlseu-edu fzyzcjy chargepoint believezzd isimluk

prometheus-es-exporter's Issues

View alias's in prometheus metric es_indices_mappings_field_count

I would like to be able to filter on alias's for index's.

For example, our es cluster stores 60 indexes. Generally it is logstash-yyyy-mm-dd.

The issue I have is, because the index date changes, I am unable to create a prometheus metric out of es_indices_mappings_field_count to monitor the count of fields in today's metric.

https://stackoverflow.com/questions/58734793/use-today-date-in-label-values-in-prometheus-alerting-rules/58745416#58745416

Support Authorization: header for connection to ElasticSearch

First of all, thanks for sharing your code.

I would like to use it to perform queries against the log store of RedHat's Openshift, but Openshift places its ElasticSearch cluster behind an oAuth proxy that requires Bearer Authentication (please see this)

The Elasticsearch constructor supports an additional parameter headers, where we could add an Authorization: Bearer .... header. That would probably mean adding a new cli flag to provide the bearer token.

I'll try to submit a PR for this, if you agree.

Elasticsearch queries with duplicated timeseries can block metrics endpoint

Hi there,

I believe I have found a case where Prometheus occasionally reports context deadline exceeded while trying to scrape an instance of prometheus-es-exporter with many queries that take a range of time.

In particular I've noticed this one has some long-running queries (~4seconds) and if I repeatedly curl the /metrics endpoint I find it occasionally hangs before responding with the metrics.

I have not done enough in-depth analysis to prove anything but my hypothesis is that it is blocking while waiting for elasticsearch to respond so it can update its metrics, and while blocking Prometheus is timing out waiting for the response so it moves on until the next scrape cycle.

One fix is to increase the timeout that Prometheus will wait. Another is to accelerate Elasticsearch to respond faster to those queries, or to analyze the queries to see if they can be made more efficient and respond faster.

I am creating this issue because while those are obviously problems I should fix, I think that it is worth raising attention to this matter incase it affects others. The above mentioned strategies might work for them.

If my hypothesis is correct then at the cost of more complexity, threads or something similar to them might enable more responsive metrics responses during those long-running queries. I can see that these would be stale metrics though, and so perhaps it is better for it to fail in this case instead of fibbing. I also am not excited about asking to make an exporter more complex than it has to be.

Anyway, feel free to close this issue if users just need to correct it on their own, otherwise perhaps this can serve as a starting point for discussing a solution.

Error in parsing aggregated response

I'm currently working on a big query containing aggregated values. I'm only interested in aggregations.1.value, which in this case would be 396.

The following is the response I get after running the query in ES (v7.3.0)

{
  "took" : 17509,
  "timed_out" : false,
  "_shards" : {
    "total" : 304,
    "successful" : 304,
    "skipped" : 290,
    "failed" : 0
  },
  "_clusters" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "1" : {
      "value" : 396
    }
  }
}

Here is the error message I am getting in return

Traceback (most recent call last):
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 117, in run_query
    metrics = parse_response(response, [name])
  File "/usr/src/app/prometheus_es_exporter/parser.py", line 86, in parse_response
    total = response['hits']['total']
KeyError: 'total'

Thanks for getting back!

How to user your code?

could you give me an example to use your code?thanks !

Add helm chart

Just FYI, I've made a prometheus-es-exporter helm chart, since it seems there wasn't one. I've made a MR to the official charts repository.

Thanks for your work!

yyyy.mm prefix for indices

Could you add any example how to reference montly prefixed indices? Example has QueryIndices = <logstash-{now/d}> . I'm having hard time trying to figure out how to deal with YYYY.MM-indexname type indices. {now/m} doesn't work. It seems to be python config parser related, but I'm not so fluent in python and havent' found an answer quickly. Please help.

Limit nodes_stats to specific fields with command line argument

I found useful to limit nodes_stats to some specific fields, because nodes_stats export a lot of metrics from bigger clusters. Applying this patch do the job:

--- a/prometheus_es_exporter/__init__.py
+++ b/prometheus_es_exporter/__init__.py
@@ -86,9 +86,9 @@ def get_cluster_health(es_client, level):
         update_gauges(metrics)
 
 
-def get_nodes_stats(es_client):
+def get_nodes_stats(es_client,fields):
     try:
-        response = es_client.nodes.stats()
+        response = es_client.nodes.stats(metric=fields)
 
         metrics = nodes_stats_parser.parse_response(response, ['es', 'nodes_stats'])
     except Exception:
@@ -167,6 +167,8 @@ def main():
                         help='disable nodes stats monitoring.')
     parser.add_argument('--nodes-stats-interval', type=float, default=10,
                         help='polling interval for nodes stats monitoring in seconds. (default: 10)')
+    parser.add_argument('--nodes-stats-fields', default='thread_pool',
+                        help='filter for node stats fields (default: thread_pool)')
     parser.add_argument('--indices-stats-disable', action='store_true',
                         help='disable indices stats monitoring.')
     parser.add_argument('--indices-stats-interval', type=float, default=10,
@@ -223,7 +225,7 @@ def main():
         run_scheduler(scheduler, args.cluster_health_interval, cluster_health_func)
 
     if not args.nodes_stats_disable:
-        nodes_stats_func = partial(get_nodes_stats, es_client)
+        nodes_stats_func = partial(get_nodes_stats, es_client, args.nodes_stats_fields)
         run_scheduler(scheduler, args.nodes_stats_interval, nodes_stats_func)
 
     if not args.indices_stats_disable:

Nodes stats and other metrics from cluster APIs

I would like to get metrics for example for cluster node stats so requesting _nodes/stats instead of _all/_search. Is that possible? Or planned?

Question: Ability to view storage space per index?

Hi, is there a prometheus metric to view the total storage consumed by index name?

For example

logstash-2020-07-06: 50gb
logstash-2020-07-05: 20gb

Use the result of a query in another query

Hello. There is some data about customers stored in ES, and I'd like to be able to query it repeatedly. The problem is that the customers change, so I'd like to be able to first get the list of customers. Is it possible with your exporter? Thanks.

No token found

When I tried to add prometheus-es-exporter as target for prometheus I'm finding no token found error.
I used the promtool provided by prometheus to verify the metrics provided by prometheus-es-exporter.
I found that prometheus couldn't parse the following metrics.
When removed the following lines, the metrics were verified properly by promtool.

Can you guys check this out and fix it?

# HELP es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_outgoing_searches
# TYPE es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_outgoing_searches gauge
es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_outgoing_searches{node_id="5JNU1rP3Tr-VOkEnN_hZcQ",node_name="5JNU1rP"} 0.0
# HELP es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_queue_size
# TYPE es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_queue_size gauge
es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_queue_size{node_id="5JNU1rP3Tr-VOkEnN_hZcQ",node_name="5JNU1rP"} 0.0
# HELP es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_service_time_ns
# TYPE es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_service_time_ns gauge
es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_service_time_ns{node_id="5JNU1rP3Tr-VOkEnN_hZcQ",node_name="5JNU1rP"} 143206.0
# HELP es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_response_time_ns
# TYPE es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_response_time_ns gauge
es_nodes_stats_adaptive_selection_5JNU1rP3Tr-VOkEnN_hZcQ_avg_response_time_ns{node_id="5JNU1rP3Tr-VOkEnN_hZcQ",node_name="5JNU1rP"} 249566.0

Handle merging metric dict when query match metric value drops

When the matching query count in Elasticsearch drops to zero, the merge_metric_dicts function crashes at code None.copy(). This behaviour continues to provide same metric value to the Prometheus scrapper which is annoying as prometheus-es-exporter has to be restarted to fetch right metrics

Error Logs:

Traceback (most recent call last):
File "/usr/src/app/prometheus_es_exporter/scheduler.py", line 16, in scheduled_run 
func(*args, **kwargs) 
File "/usr/src/app/prometheus_es_exporter/__init__.py", line 187, in run_query 
zero_missing=True) 
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 164, in merge_metric_dicts 
in old_metric_dict.items() 
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 163, in <dictcomp> 
for metric_name, (metric_doc, label_keys, old_value_dict)
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 124, in merge_value_dicts
value_dict = new_value_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

"filename": "scheduler.py", "funcName": "scheduled_run", "levelname": "ERROR", "levelno": 40, "lineno": 18, "log": "prometheus_es_exporter.scheduler.ERROR MainThread Error while running scheduled job." "message": "Error while running scheduled job.", "message_format": "Error while running scheduled job.", "module": "scheduler", "name": "prometheus_es_exporter.scheduler", "pathname": "/usr/src/app/prometheus_es_exporter/scheduler.py", "process": 1, "processName": "MainProcess", "relativeCreated": 957360909.1424942, "thread": 140296627201856, "threadName": "MainThread"

Name of indices with incrementing IDs

Hi, i'd like to not query all indexes to keep the load reasonable, but unfortunately we're not using date-based index names, but rather incrementing IDs.
Is there any way to do something like QueryIndices = <myIndex_{highest_id}>?

Disable default metrics

How i can disable default metrics? And use only custom queries?

Aggregations on date values return also a key 'value_as_string'

Hi Breadon. First of all: nice software: I had an itch, and I found it.

I have 2 problems, one is covered by #41, the other is that the resultset returns a value_as_string for a timestamp:

{
    "size": 0,
    "query": {
            "query_string": {
                "query": "somefield:somevalue"
            }
    },
    "aggs": {
        "max_timestamp": { "max": { "field": "Timestamp" } }
    }
}

This returns

{
....
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": null,
        "hits": []
    },
    "aggregations": {
        "max_timestamp": {
            "value": 1563960734019.0,
            "value_as_string": "2019-07-24T09:32:14.019Z"
        }
    }
}

I can easily find the line in parse_agg() that can skip this. I can't tell of find references to why this "value_as_string" is included, ES6 and 7 both return this extra field.

Can I change my query to not return that 'value_as_string' or is this a bug in the exporter?

Enable data table queries

Step to reproduce:

Query from a data table visualization

{
  "size": 0,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "component: switch",
            "analyze_wildcard": true
          }
        },
        {
          "range": {
            "fecha": {
              "from":"now-1d"
            }
          }
        }
      ],
      "must_not": []
    }
  },
  "aggs": {
    "table": {
      "terms": {
        "field": "userId",
        "size": 25,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}

Response from ES

{
  "took": 9,
  "timed_out": false,
  "_shards": {
    "total": 12,
    "successful": 12,
    "failed": 0
  },
  "hits": {
    "total": 118526,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "table": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "user1",
          "doc_count": 99816
        },
        {
          "key": "user2",
          "doc_count": 14711
        },
        {
          "key": "user3",
          "doc_count": 1720
        }
        }
      ]
    }
  },
  "status": 200
}

Actual result: "doc_count_error_upper_bound" and "sum_other_doc_count"
Benefit: we could send only one query and receive several metrics

More example for instrument metrics name and label on export.cfg

Hi,

Just curious, what should i do if I'd like to let Prometheus metrics coming out with label or custom name?

To demo this, suppose I have following defined in exporter.cfg:

[query_all]
# How often to run this query.
QueryIntervalSecs = 5
# The indices to run the query on. Any way of specifying indices supported by your Elasticsearch version can be used.
# This key is optional - if not specified, the default is '_all'.
QueryIndices = _all
# The search query to run.
QueryJson = { "size": 0, "query": { "match_all": {} } }

[query_login]
QueryIntervalSecs = 20
QueryIndices = _all
QueryJson = { "query": { "query_string": { "query": "path: \"*login\" AND method: \"GET\" AND status:\"200\"" } } }

So Prometheus metrics come out with:

# HELP login_hits 
# TYPE login_hits gauge
login_hits 1756.0
...
...
# HELP all_hits 
# TYPE all_hits gauge
all_hits 18923.0

Now what should I change on "[query_login]" in order to get like below?

http_hits{path="/login", method:"GET", status="200"} 1756.0

Also, what should I change on exporter.cfg in order to get like this below?

avg_req_size{path="/login", method:"GET", status="200"} 1234.5
avg_resp_size{path="/login", method:"GET", status="200"} 1234.5

Thanks.

ConnectionRefusedError: [Errno 111] Connection refused

Hey! Please refer below logs, which are being created on deploying the pod and not accessible through curl.

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 157, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 84, in create_connection
raise err
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 74, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 238, in perform_request
method, url, body, retries=Retry(False), headers=request_headers, **kw
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 376, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.6/http/client.py", line 1254, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1300, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1249, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.6/http/client.py", line 1036, in _send_output
self.send(msg)
File "/usr/local/lib/python3.6/http/client.py", line 974, in send
self.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 184, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7ffb7b0f6198>: Failed to establish a new connection: [Errno 111] Connection refused
[2019-12-26 06:26:54,300] elasticsearch.WARNING MainThread GET http://ingest-node-elk-final-lb.t004-u000005:9200/_cluster/health?level=indices [status:N/A request:0.002s]

On executing the curl this is the error, please refer and suggest:

url -v -g http://[fd74:ca9b:172:21::17:6032]:9206

About to connect() to fd74:ca9b:172:21::17:6032 port 9206 (#0)
Trying fd74:ca9b:172:21::17:6032...
Connection refused
Failed connect to fd74:ca9b:172:21::17:6032:9206; Connection refused
Closing connection 0
curl: (7) Failed connect to fd74:ca9b:172:21::17:6032:9206; Connection refused
[root@uhn7kbrc1rbrm001 ~]#
[root@uhn7kbrc1rbrm001 ~]#
[root@uhn7kbrc1rbrm001 ~]#
[root@uhn7kbrc1rbrm001 ~]# curl -v -g http://[fd74:ca9b:172:19::d58d]:9206
About to connect() to fd74:ca9b:172:19::d58d port 9206 (#0)
Trying fd74:ca9b:172:19::d58d...
Connection refused
Failed connect to fd74:ca9b:172:19::d58d:9206; Connection refused
Closing connection 0
curl: (7) Failed connect to fd74:ca9b:172:19::d58d:9206; Connection refused

Support ChaosSearch Elastic Search Endpoint

Hi, do you support the Chaos Search ES endpoint? https://docs.chaossearch.io/docs/chaossearch-api

Their authentication method is via AWS access key and tokens.

awsauth = AWS4Auth("Access-Key-ID", "Secret-Access-Key", "us-east-1", 's3')
es = Elasticsearch(
  hosts = [{'host': 'poc-trial.chaossearch.io', 'port': 443, 'url_prefix': '/elastic', 'use_ssl': True}],
  http_auth=awsauth,
  connection_class=RequestsHttpConnection,
  verify_certs=True
)

No way to get per-node index stats

es_indices_stats_primaries_docs_count contains aggregated stats for the entire cluster. But there's no corresponding per-node index data, even though the code looks like it's supposed to default to getting everything from /_nodes/stats. For example:

$ curl -s http://10.4.14.73:9206/metrics | grep es_nodes | grep indic | wc -l
0

Here is the complete output. My flags (in Kubernetes format):

        args:
        - --query-disable
        - --cluster-health-timeout
        - "30"
        - --nodes-stats-timeout
        - "30"
        - --indices-stats-timeout
        - "30"
        - --indices-stats-mode
        - indices
        - -e
        - localhost:9200

Not getting any different if I try to force it with --nodes-stats-metrics indices.

Edit: I see that the code explicitly drops this information:

excluded_keys = [
    'timestamp',
    'indices',
]

Is there any reason why?

Error when run

prometheus-es-exporter -p 9206 -e http://localhost:9200 -c /opt/prometheus-es-exporter/exporter.cfg
Traceback (most recent call last):
File "/opt/prometheus-es-exporter/prometheus-es-exporter/bin/prometheus-es-exporter", line 7, in
from prometheus_es_exporter import main
File "/opt/prometheus-es-exporter/prometheus-es-exporter/lib/python3.4/site-packages/prometheus_es_exporter/init.py", line 17, in
from prometheus_es_exporter import cluster_health_parser
File "/opt/prometheus-es-exporter/prometheus-es-exporter/lib/python3.4/site-packages/prometheus_es_exporter/cluster_health_parser.py", line 34
result.extend(parse_block(n_value, metric=metric + [key], labels={**labels, singular_key: [n_key]}))
^
SyntaxError: invalid syntax

Can you advise ?

Can't aggregate by numeric field

Bucket label generation expects string bucket names - throws an exception on numeric:

  File "/usr/local/bin/prometheus-es-exporter", line 9, in <module>
    load_entry_point('prometheus-es-exporter', 'console_scripts', 'prometheus-es-exporter')()
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 135, in main
    scheduler.run()
  File "/usr/local/lib/python3.5/sched.py", line 147, in run
    action(*argument, **kwargs)
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 65, in scheduled_run
    update_gauges(metrics)
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 31, in update_gauges
    for key in label_keys
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 31, in <listcomp>
    for key in label_keys
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 16, in format_label_value
    return '_'.join(value_list).replace('.', '_')
TypeError: sequence item 1: expected str instance, int found

configparser.InterpolationSyntaxError if using regex with %

Hi,
i've been building a query that includes a regex with % in it, which the configparser doesn't seem to like:

Traceback (most recent call last):
  File "/usr/local/bin/prometheus-es-exporter", line 11, in <module>
    load_entry_point('prometheus-es-exporter', 'console_scripts', 'prometheus-es-exporter')()
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 382, in main
    query = json.loads(config.get(section, 'QueryJson'))
  File "/usr/local/lib/python3.7/configparser.py", line 799, in get
    d)
  File "/usr/local/lib/python3.7/configparser.py", line 394, in before_get
    self._interpolate_some(parser, option, L, value, section, defaults, 1)
  File "/usr/local/lib/python3.7/configparser.py", line 444, in _interpolate_some
    "found: %r" % (rest,))
configparser.InterpolationSyntaxError: '%' must be followed by '%' or '(', found: '%3[aA])[0-9]+|.+\\%40.+).*\'\n}\n}\n]\n}\n},\n"aggs": {\n"request": {\n"terms": {\n"field": "request_path",\n"size": 100\n},\n"aggs": {\n"response": {\n"terms": {\n"field": "response"\n}\n}\n}\n}\n}\n}'

The (simplified) config looks like this:

QueryJson = {
    "size": 0,
    "query": {
      "bool": {
        "must_not": [
          {
            "regexp": {
              "request_path": ".*(.+(\/|:|%3a)[0-9]+|.+%40.+).*"
            }
          }
        ]
      }
    }

_search just return 10000

I try to use the search, when the number is above 10000, it just return 10000,

POST logstash*/_search?size=0
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": """ xxx """,
"analyze_wildcard": true
}
}
],
"filter": [
{
"range": {
"@timestamp": {
"gt": "now-10m",
"lt": "now"
}
}
}
]
}
}
}

--return
{
"took" : 74,
"timed_out" : false,
"_shards" : {
"total" : 355,
"successful" : 355,
"skipped" : 331,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
}
}

On query failure metric is not removed

I have a terms query setup via an alias.
I launch the exporter and all is good, the metric is there and then I get a doc count from it.

I change the index associated with the alias and I see the following error in the logs:

  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 115, in run_query
    response = es_client.search(index=indices, body=query, request_timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 660, in search
    doc_type, '_search'), params=params, body=body)
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 318, in perform_request
    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 186, in perform_request
    self._raise_error(response.status, raw_data)
  File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: RequestError(400, 'search_phase_execution_exception', 'failed to create query:

The terms query count does not disappear I can then not react to it with an is absent the last value remains in place.
I have not look at the code but it looks like the exception is not correctly handled to update the metrics exported.

Reload when config is updated

How to reload when the config is updated?

Add `took` from query response as a metric?

The performance profile of the queries you run for monitoring could change over time e.g., bringing on new clients in a multi-tenanted setup. Adding the time taken to execute the query would be a simple way to monitor the monitoring, so to speak.

Aggregate by two labels

Hi,
We want to export using two labels of a metric and get the result like this:

Metric :
ins_lighthouse {uri="tenant.com", metric="performance"} 100.0
ins_lighthouse {uri="tenant.com", metric="seo"} 100.0

instead of the current result that is:

ins_lighthouse_seo{uri="tenant.com"} 100.0
ins_lighthouse_performance{uri="tenant.com"} 100.0

Is there a way to get this result ? Some work in progress? If not maybe we will try to develop some code that could get this.

Thanks

Invalid metric name

If you configure a percentile query to Elasticsearch, you receive the following error (Invalid metric name):

Traceback (most recent call last):
File "./prometheus-es-exporter", line 11, in
sys.exit(main())
File "/home/logiadmin/.local/lib/python3.5/site-packages/prometheus_es_exporter/init.py", line 135, in main
scheduler.run()
File "/usr/lib/python3.5/sched.py", line 147, in run
action(_argument, *_kwargs)
File "/home/logiadmin/.local/lib/python3.5/site-packages/prometheus_es_exporter/init.py", line 65, in scheduled_run
update_gauges(metrics)
File "/home/logiadmin/.local/lib/python3.5/site-packages/prometheus_es_exporter/init.py", line 41, in update_gauges
gauge = Gauge(metric_name, '', label_keys)
File "/home/logiadmin/.local/lib/python3.5/site-packages/prometheus_client/core.py", line 333, in init
raise ValueError('Invalid metric name: ' + full_name)
ValueError: Invalid metric name: homemvc_perc_values_50.0

I have reviewed the prometheus_client project and I see that Prometheus can't create a metric with "." in its name.

How to Extend Metric Labelling

Hi,
Is it possible to annotate metrics such es_cluster_health_status with meaningful labels such as colour=<green|yellow|red>? I'm planning some dashboards and this would be very helpful.

Documentation update with new features is needed

I'm working on fresh setup of Prometheus monitoring and found prometheus-es-exporter useful for scraping logs exceptions from elasticsearch index.
While trying to setup prometheus-es-exporter ssl certificates configuration was used for Elasticsearch connection. Here is snippet from my kubernetes yaml:

      containers:
      - args:
        - --indices-stats-disable                                                                                                                                                                     
        - -e                                                                                                                                                                                                  
        - https://elasticsearch:9200                                                                                                                                                                  
        - --ca-certs                                                                                                                                                                                  
        - /elastic-certs/ca.crt                                                                                                                                                                       
        - --client-cert                                                                                                                                                                               
        - /elastic-certs/kibana.crt
        - --client-key
        - /elastic-certs/kibana-nopasswd.key

Unfortunately this feature is not documented yet.
Could you please add some documentation about other useful features?

Handle merging metric dict when query match metric value drops

Error Logs:

Traceback (most recent call last):
File "/usr/src/app/prometheus_es_exporter/scheduler.py", line 16, in scheduled_run 
func(*args, **kwargs) 
File "/usr/src/app/prometheus_es_exporter/__init__.py", line 187, in run_query 
zero_missing=True) 
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 164, in merge_metric_dicts 
in old_metric_dict.items() 
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 163, in <dictcomp> 
for metric_name, (metric_doc, label_keys, old_value_dict)
File "/usr/src/app/prometheus_es_exporter/metrics.py", line 124, in merge_value_dicts
value_dict = new_value_dict.copy()
AttributeError: 'NoneType' object has no attribute 'copy'

Metrics didn't get the label from aggs

Here is my config

[query_es_test]
QueryIntervalSecs = 30
QueryTimeoutSecs = 15
QueryIndices = 
QueryJson = {
                "size": 0,
                "query": {
                    "bool": {
                        "must": [{ "match": { "state": "FAILURE"} }],
                        "filter": {
                            "bool": {
                                "must": [
                                    { "range": { "@timestamp": { "gte": "now-24h", "lt": "now-6h" } } }
                                ]
                            }
                        }
                    }
                },
                "aggs": {
                    "env": {
                        "terms": {"field": "env.keyword"},
                        "aggs": {
                            "val_sum": {
                                "sum": {
                                    "field": "val",
                                    "missing": 1
                                }
                            }
                        }
                    }
                }
            }

And the result I get from by using

curl -X GET "http://my-es:9200/_search" -H 'Content-Type: application/json' -d'
{
    "size": 0,
    "query": {
        "bool": {
            "must": [{ "match": { "state": "FAILURE"} }],
            "filter": {
                "bool": {
                    "must": [
                        { "range": { "@timestamp": { "gte": "now-24h", "lt": "now-6h" } } }
                    ]
                }
            }
        }
    },
    "aggs": {
        "env": {
            "terms": {"field": "env.keyword"},
            "aggs": {
                "val_sum": {
                    "sum": {
                        "field": "val",
                        "missing": 1
                    }
                }
            }
        }
    }
}' | jq .
{
  "took": 101,
  "timed_out": false,
  "_shards": {
    "total": 451,
    "successful": 451,
    "skipped": 401,
    "failed": 0
  },
  "hits": {
    "total": 256,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "env": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "staging",
          "doc_count": 164,
          "val_sum": {
            "value": 164
          }
        },
        {
          "key": "production",
          "doc_count": 92,
          "val_sum": {
            "value": 92
          }
        }
      ]
    }
  }
}

I thought I will get the label like env=production, howerver, the label didn't apply on the metrics. is there anything i did wrong?

es_test_hits 0.0
# HELP es_test_took_milliseconds 
# TYPE es_test_took_milliseconds gauge
es_test_took_milliseconds 11.0
# HELP es_test_env_doc_count_error_upper_bound 
# TYPE es_test_env_doc_count_error_upper_bound gauge
es_test_env_doc_count_error_upper_bound 0.0
# HELP es_test_env_sum_other_doc_count 
# TYPE es_test_env_sum_other_doc_count gauge
es_test_env_sum_other_doc_count 0.0

Query the state of each index

Hi:
I want to query all the information for each index, and how do I do that.
thanks

ValueError: Duplicated timeseries in CollectorRegistry - when using top_hits aggregations

Hi @braedon

I'm trying to set up a query with aggregations, and I'm hitting this (periodic) exception:

[2019-03-25 13:56:29,145] root.ERROR MainThread Error while running scheduled job.
Traceback (most recent call last):
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 210, in scheduled_run
    func()
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 121, in run_query
    update_gauges(metrics)
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 78, in update_gauges
    gauge = Gauge(metric_name, '', label_keys)
  File "/usr/local/lib/python3.6/site-packages/prometheus_client/metrics.py", line 320, in __init__
    labelvalues=labelvalues,
  File "/usr/local/lib/python3.6/site-packages/prometheus_client/metrics.py", line 103, in __init__
    registry.register(self)
  File "/usr/local/lib/python3.6/site-packages/prometheus_client/registry.py", line 29, in register
    duplicates))
ValueError: Duplicated timeseries in CollectorRegistry: {'es_query_all_non_kubesystem_messages_by_containername_tops_hits_hits'}

First of all: I didn't check the code at all, nor attempted to debug it. I think the exception is harmless as I'm getting all other fields. What I think it happens is because of the result having:

  "aggregations": {
    "by_namespace": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 45,
      "buckets": [
// ...
        {
          "key": "some-container-name",
          "doc_count": 24,
          "tops": {
            "hits": {
              "total": 24,
              "max_score": 1.4e-45,
              "hits": [
                {

the last block will be repeated by each bucket, and since the metric name is forged from the aggregation name, then it will collide. I'm not very experienced with ElasticSearch either, but this is the query I'm running:

    QueryJson = {
        "query": {
          "bool": {
            "must_not": { "match": {"kubernetes.namespace_name": "kube-system" }},
            "filter": {
              "bool": {
                "must": [
                  { "range": { "@timestamp": { "gte": "now-1m", "lt": "now" } } }
                ]
              }
            }
          }
        },
        "aggs": {
          "by_containername": {
            "terms": {
              "field": "kubernetes.container_name.keyword"
            },
            "aggs": {
              "tops": {
                "top_hits": {
                  "size": 10
                }
              }
            }
          }
        }
      }

... and the goal is to have the top 10 containers with the most logging lines. If I remove the "tops" section, then I'll get all the counters for all containers, which is a bit too much. I'm not sure how to fix this at the ElasticSearch level, and it's possible that the schema of the results changed - or that people are not using aggregations at all? - do you think there's room for a fix? (I'm guessing the es-exporter will soon be on my result list for that query, with all the logging from the exceptions eheheh)

How to enrich Prometheus metrics with variable labels

Hi,

I am currently using Grafana with an ElasticSearch datasource to do variable queries based on my dashboard variables (e.g.: query for all or a specific node):

node:"$node.*" AND logfile:"docker" AND status:404

I wanted to switch to your prometheus-es-exported in order to unify everything in Prometheus without direct queries to other data source.

However, now I am unable to do variablized queries and I was asking myself if it is possible to somehow enrich the results with custom labels. Currently I only see two labels:

instance
job

Is it somehow possible to group those results based on custom labels such as:

test_404_access_logs_last_15_min_hits{node:"node1",instance="i", job="j"}   5
test_404_access_logs_last_15_min_hits{node:"node2",instance="i", job="j"}   4
test_404_access_logs_last_15_min_hits{node:"node3",instance="i", job="j"}   6
test_404_access_logs_last_15_min_hits{node:"node4",instance="i", job="j"}   8

This node would be a key from an elastic search query and the actual query definition would only count per unique node keys.

This way I could use those metrics in Grafana with variables on nodes and use a single dashboard to selectively show logs for all nodes or a very specific node.

Nodes stats parser fails on AWS managed Elasticsearch

Hi,

I faced with errors in some queries. Errors are as follow (pls check that response code was 200):

[2019-08-19 10:06:45,275] elasticsearch.INFO Thread-2 GET https://my-es-instance.ap-southeast-1.es.amazonaws.com:443/_nodes/stats [status:200 request:0.800s]
[2019-08-19 10:06:45,475] root.ERROR Thread-2 Error while fetching Nodes Stats.
Traceback (most recent call last):
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 168, in collect
    metrics = nodes_stats_parser.parse_response(response, self.metric_name_list)
  File "/usr/src/app/prometheus_es_exporter/nodes_stats_parser.py", line 77, in parse_response
    result.extend(parse_node(value, metric=metric, labels=OrderedDict({'node_id': [key]})))
  File "/usr/src/app/prometheus_es_exporter/nodes_stats_parser.py", line 66, in parse_node
    return parse_block(node, metric=metric, labels=labels)
  File "/usr/src/app/prometheus_es_exporter/nodes_stats_parser.py", line 47, in parse_block
    result.extend(parse_block(value, metric=metric + [key], labels=labels))
  File "/usr/src/app/prometheus_es_exporter/nodes_stats_parser.py", line 52, in parse_block
    bucket_name = n_value[bucket_name_key]
KeyError: 'path'

I use AWS managed ES version 6.3. I've just upgraded es exporter to latest version (0.5.2), but issue still persists.

Please suggest.

Arbitrary labels for metrics

I'm using this exporter to alert via Prometheus when a query hit count is over a certain limit. When an alert is triggered, the content of the alert is fairly basic, only telling the user the name of the alert and not much else. What I would like to do is to include a link to Kibana with the search that triggered the alert. E.g. the alert would look something like

Alert: App error(s) seen in the last 5 minutes - critical
 Cluster:  https://my-cluster
 Graph: <link to Prometheus graph>
 Details:
    • alertname: AppErrors
    ...
    • query: <link to Kibana search using this query>

Is that something this exporter supports?

Total field metric

Do you have an example to create a prometheus metric out of the count of total fields?

For example, under management/kibana/index_patterns you can see the total number of fields. I would like to create a prometheus metric so I can alert if there are 1000 fields or more.

Enable loading all configuration options from a file.

Currently you can load options to do with queries from a file, but not all the command line options. I'd like to be able to specify those in a configuration file, so that file can be kept in a separate place. I'm planning to work on this issue, probably just by adding this https://pypi.org/project/click-config-file/ and testing it a bit.

How can i pass a ca.crt to the exporter.

How can i pass a ca.crt to the exporter. is there a docker variable?

Originally posted by @jayanra in #39 (comment)

es index storage size is inconsistent

I use curl -X GET _cat/indices?v&s=index method and use exporter --es.indices to get elasticsearch_indices_store_size_bytes_total value is not the same?

ES 7+ compatibility

I've upgraded ES to 7.2 and noticed that mine metrics against ES failed. After some research I found that metric if failing with:

[2019-07-19 17:50:56,967] root.ERROR MainThread Error while running scheduled job.
Traceback (most recent call last):
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 210, in scheduled_run
    func()
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 121, in run_query
    update_gauges(metrics)
  File "/usr/src/app/prometheus_es_exporter/__init__.py", line 89, in update_gauges
    gauge.set(value)
  File "/usr/local/lib/python3.6/site-packages/prometheus_client/metrics.py", line 340, in set
    self._value.set(float(value))
TypeError: float() argument must be a string or a number, not 'dict'

I think the reason is that hits output is changed in ES7 which breaks things. So or additional parameter rest_total_hits_as_int=true should be added to search queries URL, or some ES7 support should be introduced as rest_total_hits_as_int parameter will be removed in ES8.

Please advise how this issue could be fixed

how can i get the result about "Prometheus gauge metrics"

Hi,

I run it as:

prometheus-es-exporter -c ./exporter.cfg -p 9200 -e 192.168.200.138:9200

and it's running :

[2017-08-24 11:29:54,125] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/_cluster/health?level=indices [status:200 request:0.198s]
[2017-08-24 11:29:54,440] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/_nodes/stats [status:200 request:0.313s]
[2017-08-24 11:29:54,496] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/_stats [status:200 request:0.048s]
[2017-08-24 11:29:54,499] root.INFO MainThread Starting server...
[2017-08-24 11:29:54,499] root.INFO MainThread Server started on port 9200
[2017-08-24 11:29:54,541] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/_all/_search [status:200 request:0.041s]
[2017-08-24 11:29:54,672] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/%3Cmegacorp%3E/_search [status:200 request:0.130s]
[2017-08-24 11:30:09,047] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/_all/_search [status:200 request:0.114s]
[2017-08-24 11:30:14,027] elasticsearch.INFO MainThread GET http://192.168.200.138:9200/%3Cmegacorp%3E/_search [status:200 request:0.095s]

but what's the path of result? how can i get the result about "Prometheus gauge metrics"
thanks

Labels

I would like to add pod level labels to exported series. Is there a way to accomplish this?

Thank you so much for sharing your work.

No logging when `-c` given no value or if value is incorrect

$ docker run --rm --name es-exporter -v /tmp/query.cfg:/mnt/configs/exporter.cfg -p 9988:8080 braedon/prometheus-es-exporter:0.1.0 -e mon-es1.movio.co 
[2016-07-21 02:36:11,376] root.INFO MainThread Starting server...
[2016-07-21 02:36:11,376] root.INFO MainThread Server started on port 8080
[2016-07-21 02:36:11,377] root.INFO MainThread Shutting down

Support for client-certs?

Hi,
how can i provide my client-cert / key to the application? I just see a parameter for a ca-cert, but not for a client-cert and its key.

thx

Support Elastic Scroll API

Hi,

I have a use case where I get thousands of results.

Actually I use a workaround with multiple queries which is not so cool...

Is there a possibility to query elastic using the scroll api ?

Can I exclude the `python_` and `es_` metrics somehow?

Hello! Many thanks for making this project!

I was wondering, if there is a was to configure this to exclude the python_* and es_* metrics - hence to only have metrics based on my "own" queries? I'm hoping this will increase the response time of entire metrics page.