dabz / ccloudexporter Goto Github PK
View Code? Open in Web Editor NEWPrometheus exporter for Confluent Cloud API metric
Home Page: https://docs.confluent.io/current/cloud/metrics-api.html
Prometheus exporter for Confluent Cloud API metric
Home Page: https://docs.confluent.io/current/cloud/metrics-api.html
In order to incorporate this as part of a production monitoring solution I'm looking for a "health" endpoint that would fail if this app is down, is there such an endpoint which isn't the metrics endpoint that seems to be hitting the cluster everytime?
Thanks a lot for this project
I noticed that the admin client is failing to connect. It appears unused (and should be unnecessary) so I recommend removing references to it: https://github.com/Dabz/ccloudexporter/blob/master/cmd/ccloudexporter/ccloudexporter.go#L30-L40
Hello,
I've followed the exact instructions for both docker and Go.
getting the following error message, even through the API key is valid:
{
"Endpoint": "https://api.telemetry.confluent.cloud/v1/metrics/cloud/descriptors",
"StatusCode": 403,
"body": "eyJlcnJvciI6eyJjb2RlIjo0MDMsIm1lc3NhZ2UiOiJpbnZhbGlkIEFQSSBrZXkifX0K",
"level": "fatal",
"msg": "Received status code 403 instead of 200 for GET on https://api.telemetry.confluent.cloud/v1/metrics/cloud/descriptors. \n\n{\"error\":{\"code\":403,\"message\":\"invalid API key\"}}\n\n\n",
"time": "2021-01-14T00:20:31Z"
}
would appreciate any help with this
The Prometheus format is available and works really well. To elevate ccloudexporter as the de-facto for gathering CCloud metrics instead of relying on multiple different solutions, I want to propose adding a Kafka Sink to the code base.
The defaults dont need to change at all, but if we can provide a Kafka sink, it would provide customers an option to stream the data to Kafka. Considering they might have a system that is streaming data off a sink connector to aggregation platforms like Splunk, ES etc, this would mean just adding topic name from this component to sink connector config to stream API data from Kafka as well.
P.S: I do understand that keeping the data of the system being monitored onto the system itself is an anti-pattern, but if we can give a choice, a lot of customers might like it.
Hello.
Is it possible somehow to pass config with other metrics or just change the metrics we export from Confluent, when using Docker or Kubernetes?
Thank you in advance.
I started using the latest image and i am getting 403 instead of 200 on posting https://api.telemetry.confluent.cloud//v1/metrics/cloud/query ..
and i am not getting all the metrics per topic
Received status code 403 instead of 200
never got the 403 when using the Dockerfile on Tree 296442c -- ( but the size of the image is huge )
Recommend using the 'attributes' endpoint to get the topic list to avoid any future dependency on the admin client: https://github.com/Dabz/ccloudexporter/blob/master/cmd/internal/scrapper/scrapper.go#L38-L54
https://api.telemetry.confluent.cloud/v1/metrics/{dataset}/attributes
Hi all,
I have followed https://docs.microsoft.com/en-us/azure/azure-monitor/containers/container-insights-prometheus-integration
documentation and currently we have cluster-wide omsagent replicaset (singleton) to scrape from the ccloudexporter deployment in AKS via kubernetes service.
kubectl logs <ccloudexporter>
pod says "Listening on http://:2112/metrics\n", and
and just example of 1 of the lines in kubectl logs <omsagent rs>
pod says
> prometheus, address=x.x.x.x, scrapeUrl=http://ccloudexportersvc.ccloudexporternamespace.svc.cluster.local:2112/metrics go_memstats_heap_objects=2577
However, in Azure portal > kubernetes service > one of the cluster > Logs > and I've ran queries such as
InsightsMetrics
| where Namespace contains "prometheus"
| where Computer contains "<hostname/node of the omsagent rs>
but the query returns no result.
When running the ccloud-exporter
container within Kubernetes. The CCloud Metrics API rate limit of 50 requets / second is easily hit. Even when the pod has livenessProbe
and readinessProbe
disabled. Example of such an error
"error": "Received status code 429 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ()",
"level": "error",
"msg": "Query did not succeed",
"optimizedQuery": {
"aggregations": [
{
"agg": "SUM",
"metric": "io.confluent.kafka.server/partition_count"
}
],
"filter": {
"op": "AND",
"filters": [
{
"op": "OR",
"filters": [
{
"field": "resource.kafka.id",
"op": "EQ",
"value": "<redacted>"
}
]
}
]
},
"granularity": "PT1M",
"group_by": [],
"intervals": [
"2021-09-06T17:09:00Z/PT1M"
],
"limit": 1000
},
"response": {
"data": null
},
"time": "2021-09-06T17:11:30Z"
}
Once the API rate limit is triggered, it tends to self-maintain in an infinite loop. Probably because ccloud-exporter
retries in quick succession without giving enough wait time between metrics collection.
Add a new config secondsBetweenRetry
as waiting time between a retries when an access to CCloud Metrics API had failed. Ideally this pause should be at the individual API request, and not at the batch of requests, like 9 at a time as in config.simple.yaml.
Even better, this pause duration should follow an "exponential backoff" for example, starts at 5 seconds, then doubles at each new retry and be capped off at, let's say 5 minutes.
I am not sure if this feature already exists. This may be another major iteration, or may need a different app altogether.
Metrics API recently saw a down time of few hours.
The idea here is to have something like a simple API call to back-fill those few hours of data with an overridable endpoint + metric format, ex. influx or Datadog, all while maintaining the same metric names as exposed by Prometheus in real-time.
When raised this issue with confluent support they simply asked to implement a retry mechanism, as recommended in the documentation. Is there a way this can be implemented in ccloudexporter?
{
"Endpoint": "https://api.telemetry.confluent.cloud//v1/metrics/cloud/query",
"StatusCode": 503,
"body": "upstream connect error or disconnect/reset before headers. reset reason: overflow",
"level": "error",
"msg": "Received invalid response",
"time": "2021-04-08T05:53:22Z"
}
{
"Endpoint": "https://api.telemetry.confluent.cloud//v1/metrics/cloud/query",
"StatusCode": 500,
"body": "{\"errors\":[{\"status\":\"500\",\"detail\":\"There was an error processing your request. It has been logged (ID xxxxxxxxxxxxxxx).\"}]}",
"level": "error",
"msg": "Received invalid response",
"time": "2021-04-08T05:59:30Z"
}
We are running this exporter and the Kubernetes pod keeps crashing and restarting. We aren't sure what's happening since no logging is occurring, we just know that the probe on /metrics
fails and Kubernetes restarts the pod. Is there a way to get additional logging to figure out what's going on?
TYPE confluent_kafka_server_request_bytes gauge
All queries are sent synchronously, as the number of metric is increasing (thus the number of queries to send), the scrape duration is increasing.
The exporter should execute the query asynchronously in order to reduce the scrape duration (and avoid reaching the scrape timeout...)
It's easy to forget you have a ccloudexporter
instance running against a cluster that has been torn down. This results in spamming 403 errors until someone notices. Since 403 errors are usually permanent failures that require user intervention to fix, consider crashing the ccloudexporter
to fail fast if one is returned.
It seems that grouping per partition is generating a lot of data and make it harder to use. On top of that, it seems that not all metrics will be able to be grouped per partition in the future.
We should remove the grouping per partition, maybe we could include it later on, but with more restrictions (e.g. when it's specified for a single topic)
It seems that, if you have multiple clusters configure in the configuration file, the data points are no longer grouped by clusters.
Cause: The exporter is relying on the "labels" field in the descriptor endpoint to find out which label can be used to "group by" the metrics (https://github.com/Dabz/ccloudexporter/blob/master/cmd/internal/collector/collector.go#L144-L152). It seems that the Metrics API is no longer exposing the cluster_id in the list of label.
Workaround: Have multiple rules in your configuration file, or multiple instances of the exporter.
The Prometheus format is available and works really well. To elevate ccloudexporter as the de-facto for gathering CCloud metrics instead of relying on multiple different solutions, I want to propose adding a Kafka Sink to the code base.
The defaults don't need to change at all, but if we can provide a Kafka sink, it would provide customers an option to stream the data to Kafka. For customers already streaming metrics from Kafka to an end system, this means only adding topic name to the sink connector config to stream API data from Kafka as well.
P.S: I do understand that keeping data of the system being monitored onto the system itself is an anti-pattern, but if we can give a choice, a lot of customers might like it.
Hello,
While using ccloudexporter to access Metrics API v1.
Occasionally, we see a 429 response code from the metrics API. When we reached out to confluent, they pointed us to this document - https://api.telemetry.confluent.cloud/docs#section/Object-Model/Datasets
Basically, there is a limit of 50 requests/minute per IP.
Question: Does the exporter respect this limit for V1 API as of now? If not, are there plans to support it?
Hi All,
I am trying to run the exporter by using Docker command to extract metrics from our confluent cloud setup.
docker run \
-e CCLOUD_API_KEY=$CCLOUD_API_KEY \
-e CCLOUD_API_SECRET=$CCLOUD_API_SECRET \
-e CCLOUD_CLUSTER=lkc-abc123 \
-p 2112:2112 \
dabz/ccloudexporter:latest
But I am getting the following error:
{
"error": "Get \"https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources\": x509: certificate signed by unknown authority",
"level": "fatal",
"msg": "HTTP query for the descriptor endpoint failed",
"time": "2021-12-08T08:42:53Z"
}
Is it due to we enable the X.509 certificate at confluent cloud? Anyone know how to solve it? Appreciate your help. Thanks.
Hi,
we have connectors running in dev clusters and we have more than one dedicated cluster in our environments, we are able to see the metrics of kafka cluster which has dimensions of kafka.id where as connectors don't have any dimensions(kafka.id)), would that be possible to add cluster for the connectors so that we can find connector running associated with kafka.id?
I don't see label of kafka.id when I query manual from metrics api.
thanks
Niranjan
Converted the ccloudexporter
kubernetes files into a helm chart and am running into a timeout issue.
The deployment has these env vars set:
env:
- name: CCLOUD_API_KEY
value: "vault:secret/grafana/kafka/ccloud#CCLOUD_API_KEY"
- name: CCLOUD_API_SECRET
value: "vault:secret/grafana/kafka/ccloud#CCLOUD_API_SECRET"
- name: CCLOUD_CLUSTER
value: {{ .Values.cluster }}
Seeing:
kubectl logs -n grafana ccloud-exporter-deployment-cdcbbbb67-wq9hr -f
{
"error": "Get \"https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources\": dial tcp 52.38.184.52:443: i/o timeout",
"level": "fatal",
"msg": "HTTP query for the descriptor endpoint failed",
"time": "2021-08-31T18:18:57Z"
}
which tells me the env vars (api_key|secret) are valid but the request is timing out.
Did a little test with a test pod:
› cat test.yaml ☠️
apiVersion: v1
kind: Pod
metadata:
name: test-pod-name
namespace: grafana
spec:
containers:
- name: test-pod-name
env:
- name: CCLOUD_API_KEY
value: vault:secret/grafana/kafka/ccloud#CCLOUD_API_KEY
- name: CCLOUD_API_SECRET
value: vault:secret/grafana/kafka/ccloud#CCLOUD_API_SECRET
command: ["/bin/bash", "-c"]
args:
- curl -u $CCLOUD_API_KEY:$CCLOUD_API_SECRET https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources\?resource_type\=kafka
and I see:
› kubectl logs -n grafana test-pod-name -f
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 590 100 590 0 0 2 0 0:04:55 0:03:35 0:01:20 131
{"data":[{"type":"kafka","description":"A Kafka cluster","labels":[{"description":"ID of the Kafka cluster","key":"kafka.id"}]},{"type":"connector","description":"A Kafka Connector","labels":[{"description":"ID of the connector","key":"connector.id"}]},{"type":"ksql","description":"A ksqlDB application","labels":[{"description":"ID of the ksqlDB application","key":"ksql.id"}]},{"type":"schema_registry","description":"A schema registry","labels":[{"description":"ID of the schema registry","key":"schema_registry.id"}]}],"meta":{"pagination":{"page_size":100,"total_size":4}},"links":{}}%
and sometimes:
› kubectl logs -n grafana test-pod-name -f
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:04:17 --:--:-- 0
curl: (28) Failed to connect to api.telemetry.confluent.cloud port 443: Connection timed out
Looks like it's taking too long and ends up timing out at times.
Any idea what could cause this in EKS?
As of 2021-09-02, ccloud-exporter exposes only endpoint localhost:2112/metrics
. When an HTTP request is made on this /metrics
endpoint, ccloud-exporter makes outgoing requests to Confluent Cloud Metrics API. Which is the normal and expected behaviour.
In the context of Kubernetes, when ccloud-exporter runs within a pod with livenessProbe
and readinessProbe
. As the /metrics
is the only endpoint exposed by ccloud-exporter, we might be attempted to use this endpoint to probe the readiness status of the ccloud-exporter
container.
As a result, each time the /metrics
endpoint is probed, and the probe frequency is high (every 5 seconds in this example). The probe request will trigger a collection of requests to Confluent Cloud Metrics API. The quick repeats of probing on the /metrics
endpoint will then exhaust the CCloud Metrics API rate limit of 50 requets / minute.
}
"Endpoint": "https://api.telemetry.confluent.cloud//v2/metrics/cloud/query",
"StatusCode": 429,
"body": "",
"level": "error",
"msg": "Received invalid response",
"time": "2021-09-02T14:36:40Z"
}
{
"error": "Received status code 429 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ()",
"level": "error",
"msg": "Query did not succeed",
... etc...
}
In the case of this example, the API rate limit error status 429 occurs within 15 seconds.
Then ccloud-exporter is stuck in an infinite loop of "StatusCode": 429. Because Kubernetes will endlessly probe the /metrics
endpoint to check the health of the pod.
Add a separate endpoint for self health-check. For example: localhost:2113/selfcheck
which returns OK if ccloud-exporter
is in good shape. This helps Kubernetes to manage the life cycle of the container. For example, to restart the container if it is stuck in a non-functional state.
livenessProbe
and readinessProbe
sections in the manifest below and deploy it on your Kubernetes cluster.CCLOUD_...
apiVersion: apps/v1
kind: Deployment
metadata:
name: ccloud-exporter
namespace: monitoring
labels:
app: ccloud-exporter
spec:
replicas: 1
selector:
matchLabels:
app: ccloud-exporter
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: ccloud-exporter
spec:
containers:
- name: ccloud-exporter
image: dabz/ccloudexporter:latest
imagePullPolicy: IfNotPresent
env:
- name: CCLOUD_API_KEY
value: CloudAPIKey?????
- name: CCLOUD_API_SECRET
value: CloudAPISecret?????
- name: CCLOUD_CLUSTER
value: lkc-?????
ports:
- name: metrics
containerPort: 2112
protocol: TCP
# livenessProbe:
# httpGet:
# path: /metrics
# port: metrics
# scheme: HTTP
# initialDelaySeconds: 30
# timeoutSeconds: 30
# periodSeconds: 15
# successThreshold: 1
# failureThreshold: 3
# readinessProbe:
# httpGet:
# path: /metrics
# port: metrics
# scheme: HTTP
# initialDelaySeconds: 30
# timeoutSeconds: 30
# periodSeconds: 5
# successThreshold: 1
# failureThreshold: 3
resources:
requests:
cpu: "250m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: ccloud-exporter-service
namespace: monitoring
labels:
app: ccloud-exporter
spec:
ports:
- name: metrics
protocol: TCP
port: 2112
targetPort: 2112
selector:
app: ccloud-exporter
V2 API added resource io.confluent.kafka.schema_registry
and an initial metric io.confluent.kafka.schema_registry/schema_count
. Exporter could collect this metric.
Sample query:
{
"aggregations": [
{
"agg": "SUM",
"metric": "io.confluent.kafka.schema_registry/schema_count"
}
],
"filter": {
"field": "resource.schema_registry.id",
"op": "EQ",
"value": "lsrc-xxxxx"
},
"granularity": "PT1H",
"intervals": [
"2021-02-23T11:00:00+11:00/P0Y0M0DT1H0M0S"
],
"group_by": [
"resource.schema_registry.id"
]
}
Example response/metric:
{
"aggregations": [
{
"agg": "SUM",
"metric": "io.confluent.kafka.schema_registry/schema_count"
}
],
"filter": {
"field": "resource.schema_registry.id",
"op": "EQ",
"value": "lsrc-xxxxx"
},
"granularity": "PT1H",
"intervals": [
"2021-02-23T11:00:00+11:00/P0Y0M0DT1H0M0S"
],
"group_by": [
"resource.schema_registry.id"
]
}
{
"data": [
{
"timestamp": "2021-02-23T00:00:00Z",
"value": 9.0,
"resource.schema_registry.id": "lsrc-rw6m7"
}
]
}
Please add a central LICENSE document declaring the whole repository under the MIT license.
For easier integration with k8s secrets?
Thanks for this project!
Hi,
I created a config.yml
file based on the default configuration in the README.md
page using :
rules:
- clusters:
- $CCLOUD_CLUSTER
On metrics fetch, I see errors as
Received status code 403 instead of 200 for POST on https://api.telemetry.confluent.cloud//v2/metrics/cloud/query ({\"errors\":[{\"status\":\"403\",\"detail\":\"Query must filter by at least one of your authorized resources
Running with docker-compose, I exec a shell in the ccloud_exporter and display environment variables :
$ env
...
CCLOUD_CLUSTER=lkc-xxxx1
...
When I set the real value (lkc-xxxx1) instead of the environment variable $CCLOUD_CLUSTER
, metrics are correctly fetched on the cluster.
It seems that you have more control over the timestamp of the metrics if you implement a custom collector. This could be useful in the case of Confluent Cloud as the latest data point might not be accurate and we might need to update old data points.
Hi,
this kind of error happened for active_connection_count metrics.
Received status code 400 instead of 200 for POST on https://api.telemetry.confluent.cloud/v1/metrics/cloud/query with {"aggregations":[{"agg":"SUM","metric":"io.confluent.kafka.server/active_connection_count"}],"filter":{"op":"AND","filters":[{"field":"metric.label.cluster_id","op":"EQ","value":"lkc-aaaaa"}]},"granularity":"PT1M","group_by":["metric.label.topic"],"intervals":["2020-03-17T09:38:25+01:00/2020-03-17T09:39:25+01:00"],"limit":1000}
According to Confluent Cloud support, you can remove "group_by":["metric.label.topic"]
while running 'docker-compose up -d' getting below .
I installed docker and docker compose , adding user to docker group .
Traceback (most recent call last):
File "urllib3/connectionpool.py", line 677, in urlopen
File "urllib3/connectionpool.py", line 392, in _make_request
File "http/client.py", line 1277, in request
File "http/client.py", line 1323, in _send_request
File "http/client.py", line 1272, in endheaders
File "http/client.py", line 1032, in _send_output
File "http/client.py", line 972, in send
File "docker/transport/unixconn.py", line 43, in connect
PermissionError: [Errno 13] Permission denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "requests/adapters.py", line 449, in send
File "urllib3/connectionpool.py", line 727, in urlopen
File "urllib3/util/retry.py", line 410, in increment
File "urllib3/packages/six.py", line 734, in reraise
File "urllib3/connectionpool.py", line 677, in urlopen
File "urllib3/connectionpool.py", line 392, in _make_request
File "http/client.py", line 1277, in request
File "http/client.py", line 1323, in _send_request
File "http/client.py", line 1272, in endheaders
File "http/client.py", line 1032, in _send_output
File "http/client.py", line 972, in send
File "docker/transport/unixconn.py", line 43, in connect
urllib3.exceptions.ProtocolError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "docker/api/client.py", line 214, in _retrieve_server_version
File "docker/api/daemon.py", line 181, in version
File "docker/utils/decorators.py", line 46, in inner
File "docker/api/client.py", line 237, in _get
File "requests/sessions.py", line 543, in get
File "requests/sessions.py", line 530, in request
File "requests/sessions.py", line 643, in send
File "requests/adapters.py", line 498, in send
requests.exceptions.ConnectionError: ('Connection aborted.', PermissionError(13, 'Permission denied'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "docker-compose", line 3, in
File "compose/cli/main.py", line 80, in main
File "compose/cli/main.py", line 189, in perform_command
File "compose/cli/command.py", line 70, in project_from_options
File "compose/cli/command.py", line 153, in get_project
File "compose/cli/docker_client.py", line 43, in get_client
File "compose/cli/docker_client.py", line 170, in docker_client
File "docker/api/client.py", line 197, in init
File "docker/api/client.py", line 222, in _retrieve_server_version
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))
[3257] Failed to execute script docker-compose
I'm seeing pretty regular gaps in the data, probably due to interval misses. I noticed here we're scraping between now and the previous minute:
https://github.com/Dabz/ccloudexporter/blob/master/cmd/internal/scrapper/query.go#L67-L76
If prometheus easily supports updating values, I'd recommend just widening the query window to 5 minutes so you get data points. Updates on the fly would also help with values that are still stabilizing (you'll notice that we expose data right away rather than waiting for late arrivals, then update on the fly).
This looks like a substitution error, but its very misleading to think that there is something wrong with the host name
{
"level": "info",
"msg": "Listening on http://:2112/metrics\n",
"time": "2021-07-22T18:34:31Z"
}
Recommend find and replace for scrapper -> scraper to fix the typo since it's used many places
It would be useful to be able to support exporting for multiple clusters. The Metrics API can do this by using an OR:
"filter": { "filters": [ { "field": "metric.label.cluster_id", "op": "EQ", "value": "lkc-XXXX1" }, { "field": "metric.label.cluster_id", "op": "EQ", "value": "lkc-XXXX2" } ], "op": "OR" },
Results from the query can also be grouped by both the cluster id and the topic name to avoid collisions of topic names across clusters:
"group_by": [ "metric.label.cluster_id", "metric.label.topic" ],
Finally, you can also use the new GROUPED
format by passing in the format
parameter to make the grouping more explicit if it is helpful.
Right now the list of metrics is hardcoded:
https://github.com/Dabz/ccloudexporter/blob/master/cmd/internal/scrapper/query.go#L71-L73
I recommend using the descriptors endpoint https://api.telemetry.confluent.cloud/v1/metrics/{dataset}/descriptors
An example on getting the available metrics here:
https://docs.confluent.io/current/cloud/metrics-api.html#list-the-available-metrics
Username/password authentication to the Metrics API was only supported in the preview phase. For GA release, only API key/secret authentication is officially supported.
CCLOUD_USER
and CCLOUD_PASSSWORD
environment variables to CCLOUD_APIKEY
and CCLOUD_APISECRET
respectively
The number of metrics available in Confluent Cloud Metrics API is increasing every day. As we have more and more data, exposing all of them does not make sense. Instead, we should:
Currently, ccloudexporter allows filtering based on topics specified.
It should also allow an exclusion list, where the listed topics are excluded from the prometheus metrics endpoint.
When the query is filtering to a single metric.label.cluster_id
:
"filter" : {
"op": "EQ",
"field": "metric.label.cluster_id",
"value": "lkc-12345"
}
there is no need to also specify a group_by
, since we know that all results have the same cluster_id (lkc-12345
in this example).
"group_by": [
"metric.label.cluster_id"
]
This superfluous group_by causes the query to be more expensive on the backend. We can explore optimizing this out on the backend, but it is difficult to do since the filter can contain arbitrarily complex boolean expressions.
Some of the topics on ccloud are listed with the production and consumption as null .. these are not fetched.. how can we fetch these as well?
The API now also exposes numbers for successful authentications
Currently, the Metric API does not expose the consumer lag. But we could retrieve it in multiple ways, e.g. the exporter could rely on the Admin API to expose it.
The User-Agent should be specified: the format should be ccloudexporter/<commit version>
. This, in order to help the Confluent Cloud team identify the origin of requests and have a way to easily trace the source of unusual workload.
Hi
Using this code, i tried to connect confluent using the docker-compose method. This code is able to connect and pull the metrics only when I am able to pass CCLOUD_CLUSTER with a single cluster only. Having the following issues and need resolve.
Appreciate if anyone able to help me with this.
ccloud_exporter container logs below,
{
"error": "Get \"https://api.telemetry.confluent.cloud/v2/metrics/cloud/descriptors/resources\": dial tcp: lookup api.telemetry.confluent.cloud on 127.0.0.11:53: read udp 127.0.0.1:57113-\u003e127.0.0.11:53: i/o timeout",
"level": "fatal",
"msg": "HTTP query for the descriptor endpoint failed",
"time": "2021-09-21T14:12:54Z"
}
I tried to increase the timeout to 120 seconds as provided in README.md but no luck.
flag.IntVar(&Context.HTTPTimeout, "timeout", 120, "Timeout, in second, to use for all REST call with the Metric API")
Thanks in advance!
During a recent vulnerability scan we run internally this was identified in the ccloudexporter
binary.
Could I ask for a fix for this please?
{
"Target": "ccloudexporter",
"Type": "gobinary",
"Vulnerabilities": [
{
"VulnerabilityID": "CVE-2019-11254",
"PkgName": "gopkg.in/yaml.v2",
"InstalledVersion": "v2.2.5",
"FixedVersion": "v2.2.8",
"Layer": {
"DiffID": "sha256:c87148c01e568bde3a58ce90550eb43596a0d9c36bb0bfcb25d31df097c8439f"
},
"SeveritySource": "nvd",
"PrimaryURL": "https://nvd.nist.gov/vuln/detail/CVE-2019-11254",
"Title": "kubernetes: Denial of service in API server via crafted YAML payloads by authorized users",
"Description": "The Kubernetes API Server component in versions 1.1-1.14, and versions prior to 1.15.10, 1.16.7 and 1.17.3 allows an authorized user who sends malicious YAML payloads to cause the kube-apiserver to consume excessive CPU cycles while parsing YAML.",
"Severity": "MEDIUM",
"CVSS": {
"nvd": {
"V2Vector": "AV:N/AC:L/Au:S/C:N/I:N/A:P",
"V3Vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"V2Score": 4,
"V3Score": 6.5
},
"redhat": {
"V3Vector": "CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H",
"V3Score": 6.5
}
},
"References": [
"https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-11254",
"https://github.com/kubernetes/kubernetes/issues/89535",
"https://groups.google.com/d/msg/kubernetes-announce/ALL9s73E5ck/4yHe8J-PBAAJ",
"https://groups.google.com/forum/#!topic/kubernetes-security-announce/wuwEwZigXBc",
"https://linux.oracle.com/cve/CVE-2019-11254.html",
"https://linux.oracle.com/errata/ELSA-2020-5653.html",
"https://security.netapp.com/advisory/ntap-20200413-0003/"
],
"PublishedDate": "2020-04-01T21:15:00Z",
"LastModifiedDate": "2020-10-02T17:37:00Z"
}
]
}
Even though this has only 2 endpoints it would be nice to add swagger.yaml to the repo and as a path. This would formalise the API and also make automations easier
The query interval timeFrom
is taken by applying the configured delay to time.Now()
Instead, the start time should be time.Now()
with the seconds truncated (i.e. rounded down to the nearest minute). Since the Metrics API only stores data at minutely granularity, using time.Now()
is effectively rounding up to the next minute, which makes the effective delay less than the configured delay.
For example:
120 seconds
00:10:05
00:08:05 / PT1M
00:09:00 / PT1M
(only metrics with timestamp 00:09:00
will be matched)65 seconds
(00:10:05 - 00:09:00)A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.