Giter Site home page Giter Site logo

grafana / xk6-output-prometheus-remote Goto Github PK

View Code? Open in Web Editor NEW
150.0 144.0 69.0 5.74 MB

k6 extension to output real-time test metrics using Prometheus Remote Write.

License: GNU Affero General Public License v3.0

Go 96.89% Dockerfile 0.77% Makefile 2.34%
grafana-dashboard k6 k6-output prometheus remote-write xk6

xk6-output-prometheus-remote's Introduction

xk6-output-prometheus-remote

The xk6-output-prometheus-remote extension allows you to publish test-run metrics to Prometheus via Remote Write endpoint.

⚠️ Be careful not to confuse this with the Prometheus Remote Write client extension which is used for load and performance testing of Prometheus itself.

As of k6 v0.42.0, this extension is available within k6 as an experimental module. This means that the extension is in the process of being fully merged into the core of k6 and doesn't require a special build with xk6 to use this feature. For further details, read the extension graduation guide.

Usage

Consult the Prometheus remote write guide in the k6 docs to explore the various methods and options for sending k6 metrics to a Prometheus remote-write endpoint.

Development

For developing or testing this extension, you can build a k6 binary with the local extension using xk6 with the following steps:

xk6 build --with github.com/grafana/xk6-output-prometheus-remote=. 

For more details, refer to the k6 docs:

Dashboards

 

This repo contains the source code of two Grafana dashboards designed to visualize test results: k6 Prometheus and k6 Prometheus (Native Histograms).

Visit the documentation to learn more about these dashboards. You can import them to your Grafana instance or with the docker-compose example on this repo.

🌟 Special thanks to jwcastillo for his contributions and dedication to improving the dashboards.

xk6-output-prometheus-remote's People

Contributors

7olstoy avatar arukiidou avatar chr15murray avatar codebien avatar javaducky avatar johncming avatar jwcastillo avatar mem avatar mstoykov avatar na-- avatar olegbespalov avatar oleiade avatar pablochacin avatar pkwarren avatar ppcano avatar robholland avatar sea-you avatar yorugac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xk6-output-prometheus-remote's Issues

Custom tags

Is it possible to add custom tags to default metrics? example, runId or app_name so that can filter by it if there are multiple different ran it the same time.

OAuth Authentication

Currently Basic Authentication is supported. Does OAuth2 support exist? Not all Prometheus instances can enable Basic Auth, and using a OAuth2 is considered a better practice.

K6 Remote write took 5.1603032s while flush period is 1s. Some samples m ay be dropped.

Brief summary

When using the K6 Prometheus Remote Write Extension samples are being dropped under loads of 700TPS
No custom tags are being used and Prometheus doesn't appear to be under CPU or Memory stress
The same problem occurs if we remote write to Mimir.

k6 version

k6 = v0.38.0 extension = v0.0.2

OS

Windows

Docker version and image (if applicable)

No response

Steps to reproduce the problem

Simple K6 test using the K6 prometheus remote write extension.

import http from 'k6/http';
import { sleep } from 'k6';

export default function () {

  let res1 = http.get('http://simple apache endpoint',);
  sleep(.0001)
}
{
    "stages": [
		{
		"duration": "10s",
		"target": 5
		},
		{
		"duration": "600s",
		"target": 5
		},
		{
		"duration": "10s",
		"target": 1
		}
			],
  
  "noConnectionReuse": true,
  "userAgent": "MyK6UserAgentString/1.0"
}

Expected behaviour

Samples should be written out within 1 sec flush period for the tested TPS. We would like to run error free at 1200TPS

Actual behaviour

WARN[0432] Remote write took 5.1603032s while flush period is 1s. Some samples may be dropped. nts=150005

Can't get example dashboards to fully function

I've been strugeling for a couple of days to get the provided Grafana dashboard dashboard-results.json to function fully.
After reading through the closed issues I've finally discovered I need to pass K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM=true to k6 but that hasn't solved it for me.
I do have the correct version of Prometheus and have --enable-feature=native-histograms set. I'm receiving no errors in the Grafana dashboard now so I think everything is set up as it should be.

We are running Grafana 9.3.6 and Prometheus 2.42.0

Here's my thoughts / questions in a nice list:

  • The README.md should be updated with better instructions on what all needs to be set up or at minimum a link to https://k6.io/docs/results-output/real-time/prometheus-remote-write/#about-metrics-mapping
  • What is testid? Nothing ever appears in that list and I can't find it anywhere in the k6 docs. All the queries seem to rely on it but I don't know what it is or where it comes from or if there is something I should be setting.
  • P95 Response time is always 0
  • Selecting a URL in the URL drop down does not seem to do anything, i.e. data for unselected urls still show on the graphs
  • My main graph only has "Active VUs" and "Request Rate" whereas your screenshots also have "Failed request rate" and "Response Time" I could see failed request rate not showing as I have no failed requests, but response time is kind of important.
  • What am I missing?

See below for a screenshot of my dashboard

image

Setup CLA

Add CLA to the repo before any potential outside contributions.

K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM doesn't generate histogram

When K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM is set to true no histogram is generated at all:
https://imgur.com/a/HRes6iz

I'm using k6-operator with following docker image:
ghcr.io/grafana/operator:latest-runner (at time of my testing it was:
ghcr.io/grafana/operator@sha256:3f33890b2c94f55d8d886ee184dde69025dc2357970b63f0f837a4659b812aa4)
and this is crd config:

apiVersion: k6.io/v1alpha1
kind: K6
metadata:
  name: k6-test
spec:
  parallelism: 4
  script:
    configMap:
      name: k6-test
      file: test.js
  arguments: --out experimental-prometheus-rw --tag testid=k6-test
  runner:
    env:
      - name: K6_PROMETHEUS_RW_SERVER_URL
        value: http://mimir-gateway.monitoring.svc.cluster.local/api/v1/push
      - name: K6_PROMETHEUS_RW_PUSH_INTERVAL
        value: 1s
      - name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
        value: 'true'

Stale marker breaks metric

Unfortunately, @javaducky discovered that the latest version doesn't report accurate metrics. The issue seems generated from the latest added feature (stale marker).

For this reason, if the test is short, the time series cannot return any data point on a query. For example, using the Instant vector selector k6_http_reqs_total with some time frame and min step values it could not show the right total.

The suggestion is to roll back to v0.0.7 until a fix is available.

metrics k6_http_req_duration_seconds

Hi, I can't find the k6_http_req_duration_seconds metric, there are only metrics with 99 percental.
How do I get the k6_http_req_duration_seconds metric now?

Occasional "duplicate sample for timestamp" errors

I use the latest version of the extension with 0.35.0 k6. Occasionally during the test I get the following error:

ERRO[0370] Failed to store timeseries. error="server returned HTTP status 400 Bad Request: duplicate sample for timestamp"

The errors are not terribly frequent, but not rare either - I've got about 25 errors after 10 minutes of the test run with 255 VUs generating the load.

Is it possible to somehow mitigate the issue so that the samples are not dropped? Also, is it possible to asses the number of duplicate samples, i.e. how many of them were not written because of the error?

Improve Grafana dashboard in the example

Improve upon the currently provided dashboard (example/dashboards/default.json).

Another goal for this change would be to align the look-and-feel for the Prometheus-backed dashboard (this extension), the xk6-output-influxdb, and xk6-output-timescaledb possibly with the output provided by the k6 Cloud App. This would provide a consistent user experience for the varied data sources.

Ideally, this dashboard would be promoted to the Grafana site as the "official" dashboard for displaying k6 metrics backed by Prometheus/Cortex/Mimir datasources.

Better API for HTTP Headers

k6 does not support the most used case of the current API for setting the custom HTTP Headers, which is the Cortex authentication. See grafana/k6#2904 for details.

The suggestion is to deprecate the current API and replace it with a fixed environment variable name supported by k6. (e.g. K6_PROMETHEUES_RW_HTTP_HEADERS and use the value for the key-values binding. k6 uses the same approach for the tags' customization in the tagsInfluxDBv1 output.

K6_PROMETHEUES_RW_HTTP_HEADERS=X-Scope-OrgID:<org>,X-ANOTHER_HEADER:val

Filtering metrics by tags

To avoid quickly overshooting server side limits, it would be good to have configurable option on how many metrics and how many labels are sent via configuration and perhaps, which ones are sent in the first place.

Partially related k6 issue

Incorrect data saved to prometheus

Hi

I have a problem with prometheus. When I run script with saving metrics to prometheus I see aproximately 5 min of fake data at the end of test run metrics.

It repeats on your sample from readme file. If you go to prometheus after test.js script execution you'll see that you have 5 minutes of data after the data for 10 seconds your test script saved.

2022-08-31_16-56-33

You can see this fake data also using grafana dashboard from your example.

2022-08-31_17-02-33

Repeated in 0.38, 0.39 version of k6

Thank you for your job!

`K6_KEEP_URL_TAG` option not working in `v0.0.4`

Hi, thank you for this excellent extension.

We've recently tried upgrading to v0.0.4 as there was a bug fix that we need in k6 v0.40.0.

We were using the K6_KEEP_URL_TAG option, set to false as we have a fair few urls that incorporate a uuid. This was working in v0.0.3.

Since the upgrade, we have found that the url tag is arriving in prometheus again, despite leaving K6_KEEP_URL_TAG="false" set. We've also tried K6_KEEP_URL_TAG=false as I noticed that the option is expecting a Bool but no luck there either.

Would you be able to offer any help? Thanks!

Multiple remote write endpoints

We have a replicated (HA) Prometheus setup in our cluster, so when using remote write to push metrics into Prometheus, we have to push to all replicas in parallel—otherwise, only one replica will have the pushed metrics, and you have a random chance of seeing them or not depending on which Prometheus you happen to be routed to when querying.

For this reason Prometheus remote write typically allows you to specify a list of endpoints to push to (see upstream docs where remote_write is an array). At the moment it seems xk6-output-prometheus-remote only supports a single remote write endpoint (from K6_PROMETHEUS_REMOTE_URL). Could we have some way of submitting a list instead?

Strange behavior of the dashboard.

Firstly thanks for sharing these great dashboards for visualization. They look awesome.
On the other hand, I saw strange behavior while I was testing the dashboards with my test cases, and I do have question about the data accuracy as the data reporting on the dashboards doesn't look to align with the test result on the command line.

Let's have the 1st metric "Request Made" on the "Test Result" dashboard as an example. There were 2 values reporting, which is quite confusing. And neither of these values reflected the accurate number of requests being generated over the test case.
image
And if you take a look at the P95 Response Time metric on the dashboard, it was 3x faster than the p95 response time reported in the test summary on command line side.
image

Guidance for remote write time > flush period

Hey again

During our load testing we are hitting the Remote write took Ys while flush period is Xs log message and so samples are likely being dropped. In our setup we are writing directly to Prometheus with the remote-write-receiver feature.

I noticed on the README that this sentence refers to the remote_write.queue_config for tuning.

Depending on exact setup, it may be necessary to configure Prometheus and / or remote-write agent to handle the load. For example, see queue_config parameter of Prometheus.

However, this configuration can only be applied when Prometheus (or the target agent) is configured for publishing to a remote write endpoint; since queue_config is a subset of remote_write, where remote_write.url is a required field.

Is my understanding of this correct?

For our use case, we don't necessarily need the metrics in real time. Would it be possible to have the k6 metrics inserted sequentially so that if remote write receiver latency > flush period, the extension would keep hold of samples until all are published?

Thanks!

Unable to build on latest version

It seems that k6 has some package restructure

Error

go: downloading github.com/hashicorp/golang-lru v0.5.4
go: downloading github.com/Azure/go-autorest/autorest/validation v0.3.1
go: downloading github.com/Azure/go-autorest/autorest/to v0.4.0
go: finding module for package go.k6.io/k6/stats
k6 imports
	github.com/grafana/xk6-output-prometheus-remote imports
	github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite imports
	go.k6.io/k6/stats: module go.k6.io/k6@latest found (v0.38.0), but does not contain package go.k6.io/k6/stats
2022/05/05 11:23:50 [INFO] Cleaning up temporary folder: /tmp/buildenv_2022-05-05-1122.130079637
2022/05/05 11:23:50 [FATAL] exit status 1
1 error occurred:
	* Status: The command '/bin/sh -c CGO_ENABLED=0 GOPRIVATE="go.k6.io/k6" xk6 build     --with github.com/grafana/xk6-output-prometheus-remote=.     --output /tmp/k6' returned a non-zero code: 1, Code: 1

Bug: incorrect grouping of metrics during aggregation

Described in community forum: https://community.k6.io/t/counters-not-starting-from-zero/2737
Both actual and expected behavior are included in the above description.

The cause: internal aggregation of metrics in the extension is happening by names of metrics and doesn't take labels into account. Ideally, this should be resolved by grafana/k6#1831. However, that issue may take quite a lot of time to resolve fully, so it'd be good if a quicker solution can be added here first. E.g. once some design solutions proposed to grafana/k6#1831, it might make sense to evaluate them here, in the extension first.

Currently considered blocked by grafana/k6#1831 and should be re-visited upon progress there.

k6 standard output results are wrong when prometheus output is enabled

Hi! I've discovered some strange behaviour when prometheus output is enabled. The final test results seem to be wrong when prometheus output is enabled and the difference is quite substantial:

Example scenario:

import http from 'k6/http';
import { sleep } from 'k6';

export default function () {
  http.get('https://example.com');
  sleep(1);
  http.get('https://example.com/foo');
  sleep(1);
  http.get('https://example.com/bar');
  sleep(1);
}

Settings for k6:

  • virtual users: 1
  • duration: 10s
  • k6 built from the latest main Dockerfile
  • env variable: K6_PROMETHEUS_REMOTE_URL=http://prometheus:9090/api/v1/write
  • the same binary/environment used in both cases below

Assumptions

  • the target site is very simple, it responds almost immediately
  • having in mind the previous point, the average iteration time would be ~3 seconds and during the 10s duration of the whole test we should see ~3-4 iterations completed
  • each iteration consists of 3 http requests, so overall number of requests would be ranging from 9 to 12

Case #1 - running without the prometheus output enabled

Command:

k6 run --vus 1 --duration 10s --tag source=example-site-test.js -- example-site-test.js

Output:

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

  execution: local
     script: example-site-test.js
     output: -

  scenarios: (100.00%) 1 scenario, 1 max VUs, 40s max duration (incl. graceful stop):
           * default: 1 looping VUs for 10s (gracefulStop: 30s)


running (01.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  10% ] 1 VUs  01.0s/10s

running (02.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  20% ] 1 VUs  02.0s/10s

running (03.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  30% ] 1 VUs  03.0s/10s

running (04.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  40% ] 1 VUs  04.0s/10s

running (05.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  50% ] 1 VUs  05.0s/10s

running (06.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  60% ] 1 VUs  06.0s/10s

running (07.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  70% ] 1 VUs  07.0s/10s

running (08.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  80% ] 1 VUs  08.0s/10s

running (09.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  90% ] 1 VUs  09.0s/10s

running (10.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default   [ 100% ] 1 VUs  10.0s/10s

running (11.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default ↓ [ 100% ] 1 VUs  10s

running (12.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default ↓ [ 100% ] 1 VUs  10s

running (12.1s), 0/1 VUs, 4 complete and 0 interrupted iterations
default ✓ [ 100% ] 1 VUs  10s

     data_received..................: 17 kB 1.4 kB/s
     data_sent......................: 988 B 82 B/s
     http_req_blocked...............: avg=4.86ms   min=481ns   med=821ns    max=58.37ms  p(90)=1.12µs   p(95)=26.26ms 
     http_req_connecting............: avg=116.7µs  min=0s      med=0s       max=1.4ms    p(90)=0s       p(95)=630.22µs
     http_req_duration..............: avg=3.98ms   min=3.14ms  med=3.89ms   max=4.91ms   p(90)=4.8ms    p(95)=4.89ms  
       { expected_response:true }...: avg=3.98ms   min=3.14ms  med=3.89ms   max=4.91ms   p(90)=4.8ms    p(95)=4.89ms  
     http_req_failed................: 0.00% ✓ 0        ✗ 12 
     http_req_receiving.............: avg=84.46µs  min=65.65µs med=79.99µs  max=136.04µs p(90)=92.44µs  p(95)=112.27µs
     http_req_sending...............: avg=117.82µs min=95.55µs med=110.21µs max=200.26µs p(90)=125.14µs p(95)=159.04µs
     http_req_tls_handshaking.......: avg=1.69ms   min=0s      med=0s       max=20.32ms  p(90)=0s       p(95)=9.14ms  
     http_req_waiting...............: avg=3.78ms   min=2.93ms  med=3.7ms    max=4.69ms   p(90)=4.5ms    p(95)=4.62ms  
     http_reqs......................: 12    0.990406/s
     iteration_duration.............: avg=3.02s    min=3.01s   med=3.01s    max=3.07s    p(90)=3.05s    p(95)=3.06s   
     iterations.....................: 4     0.330135/s
     vus............................: 1     min=1      max=1
     vus_max........................: 1     min=1      max=1

Everything's correct - we have 4 iterations done, 12 http requests, it looks ok!

Case #2 - running with the prometheus output enabled

Command:

k6 run --vus 1 --duration 10s -o output-prometheus-remote --tag source=example-site-test.js -- example-site-test.js

Output:

          /\      |‾‾| /‾‾/   /‾‾/   
     /\  /  \     |  |/  /   /  /    
    /  \/    \    |     (   /   ‾‾\  
   /          \   |  |\  \ |  (‾)  | 
  / __________ \  |__| \__\ \_____/ .io

time="2022-05-31T12:06:19Z" level=info msg="Prometheus: configuring remote-write with prometheus mapping"
  execution: local
     script: example-site-test.js
     output: Output k6 metrics to prometheus remote-write endpoint

  scenarios: (100.00%) 1 scenario, 1 max VUs, 40s max duration (incl. graceful stop):
           * default: 1 looping VUs for 10s (gracefulStop: 30s)


running (01.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  10% ] 1 VUs  01.0s/10s

running (02.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  20% ] 1 VUs  02.0s/10s

running (03.0s), 1/1 VUs, 0 complete and 0 interrupted iterations
default   [  30% ] 1 VUs  03.0s/10s

running (04.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  40% ] 1 VUs  04.0s/10s

running (05.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  50% ] 1 VUs  05.0s/10s

running (06.0s), 1/1 VUs, 1 complete and 0 interrupted iterations
default   [  60% ] 1 VUs  06.0s/10s

running (07.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  70% ] 1 VUs  07.0s/10s

running (08.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  80% ] 1 VUs  08.0s/10s

running (09.0s), 1/1 VUs, 2 complete and 0 interrupted iterations
default   [  90% ] 1 VUs  09.0s/10s

running (10.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default   [ 100% ] 1 VUs  10.0s/10s

running (11.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default ↓ [ 100% ] 1 VUs  10s

running (12.0s), 1/1 VUs, 3 complete and 0 interrupted iterations
default ↓ [ 100% ] 1 VUs  10s

running (12.1s), 0/1 VUs, 4 complete and 0 interrupted iterations
default ✓ [ 100% ] 1 VUs  10s

     data_received..................: 30 kB  2.5 kB/s
     data_sent......................: 1.9 kB 154 B/s
     http_req_blocked...............: avg=3.91ms   min=625ns   med=928ns    max=46.98ms  p(90)=1.2µs   p(95)=39.94ms 
     http_req_connecting............: avg=87.27µs  min=0s      med=0s       max=1.04ms   p(90)=0s      p(95)=890.15µs
     http_req_duration..............: avg=5.09ms   min=3.53ms  med=4.54ms   max=7.65ms   p(90)=7.64ms  p(95)=7.65ms  
       { expected_response:true }...: avg=5.09ms   min=3.53ms  med=4.54ms   max=7.65ms   p(90)=7.54ms  p(95)=7.64ms  
     http_req_failed................: 0.00%  ✓ 0        ✗ 24 
     http_req_receiving.............: avg=86.17µs  min=64.84µs med=80.54µs  max=153.48µs p(90)=96.78µs p(95)=144.98µs
     http_req_sending...............: avg=120.22µs min=82.75µs med=115.69µs max=187.3µs  p(90)=149.8µs p(95)=181.67µs
     http_req_tls_handshaking.......: avg=2.12ms   min=0s      med=0s       max=25.44ms  p(90)=0s      p(95)=21.62ms 
     http_req_waiting...............: avg=4.89ms   min=3.36ms  med=4.37ms   max=7.45ms   p(90)=7.3ms   p(95)=7.42ms  
     http_reqs......................: 24     1.980191/s
     iteration_duration.............: avg=3.03s    min=3.01s   med=3.02s    max=3.06s    p(90)=3.06s   p(95)=3.06s   
     iterations.....................: 7      0.577556/s
     vus............................: 1      min=1      max=1
     vus_max........................: 1      min=1      max=1

Now it's getting weird - the final result shows 7 completed iterations (despite the fact that "progress bar" stopped at 4), 24 http requests, rates are also different. For some of the metrics there it looks like they're doubled (including data_received and data_sent) BUT in both cases (prometheus output enabled/disabled) the target service got the same number of requests - 12 (I checked that in logs of the target service).

I don't know what I'm doing wrong, could you check that please?

Add instance label

Is it possible to add an instance label to each metric?
Prometheus fails to record metrics correctly when multiple k6 write to a single Prometheus.

Improve metrics processing and aggregation

The extension must send samples with values that would be understandable to remote agent. To achieve that, k6 aggregation methods of Sink interface are used. This leads to the following problems:

  1. Sink's Add and Format methods are not very efficient; additionally, Trend's Sink is not fit for such usage at all so ad-hoc optimizations must be used as is done here (related k6 issue)
  2. attempt to group samples for each metric into TimeSeries leads to losing data in tags / labels (related k6 issue)
  3. converting all samples into separate TimeSeries results, at best, in hitting early whatever usage limit is currently set at Prometheus

Since adding any kind of further pre-processing here might clash with the future work on Metric Refactoring at k6, improving these can be considered blocked for now.

Can we use this extension with pushgateway

I tried using this extension with pushgateway and got this error server returned HTTP status 405 Method Not Allowed: Method Not Allowed

if this extension does not work with pushgateway then is there any other way to send metrics using pushgateway?

Unable build prometheus-remote extension with xk6

go version go1.19.1

2022/11/09 20:35:04 [INFO] exec (timeout=0s): /usr/local/go/bin/go mod tidy -compat=1.17
2022/11/09 20:35:05 [INFO] exec (timeout=0s): /usr/local/go/bin/go build -o /usr/local/bin/k6 -ldflags -w -s -trimpath
# github.com/grafana/xk6-output-prometheus-remote/pkg/remotewrite
../../../go/pkg/mod/github.com/grafana/[email protected]/pkg/remotewrite/remotewrite.go:286:18: undefined: metrics.SampleTags
../../../go/pkg/mod/github.com/grafana/[email protected]/pkg/remotewrite/remotewrite.go:291:59: undefined: metrics.SampleTags
../../../go/pkg/mod/github.com/grafana/[email protected]/pkg/remotewrite/prometheus.go:12:27: undefined: metrics.SampleTags
2022/11/09 20:36:17 [INFO] Cleaning up temporary folder: /Users/srinivasan/mywork/k6build/buildenv_2022-11-09-2032.682437367
2022/11/09 20:36:17 [FATAL] exit status 2

#56 (comment)

Http requests count double with 'constant-arrival-rate' executor

I'm using the latest v0.38.0 version of the k6 with the xk6-output-prometheus-remote extension.
I've tried several scenarios with the constant-arrival-rate executor and always got the same results. The iterations metric was counted two times more than expected.
I've created the simple Counter to prove my theory and got the same 'doubled' results.

Here is an example script:

import http from 'k6/http';
import {Counter} from 'k6/metrics';

export const options = {
    linger: true,
    scenarios: {
        contacts: {
            executor: 'constant-arrival-rate',
            duration: '30s',
            rate: 10,
            timeUnit: '1s',
            preAllocatedVUs: 1,
            maxVUs: 150,
        },
    },
};


const durationCounter = new Counter('dur_counter');

export default function () {

    var res = http.get('http://test.k6.io');

    if (res.status == 200) {
        durationCounter.add(1);
    }
}

Here are the results
Screenshot 2022-06-05 at 21 46 44

The number of HTTP requests sent to the server is correct, I've checked on my local machine (10 qps).

High cardinality on URL labels with using the Xk6 extension for prometheus remote write

I am using the latest code to build the promremotewrite extension for K6 and have encountered some behaviour we don't see with K6 core code.

We are using K6 to performance test at a high rate with a large number of time series. We also have a range of labels specified.
What we see when we query Prometheus is a label called url which has a high number of values as we are using dynamic urls in our testing. We do not specify url as a label in the script. If we specify url it has no impact on this behaviour
Due to the high cardinality on this url label we get severe performance issues in Prometheus. (serveral hundre thousand values)
In the K6 core version we note that the name label needs to be specified as it defaults to the url value otherwise
But with the k6 prom extension we no longer get a name label although we have specified it and URL now appears as a label.
We tried overriding the URL label but it has no effect.

Expected behaviour
url does not appear as a label in the k6 metric as this can cause high cardinality issues with dynamic URLs
name label when specified in the k6 script should appear as a label in Prometheus

Observed behaviour
url appears as a label in the k6 metric when queried in Prometheus. This causes performance issues due to high cardinality
name label does not appear in Prometheus even though it is specified in the script

Sample script

import http from 'k6/http';
import { sleep } from 'k6';
import { Rate } from 'k6/metrics';
import { Counter } from 'k6/metrics';



export default function () {

  let res1 = http.get('http://metrics-observability-metric-engcoeliberty-dev-di1001.apps-int.di1001.cpaas.test/', {
    tags: { type: 'API Call' ,group: 'Login', name: 'Directory',scenario: 'Smoke Test1', name: 'name3'},
  });
  
  
  

  let responses = http.batch([
    [
      'GET',
      'http://metrics-observability-metric-engcoeliberty-dev-di1001.apps-int.di1001.cpaas.test/foo',
      null,
      { tags: { type: 'staticContent',group: 'Folder' ,name: 'Directory2', scenario: 'Smoke Test1', name: 'name1'} },
    ],
	[
      'GET',
      'http://metrics-observability-metric-engcoeliberty-dev-di1001.apps-int.di1001.cpaas.test/bar',
      null,
      { tags: { type: 'staticContent',group: 'WebServer',name: 'Dir List2', scenario: 'Batch Test1', name: 'name2' } },
    ],
    
  ]);

  sleep(1);
}

Add support to export Thresholds

With K6, using thresholds to indicate potential issues with load is a common practice with the load testing. Looking at the k6 summary in our logs, show whether the thresholds passed or failed. When using the prometheus output, the thresholds are not pushed up into prometheus, or at least we are not able to find them. This seems to be a big piece of testing information that is missing if we choose to output to Prometheus. We did try using the xk6-output-timescaledb output, and it writes the information to a thresholds table that is used in the sample dashboard. We prefer using PromQL, so we would like to be able to see the thresholds in Grafana. I believe we can manually setup thresholds in the dashboard, but that would be a static threshold that can't be defined in our k6 test script or option file. Are we missing something obvious, or are there plans to add thresholds in the future?

Investigate potential for concurrency processing

There are two main ways to add concurrency to the extension:

  1. concurrency at the level of processing metrics pre-request

Details: Output receives batches of metrics that must be iterated over and converted into remote write TimeSeries. This may seems as a natural point to add concurrency like this:

   samplesContainers := o.GetBufferedSamples()
   step := math.Floor(len(samplesContainers) / concurrencyLimit)

   for i := 0; i < concurrencyLimit; i++ {
      wg.Add(1)
      // get chunk of samplesContainers from i * step to (i+1) * step
      go func(...) {
         ...
         gatherpoint[i] = convertToTimeSeries(chunk)
         ...
      }(...)
   }
   wg.Wait()

   for i := 0; i < concurrencyLimit; i++ {
      allTS = append(allTS, gatherpoint[i]...)
   }

   // encode and send remote write request

But this processing must be done within 1 second of flush period. Basic experiments so far showed next to none improvement in trying to spawn goroutines within that time limit. This result will likely be impacted by changes from Metric Refactoring in k6 and might need more investigation.

  1. concurrency at remote write requests

Details: this is blocked by inability to compile TimeSeries (group samples). Attempt to send disjointed samples concurrently would only result in out of order errors.

Push Docker image

Hi folks! Why don't you want to push your docker images to the official registry? For example, I wanna export the result of K6 testing performance to Prometheus in Keptn and I must write the image name, I can't build it. Now I've upped a fork, but maybe you're gonna use GHA to push image?

filter the vus by scenarios

Each scenario can have a different number of vus, is it possible to add the scenario tag in the check metric?

Add option for metrics sampling

We've noticed that this plugin sends metrics to Prometheus that are sampled every 50ms. This results in an absolutely monumental amount of data volume being ingested, which overloads our system. All of our other services have metrics that are sampled at around 30s intervals. Even 1s would be fine, but 50ms is complete overkill since that level of detail will never show up in query results or a dashboard.

We've worked around the problem in our fork of xk6-output-prometheus-remote by applying the following diff:

--- a/kube/charts/k6/xk6-output-prometheus-remote/pkg/remotewrite/remotewrite.go
+++ b/kube/charts/k6/xk6-output-prometheus-remote/pkg/remotewrite/remotewrite.go
@@ -131,6 +131,8 @@ func (o *Output) flush() {
 func (o *Output) convertToTimeSeries(samplesContainers []stats.SampleContainer) []prompb.TimeSeries {
        promTimeSeries := make([]prompb.TimeSeries, 0)

+       seen := map[string]bool{}
+
        for _, samplesContainer := range samplesContainers {
                samples := samplesContainer.GetSamples()

@@ -148,9 +150,17 @@ func (o *Output) convertToTimeSeries(samplesContainers []stats.SampleContainer)

                        if newts, err := o.metrics.transform(o.mapping, sample, labels); err != nil {
                                o.logger.Error(err)
-                       } else {
+                       } else if !seen[sample.Metric.Name] {
                                promTimeSeries = append(promTimeSeries, newts...)
                        }
+
+                       // We only need 1 sample per metric per remote
+                       // write, not one every 50ms(!!). Can't do
+                       // this earlier in the function because
+                       // counters have to be computed using all
+                       // incoming samples. We just refrain from
+                       // sending the final value multiple times.
+                       seen[sample.Metric.Name] = true
                }

                // Do not blow up if remote endpoint is overloaded and responds too slowly.

This is clearly a hack, as it would be better to just set a specific time interval and have the sampling done intelligently, rather than hardcoding to once per remote write. Also, this implementation sends the oldest datapoint per interval, rather than the most recent which is probably what is desired. But, Prometheus simply can't manage with hundreds of thousands of time series generated by a 50ms sample interval, so we needed to do something to get the plugin working.

Would a change to make the sampling interval customizable be accepted upstream?

k6_http_req_failed is always 1

The metric k6_http_req_failed is always 1, rather than a counter, so we can't see a count via increase or rate.

There's a good chance I've misunderstood this metric - is it supposed to be a gauge indicating a scenario failure? Should we add our own error rate metric?

image

Build command:

go install go.k6.io/xk6/cmd/xk6@latest
xk6 build \
    --with github.com/grafana/xk6-output-prometheus-remote@latest

Version:

xk6-output-prometheus-remote 0.06

go: downloading go.k6.io/k6 v0.41.1-0.20221116104224-5fa71b761185

Update protobuf library

#49 is going to replace the current golang/protobuf with gogo/protobuf so it aligns with the library currently used by Prometheus. gogo has a good reputation for performance efficiency, but unfortunately, it has been deprecated so it would be better to switch to a more reliable library. planetscale/vtprotobuf could be a good alternative.

Benchmarks and tests should be added for a good comparison.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.