prometheus-community / avalanche Goto Github PK

Prometheus/OpenMetrics endpoint series generator for load testing.

License: Apache License 2.0

Go 97.18% Dockerfile 1.38% Makefile 1.44%

avalanche's Issues

Need information on setting the Username and Password for Configured Thanos RemoteWrite Url

Hi All,

I am using the below mentioned command for setting the configurations to send the data to thanos using remotewrite API.
docker run -p 9001:9001 quay.io/freshtracks.io/avalanche --metric-count=5 --label-count=2 --const-label=TenantId=tenant1 --const-label=ProductName=Pname --remote-url=http:///api/v1/receive

Need information on how to set the user name and password for the Thanos remote write URL to push the data to thanos.
When i checked the help options i dont see any option to set Username and Pwd .

Docker image is not update after remote url PR is merged

https://quay.io/repository/freshtracks.io/avalanche is not updated from last one year
the change I am looking for #5

Support custom HTTP headers

In addition to Tenant header, we need to add custom HTTP header when RemoteWrite request.
How about providing the feature to add custom HTTP Headers like below?

Flag Name

--remote-custom-header

Example

avalanche --remote-url "http://host/prom/push" \
        --remote-tenant "fake" \
        --remote-tenant-header "x-my-tenant" \
        --remote-custom-header "x-custom-header: test1" \
        --remote-custom-header "Authorization: Bearer abcdefg"

add process_cpu_seconds_total to metrics

It would be handy if avalanche included process_cpu_seconds_total in its metrics.

Example from prometheus/alertmanager/grafana metrics:

# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1.61

avalanche targets cannot be observed

I add the scrape_configs to prometheus configmap, checked that the configuration has been updated. But the new job always cannot be detect by Prometheus

Deployment file:

#create namespace avalanche
apiVersion: v1
kind: Namespace
metadata:
name: avalanche
spec:
finalizers:
-kubernetes

#create deployment avalanche
apiVersion: apps/v1
kind: Deployment
metadata:
name: avalanche
namespace: avalanche
labels:
name: avalanche
spec:
selector:
matchLabels:
app: avalanche
replicas: 1 # tells deployment to run 1 pods matching the template
template:
metadata:
labels:
app: avalanche
spec:
containers:
- name: pg-avalanche
image: quay.io/freshtracks.io/avalanche:latest
args:
- "--metric-count=1000"
- "--series-count=50"
- "--port=9001"
ports:
- containerPort: 9001

#create service avalanche-svc
apiVersion: v1
kind: Service
metadata:
name: avalanche-svc
namespace: avalanche
labels:
app: avalanche
spec:
ports:
# the port that this service should serve on
- port: 9001
targetPort: 9001
name: http-avalanche
type: ClusterIP
clusterIP: None
#label keys and values that must match in order to receive traffic for this service
selector:
app: avalanche

Scape_configs:

job_name: ft-avalanche
kubernetes_sd_configs:
- role: pod
  relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_avalanche_scrape]
  action: keep
  regex: true

High Churn from avalanche even though series-interval and metric-interval parameters are set high

Hi Team,

I wanted to understand why Avalanche produces such a high churn even though the series-interval and metric-interval is set high. The following are the parameters I have set in my kubernetes yaml which deploys 2 replicas of the avalanche:

"--metric-count=20"
"--series-count=100"
"--label-count=10"
"--port=9001"
"--value-interval=1"
"--series-interval=316000"
"--metric-interval=316000"
`

This basically ensures that there are 20 metrics with 100 series per metric. These exist without changing/cycling for 316000 seconds. What changes is the values for each of the metric-series combination which does so every second.

So essentially my understanding from this is that the churn should not be high as there is no creation of a new series every single scrape. as the series-metric combination remains the same, its just that the values change. Now what I seem to notice is that the churn is 2000 which means that every scrape has 2000 series added which is not what is the case.

I calculate the churn by querying Promethues with the following query -
topk(200, sum without(instance)(sum_over_time(scrape_series_added[1h])))
This will give a per-target churn value over the past hour, and aggregate it up ignoring the instance label and then find the biggest 200 churners
Reference Link- https://www.robustperception.io/finding-churning-targets-in-prometheus-with-scrape_series_added

My questions now are -

Why are the avalanche replicas creating this level of churn even though series and metrics are not created newly for every scrape. I really hope that this is not a limitation of Avalanche
What is the number of series per chunk in the head block in the case of Avalanche. I am not sure if that makes any difference because I seem to notice that on an average there is 1 chunk created for every 2 series (I have seen that for any other metrics generated by a non-avalanche load its about 1 chunk for every series). Not sure if this makes a difference to churn but I guess its going to make a difference to the samples/second of the remoteWrite to Thanos from Prometheus. Again not sure if this is the right forum to ask.

Please let me know if you need any further information from me on the same.

race condition in code causes creation of more timeseries than requested

Let's say if you run avalanche with the following arguments

     - --metric-count=1
      - --series-count=1
      - --label-count=1
      - --metricname-length=10
      - --labelname-length=10
      - --value-interval=60 # 1 min
      - --series-interval=300 # 5 min
      - --metric-interval=86400 # 1 day

The expectation is that there will be 1 timeseries and every 5 minutes old timeseries will be deleted and a new timeseries will be created. However, due to a race condition in the code, new timeseries will be created and old timeseries will be kept.

The race condition is between the goroutine for value tick and series tick. On the 5th minute, both will fire and create a race condition because series goroutine does 3 things which may interleave with the work that value goroutine does.

One way to fix the race condition will be to not use 3 goroutines but instead use a select.

remote_write feature documentation/expectations

First, thanks for making this, I'm in the early stages of a custom TSDB -> prometheus based TSDB evaluation for a platform that does ~9M metrics/sec currently, and load testing my ingestion path with something like this is going to be critical, so, thanks.

I'm checking our the remote_write feature, and I wanted to see if I've got things correct:

with the /metrics endpoint, avalanche will keep running, refreshing the metrics/labels/samples at the defiend interval, continuing to make metrics available for scraping and simulating some metric churn.

However when using the remote_write feature, its more of a "generate and run once" ? It seems avalanche will spin up, and flush the generated metrics to the remote_write url, if it takes long enough do some rotation of metrics/labels/samples, but once everything has been sent (or the request limit reached?) itll stop and shutdown, as opposed to continuously run like the. /metrics endpoint does?

Do I have that correct?

I'm looking to simulate a significant amount of remote_write volume, so i was hoping to spin up a few hundred instances of avalanche to keep sending metrics as opposed to scraping them all and remote_writing from there continuously.

any clarity on the expected use of the feature would be helpful, thanks!

Add usage documentation

Instead of linking to the blog post we should have basic usage instructions here itself. Because:

There is a discrepancy in the docker image url. Post suggests freshtracks.io/avalanche meanwhile the README gives prometheuscommunity/avalanche. I assume we want to suggest the latter.
The blogpost has images which cannot be copy pasted for convenience and must be manually written.
Editorial control should be maintained in the repo .

I'm happy to make a PR on this. I think we should suggest a compose file:

version: "3.0"
services:
  prometheus:
    container_name: prometheus
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./config/Prometheus.yml:/etc/prometheus/prometheus.yml
    networks:
      - prom
  avalanche:
    container_name: avalanche
    image: quay.io/prometheuscommunity/avalanche:main
    ports:
      - "9001:9001"
    networks:
      prom:
        aliases:
          - avalanche
networks:
  prom:
    name: prom

Add healthcheck

Hi, would be nice to have some health endpoint, implemented in #15

Usage message inaccurate?

https://github.com/open-fresh/avalanche/blob/3efdac332ba731e931020b80dcd16cb85b9bc081/cmd/avalanche.go#L23
I think the series-interval actually changes the "cycle_id" label value value, not the "series_id" value and the metric-interval changes the metric name itself, not a label.

Support Out-of-Order Requests

Since Prometheus supports Out-of-Order Metrics (out_of_order_time_window), it would be nice to generated load with data older than 2 (configurable) hours.

add option to run avalanche without refreshing values (e.g. let intervals take a value of zero)

I would like to run avalanche without it ever refreshing series_id nor __name__ label.
One way to specify this is with an interval value of zero.

$ ./cmd --series-interval=0 --metric-interval=0
panic: non-positive interval for NewTicker

goroutine 1 [running]:
time.NewTicker(0x0, 0xc0001aa500)
        /usr/lib/go-1.13/src/time/tick.go:23 +0x147
github.com/open-fresh/avalanche/metrics.RunMetrics(0xa, 0xa, 0xa, 0x5, 0x5, 0x1e, 0x0, 0x0, 0x0, 0x0, ...)
        /home/ubuntu/go/src/github.com/open-fresh/avalanche/metrics/serve.go:97 +0x64e
main.main()
        /home/ubuntu/go/src/github.com/open-fresh/avalanche/cmd/avalanche.go:44 +0x1a8

https://github.com/open-fresh/avalanche/blob/0c1c64c97f6dc5630ae5c057821cc8e08a12922b/cmd/avalanche.go#L24-L25

https://github.com/open-fresh/avalanche/blob/0c1c64c97f6dc5630ae5c057821cc8e08a12922b/metrics/serve.go#L97-L98

Deploy avalanche in a k8s cluster

Hi,

I have deployed Avalanche in a k8s cluster with only 1 replica. Prometheus, alertmanager and nodeexpoter pods are residing in the same k8s cluster. I want to see the CPU and MEM consumption with the current metrics and series setting. How can I do in the next step?
`#create deployment avalanche
apiVersion: apps/v1
kind: Deployment
metadata:
name: avalanche

namespace: avalanche

namespace: monitoring
labels:
name: avalanche
spec:
selector:
matchLabels:
app: avalanche
replicas: 1 # tells deployment to run 1 pods matching the template
template:
metadata:
labels:
app: avalanche
spec:
containers:
- name: pg-avalanche
image: quay.io/freshtracks.io/avalanche:latest
args:
- "--metric-count=1000"
- "--series-count=50"
- "--port=9001"
ports:
- containerPort: 9001

#create service avalanche-svc
apiVersion: v1
kind: Service
metadata:
name: avalanche-svc

namespace: avalanche

namespace: monitoring
labels:
app: avalanche
spec:
ports:
# the port that this service should serve on
- port: 9001
targetPort: 9001
nodePort: 32555
name: http-avalanche
type: NodePort

label keys and values that must match in order to receive traffic for this service

selector:
app: avalanche`

ARM support

The public docker image quay.io/freshtracks.io/avalanche lacks an ARM version.
standard_init_linux.go:228: exec user process caused: exec format error
I just built the project from master on an ARM machine. Works flawlessly.

exit gracefully if not enough memory

When I try to generate 100k metrics on a 1G VM, avalanche hangs and dies.
Also the log messages are delayed by ~1 min, i.e. don't really match the printed timestamp.

./cmd --metric-count=100000 --label-count=10 --series-count=10 --value-interval=10 --series-interval=36000 --metric-interval=36000 --port=9001
Serving ur metrics at localhost:9001/metrics
2021-09-30 21:35:44.801681073 +0000 UTC m=+30.791942268: refreshing metric values
2021-09-30 21:35:54.83536445 +0000 UTC m=+40.825625685: refreshing metric values
Killed

Docker container does not exists

This is not available:

docker run quay.io/prometheuscommunity/avalanche --help

The old one, referring freshtracks.io, works.

Have timeouts as configurable option

Currently, client does not have a configurable timeout, but it's defaulted to time.Minute()

It might make sense for some use cases where this number can be configured to different values.

Implemented in #61

k8s deployment

i'm getting error after deployment - Prometheus couldn't use the end point...

  /etc/prometheus/config_out $ wget http://avalanche-svc.avalanche.svc.cluster.local:9001
  Connecting to avalanche-svc.avalanche.svc.cluster.local:9001 (10.244.0.97:9001)
  wget: server returned error: HTTP/1.1 404 Not Found

However, I found the pod logs - looks like it's generating metrics.

  jerry@Jaes-MacBook-Pro performance % kubectl logs  avalanche-6cf47d949-dxq4n -n avalanche
  Serving ur metrics at localhost:9001/metrics
  2021-12-03 01:04:45.458781499 +0000 UTC m=+30.506091744: refreshing metric values
  2021-12-03 01:05:15.455382438 +0000 UTC m=+60.502692674: refreshing metric values
  2021-12-03 01:05:15.455389516 +0000 UTC m=+60.502699712: refreshing series cycle
  2021-12-03 01:05:45.455325735 +0000 UTC m=+90.502636129: refreshing metric values
  2021-12-03 01:06:15.455331837 +0000 UTC m=+120.502642080: refreshing metric values

this is my scrap config in Promethues.

    - job_name: avalanche
      scrape_interval: 1m
      scrape_timeout: 10s
      metrics_path: /metrics
      scheme: http
      static_configs:
        - targets:
          - http://avalanche-svc.avalanche.svc.cluster.local:9001

This is my deployment file.

        #create namespace avalanche
        apiVersion: v1
        kind: Namespace
        metadata:
          name: avalanche
        spec:
          finalizers:
          - kubernetes
        #create deployment avalanche
        apiVersion: apps/v1 
        kind: Deployment
        metadata:
          name: avalanche
          namespace: avalanche
          labels:
            name: avalanche
        spec:
          selector:
            matchLabels:
              app: avalanche
          replicas: 1 # tells deployment to run 1 pods matching the template
          template:
            metadata:
              labels:
                app: avalanche
            spec:
              containers:
              - name: pg-avalanche
                image: quay.io/freshtracks.io/avalanche:latest
                args:
                - "--metric-count=1000"
                - "--series-count=50"
                - "--label-count=10"
                - "--port=9001"
                ports:
                - containerPort: 9001
        ---
        #create service avalanche-svc
        apiVersion: v1
        kind: Service
        metadata:
          name: avalanche-svc
          namespace: avalanche
          labels:
            app: avalanche
        spec:
          ports:
            # the port that this service should serve on
            - port: 9001
              targetPort: 9001
              name: http-avalanchea
          type: ClusterIP
          clusterIP: None
          # label keys and values that must match in order to receive traffic for this service
          selector:
            app: avalanche

prometheus-community / avalanche Goto Github PK

avalanche's Issues

Flag Name

Example

namespace: avalanche

namespace: avalanche

label keys and values that must match in order to receive traffic for this service

Recommend Projects

Recommend Topics

Recommend Org