vmware-tanzu / k-bench Goto Github PK

View Code? Open in Web Editor NEW

377.0 377.0 56.0 631 KB

Workload Benchmark for Kubernetes

License: Other

Shell 19.60% Go 54.44% Python 25.93% Puppet 0.03%

k-bench's People

Contributors

Stargazers

Watchers

Forkers

marquesledivan danielmschmidt luwang-vmware winsopc sebastian-schmid lenhattan86 zorro786 ndn97 falgofrancis davefellows necatican eshnil2000 ryan-beisner juan-lee sasi1908 enyinna1234 vondowntown pratikdeoghare chuanyi-zjc ganesank-git jeffa5 elfotografo007 warmchang colinianking luqmanbarry k8s-libs andy-ldai kjm0001 shubhamrouniyar devopsonazure vipinms2 binttaa alex-matei neelakurunji prabhakar-oracle brburden edgelesssys chmey yhwlzq8663 kirillpushkarev busyboy77 ahaware seifrajhi weikaipan karan9797 inbaraj-s totallynottito hamidu68 kunjpatel1402 dmore scan-dev anantpednekar-nci ycchiranjeevi doraa7 sandesh-initializ pwschuurman

k-bench's Issues

Add cloud vendors to infrastructure package

It would be a great feature to add CRUD operations for managed K8s clusters (e.g AWS, Azure, etc.)

unable to run k-bench for GKE clusters

Running in to the following issue while running k-bench against GKE.
``
tceuser@tkg-cli-client:~/k-bench$ ./run.sh -r gke_notap -t all
Running test command_in_container_predicate and results redirected to "./results_gke_notap_15-Dec-2021-08-58-54-am/command_in_container_predicate"
Starting benchmark, writing logs to results_gke_notap_15-Dec-2021-08-58-54-am/command_in_container_predicate/kbench.log...
Running workload, please check kbench log for details...
panic: no Auth Provider found for name "gcp"

goroutine 1 [running]:
k-bench/util.Run(0xc000206800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/go/src/k-bench/util/testdriver.go:164 +0x22f1
main.main()
/root/go/src/k-bench/cmd/kbench.go:193 +0xe76
Running test command_outside_container_and_resource_predicate and results redirected to "./results_gke_notap_15-Dec-2021-08-58-54-am/command_outside_container_and_resource_predicate"
Starting benchmark, writing logs to results_gke_notap_15-Dec-2021-08-58-54-am/command_outside_container_and_resource_predicate/kbench.log...
Running workload, please check kbench log for details...
panic: no Auth Provider found for name "gcp"

goroutine 1 [running]:
k-bench/util.Run(0xc00023a800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/go/src/k-bench/util/testdriver.go:164 +0x22f1
main.main()
/root/go/src/k-bench/cmd/kbench.go:193 +0xe76
Running test cp_heavy_12client and results redirected to "./results_gke_notap_15-Dec-2021-08-58-54-am/cp_heavy_12client"
Starting benchmark, writing logs to results_gke_notap_15-Dec-2021-08-58-54-am/cp_heavy_12client/kbench.log...
Running workload, please check kbench log for details...
panic: no Auth Provider found for name "gcp"

goroutine 1 [running]:
k-bench/util.Run(0xc0001f2800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
/root/go/src/k-bench/util/testdriver.go:164 +0x22f1
main.main()
/root/go/src/k-bench/cmd/kbench.go:193 +0xe76
``

Running k-bench against the OpenShift 4.6 cluster keeps printing the `Unauthorized` error

Please advise how to solve this issue.

I also applied the PR #28 in order to recognize the KUBECONFIG variable.

terminal output

[root@node9 ~]# export KUBECONFIG=/root/ocp/46/install_dir/auth/kubeconfig
[root@node9 ~]# cd k-bench
[root@node9 k-bench]# ./run.sh -r "kbench-dp-fio"  -t "dp_fio" -o "./results"
Running test dp_fio and results redirected to "./results/results_kbench-dp-fio_20-Oct-2021-04-44-39-pm/dp_fio"
I1020 16:44:41.287582  154604 request.go:645] Throttling request took 1.192939072s, request: GET:https://api.ssic.openshift.smc:6443/apis/cdi.kubevirt.io/v1alpha1?timeout=32s
Error from server (NotFound): error when creating "./config/dp_fio/fio_pvc.yaml": namespaces "kbench-pod-namespace" not found
Starting benchmark, writing logs to results/results_kbench-dp-fio_20-Oct-2021-04-44-39-pm/dp_fio/kbench.log...
Running workload, please check kbench log for details...
E1020 16:44:43.385243  154654 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.Pod: Unauthorized
E1020 16:44:44.390633  154654 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.Pod: Unauthorized
E1020 16:44:45.397357  154654 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.Pod: Unauthorized
E1020 16:44:46.404606  154654 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.Pod: Unauthorized
E1020 16:44:47.411525  154654 reflector.go:156] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:108: Failed to list *v1.Pod: Unauthorized
^C^C
[root@node9 k-bench]#

kbench.log

time="2021-10-20T16:44:43.363" level=info msg="Starting kbench..."
time="2021-10-20T16:44:43.365" level=info msg="Created a new Pod manager."
time="2021-10-20T16:44:43.379" level=warning msg="Fail to create namespace kbench-pod-namespace, Unauthorized"
time="2021-10-20T16:44:43.379" level=info msg="Performing pod actions in operation 0"
time="2021-10-20T16:44:43.379" level=info msg="Waiting all threads to finish on the current operation"
time="2021-10-20T16:44:43.390" level=warning msg="Fail to create namespace kbench-pod-namespace, Unauthorized"
time="2021-10-20T16:44:43.395" level=error msg=Unauthorized
time="2021-10-20T16:44:43.395" level=info msg="Sleep 100000 mili-seconds after CREATE action"
time="2021-10-20T16:44:47.436" level=info msg="Terminating the run after receiving SIGTERM signal."

The differences of k-bench, perf-tests and kubemark?

perf-tests: https://github.com/kubernetes/perf-tests/tree/master

Add flags to scale load

It'll be valuable to have functionality that allows the user to increase or decrease the number or resources created during a kbench run. One could scale the number of Pods, Deployments, and Services in order to observe the effects on performance.

0 "NumReplicas" in Deployments causes the wait to check until benchmark timeout

If a "Deployments" workload is defined without any "NumReplicas", the benchmark spins until timeout. Example workload:

  "BlockingLevel": "operation",
  "Timeout": 540000,
  "CheckingInterval": 3000,
  "Cleanup": false,
  "Operations": [
    {
      "Deployments": {
        "Actions": [
          {
            "Act": "CREATE",
            "Spec": {
              "ImagePullPolicy": "IfNotPresent",
              "Image": "k8s.gcr.io/pause:3.1"
            }
          },
          {
            "Act": "DELETE"
          }
        ],
        "SleepTimes": [
          30000
        ],
        "Count": 1
      }
    }
  ]
}

I would suggest to default it to 1 as it defaults in K8s.

Server metrices for pods not populated

Kbench test is not generating the server stats:

Config

Output

Make framework configurable to run hierarchical suite of workloads

The user should be able to provide a folder, subfolder or a test name to run.

Example:
./run.sh -t dataplane
./run.sh -t dataplane/ai/classification/resnet50
./run.sh -t dataplane/ai

provide web UI compare diff

http://artifacts.opnfv.org/functest-kubernetes/D8LBWLN718M8/functest-kubernetes-opnfv-functest-kubernetes-benchmarking-leguer-xrally_kubernetes_full-run-20/xrally_kubernetes_full/xrally_kubernetes_full.html#/

just like that.

Use KUBECONFIG environment variable if it exists

What would you like to be added:
I want K-bench to recognize the KUBECONFIG environment variable that I set.

Why is this needed:
The k-bench README tells the use to use kubectl get nodes to verify the cluster they are currently operating on. While this is a good way to check, it can be misleading due to not handling the KUBECONFIG variable in the code.

Quoting from Kubernetes docs:

If the KUBECONFIG environment variable doesn't exist, kubectl uses the default kubeconfig file, $HOME/.kube/config.

If the KUBECONFIG environment variable does exist, kubectl uses an effective configuration that is the result of merging the files listed in the KUBECONFIG environment variable.

This small change will reduce the chances of running the tests on the wrong cluster while not introducing breaking changes.

Support for the WATCH act

First, awesome project! 👏🏻

I saw from the configuration file that k-bench can define the actions for the benchmark. In most of the examples out there, CREATE, LIST, GET, DELETE, and UPDATE are used: wondering if there's support also for the WATCH one.

The reason behind that is that I'm using an etcd shim and need to be sure that the watch events that would be faked since not supported natively by the datastore are performing as expected.

I tried to take a look in the code base and wasn't able to find anything so far.

Convert logs to CSV

I think there's value in adding something that converts Kbench.log to CSV. It will make managing, and analyzing the data as a spreadsheet easier.

Running kbench on aarch64 VM

Tried running this on some aarch64 compute instances and there doesn't seem to be support. If it is, please point us me in the right direction. Thank you!

Panic due to concurrent map read and map write

K-bench had a fatal error and panicked due to concurrent map read and map write. Adding the trace log.

time="2023-04-07T04:44:19.124" level=info msg="Updated ProgressDeadlineSeconds for deployments kbench-deployment-oid-0-tid-1568"
time="2023-04-07T04:44:19.124" level=info msg="Sleep 9000 mili-seconds after UPDATE action"
time="2023-04-07T04:44:19.124" level=info msg="Updated ProgressDeadlineSeconds for deployments kbench-deployment-oid-0-tid-819"
time="2023-04-07T04:44:19.124" level=info msg="Sleep 9000 mili-seconds after UPDATE action"
time="2023-04-07T04:44:28.125" level=info msg="All operations completed."
fatal error: concurrent map read and map write

goroutine 1 [running]:
runtime.throw({0x11d3617?, 0x0?})
	/usr/lib/golang/src/runtime/panic.go:992 +0x71 fp=0xc0021096c0 sp=0xc002109690 pc=0x43a491
runtime.mapaccess2_faststr(0xc002109b88?, 0xd8f55ce171?, {0xc03510b5c0, 0x31})
	/usr/lib/golang/src/runtime/map_faststr.go:117 +0x3d4 fp=0xc002109728 sp=0xc0021096c0 pc=0x419214
k-bench/manager.(*PodManager).CalculateStats(0xc000488000)
	/root/go/src/k-bench/manager/pod_manager.go:1173 +0xef3 fp=0xc002109d08 sp=0xc002109728 pc=0xf75c93
k-bench/manager.(*DeploymentManager).CalculateStats(0xc0001eecf8?)
	/root/go/src/k-bench/manager/deployment_manager.go:547 +0x1d fp=0xc002109d20 sp=0xc002109d08 pc=0xf63a3d
k-bench/util.Finalize()
	/root/go/src/k-bench/util/testdriver.go:323 +0x1c6 fp=0xc002109f20 sp=0xc002109d20 pc=0xf9e966
k-bench/util.Run(_, {{{0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, 0x0}, {0x0, ...}, ...}, ...}, ...)
	/root/go/src/k-bench/util/testdriver.go:297 +0xfdd fp=0xc00210f670 sp=0xc002109f20 pc=0xf9dbdd
main.main()
	/root/go/src/k-bench/cmd/kbench.go:195 +0xb6e fp=0xc00210ff80 sp=0xc00210f670 pc=0xfb740e
runtime.main()
	/usr/lib/golang/src/runtime/proc.go:250 +0x212 fp=0xc00210ffe0 sp=0xc00210ff80 pc=0x43cc72
runtime.goexit()
	/usr/lib/golang/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc00210ffe8 sp=0xc00210ffe0 pc=0x46a1c1

goroutine 35 [chan receive]:
k8s.io/klog.(*loggingT).flushDaemon(0x0?)
	/root/go/pkg/mod/k8s.io/[email protected]/klog.go:1010 +0x6a
created by k8s.io/klog.init.0
	/root/go/pkg/mod/k8s.io/[email protected]/klog.go:411 +0xef

goroutine 3 [syscall, 20 minutes]:
os/signal.signal_recv()
	/usr/lib/golang/src/runtime/sigqueue.go:151 +0x2f
os/signal.loop()
	/usr/lib/golang/src/os/signal/signal_unix.go:23 +0x19
created by os/signal.Notify.func1.1
	/usr/lib/golang/src/os/signal/signal.go:151 +0x2a

Customize configuration of Image reference and ImagePullSecret

Hi,

In order to be able to run on air-gapped environment, it would be nice to be able to easyily replace image path in the config directory.

In addition, it would be necessary to support configuration of ImagePullSecret

Regards

K-Bench Benchmarking and Server Side Metrics

Good evening
First of all congratulations on the awesome tool.
I am a master's student in Computer Engineering at the University of Coimbra and, together with another colleague and supervisores, we are carrying out a scientific paper in which we are using K-Bench for benchmarking. We ran different tests (cp_heavy_12client, cp_heavy_8client, cp_light_4client and cp_light_1client) and with neither of those tests we managed to retrieve information on the server side metrics such as Image pulling latency. Is this supposed to happen or should we use another flag other than -benchconfig so as to run the tests?
Thank you for your attention.

Is this project still maintained?

TL;DR: I'm wondering if this project is still supported and maintained.

We recently used k-bench to run some benchmark test against kubernetes clusters, but we are now a little worried that the project might be unmaintained since no PRs have been merged or issue discussion for a long time. And there has not been a new release. It seems maintainers no longer active on GitHub too: @ganesank-git @yonglipa @ganesank

I think looking for new maintainers to help out with the workload would help. This is a pretty cool project which I hope will continue to improve! And we'd like to maintain it!

Remove Go Download

Clean the downloaded Go tarball.

kbench-pod-namespace doesn't terminate on cleanup=true

In every run "kbench-pod-namespace" is stuck in "Terminating" state waiting for finalizer action.

This causes failure to run subsequent workloads due to the namespace still "Terminating".

Support for in-cluster configs and k-bench as a kubernetes service.

What is Pod creation average latency?

In log file Im getting log like
time="2022-04-27T13:42:53.644" level=info msg="Pod creation average latency: 1.0219923 "
In Readme.me Pod creation average latency not explained. I think the it is entirely different from Pod creation latency (server).

Release k-bench version or publish roadmap

Hi Guys,

Is it possible to release a version of kbench at this stage? Even if its alpha?

Also is it possible to publish a roadmap along with feature list? Possibly with target dates?

This will help community gauge the maturity of the tool and future plan.

Thanks!

Dataplane Tests Error in container command

It looks as though the commands used to start iPerf in the containers are failing?

time="2023-04-21T12:52:32.366" level=info msg="Run: Container netperfclientcontainer found for pod kbench-pod-oid-2-tid-0"
time="2023-04-21T12:52:43.210" level=info msg="Container netperfclientcontainer on pod kbench-pod-oid-2-tid-0, Run out:  err: mkdir: cannot create directory '/tmp/perfoutput': File exists\n"
time="2023-04-21T12:52:43.210" level=info msg="Sleep 10000 mili-seconds after RUN action"
time="2023-04-21T12:52:53.222" level=info msg="One operation completed. Continue to run the next..."
time="2023-04-21T12:52:53.222" level=info msg="Performing pod actions in operation 6"
time="2023-04-21T12:52:53.222" level=info msg="Waiting all threads to finish on the current operation"
time="2023-04-21T12:52:53.445" level=info msg="Run: Container netperfservercontainer found for pod kbench-pod-oid-1-tid-0"
time="2023-04-21T12:53:44.875" level=info msg="Container netperfservercontainer on pod kbench-pod-oid-1-tid-0, Run out:  err: mkdir: cannot create directory '/tmp/perfoutput': File exists\n"
time="2023-04-21T12:53:44.876" level=info msg="Sleep 10000 mili-seconds after RUN action"

And therefore we are not getting any network I/O results from ./run.sh -t "dp_netperf_internode" but just the generic pod start/stop stats.

---update---

I can get these tests to run by deploying manually in the same environment and just running the test commands in sequence.

If the dataplane tests do run, where would I expect to see the results? I can't find anything that look like what I would expect for dataplane throughput and latency.