cloud-bulldozer / e2e-benchmarking Goto Github PK
View Code? Open in Web Editor NEWPerformance Tests for end Platforms
License: Apache License 2.0
Performance Tests for end Platforms
License: Apache License 2.0
When deploying the cluster logging for the first time it will sometimes result in an error for missing the ClusterLogging and ClusterLogForwarder.
Mon 29 Mar 2021 10:49:00 AM UTC: Checking if oc client is installed
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.6.10 True False 74d Cluster version is 4.6.10
Mon 29 Mar 2021 10:49:00 AM UTC: Deteting openshift-logging/openshift-operators-redhat namespaces if exists
Mon 29 Mar 2021 10:49:01 AM UTC: Installing the necessary objects for setting up elastic and logging operators and creating a cluster logging instance
Mon 29 Mar 2021 10:49:01 AM UTC: Creating cluster logging with custom elasticsearch backend
namespace/openshift-operators-redhat created
namespace/openshift-logging created
operatorgroup.operators.coreos.com/openshift-operators-redhat created
operatorgroup.operators.coreos.com/cluster-logging created
subscription.operators.coreos.com/cluster-logging created
unable to recognize "STDIN": no matches for kind "ClusterLogging" in version "logging.openshift.io/v1"
unable to recognize "STDIN": no matches for kind "ClusterLogForwarder" in version "logging.openshift.io/v1"
I presume this is due to trying to apply the cr's to quickly after applying the crd.
Some of the links used in the e2e tests and common files still refer to ripsaw in the github link. While this isn't hurting anything from a functional standpoint we should revise the old links for ease of readability.
When a test such as uperf's host network test times out I expect to see a failure code as the result of the script however it is returning 0 meaning a successful completion of the test.
Example
[2021-12-01, 05:25:28 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: BENCHMARK UUID STATE
[2021-12-01, 05:25:28 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 Not Assigned Yet
[2021-12-01, 05:25:37 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317
[2021-12-01, 05:25:39 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Building
[2021-12-01, 05:25:44 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Starting Servers
[2021-12-01, 05:26:27 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Starting Clients
[2021-12-01, 05:26:38 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Waiting for Clients
[2021-12-01, 05:27:00 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Clients Running
[2021-12-01, 05:27:08 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Set Running
[2021-12-01, 07:14:43 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Run Next Set
[2021-12-01, 07:14:50 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Running
[2021-12-01, 07:14:59 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Cleanup
[2021-12-01, 07:15:11 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 18a763ca-0e2f-4a97-a845-b9b6713dd317 Complete
[2021-12-01, 07:15:12 EST] {subprocess.py:89} INFO - ripsaw-cli:ripsaw.clients.k8s:INFO :: uperf-benchmark-hostnet-network-1 with uuid 18a763ca-0e2f-4a97-a845-b9b6713dd317 has reached the desired state Complete
In the example above the steps between "Set Running" and "Run Next Set" take roughly 2 hours which is the default timeout interval of the job.
This results in us having CI jobs that complete and "pass" even though they should throw an error code.
After router test execution finished ok, uploading to ES failed with a timeout, making all the test invalid:
[2021-10-25 21:16:38,830] {subprocess.py:78} INFO - �[1mMon Oct 25 21:16:38 UTC 2021 Testing all routes before triggering the workload�[0m
[2021-10-25 21:24:07,367] {subprocess.py:78} INFO - �[1mMon Oct 25 21:24:07 UTC 2021 Generating config for termination http with 1 clients 0 keep alive requests and path /1024.html�[0m
[2021-10-25 21:24:08,348] {subprocess.py:78} INFO - �[1mMon Oct 25 21:24:08 UTC 2021 Copying mb config http-scale-http.json to pod http-scale-client-5795dcd5cf-nd4w8�[0m
[2021-10-25 21:24:10,000] {subprocess.py:78} INFO - �[1mMon Oct 25 21:24:10 UTC 2021 Executing sample 1/2 using termination http with 1 clients and 0 keepalive requests�[0m
[2021-10-25 21:24:10,283] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - {
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "requests_per_second": 94798,
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "avg_latency": 5259,
[2021-10-25 21:25:29,192] {subprocess.py:78} INFO - "latency_95pctl": 7364,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "latency_99pctl": 9336,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "sample": "1",
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "keepalive": 0,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - "200": 5687916
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - }
[2021-10-25 21:25:29,193] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:25:29,239] {subprocess.py:78} INFO - �[1mMon Oct 25 21:25:29 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:26:29,241] {subprocess.py:78} INFO - �[1mMon Oct 25 21:26:29 UTC 2021 Executing sample 2/2 using termination http with 1 clients and 0 keepalive requests�[0m
[2021-10-25 21:26:29,612] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - {
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "requests_per_second": 96280,
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "avg_latency": 5172,
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "latency_95pctl": 7214,
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "latency_99pctl": 9203,
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "sample": "2",
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:27:48,844] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - "keepalive": 0,
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - "200": 5776842
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - }
[2021-10-25 21:27:48,845] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:27:48,889] {subprocess.py:78} INFO - �[1mMon Oct 25 21:27:48 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:28:48,892] {subprocess.py:78} INFO - �[1mMon Oct 25 21:28:48 UTC 2021 Generating config for termination http with 1 clients 1 keep alive requests and path /1024.html�[0m
[2021-10-25 21:28:49,569] {subprocess.py:78} INFO - �[1mMon Oct 25 21:28:49 UTC 2021 Copying mb config http-scale-http.json to pod http-scale-client-5795dcd5cf-nd4w8�[0m
[2021-10-25 21:28:51,231] {subprocess.py:78} INFO - �[1mMon Oct 25 21:28:51 UTC 2021 Executing sample 1/2 using termination http with 1 clients and 1 keepalive requests�[0m
[2021-10-25 21:28:51,514] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - {
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "requests_per_second": 7520,
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "avg_latency": 66310,
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "latency_95pctl": 112284,
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "latency_99pctl": 147764,
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "sample": "1",
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:29:54,914] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - "keepalive": 1,
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - "200": 451232
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - }
[2021-10-25 21:29:54,915] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:29:54,958] {subprocess.py:78} INFO - �[1mMon Oct 25 21:29:54 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:30:54,960] {subprocess.py:78} INFO - �[1mMon Oct 25 21:30:54 UTC 2021 Executing sample 2/2 using termination http with 1 clients and 1 keepalive requests�[0m
[2021-10-25 21:30:55,269] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:31:58,903] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:31:58,903] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - {
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "requests_per_second": 8729,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "avg_latency": 57239,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "latency_95pctl": 93251,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "latency_99pctl": 123224,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "sample": "2",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "keepalive": 1,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - "200": 523795
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - }
[2021-10-25 21:31:58,904] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:31:58,945] {subprocess.py:78} INFO - �[1mMon Oct 25 21:31:58 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:32:58,948] {subprocess.py:78} INFO - �[1mMon Oct 25 21:32:58 UTC 2021 Generating config for termination http with 1 clients 50 keep alive requests and path /1024.html�[0m
[2021-10-25 21:32:59,583] {subprocess.py:78} INFO - �[1mMon Oct 25 21:32:59 UTC 2021 Copying mb config http-scale-http.json to pod http-scale-client-5795dcd5cf-nd4w8�[0m
[2021-10-25 21:33:01,223] {subprocess.py:78} INFO - �[1mMon Oct 25 21:33:01 UTC 2021 Executing sample 1/2 using termination http with 1 clients and 50 keepalive requests�[0m
[2021-10-25 21:33:01,514] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - {
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:34:16,725] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "requests_per_second": 75027,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "avg_latency": 6622,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "latency_95pctl": 11104,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "latency_99pctl": 15192,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "sample": "1",
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "keepalive": 50,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - "200": 4501656
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - }
[2021-10-25 21:34:16,726] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:34:16,772] {subprocess.py:78} INFO - �[1mMon Oct 25 21:34:16 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:35:16,774] {subprocess.py:78} INFO - �[1mMon Oct 25 21:35:16 UTC 2021 Executing sample 2/2 using termination http with 1 clients and 50 keepalive requests�[0m
[2021-10-25 21:35:18,405] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:36:34,282] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:36:34,282] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - {
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "requests_per_second": 75339,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "avg_latency": 6589,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "latency_95pctl": 11073,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "latency_99pctl": 15216,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "sample": "2",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "conn_per_targetroute": 1,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "keepalive": 50,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - "200": 4520354
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - }
[2021-10-25 21:36:34,283] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:36:34,329] {subprocess.py:78} INFO - �[1mMon Oct 25 21:36:34 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:37:34,332] {subprocess.py:78} INFO - �[1mMon Oct 25 21:37:34 UTC 2021 Generating config for termination http with 20 clients 0 keep alive requests and path /1024.html�[0m
[2021-10-25 21:37:34,959] {subprocess.py:78} INFO - �[1mMon Oct 25 21:37:34 UTC 2021 Copying mb config http-scale-http.json to pod http-scale-client-5795dcd5cf-nd4w8�[0m
[2021-10-25 21:37:36,657] {subprocess.py:78} INFO - �[1mMon Oct 25 21:37:36 UTC 2021 Executing sample 1/2 using termination http with 20 clients and 0 keepalive requests�[0m
[2021-10-25 21:37:36,941] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - {
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - "requests_per_second": 11818,
[2021-10-25 21:38:47,378] {subprocess.py:78} INFO - "avg_latency": 62146082886,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "latency_95pctl": 2138190,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "latency_99pctl": 5072470,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "sample": "1",
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "conn_per_targetroute": 20,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "keepalive": 0,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "0": 80286,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "200": 709087,
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - "504": 1
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - }
[2021-10-25 21:38:47,379] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:38:47,418] {subprocess.py:78} INFO - �[1mMon Oct 25 21:38:47 UTC 2021 Sleeping for 60s before next test�[0m
[2021-10-25 21:39:47,420] {subprocess.py:78} INFO - �[1mMon Oct 25 21:39:47 UTC 2021 Executing sample 2/2 using termination http with 20 clients and 0 keepalive requests�[0m
[2021-10-25 21:39:47,718] {subprocess.py:78} INFO - Unable to use a TTY - input is not a terminal or the right kind of file
[2021-10-25 21:41:05,990] {subprocess.py:78} INFO - Executing 'mb -i /tmp/http-scale-http.json -d 60 -o /tmp/results.csv'
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - Workload finished, results:
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - {
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "termination": "http",
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "test_type": "http",
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "uuid": "fa9cff5b-ec25-4669-a93c-de06b3806aa3",
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "requests_per_second": 32381,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "avg_latency": 1293305100412,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "latency_95pctl": 668422,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "latency_99pctl": 3339297,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "host_network": "true",
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "sample": "2",
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "runtime": 60,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "routes": 500,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "conn_per_targetroute": 20,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "keepalive": 0,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "tls_reuse": true,
[2021-10-25 21:41:05,991] {subprocess.py:78} INFO - "number_of_routers": "2",
[2021-10-25 21:41:05,992] {subprocess.py:78} INFO - "0": 80059,
[2021-10-25 21:41:05,992] {subprocess.py:78} INFO - "200": 1942910,
[2021-10-25 21:41:05,992] {subprocess.py:78} INFO - "408": 1
[2021-10-25 21:41:05,992] {subprocess.py:78} INFO - }
[2021-10-25 21:41:05,992] {subprocess.py:78} INFO - Indexing documents in router-test-results
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - Traceback (most recent call last):
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - six.raise_from(e, None)
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "<string>", line 3, in raise_from
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - httplib_response = conn.getresponse()
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib64/python3.6/http/client.py", line 1346, in getresponse
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - response.begin()
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib64/python3.6/http/client.py", line 307, in begin
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - version, status, reason = self._read_status()
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib64/python3.6/http/client.py", line 268, in _read_status
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - File "/usr/lib64/python3.6/socket.py", line 586, in readinto
[2021-10-25 21:41:05,993] {subprocess.py:78} INFO - return self._sock.recv_into(b)
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - socket.timeout: timed out
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO -
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - During handling of the above exception, another exception occurred:
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO -
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - Traceback (most recent call last):
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 250, in perform_request
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - method, url, body, retries=Retry(False), headers=request_headers, **kw
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - _stacktrace=sys.exc_info()[2])
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/util/retry.py", line 344, in increment
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - raise six.reraise(type(error), error, _stacktrace)
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/packages/six.py", line 693, in reraise
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - raise value
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - chunked=chunked)
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 386, in _make_request
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - File "/usr/lib/python3.6/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
[2021-10-25 21:41:05,994] {subprocess.py:78} INFO - urllib3.exceptions.ReadTimeoutError: HTTPConnectionPool(host='perf-results-elastic.apps.keith-cluster.perfscale.devcluster.openshift.com', port=80): Read timed out. (read timeout=10)
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO -
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - During handling of the above exception, another exception occurred:
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO -
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - Traceback (most recent call last):
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/workload/workload.py", line 92, in <module>
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - exit(main())
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/workload/workload.py", line 88, in main
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - index_result(payload)
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/workload/workload.py", line 23, in index_result
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - es.index(index=es_index, body=payload)
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/utils.py", line 152, in _wrapped
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - return func(*args, params=params, headers=headers, **kwargs)
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/client/__init__.py", line 402, in index
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - body=body,
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 415, in perform_request
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - raise e
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/transport.py", line 388, in perform_request
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - timeout=timeout,
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - File "/usr/local/lib/python3.6/site-packages/elasticsearch/connection/http_urllib3.py", line 261, in perform_request
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - raise ConnectionTimeout("TIMEOUT", str(e), e)
[2021-10-25 21:41:05,995] {subprocess.py:78} INFO - elasticsearch.exceptions.ConnectionTimeout: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='perf-results-elastic.apps.keith-cluster.perfscale.devcluster.openshift.com', port=80): Read timed out. (read timeout=10))
[2021-10-25 21:41:06,029] {subprocess.py:78} INFO - command terminated with exit code 1
[2021-10-25 21:41:06,032] {subprocess.py:78} INFO - fa9cff5b-ec25-4669-a93c-de06b3806aa3
[2021-10-25 21:41:06,033] {subprocess.py:82} INFO - Command exited with return code 1
[2021-10-25 21:41:06,055] {taskinstance.py:1462} ERROR - Task failed with exception
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1164, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1282, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1312, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/bash.py", line 176, in execute
raise AirflowException('Bash command failed. The command returned a non-zero exit code.')
airflow.exceptions.AirflowException: Bash command failed. The command returned a non-zero exit code.
[2021-10-25 21:41:06,057] {taskinstance.py:1505} INFO - Marking task as UP_FOR_RETRY. dag_id=4.8_rosa_default, task_id=router, execution_date=20211025T080342, start_date=20211025T211213, end_date=20211025T214106
[2021-10-25 21:41:06,090] {local_task_job.py:151} INFO - Task exited with return code 1
[2021-10-25 21:41:06,115] {local_task_job.py:261} INFO - 0 downstream tasks scheduled from follow-on schedule check
The uperf result doc that gets created in google does not explicitly say what test it is for. When searching through previous tests this makes it very difficult to quickly determine what test was what. Simply adding an additional name to the email subject or spreadsheet would be extremely helpful.
IE: instead of
Subject: Uperf-Test-Results-2021-03-02-16.41.14 - Invitation to edit
use
Subject: Uperf-Test-Results-2021-03-02-16.41.14 - pod-to-pod - Invitation to edit
In clusters accessible only via proxy, the e2e-benchmark tests using the ripsaw cli fail when oc otherwise works. The client needs to be updated to configure the kubernetes.client with the proxy. Something like this, I think:
config.load_kube_config(config_file=kubeconfig_path)
proxy_url = os.getenv('http_proxy', None)
if proxy_url:
client.Configuration._default_proxy = proxy_url
self.api_client = client.ApiClient()
I will try to put up a PR if I can figure out how to test ripsaw CLI changes locally (easily)
Assumes it is always usec, but it could be ms/sec
Earlier iit the latency was in ms
but now it is in us
router-perf-v2 failed because to the endpoints can not be reached.
Mon Dec 13 09:28:58 AM UTC 2021 Testing all routes before triggering the workload
curl: (52) Empty reply from server
Manually access the endpoints also failed.
% oc get route -n http-scale-http --no-headers -o custom-columns="route:.spec.host" | grep 499
http-perf-499-http-scale-http.apps.qili-aws-ovn.qe.devcluster.openshift.com
% oc get pod -n http-scale-http | grep 499
http-perf-499 1/1 Running 0 57m
% curl --retry 3 --connect-timeout 5 -sSk http://http-perf-499-http-scale-http.apps.qili-aws-ovn.qe.devcluster.openshift.com
curl: (52) Empty reply from server
Output of kube-burner
version
kube-burner-0.9.1
Describe the bug
When running with oc client version above 1.19 errors like #178 appear.
To Reproduce
Expected behavior
The test should run without error
Screenshots or output
An error seen in #178
Error: unknown flag: --type
See 'kubectl set --help' for usage.
Additional context
Note: This is a proposal and subject to change
Currently this repo is mainly a collection of bash scripts that orchestrate templating out Kubernetes manifests for benchmark-operator CRs. While this has worked for us in the past, I think we're getting to a point where we have to look at how this project should mature and how we can make it more consumable and reliable than it is currently.
Current Problems:
I think most of these problems can be remedied by looking at making this project into a CLI Tool that we release versioned artifacts for. IMO this would best be done by moving this project to Python and publish pip packages for versions that pass CI.
An ideal usage of this package could be something like this:
pip install openshift-benchmarks
# install benchmark-operator
openshift-benchmarks install-operator -n my-ripsaw
# show all workloads
openshift-benchmarks workloads list
# shows global configs like es and other commands
openshift-benchmarks -h
# shows uperf specific config
openshift-benchmarks uperf -h
# run uperf workload
openshift-benchmarks uperf run --networkpolicy true
# uninstall operator
openshift-benchmarks destroy-operator
It would also be nice to use this as a library in other python code if possible. This would be great for using it in Airflow but isn't a hard requirement.
I've noticed that some of the logic from this old PR into the benchmark-operator cloud-bulldozer/benchmark-operator#437 could be used as a starting point.
Etcd-perf test runs fio to capture the latency/fsync metrics on the disk. Currently, the pod gets scheduled on one of the worker nodes and they might not be using the same disk as master nodes, we need to make sure the test pod is scheduled on one of the master nodes to be able to hit the location used by the etcd to read/write.
We are blinding removing the openshift-operators-redhat namespace during install/cleanup. This is bad practice as it may being used by other RH operators and could result in future problems. We should more selectively remove this namespace or even better, if we only remove what we need to add.
Line reference:
Failed execution of router on airflow:
[2021-11-18, 04:04:55 CET] {subprocess.py:89} INFO - touchstone_compare --database elasticsearch -url http://elastic:62cuyJA229jfFl604nUC54TV@perf-results-elastic.apps.keith-cluster.perfscale.devcluster.openshift.com:80 -u 73571683-b560-4093-a63e-bee5ead321a0 --config /home/airflow/workspace/e2e-benchmarking/workloads/router-perf-v2/mb-touchstone.json -o csv --output-file /home/airflow/workspace/e2e-benchmarking/workloads/router-perf-v2/ingress-performance.csv --rc 0
[2021-11-18, 04:04:58 CET] {subprocess.py:89} INFO - [1mThu Nov 18 03:04:58 UTC 2021 Installing requirements to generate spreadsheet[0m
[2021-11-18, 04:05:03 CET] {subprocess.py:89} INFO - WARNING: You are using pip version 21.1.1; however, version 21.3.1 is available.
[2021-11-18, 04:05:03 CET] {subprocess.py:89} INFO - You should consider upgrading via the '/tmp/tmp.FcwXikxljr/bin/python -m pip install --upgrade pip' command.
[2021-11-18, 04:05:03 CET] {subprocess.py:89} INFO - ../../utils/common.sh: line 81: ./csv_gen.py: No such file or directory
[2021-11-18, 04:05:03 CET] {subprocess.py:89} INFO - 73571683-b560-4093-a63e-bee5ead321a0
[2021-11-18, 04:05:03 CET] {subprocess.py:93} INFO - Command exited with return code 127
https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/utils/common.sh#L81
The current pod-density version which creates specified number of sleep pods in a namespace, we also need a heavy version similar to node-density heavy to create heavy applications. We can leverage node-density heavy version to today to do it by configuring the node and pod counts but it also creates services which is a limitation for this test since there can't be more than 5000 services per namespace due to the ARG_MAX limitation on the host and pod-density would need to create >= 25000 pods to validate and push the cluster maximums.
I think we could print some of the workload logs (Not the benchmark operator ones, but from the actual pods of the workload) and when a benchmark fails. We're currently blind when that happens and sometimes implies re-running it manually, hence waste of time.
A right place to add this feature could be within the run_benchmark function:
e2e-benchmarking/utils/benchmark-operator.sh
Lines 53 to 59 in fc86aa5
While running https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/run_maxservices_test_fromgit.sh and https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/run_maxnamespaces_test_fromgit.sh, job iterations are set to 0 even after setting the value for TEST_JOB_ITERATIONS
in the script files.
I was able to trace the issue to a missing export
keyword before TEST_JOB_ITERATIONS
. After including the export
keyword in the script file, this issue is resolved.
Sometimes when large amounts of iterations of the cluster-density test are run, the {{rand 4}} or {{rand 10}} produce a number and the configmap is not able to get created properly. See output below
09-02 01:18:39.880 time="2021-09-02 05:18:39" level=error msg="Error creating object: ConfigMap in version "v1" cannot be handled as a ConfigMap: v1.ConfigMap.Data: ReadString: expects " or n, but found 9, error found in #10 byte of ...|:{"key1":9764,"key2"|..., bigger context ...|{"apiVersion":"v1","data":{"key1":9764,"key2":"sQxmxGVmZV"},"kind":"ConfigMap","metad|..."
09-02 01:18:39.880 time="2021-09-02 05:18:39" level=error msg="Retrying object creation"
09-02 01:18:40.529 time="2021-09-02 05:18:40" level=error msg="Error creating object: ConfigMap in version "v1" cannot be handled as a ConfigMap: v1.ConfigMap.Data: ReadString: expects " or n, but found 9, error found in #10 byte of ...|:{"key1":9764,"key2"|..., bigger context ...|{"apiVersion":"v1","data":{"key1":9764,"key2":"sQxmxGVmZV"},"kind":"ConfigMap","metad|..."
09-02 01:18:40.529 time="2021-09-02 05:18:40" level=error msg="Retrying object creation"
09-02 01:18:43.869 time="2021-09-02 05:18:43" level=error msg="Error creating object: ConfigMap in version "v1" cannot be handled as a ConfigMap: v1.ConfigMap.Data: ReadString: expects " or n, but found 9, error found in #10 byte of ...|:{"key1":9764,"key2"|..., bigger context ...|{"apiVersion":"v1","data":{"key1":9764,"key2":"sQxmxGVmZV"},"kind":"ConfigMap","metad|..."
I think a quick fix for this is to add quotes around "{{rand 4}}" so in case it isn't a string it'll be passed as one and the config will be able to
Testing out a fix in: https://github.com/paigerube14/e2e-benchmarking/tree/quotes
OpenShift needs a standardized tool that can be used to synthesize actual workloads for the cluster based on the node count.
So for example I would expect something like this as the user experience.
./benchmark --workers 3
This would run what OCP considers as the high end of a workload for 3 nodes. From this workload, the expectation is given the hardware requirements for the cluster are met that the test will pass. If the test fails the cluster-admin should be able to go to the console and review the alerts on the cluster. Those alerts should provide direct direction to admin of resolution steps to resolve performance alerts.
IE disks too slow ..
With the recent enhancement - cloud-bulldozer/benchmark-operator#323, we will be able to enable Cerberus by passing a url to Ripsaw. The workloads in plow needs to be modified to enable it and act accordingly.
The current upgrade tests at https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/upgrade-perf/run_upgrade_fromgit.sh seem to be too simplistic. I would like to see this improve based on the recent upgrade testing we had done for a customer. It is important to load up the cluster and here are some suggested improvements to the script.
Some yamls for sample
apiVersion: apps/v1
kind: Deployment
metadata:
name: sampleapp
spec:
replicas: 300
selector:
matchLabels:
app: sample
template:
metadata:
labels:
app: sample
spec:
containers:
- name: app
image: quay.io/smalleni/sampleapp:latest
readinessProbe:
httpGet:
path: /
port: 8080
initialDelaySeconds: 3
ports:
- containerPort: 8080
protocol: TCP
resources:
requests:
cpu: "1"
limits:
cpu: "1"
nodeSelector:
app: "true"
apiVersion: v1
kind: Service
metadata:
name: samplesvc
spec:
selector:
app: sample
ports:
- port: 80
targetPort: 8080
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: except
spec:
podSelector:
matchLabels:
app: sample
ingress:
- from:
- ipBlock:
cidr: 10.128.0.0/14
except:
- "10.130.36.0/23"
- "10.130.12.0/23"
- "10.128.18.0/23"
- "10.131.10.0/23"
- "10.131.22.0/23"
- "10.128.24.0/23"
- "10.128.14.0/23"
e2e-benchmarking/workloads/network-perf/common.sh
Lines 235 to 242 in de3a11a
[2021-10-15 09:14:34,406] {subprocess.py:78} INFO - + sleep 60
[2021-10-15 09:15:34,407] {subprocess.py:78} INFO - + for i in $(seq 1 $_timeout)
[2021-10-15 09:15:34,408] {subprocess.py:78} INFO - ++ oc get nodes --no-headers -l 'node-role.kubernetes.io/worker,node-role.kubernetes.io/master!=,node-role.kubernetes.io/infra!=,node-role.kubernetes.io/workload!=' --ignore-not-found
[2021-10-15 09:15:34,408] {subprocess.py:78} INFO - ++ grep -v NAME
[2021-10-15 09:15:34,408] {subprocess.py:78} INFO - ++ wc -l
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + current_workers=18
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + echo 'Current worker count: 18'
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - Current worker count: 18
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + echo 'Desired worker count: 3'
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - Desired worker count: 3
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + oc describe -n benchmark-operator benchmarks/scale
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + grep State
[2021-10-15 09:15:34,880] {subprocess.py:78} INFO - + grep Complete
[2021-10-15 09:15:35,337] {subprocess.py:78} INFO - State: Complete
[2021-10-15 09:15:35,337] {subprocess.py:78} INFO - + '[' 0 -eq 0 ']'
[2021-10-15 09:15:35,337] {subprocess.py:78} INFO - + '[' 18 -eq 3 ']'
[2021-10-15 09:15:35,337] {subprocess.py:78} INFO - + echo 'Scaling completed but desired worker count is not equal to current worker count!'
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - Scaling completed but desired worker count is not equal to current worker count!
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - + break
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - + '[' 1 == 1 ']'
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - + echo 'Scaling failed'
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - Scaling failed
[2021-10-15 09:15:35,338] {subprocess.py:78} INFO - + exit 1
When running the router-v2 test by hand with comparissions enabled the following error is seen from touchstone
+ touchstone_compare --database elasticsearch -url https://MY_ES_SERVER -u null -o yaml --config config/mb.json --tolerancy-rules tolerancy-configs/mb.yaml
+ grep -v ERROR
+ tee compare_output_9.yaml
2021-09-09 15:45:12,746 - touchstone - ERROR - Error: Issue capturing results from elasticsearch using config {'filter': {'test_type': 'http'}, 'buckets': ['routes', 'conn_per_targetroute', 'keepalive'], 'aggregations': {'requests_per_second': ['avg'], 'latency_95pctl': ['avg']}}
2021-09-09 15:45:12,763 - touchstone - ERROR - Error: Issue capturing results from elasticsearch using config {'filter': {'test_type': 'edge'}, 'buckets': ['routes', 'conn_per_targetroute', 'keepalive'], 'aggregations': {'requests_per_second': ['avg'], 'latency_95pctl': ['avg']}}
2021-09-09 15:45:12,780 - touchstone - ERROR - Error: Issue capturing results from elasticsearch using config {'filter': {'test_type': 'passthrough'}, 'buckets': ['routes', 'conn_per_targetroute', 'keepalive'], 'aggregations': {'requests_per_second': ['avg'], 'latency_95pctl': ['avg']}}
2021-09-09 15:45:12,796 - touchstone - ERROR - Error: Issue capturing results from elasticsearch using config {'filter': {'test_type': 'reencrypt'}, 'buckets': ['routes', 'conn_per_targetroute', 'keepalive'], 'aggregations': {'requests_per_second': ['avg'], 'latency_95pctl': ['avg']}}
2021-09-09 15:45:12,813 - touchstone - ERROR - Error: Issue capturing results from elasticsearch using config {'filter': {'test_type': 'mix'}, 'buckets': ['routes', 'conn_per_targetroute', 'keepalive'], 'aggregations': {'requests_per_second': ['avg'], 'latency_95pctl': ['avg']}}
2021-09-09 15:45:12,813 - touchstone - ERROR - Key test_type key not found in current dict level: []
{}
Looking at the touchstone_compare line the UUID passed is set to null. However, at the top of the output you can see the UUID is set.
09-09-2021T12:23:01 Small scale scenario detected: #workers < 24
09-09-2021T12:23:01 Deploying benchmark infrastructure
time="2021-09-09 12:23:01" level=info msg="Setting log level to info"
time="2021-09-09 12:23:01" level=info msg="🔥 Starting kube-burner with UUID 5f69ab80-8a40-4b00-80a6-54ee61e47018"
Example env.sh file sourced
# General
export KUBECONFIG=/root/gcp/gcp_kube
export UUID=$(uuidgen)
# ES configuration
export ES_SERVER=MY_ES_SERVER
export ES_INDEX=${ES_INDEX:-router-test-results}
export ES_SERVER_BASELINE=MY_ES_BASELINE
# Gold comparison
COMPARE_WITH_GOLD=true
ES_GOLD=${ES_GOLD:-${ES_SERVER}}
GOLD_SDN=${GOLD_SDN:-openshiftsdn}
GOLD_OCP_VERSION=4.8
# Environment setup
NUM_NODES=$(oc get node -l node-role.kubernetes.io/worker --no-headers | grep -cw Ready)
ENGINE=${ENGINE:-podman}
KUBE_BURNER_RELEASE_URL=${KUBE_BURNER_RELEASE_URL:-https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.11/kube-burner-0.11-Linux-x86_64.tar.gz}
KUBE_BURNER_IMAGE=quay.io/cloud-bulldozer/kube-burner:latest
TERMINATIONS=${TERMINATIONS:-"http edge passthrough reencrypt mix"}
INFRA_TEMPLATE=http-perf.yml.tmpl
INFRA_CONFIG=http-perf.yml
export SERVICE_TYPE=${SERVICE_TYPE:-NodePort}
export NUMBER_OF_ROUTERS=${NUMBER_OF_ROUTERS:-2}
export HOST_NETWORK=${HOST_NETWORK:-true}
export NODE_SELECTOR=${NODE_SELECTOR:-'{node-role.kubernetes.io/workload: }'}
# Benchmark configuration
RUNTIME=${RUNTIME:-60}
TLS_REUSE=${TLS_REUSE:-true}
URL_PATH=${URL_PATH:-/1024.html}
SAMPLES=${SAMPLES:-2}
QUIET_PERIOD=${QUIET_PERIOD:-60s}
KEEPALIVE_REQUESTS=${KEEPALIVE_REQUESTS:-"0 1 50"}
# Comparison and csv generation
THROUGHPUT_TOLERANCE=${THROUGHPUT_TOLERANCE:-5}
LATENCY_TOLERANCE=${LATENCY_TOLERANCE:-5}
PREFIX=${PREFIX:-$(oc get clusterversion version -o jsonpath="{.status.desired.version}")}
LARGE_SCALE_THRESHOLD=${LARGE_SCALE_THRESHOLD:-24}
METADATA_COLLECTION=${METADATA_COLLECTION:-true}
SMALL_SCALE_BASELINE_UUID=29d520a2-039a-4a1e-b139-83fe2e63fda1
LARGE_SCALE_BASELINE_UUID=9df8255d-2038-42ed-869d-f748f671da07
GSHEET_KEY_LOCATION=/root/gcp/gsheet
EMAIL_ID_FOR_RESULTS_SHEET="[email protected]"
Hi.
In README file is info that I need to install python requirements... There is no requirements.txt file.
Do I still need to install requirements?
[2021-11-29, 22:56:37 EST] {subprocess.py:89} INFO - Google Spreadsheet link -> https://docs.google.com/spreadsheets/d/1whYNQ1tjYoQdYGGGSod2-O1XQbAO5EK7GhnIsmvICqw
[2021-11-29, 22:56:37 EST] {subprocess.py:89} INFO - �[1mTue Nov 30 03:56:37 UTC 2021 Removing touchstone�[0m
[2021-11-29, 22:56:37 EST] {subprocess.py:89} INFO - ../../utils/compare.sh: line 14: deactivate: command not found
When testing some clusters we may not have easy access to a kubeconfig (think some managed services). Allowing the option to provide a login api address, user and pass would increase the usability of the e2e testing framework. Without this ability we will be unable to add these platforms to our pipeline testing.
We currently index the scale up and upgrade timings to elasticsearch, it might be useful to display them on the stdout at the end of the job run for cases where 1 ) we want to quickly take a look at how long it took 2) we don't use elasticsearch - defaults to public but the user might not know about it.
Thoughts?
The wait_for_benchmark function in the kube-burner common.sh does not have any timeout so it could run forever. We have an environment variable that gets set at the top of the file (JOB_TIMEOUT) but do not ever use it in the script.
While running Uperf Service IP testing on 100 node clusters across AWS, Azure and GCP, saw a consistent pattern of errors printing messages as below:
18:25:41 2021-05-06T18:25:28Z - INFO - MainProcess - wrapper_factory: identified uperf as the benchmark wrapper
18:25:41 2021-05-06T18:25:28Z - INFO - MainProcess - trigger_uperf: Starting sample 1 out of 3
18:25:41 2021-05-06T18:25:28Z - ERROR - MainProcess - trigger_uperf: UPerf failed to execute, trying one more time..
18:25:41 2021-05-06T18:25:28Z - ERROR - MainProcess - trigger_uperf: stdout: Error getting SSL CTX:1
18:25:41 Allocating shared memory of size 156624 bytes
18:25:41 Error connecting to 172.30.46.82
18:25:41
18:25:41 ** TCP: Cannot connect to 172.30.46.82:20000 Connection refused
18:25:41 2021-05-06T18:25:28Z - ERROR - MainProcess - trigger_uperf: stderr:
18:25:41 2021-05-06T18:25:28Z - CRITICAL - MainProcess - trigger_uperf: UPerf failed to execute a second time, stopping...
18:25:41 2021-05-06T18:25:28Z - CRITICAL - MainProcess - trigger_uperf: stdout: Error getting SSL CTX:1
18:25:41 Allocating shared memory of size 156624 bytes
18:25:41 Error connecting to 172.30.46.82
18:25:41
18:25:41 ** TCP: Cannot connect to 172.30.46.82:20000 Connection refused
Talking to @jtaleric, @dry923 and @mohit-sheth , they suggested this is a potential bug and have it filed as an issue so it can be addressed.
This is same error seen with METADATA_COLLECTION=false
or METADATA_COLLECTION=true
, no other parameters are being passed except ES_SERVER which is our own server and it works well.
CC : @mffiedler
Today, we pick up two worker nodes to pin uperf client and server pods too.
We should look at either pinning server/client pods on worker nodes in the same availability zone or across different availability zones, so that we can get more consistent results.
Initially, we can start with same availability zone.
During a e2e test we assume the kubectl, oc, etc packages are at a suitable version. This can lead to errors and wasted time debugging and fixing. We should add a common step to ensure we are at the correct package versions for the current implementation.
Clusters might not have the workload node, for example the cluster used by the e2e CI. We need to expose the parameters which enables/disables running the test orchestrator from the workload node:
export WORKLOAD_JOB_NODE_SELECTOR=<true/false>
export WORKLOAD_JOB_TAINT=<true/false>
Also, looks like it's running the comparison even COMPARE is set to false,
needs to run only COMPARE=true.When running the network test (uperf) on a GCP cluster the following error is hit
2021-09-15T14:33:25Z - INFO - MainProcess - process: Collecting 3 samples of command ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1']
2021-09-15T14:33:26Z - WARNING - MainProcess - process: Got bad return code from command: ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1'].
2021-09-15T14:33:28Z - WARNING - MainProcess - process: Got bad return code from command: ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1'].
2021-09-15T14:33:29Z - WARNING - MainProcess - process: Got bad return code from command: ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1'].
2021-09-15T14:33:29Z - CRITICAL - MainProcess - process: After 3 attempts, unable to run command: ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1']
2021-09-15T14:33:29Z - WARNING - MainProcess - process: Sample 1 has failed state for command ['uperf', '-v', '-a', '-R', '-i', '1', '-m', '/tmp/uperf-test/uperf-stream-tcp-16384-16384-1']
2021-09-15T14:33:29Z - CRITICAL - MainProcess - uperf: Uperf failed to run! Got results: ProcessSample(expected_rc=0, success=False, attempts=3, timeout=None, failed=[ProcessRun(rc=-11, stdout='Error getting SSL CTX:1\nAllocating shared memory of size 156624 bytes\nCompleted handshake phase 1\nStarting handshake phase 2\nHandshake phase 2 with 10.0.128.3\n Done preprocessing accepts\n Sent handshake header\n Sending workorder\n Sent workorder\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\nTX worklist success Sent workorder\nHandshake phase 2 with 10.0.128.3 done\nCompleted handshake phase 2\nStarting 1 threads running profile:stream-tcp-16384-16384-1 ... 0.00 seconds\n', stderr='', time_seconds=1.363361, hit_timeout=False), ProcessRun(rc=-11, stdout='Error getting SSL CTX:1\nAllocating shared memory of size 156624 bytes\nCompleted handshake phase 1\nStarting handshake phase 2\nHandshake phase 2 with 10.0.128.3\n Done preprocessing accepts\n Sent handshake header\n Sending workorder\n Sent workorder\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\nTX worklist success Sent workorder\nHandshake phase 2 with 10.0.128.3 done\nCompleted handshake phase 2\nStarting 1 threads running profile:stream-tcp-16384-16384-1 ... 0.00 seconds\n', stderr='', time_seconds=1.401737, hit_timeout=False), ProcessRun(rc=-11, stdout='Error getting SSL CTX:1\nAllocating shared memory of size 156624 bytes\nCompleted handshake phase 1\nStarting handshake phase 2\nHandshake phase 2 with 10.0.128.3\n Done preprocessing accepts\n Sent handshake header\n Sending workorder\n Sent workorder\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\n Sent transaction\n Sent flowop\nTX worklist success Sent workorder\nHandshake phase 2 with 10.0.128.3 done\nCompleted handshake phase 2\nStarting 1 threads running profile:stream-tcp-16384-16384-1 ... 0.00 seconds\n', stderr='', time_seconds=1.323687, hit_timeout=False)], successful=None)
OpenShift cluster upgrades uses channel defined in cincinnati - https://github.com/openshift/cincinnati-graph-data/tree/master/channels to determine if there are any upgrades available. It's set to stable-4.x or candiate-4.x and both of them doesn't track the nightly OCP builds since they are not GA.
Problem:
The target cluster has a single worker node and only one router. I set NUMBER_OF_ROUTERS=1, NODE_SELECTOR={node-role.kubernetes.io/worker: }, the router-perf-v2 can't work.
Error logs:
08-23 16:30:30.977 23-08-2021T08:30:30 Scaling number of routers to 1
08-23 16:30:31.243 deployment.apps/router-default scaled
08-23 16:30:31.526 Waiting for deployment "router-default" rollout to finish: 1 old replicas are pending termination...
08-23 16:40:40.639 error: deployment "router-default" exceeded its progress deadline
Analysis of the cause:
After running this line of code
A new replica set router-default-d9888dff8
is created to make the change to a new pod.
% oc get pods -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-5844bb8f66-jhxph 1/1 Running 0 3h27m
router-default-d9888dff8-pb4kg 0/1 Pending 0 11m
While because of the anti-affinity rule, the new pod can not be scheduled.
% oc describe pod router-default-d9888dff8-pb4kg -n openshift-ingress
...
Controlled By: ReplicaSet/router-default-d9888dff8
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules.
Then when the following code is run
e2e-benchmarking/workloads/router-perf-v2/common.sh
Lines 63 to 64 in 9da00a2
error happens
% oc scale --replicas=1 -n openshift-ingress deploy/router-default
deployment.apps/router-default scaled
% oc rollout status -n openshift-ingress deploy/router-default
error: deployment "router-default" exceeded its progress deadline
Scale up trys to work on the replica set router-default-d9888dff8
, which is not READY.
% oc describe -n openshift-ingress deploy/router-default
...
OldReplicaSets: router-default-5844bb8f66 (1/1 replicas created), router-default-d9888dff8 (1/1 replicas created)
...
Normal ScalingReplicaSet 22m (x5 over 128m) deployment-controller Scaled up replica set router-default-d9888dff8 to 1
% oc get rs -n openshift-ingress
NAME DESIRED CURRENT READY AGE
router-default-5844bb8f66 1 1 1 3h44m
router-default-d9888dff8 1 1 0 134m
Proposal:
To make the router-perf-v2 work on single worker node cluster. One proposal could be adding a logic when NUMBER_OF_ROUTERS is set to -1, the tune_liveness_probe and enable_ingress_operator functions are disabled.
following cloud-bulldozer/benchmark-operator#315 which replaces a few metadata related parameters in the cr such as metadata_privileged is now metadata.privileged, the current scripts will also need to be updated
When running the router test it has been periodically failing with the following error:
...
Indexing documents in router-test-results
10-09-2021T18:56:46 Sleeping for 6s before next test
10-09-2021T18:56:52 Generating config for termination passthrough with 200 clients 0 keep alive requests and path /1024.html
10-09-2021T18:56:53 Copying mb config http-scale-passthrough.json to pod http-scale-client-6fc5db9645-9lpld
10-09-2021T18:56:55 Executing sample 1/1 using termination passthrough with 200 clients and 0 keepalive requests
Executing 'mb -i /tmp/http-scale-passthrough.json -d 1 -o /tmp/results.csv'
Traceback (most recent call last):
File "/usr/lib64/python3.6/subprocess.py", line 425, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.6/subprocess.py", line 1535, in _communicate
self._check_timeout(endtime, orig_timeout)
File "/usr/lib64/python3.6/subprocess.py", line 891, in _check_timeout
raise TimeoutExpired(self.args, orig_timeout)
subprocess.TimeoutExpired: Command 'mb -i /tmp/http-scale-passthrough.json -d 1 -o /tmp/results.csv' timed out after 5 seconds
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workload/workload.py", line 92, in <module>
exit(main())
File "/workload/workload.py", line 66, in main
result_codes, p95_latency, p99_latency, avg_latency = run_mb(args.mb_config, args.runtime, args.output)
File "/workload/workload.py", line 35, in run_mb
timeout=int(runtime) * 5)
File "/usr/lib64/python3.6/subprocess.py", line 430, in run
stderr=stderr)
subprocess.TimeoutExpired: Command 'mb -i /tmp/http-scale-passthrough.json -d 1 -o /tmp/results.csv' timed out after 5 seconds
command terminated with exit code 1
Example env.sh
# General
export KUBECONFIG=/root/gcp/gcp_kube_3
export UUID=$(uuidgen)
# ES configuration
export ES_SERVER=ES_SERVER
export ES_INDEX=${ES_INDEX:-router-test-results}
export ES_SERVER_BASELINE=ES_SERVER_BASELINE
# Gold comparison
COMPARE_WITH_GOLD="false"
ES_GOLD=${ES_GOLD:-${ES_SERVER}}
GOLD_SDN=${GOLD_SDN:-openshiftsdn}
GOLD_OCP_VERSION=4.8
# Environment setup
NUM_NODES=$(oc get node -l node-role.kubernetes.io/worker --no-headers | grep -cw Ready)
ENGINE=${ENGINE:-podman}
KUBE_BURNER_RELEASE_URL=${KUBE_BURNER_RELEASE_URL:-https://github.com/cloud-bulldozer/kube-burner/releases/download/v0.11/kube-burner-0.11-Linux-x86_64.tar.gz}
KUBE_BURNER_IMAGE=quay.io/cloud-bulldozer/kube-burner:latest
TERMINATIONS=${TERMINATIONS:-"http edge passthrough reencrypt mix"}
INFRA_TEMPLATE=http-perf.yml.tmpl
INFRA_CONFIG=http-perf.yml
export SERVICE_TYPE=${SERVICE_TYPE:-NodePort}
export NUMBER_OF_ROUTERS=${NUMBER_OF_ROUTERS:-1}
#export NUMBER_OF_ROUTERS=${NUMBER_OF_ROUTERS:-2}
export HOST_NETWORK=${HOST_NETWORK:-true}
export NODE_SELECTOR=${NODE_SELECTOR:-'{node-role.kubernetes.io/workload: }'}
# Benchmark configuration
#RUNTIME=${RUNTIME:-60}
#TLS_REUSE=${TLS_REUSE:-true}
#URL_PATH=${URL_PATH:-/1024.html}
#SAMPLES=${SAMPLES:-2}
#QUIET_PERIOD=${QUIET_PERIOD:-60s}
#KEEPALIVE_REQUESTS=${KEEPALIVE_REQUESTS:-"0 1 50"}
# Benchmark configuration
RUNTIME=${RUNTIME:-1}
TLS_REUSE=${TLS_REUSE:-true}
URL_PATH=${URL_PATH:-/1024.html}
SAMPLES=${SAMPLES:-1}
QUIET_PERIOD=${QUIET_PERIOD:-6s}
KEEPALIVE_REQUESTS=${KEEPALIVE_REQUESTS:-"0"}
# Comparison and csv generation
THROUGHPUT_TOLERANCE=${THROUGHPUT_TOLERANCE:-5}
LATENCY_TOLERANCE=${LATENCY_TOLERANCE:-5}
PREFIX=${PREFIX:-$(oc get clusterversion version -o jsonpath="{.status.desired.version}")}
LARGE_SCALE_THRESHOLD=${LARGE_SCALE_THRESHOLD:-24}
METADATA_COLLECTION=${METADATA_COLLECTION:-true}
SMALL_SCALE_BASELINE_UUID="29d520a2-039a-4a1e-b139-83fe2e63fda1"
LARGE_SCALE_BASELINE_UUID="9df8255d-2038-42ed-869d-f748f671da07"
GSHEET_KEY_LOCATION=/root/gcp/gsheet
EMAIL_ID_FOR_RESULTS_SHEET="[email protected]"
We need to add support to the workloads to pass the location of the kubeconfig to access the cluster if not already supported and also document the option for all the workloads given that we now support running multiple clusters from the same jump host meaning that the kubeconfig will not be in the default location - $HOME/.kube/config.
Currently the upgrade test only captures the pass/fail status based on the cluster being able to upgrade or not. We need to capture metrics including the ones we monitor manually to determine the cluster stability and index them long term similar to cluster density runs to be able to analyze the state of the cluster during/after the upgrade. This is especially useful in CI runs.
Kube-burner binary can be leveraged to just call the indexing given a metrics profile to capture and index them in ES and eventually visualized in Grafana. We can use the same metrics aggregated profile that cluster density test uses to start with.
We need a CI since these scripts are becoming more critical.
Hi @rsevilla87 in my team we are planning to run these in the jenkins agent that runs as a pod on OpenShift. I have not previously tried to run docker/podman within an openshift pod. Do you happen to know a way to do so?
I believe this script was designed by keeping in mind that it would be ran from a jump host machine in the scale lab and not a container/pod.
Alternatively, I was thinking using kube-burner binary directly within the jenkins agent pod. Would you be able to accept that as a proposed change if that works for us.
Thanks.
When running the routerv2 test with oc versions above 1.19 the following error is thrown from this line of code.
Common.sh file:
log "Adding workload.py to the client pod"
oc set volumes -n http-scale-client deploy/http-scale-client --type=configmap --mount-path=/workload --configmap-name=workload --add
Error:
Error: unknown flag: --type
See 'kubectl set --help' for usage.
OC version that failed:
# ./oc_1_20 version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.0", GitCommit:"af46c47ce925f4c4ad5cc8d1fca46c7b77d13b38", GitTreeState:"clean", BuildDate:"2020-12-08T17:59:43Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.0+2f3101c", GitCommit:"2f3101cb663d0cb102ccb9730b63753604f6d29b", GitTreeState:"clean", BuildDate:"2021-02-26T13:55:24Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"linux/amd64"}
OC version that worked:
# oc version
Client Version: 4.6.0-202103060018.p0-aaa9ca3
Server Version: 4.6.21
Kubernetes Version: v1.19.0+2f3101c
workload and csv_gen use some uncommon pre-reqs. Provide and document an easy way to install them to avoid a nasty surprise with csv_gen failing after a long run.
When running the router v2 test with comparisons enabled there is a warning presented after all the tests are running stating that rysnc is not available in the container. It is unclear if this is actually a problem or not.
example error:
Indexing documents in router-test-results
10-09-2021T18:07:55 Sleeping for 6s before next test
10-09-2021T18:08:01 Enabling cluster version and ingress operators
deployment.apps/cluster-version-operator scaled
deployment.apps/ingress-operator scaled
WARNING: cannot use rsync: rsync not available in container
results.csv
10-09-2021T18:08:05 delete tuned profile for node labeled with node-role.kubernetes.io/workload
tuned.tuned.openshift.io "openshift-ingress-performance" deleted
10-09-2021T18:08:06 Deleting infrastructure
@mohit-sheth Should we use http://github.com/openshift-scale/workloads instead?
Error logs:
[2021-11-01, 20:59:19 EDT] {subprocess.py:89} INFO - touchstone_compare --database elasticsearch -url http://elastic:62cuyJA229jfFl604nUC54TV@perf-results-elastic.apps.keith-cluster.perfscale.devcluster.openshift.com:80 -u cee41274-53ea-45f2-adb9-5c9002695df9 --config /home/airflow/workspace/e2e-benchmarking/workloads/router-perf-v2/mb-touchstone.json -o csv --tolerancy-rules /home/airflow/workspace/e2e-benchmarking/workloads/router-perf-v2/mb-tolerancy-rules.yaml --output-file /home/airflow/workspace/e2e-benchmarking/workloads/router-perf-v2/ingress-performance.csv --rc 0
[2021-11-01, 20:59:19 EDT] {subprocess.py:89} INFO - 2021-11-01, 20:59:19 EDT - touchstone - CRITICAL - At least two uuids are required when tolerancy-rules flag is passed
The latest PR's merged into e2e and touchstone are causing this behaviour.
#244
cloud-bulldozer/benchmark-comparison#54
The problem lies in this if-else block in env.sh
so tolerance_rules is being set even if not explicitly set be user, a default is picked up.e2e-benchmarking/utils/compare.sh
Line 32 in e37b211
Possible solution is getting rid of this else block
I was using following benchmark in airflow for comparison:
{
"name": "host_network",
"workload": "network-perf",
"command": "./run_hostnetwork_network_test_fromgit.sh test_cloud",
"env": {
"COMPARE": "true",
"COMPARE_WITH_GOLD": "false",
"BASELINE_CLOUD_NAME": "aws",
"BASELINE_HOSTNET_UUID": "1057b072-ae18-5584-9937-bfec75f407e2",
"EMAIL_ID_FOR_RESULTS_SHEET": "[email protected]",
"GSHEET_KEY_LOCATION": "/tmp/key.json"
}
},
When doing so, I am not getting correct comparison as the if
statement here causes the flow to go into the else
part where es_server_baseline
is not respected. Ref:
e2e-benchmarking/utils/touchstone-compare/run_compare.sh
Lines 22 to 27 in 64574a0
I am not familiar with the comparing with the GOLD and I was asked to use Baseline by @mohit-sheth instead. I believe I am setting the variables correctly but the actual comparison is not being executed correctly.
touchstone_compare --database elasticsearch -url 'https://search-ocp-qe<redacted>.us-east-1.es.amazonaws.com:443' -u 1057b072-ae18-5584-9937-bfec75f407e2 -o yaml --config config/uperf.json --tolerancy-rules tolerancy-configs/uperf.yaml```
Although uperf script passed 2 UUIDs `1057b072-ae18-5584-9937-bfec75f407e2,6bb5d96e-c483-56ee-8859-586dc31cc547` only one was used.
Seems like there's an error while deleting the resources created by some of the kube-burner benchmarks.
12:54:03 Tue 25 Jan 2022 11:54:03 AM UTC Removing node-density=enabled label from worker nodes
12:54:03 node/ip-10-0-133-30.us-west-2.compute.internal labeled
12:54:03 namespace "15602773-ce4c-478f-9d4e-a91dc2d6a111" deleted
12:54:32 error: You must provide one or more resources by argument or filename.
12:54:32 Example resource specifications include:
12:54:32 '-f rsrc.yaml'
12:54:32 '--filename=rsrc.json'
12:54:32 '<resource> <name>'
12:54:32 '<resource>'
cc: @amitsagtani97
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.