openshift-qe / ocp-qe-perfscale-ci Goto Github PK

View Code? Open in Web Editor NEW

9.0 11.0 29.0 1.14 MB

OpenShift QE PerfScale CI

License: Apache License 2.0

Shell 7.24% Dockerfile 1.40% Python 91.36%

jenkins grafana pipeline multibranch-pipeline-jobs

ocp-qe-perfscale-ci's People

Contributors

Stargazers

Watchers

ocp-qe-perfscale-ci's Issues

Cerberus check fails with SSLEOFError

Looks like another issue we hit because of http_proxy/https_proxy failure or missing python packages.

Error reported:

10-20 15:44:58.629 2022-10-20 15:44:58,403 [ERROR] Failed to get the metrics: HTTPSConnectionPool(host='prometheus-k8s-openshift-monitoring.apps.scaleci12-20928.qe.devcluster.openshift.com', port=443): Max retries exceeded with url: /api/v1/query?query=ALERTS%7Balertname%3D%22etcdHighNumberOfLeaderChanges%22%2C+severity%3D%22warning%22%7D (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1129)')))

See failures for private cluster:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/scale-nightly-regression/486/

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cerberus/531/console

Noticed that it failed for non private cluster too:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/scale-nightly-regression/491/

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cerberus/549/console

Current wait_for_nodes_ready time is too short for large clusters.

ocp-qe-perfscale-ci/upgrade_scripts/check_upgrade.py

Line 77 in 702a3eb

def wait_for_nodes_ready(wait_num=60):

30 minutes of waiting time is not enough for large clusters (ex. 50 nodes - each need ~4 minutes to reboot on azure = more than 180 minutes).
Suggestion: Reduce wait to 15 minutes but add additional verification.
Store node name which has 'NotReady|SchedulingDisabled' status and in next iteration if the node will be different - reset back wait_num.
In this way, there is no time limit for large clusters during reboots, but if something will go wrong, then next step will be executed much earlier.

INFRA and WORKLOAD machinesets are scaled because the have no infra and workload label

On GCP I scaled workers to 3 and installed INFRA_WORKLOAD_INSTALL, Then scale cluster again to 120 nodes.
All machinesets are scaled.

oc get machinesets -A
NAMESPACE               NAME                      DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   infra-qili-gcp-kn95ma     15        15        15      15          50m
openshift-machine-api   infra-qili-gcp-kn95mb     15        15        11      11          50m
openshift-machine-api   infra-qili-gcp-kn95mc     15        15        1       1           50m
openshift-machine-api   qili-gcp-kn95m-worker-a   15        15        3       3           5h51m
openshift-machine-api   qili-gcp-kn95m-worker-b   15        15                            5h51m
openshift-machine-api   qili-gcp-kn95m-worker-c   15        15                            5h51m
openshift-machine-api   qili-gcp-kn95m-worker-f   15        15                            5h51m
openshift-machine-api   workload-qili-gcp-kn95m   15        15        1       1           50m

#147 fixed this issue and the fix worked on Azure.

I found there is no label of infra and workload on GCP machinesets

% oc get --no-headers machinesets -A --show-labels                                              
openshift-machine-api   infra-qili-gcp-kn95ma     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   infra-qili-gcp-kn95mb     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   infra-qili-gcp-kn95mc     1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-a   15    15    8     8     7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-b   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-c   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   qili-gcp-kn95m-worker-f   15    15                7h29m   machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m
openshift-machine-api   workload-qili-gcp-kn95m   1     1     1     1     147m    machine.openshift.io/cluster-api-cluster=qili-gcp-kn95m

So the fix in #147 gets all machinesets.

oc get --no-headers machinesets -A -l machine.openshift.io/cluster-api-machine-role!=infra,machine.openshift.io/cluster-api-machine-role!=workload | awk '{print $2}'
infra-qili-gcp-kn95ma
infra-qili-gcp-kn95mb
infra-qili-gcp-kn95mc
qili-gcp-kn95m-worker-a
qili-gcp-kn95m-worker-b
qili-gcp-kn95m-worker-c
qili-gcp-kn95m-worker-f
workload-qili-gcp-kn95m

Write to file broken with parenthesis

If the pod latency is not within 5s, the output passed to write-scale-ci-results contains paranethesis and is causing issues with the script currently. Need to add in an escape of those characters

cluster-post-config sometimes fails on dittybopper deployment

I've seen this happen many times I attempt to scale with the job

Example:

06-23 15:11:28.146  service/dittybopper created
06-23 15:11:28.399  route.route.openshift.io/dittybopper created
06-23 15:11:28.399  Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (containers "dittybopper", "dittybopper-syncer" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (containers "dittybopper", "dittybopper-syncer" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or containers "dittybopper", "dittybopper-syncer" must set securityContext.runAsNonRoot=true), seccompProfile (pod or containers "dittybopper", "dittybopper-syncer" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
06-23 15:11:28.399  deployment.apps/dittybopper created
06-23 15:11:28.655  configmap/sc-ocp-prom created
06-23 15:11:28.655  configmap/sc-grafana-config created
06-23 15:11:28.655  
06-23 15:11:28.655  Waiting for dittybopper deployment to be available...
06-23 15:12:36.285  error: timed out waiting for the condition on deployments/dittybopper

from https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-post-config/491/console

vsphere is not correctly recognized

From this job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-post-config/432/console, I saw vshpere is not recognized, so it must be using the the default monitoring-config.yaml - which had volumeClaimTemplate configure , so the monitoring pods tried to find a PVC which is not configured because we don't pass OPENSHIFT_PROMETHEUS_STORAGE_CLASS and OPENSHIFT_ALERTMANAGER_STORAGE_CLASS to the template as ENV for vsphere.

06-13 10:49:26.304  ++ find /home/jenkins/ws/workspace/h-pipeline_cluster-post-config_5/flexy-artifacts/workdir/install-dir/
06-13 10:49:26.304  ++ grep vsphere -c
06-13 10:49:26.304  + [[ 0 > 0 ]]
06-13 10:49:26.304  + envsubst
06-13 10:49:26.304  + oc apply -f -
06-13 10:49:29.561  configmap/cluster-monitoring-config created

This issue caused monitoring failed to move to infra machineset.

NAME                                 READY   AGE
statefulset.apps/alertmanager-main   0/2     147m
statefulset.apps/prometheus-k8s      0/2     147m
Describe the 2 statefulsets

Events:
  Type     Reason        Age                   From                    Message
  ----     ------        ----                  ----                    -------
  Warning  FailedCreate  11m (x39 over 148m)   statefulset-controller  create Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: failed to create PVC alertmanager-main-db-alertmanager-main-0: PersistentVolumeClaim "alertmanager-main-db-alertmanager-main-0" is invalid: spec.resources[storage]: Invalid value: "0": must be greater than zero
  Warning  FailedCreate  118s (x41 over 148m)  statefulset-controller  create Claim alertmanager-main-db-alertmanager-main-0 for Pod alertmanager-main-0 in StatefulSet alertmanager-main failed error: PersistentVolumeClaim "alertmanager-main-db-alertmanager-main-0" is invalid: spec.resources[storage]: Invalid value: "0": must be greater than zero

Events:
  Type     Reason        Age                    From                    Message
  ----     ------        ----                   ----                    -------
  Warning  FailedCreate  13m (x39 over 149m)    statefulset-controller  create Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: failed to create PVC prometheus-k8s-db-prometheus-k8s-0: PersistentVolumeClaim "prometheus-k8s-db-prometheus-k8s-0" is invalid: spec.resources[storage]: Invalid value: "0": must be greater than zero
  Warning  FailedCreate  3m10s (x41 over 149m)  statefulset-controller  create Claim prometheus-k8s-db-prometheus-k8s-0 for Pod prometheus-k8s-0 in StatefulSet prometheus-k8s failed error: PersistentVolumeClaim "prometheus-k8s-db-prometheus-k8s-0" is invalid: spec.resources[storage]: Invalid value: "0": must be greater than zero

Add optional dittybopper installation with argument types for cluster-post-config

Some teams using our workloads might have their own installation of dittybopper installed on there cluster. We need to make dittybopper installation optional
if we do install dittybopper, we need to be able to pass the github url and branch that we want to install from to have more flexibility

Allow passing LokiStack storageClassName

To run the netobserv-perf automation on non-AWS clouds (or on 4.12 where the gp2 storageClass no longer exists), we should let the user pass the storageClassName for the LokiStack CRD. It can default to gp2.

cc: @nathan-weinberg

During EUS upgrade, not waiting for worker mcp's to update

During EUS upgrade, the script properly patches the machine config pool to update. But doesn't wait for all the machines to properly update. Should be a step after the mcp patches

06-09 13:37:13.513  Kube-apiserver is done progressing
06-09 13:37:13.513  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Abnormal co details~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
06-09 13:37:13.513  
06-09 13:37:13.513  
06-09 13:37:15.463  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
06-09 13:37:15.464  
06-09 13:37:15.464  
06-09 13:37:15.464  post check passed without err.
06-09 13:37:15.464  
06-09 13:37:16.385  output machineconfigpool.machineconfiguration.openshift.io/worker patched
06-09 13:37:16.385  
06-09 13:37:28.561  [Pipeline] }
06-09 13:37:28.565  [Pipeline] // script
06-09 13:37:28.569  [Pipeline] script
06-09 13:37:28.571  [Pipeline] {

scale up should support machineset number other than 3

scale up only support 3 machinesets cluster.

ocp-qe-perfscale-ci/Jenkinsfile

Line 58 in ff917f0

scale_size=$(($1/3))

But for some cluster there is only one machineset, for example
Azure installed with private-templates/functionality-testing/aos-4_8/ipi-on-azure/versioned-installer-ovn template has only one machineset, when I scale it to 100 workers in jenkins job, it actually scales to 34.

% oc get --no-headers machinesets -A                         
openshift-machine-api   qili-48-zaure-rqzx4-worker-northcentralus   34    34    33    33    4h27m

Add in cluster check before benchmark runs

This is similar to #61 but to perform health check after cluster creation and before tests are executed.

Add in logging and must gathers for failed jobs

Similar to the Upgrade Ci we need to be able to run lots of jobs and log issues with out the cluster being around

It would be helpful to print off logs and maybe a must gather in certain cases to be able to properly open bugs

Some thoughts:

Add in describe or logs of machineset/node in case of cluster-workers-scaling failure
Adding in new must-gather branch to be called when failed scale-ci or upgrade job
Logs of failing pods in scale-ci jobs

Missing info about machinesets and nodes after failed scaling job

After failed scaling job cluster-workers-scaling There is no additional information about not ready machineset or nodes.
Need to add
oc describe machineset
and
oc describe node

Destroy issue

Currently there is an error that gets thrown when install fails and it goes and automatically calls destroy. Seems like the build number is not being found properly

02-22 15:24:15.611  java.lang.NumberFormatException: For input string: ""
02-22 15:24:15.611  	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
02-22 15:24:15.611  	at java.lang.Integer.parseInt(Integer.java:592)
02-22 15:24:15.611  	at java.lang.Integer.parseInt(Integer.java:615)
02-22 15:24:15.611  	at hudson.plugins.copyartifact.SpecificBuildSelector.getBuild(SpecificBuildSelector.java:70)
02-22 15:24:15.611  	at hudson.plugins.copyartifact.CopyArtifact.perform(CopyArtifact.java:454)
02-22 15:24:15.611  	at jenkins.tasks.SimpleBuildStep.perform(SimpleBuildStep.java:123)
02-22 15:24:15.611  	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:100)
02-22 15:24:15.611  	at org.jenkinsci.plugins.workflow.steps.CoreStep$Execution.run(CoreStep.java:70)
02-22 15:24:15.611  	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
02-22 15:24:15.611  	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
02-22 15:24:15.611  	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
02-22 15:24:15.611  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
02-22 15:24:15.611  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
02-22 15:24:15.611  	at java.lang.Thread.run(Thread.java:750)
02-22 15:24:15.622  Finished: FAILURE

cluster-workers-scaling job it's changing machineset of infra nodes instead of workers when scaling down after workload

Example build:
https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/kube-burner/298/

Flow:

kube-burner is starting new cluster-workers-scaling job to scale up the cluster.
cluster-workers-scaling job is running cluster-post-config to create infra nodes.
kube-burner is running workload
kube-burner is starting new cluster-workers-scaling job to scale down the cluster.

When my cluster name is ex. skordas - then first machineset on list is infra node

 NAMESPACE               NAME                                   DESIRED   CURRENT   READY   AVAILABLE   AGE
 openshift-machine-api   infra-us-east-2a                       3         0                             22m
 openshift-machine-api   infra-us-east-2b                       0         0                             22m
 openshift-machine-api   infra-us-east-2c                       0         0                             22m
 openshift-machine-api   skordas-511b-fjt5x-worker-us-east-2a   0         40        40      40          3h40m
 openshift-machine-api   skordas-511b-fjt5x-worker-us-east-2b   0         40        40      40          3h40m
 openshift-machine-api   skordas-511b-fjt5x-worker-us-east-2c   0         40        40      40          3h40m
 openshift-machine-api   workload-us-east-2a                    0         1         1       1           22m

so running build to scale down to 3 nodes will set 3 nodes for infra machineset, and rest will be 0

Results sheets should specify timezone

Currently we write timestamp to Scale CI/Upgrade sheet and Chron regression output but we do not specify which timezone the tests ran. This can be confusing at times.

Network-perf tests rewritten and now failing

@mffiedler It looks like the network-perf tests were rewritten to all use the run.sh script with different env variables setting which test to run.

See run in jenkins for failure....e2e-benchmarking-multibranch-pipeline/job/network-perf-pod-network-test/154/console

Have you ran/seen this? I am not super familiar with these tests so wanted to check with you on if the below sounds correct
I'm guessing the following:
WORKLOAD=pod2pod is for branch network-perf-pod-network-test

WORKLOAD=hostnet is for network-perf-hostnetwork-network-test

WORKLOAD= pod2svc is for network-perf-serviceip-network-test

I see a couple more options in the documentation with setting network policy to true. Should we add that in as a parameter for each of the tests to cover each of the scenarios listed?

write-scale-ci-results incorrectly records spreadsheet link

Links to independent spreadsheets in the Scale CI/Upgrade Results spreadsheet tend to take a format similar to the following:

https://docs.google.com/spreadsheets/d/14y9JA__itZptyC5w7nFBX6d8IRZb37Zz9I1kEZckQkQ ***************

The link itself is actually correct - I think this might stem from how the result is being parsed from the Jenkins log.

Prow jobs sheet rows for rosa and rosa_hcp appear the same

In the "generate jobs in gsheet" script get_periodic_jobs.py, rows for both "rosa" and "rosa_hcp" appear to be the same.

Should probably have those that are "rosa_hcp" have "Cloud Type" = "rosa_hcp", or something similar.

Storage Perf Test taken out of e2e-benchmark

There is no longer a storage-perf folder under workloads in e2e-benchmark, need to validate it was not moved. If it was taken out we should remove the storage-perf branch

..../scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/storage-perf/22/console

https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads

kube-burner job creates empty speadsheets on failure

I've seen several cases of the kube-burner job creating a new Gsheet for runs, even when there is no data populated within them. Not an urgent issue but it seems wasteful.

Example case: https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/kube-burner/586
The spreadsheet: https://docs.google.com/spreadsheets/d/16A-DNhYSuTmr_QnjbW2gEkd8T8W1_rwF4Mwha8ZYZMk/edit?usp=sharing

Kube-burner cleaning should go before scaling down

When 120 node cluster is loaded with projects - then scaling down this cluster to 3 nodes with the same number of pods can hit cluster maximum.
My proposition is moving benchmark-cleaner before cluster-workers-scaling

Add in custom workload type to kube-burner branch

Some users might want to have a specific number of namespaces and a set number of pods per namespace. Would be helpful to add in an option to pass your own custom kube-burner config file

Already set up in e2e-benchmark, just need to add the possiblity into jenkins
https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#launching-custom-workloads

Add in cluster check after benchmark runs

Want to add in an extra check after the certain benchmarks are run to verify the cluster is in a decent state

Many of the runs I've seen recently, the benchmark finishes but some of the nodes go notready. In this case I feel we should fail the test but currently it is passing

Azure INFRA and WORKLOAD machine creation failed

https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/scale-ci/job/e2e-benchmarking-multibranch-pipeline/job/cluster-workers-scaling/708/console

% oc get machineset -A
NAMESPACE               NAME                                               DESIRED   CURRENT   READY   AVAILABLE   AGE
openshift-machine-api   infra-northcentralus2                              1         1                             3h5m
openshift-machine-api   infra-northcentralus3                              1         1                             3h5m
openshift-machine-api   infra-qili-preserve-az0516-sr44j1                  1         1                             3h5m
openshift-machine-api   qili-preserve-az0516-sr44j-worker-northcentralus   3         3         3       3           3h55m
openshift-machine-api   workload-qili-preserve-az0516-sr44j                1         1                             3h5m

% oc get machines -A | grep infra
openshift-machine-api   infra-northcentralus2-82z2h                              Failed                                              3h8m
openshift-machine-api   infra-northcentralus3-2klrt                              Failed                                              3h8m
openshift-machine-api   infra-qili-preserve-az0516-sr44j1-lbhxq                  Failed                                              3h8m

Describing the machine, machine creation failed for Please make sure that the referenced resource exists, and that both resources are in the same region

  Error Message:           failed to reconcile machine "infra-northcentralus2-82z2h": network.InterfacesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidResourceReference" Message="Resource /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/qili-preserve-az0516-sr44j-rg/providers/Microsoft.Network/virtualNetworks/qili-preserve-az0516-sr44j-vnet/subnets/qili-preserve-az0516-sr44j-worker-subnet referenced by resource /subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/qili-preserve-az0516-sr44j-rg/providers/Microsoft.Network/networkInterfaces/infra-northcentralus2-82z2h-nic was not found. Please make sure that the referenced resource exists, and that both resources are in the same region." Details=[]

Check the infra machineset yaml, location is centralus.

% oc get machinesets/infra-northcentralus2 -n openshift-machine-api -o yaml
...
spec:
  replicas: 1
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qili-preserve-az0516-sr44j
      machine.openshift.io/cluster-api-machineset: infra-northcentralus2
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: qili-preserve-az0516-sr44j
        machine.openshift.io/cluster-api-machine-role: infra
        machine.openshift.io/cluster-api-machine-type: infra
        machine.openshift.io/cluster-api-machineset: infra-northcentralus2
    spec:
      lifecycleHooks: {}
      metadata:
        labels:
          node-role.kubernetes.io/infra: ""
      providerSpec:
        value:
          apiVersion: azureproviderconfig.openshift.io/v1beta1
          credentialsSecret:
            name: azure-cloud-credentials
            namespace: openshift-machine-api
          image:
            offer: ""
            publisher: ""
            resourceID: /resourceGroups/qili-preserve-az0516-sr44j-rg/providers/Microsoft.Compute/images/qili-preserve-az0516-sr44j
            sku: ""
            version: ""
          kind: AzureMachineProviderSpec
          location: centralus
          managedIdentity: qili-preserve-az0516-sr44j-identity
          metadata:
            creationTimestamp: null
          osDisk:
            diskSettings: {}
            diskSizeGB: 128
            managedDisk:
              storageAccountType: Premium_LRS
            osType: Linux
          publicIP: false
          resourceGroup: qili-preserve-az0516-sr44j-rg
          subnet: qili-preserve-az0516-sr44j-worker-subnet
          userDataSecret:
            name: worker-user-data
          vmSize: Standard_D48s_v3
          vnet: qili-preserve-az0516-sr44j-vnet
          zone: "2"

Checking code

ocp-qe-perfscale-ci/Jenkinsfile

Line 297 in 8f3eb87

    
                         export AZURE_LOCATION=$(oc get machineset -n openshift-machine-api -o=go-template='{{(index .items 0).spec.template.spec.providerSpec.value.location}}')

But the worker node machinesets is actually on 'northcentralus'

 % oc get machineset/qili-preserve-az0516-sr44j-worker-northcentralus -n openshift-machine-api -o yaml
...
spec:
  replicas: 3
  selector:
    matchLabels:
      machine.openshift.io/cluster-api-cluster: qili-preserve-az0516-sr44j
      machine.openshift.io/cluster-api-machineset: qili-preserve-az0516-sr44j-worker-northcentralus
  template:
    metadata:
      labels:
        machine.openshift.io/cluster-api-cluster: qili-preserve-az0516-sr44j
        machine.openshift.io/cluster-api-machine-role: worker
        machine.openshift.io/cluster-api-machine-type: worker
        machine.openshift.io/cluster-api-machineset: qili-preserve-az0516-sr44j-worker-northcentralus
    spec:
      lifecycleHooks: {}
      metadata: {}
      providerSpec:
        value:
          acceleratedNetworking: true
          apiVersion: machine.openshift.io/v1beta1
          credentialsSecret:
            name: azure-cloud-credentials
            namespace: openshift-machine-api
          image:
            offer: ""
            publisher: ""
            resourceID: /resourceGroups/qili-preserve-az0516-sr44j-rg/providers/Microsoft.Compute/images/qili-preserve-az0516-sr44j-gen2
            sku: ""
            version: ""
          kind: AzureMachineProviderSpec
          location: northcentralus

GCP cluster post config, network not found

While trying to add infra and workload nodes to my GCP cluster, I am hitting an issue that the name of the network does not have the end random numbers and letters that my cluster has/it's looking for

On this line:
export NETWORK_NAME=$(gcloud compute networks list | grep $CLUSTER_NAME | awk '{print $1}')

Cluster name: <shortened_cluster_name>-5snsb

network found is only the below so it is not properly setting it.
<shortened_cluster_name>-network CUSTOM REGIONAL

Working on some sort of work around. Not sure if it's a length thing but I am a second cluster that has matching cluster name and network name and would work fine.

Need to add alibaba and ibm cloud options to cluster-post-config

We need to be able to add workload and infra nodes to clusters created on alibaba and ibm cloud types

We will need to add a new yaml files for infra and workload nodes in the cluster-post-config

We will also need to add a call from the cluster-workers-scaling branch for each of these to pass the proper parameters

openshift-qe / ocp-qe-perfscale-ci Goto Github PK

ocp-qe-perfscale-ci's People

Contributors

Stargazers

Watchers

Forkers

ocp-qe-perfscale-ci's Issues

Recommend Projects

Recommend Topics

Recommend Org