openshift / svt Goto Github PK

License: Apache License 2.0

Shell 59.21% Python 36.22% DIGITAL Command Language 0.13% 1C Enterprise 0.22% Go 0.28% Awk 0.65% Makefile 0.03% Groovy 0.11% Dockerfile 1.96% Jinja 1.20%

svt's Introduction

OpenShift, Kubernetes and Docker: Performance, Scalability and Capacity Planning Research by Red Hat

OpenShift v3 Scaling, Performance and Capacity Planning Whitepaper

This repository details the approach, process and procedures used by engineering teams at Red Hat to analyze and improve the performance and scalability of integrated platform and infrastructure stacks. It shares results, best practices and reference architectures for the Kubernetes and docker-based OpenShift v3 Platform-as-a-Service, as well as the Red Hat Atomic technologies.

Unsurprisingly, performance analysis and tuning in the container and container-orchestration space has tremendous overlap with previous generation approaches to distributed computing. Performance still boils down to identifying and resolving bottlenecks, data- and compute-locality, and applying best-practices to software scale-out design hard-won over decades of grid- and high-performance computing research.

Further tests quantify application performance when running in a container hosted by OpenShift, as well as measure reliability over time, searching for things like memory leaks.

IMPORTANT

While the tests in this repository are used by the Red Hat team to measure performance, these are not supported in any OpenShift environments, and Red Hat support services cannot provide assistance with any problems.

How this repository is organized:

The hierarchy of this repository is as follows:

.
├── application_performance:  JMeter-based performance testing of applications hosted on OpenShift.
├── applications_scalability:  Performance and scalability testing of the OpenShift web UI.
├── conformance: Wrappers to run a subset of e2e/conformance tests in an SVT environment (work in progress)
├── docs:  Documentation which can help with SVT testing.
├── image_provisioner:  Ansible playbooks for building AMI and qcow2 images with OpenShift rpms and Docker images baked in.
├── networking: Performance tests for the OpenShift SDN and kube-proxy.
├── openshift_performance:  Performance tests for container build parallelism, projects and persistent storage (EBS, Ceph, Gluster and NFS)
├── openshift_scalability: Home of the infamous "cluster-loader", details in openshift_scalability/README.md
└── reliability: Run tests over long periods of time (weeks), cycle object quantity up and down.

Dockerfiles and Dependencies

Certain tests use the Quickstarts from the openshift/origin repository. Ensure that they are available in your openshift project environment before using any of the tests:

https://github.com/openshift/origin/tree/master/examples/quickstarts

Ensure also that the requisite image streams are available in your openshift project environment:

https://github.com/openshift/origin/tree/master/examples/image-streams

Also, for disconnected installations, the following Dockerfiles should be located somewhere in the target installation machines:

./reliability/Dockerfile
./openshift_scalability/content/centos-stress/Dockerfile
./openshift_scalability/content/logtest/Dockerfile
./dockerfiles/Dockerfile
./storage/fio/Dockerfile
./networking/synthetic/stac-s2i-builder-image/Dockerfile
./networking/synthetic/uperf/Dockerfile

Feedback and Issues

Feedback, issues and pull requests will be happily accepted. Feel free to submit any pull requests or issues.

svt's People

Contributors

Stargazers

Watchers

Forkers

jeremyeder ofthecurerh richm ekuric mffiedler hroyrh vikaslaad schituku timothysc jmencak sjug wabouhamad sabre1041 kwoodson vikaschoudhary16 syvanen ravisantoshgudimetla chaitanyaenr hashnao scollier shekharberry pmarhatha arcolife dnguyen9 mpduarte sgml codificat jkhelil hongkailiu wanghaoran1988 jhadvig bfallonf thomasksh hchenxa mbruzek ianmiell dhwanilraval jky88 maxamillion zvonkok juzhao alexxnica kryndex ranithareddy openstacks orgadk jruariveiro qakart vishalpatel1587 kramdoss akrzos parthpachchigar mrsiano kimettog epasham ilpinto skordas guychen11 iamniting glowdb kasturinarra deepthidharwar bumplzz69 binnyrs jtaleric ve156354 jrt2030 wupeiran smalleni fit2cloudrd pruan-rht luis5tb openshift-scale rh-ematysek anpingli rflorenc keesturam melghub pnasrat rpattath paigerube14 wewang58 cloudinfraz naturalett global-localhost global19 global19-atlassian-net alishaibm isabella232 qiliredhat mnemonicgraf shrivaibavi afcollins miiraheart lm0943111262 sachinninganure lenahorsley svetsa-rh davidboggess chadcrum

svt's Issues

Enable cluster-loader to support nfs and gluster

Copied from old repo

extend cluster-loader to support NFS and Gluster storage backeds #92 is extension for ebs/ceph, necessary to implement same / similar for nfs and gluster storages

Evaluate current test run parameters

Now that there is 2 releases worth of data, experience and lessons-learned using cluster-loader for vertical and horizontal testing, we should update/modify the tests as appropriate. Possible areas:

project composition
new scale targets from online, if available
tradeoffs between test execution time and success - don't overwhelm the cluster.
template contents
other?

cc: @timothysc @jeremyeder

Instructions for running network tests do not work if pods are not reachable from hosts network

Instructions in readme file are assuming that pods are always reachable from hosts/nodes network namespace. This is one specific network deployment case with openshift-sdn. There could be deployments where master host does not have connectivity to pods network. Instructions should be covering more details on how to run network tests in such cases.

identify process match patterns for logging

@rflorenc can you please identify the logging processes (fluentd, elastic, whatever else) and add them to our collectd config file. See this PR for an example:

b692506

Cleaning policy

Copied from old repo

Currently the cluster loader tries to do step-wise cleaning of the cluster, but I would argue that this is not a realistic use case. Users may make changes independently, but they would often happen in an uncontrolled / random fashion. On the other hand, operators are likely to do bulk cleaning operations. A case in point would be removal after the 30 day evaluation period.

The purpose of this issue is to hash out the concrete use cases so we can have the cluster-loader reflect them appropriately.

Kitchen Sink: Mark all json templates triggers in quickstarts to automatic: true

The automatic: false flag in the triggers of the templates for deployments was not working till 3.4 and hence when the cluster loader creates app as part of the kitchen sink tests rolled out the deployments which are marked as false. Now that it is fixed it is no longer deploying apps automatically and one has to manually deploy the services thru "oc rollout latest" after the cluster loader script finishes.
Change all flags to automatic: true so that deployments happen automatically and no manual intervention is needed.

https://github.com/openshift/origin/blob/master/UPGRADE.md#origin-14x--ose-34x

Image provisioner: update base images to RHEL 7.3

Update base VM and AMI to use RHEL 7.3

Network tests broken on 3.4

This is due to #136 and should be fixed similar to #147

aws-cli wrapper

For billing purposes, we need to enforce some uniformity around creating AWS instances. To do this, I propose wrapping the AWS CLI utility with our own script that we'll carry in svt.

A typical invocation is:

$ aws ec2 run-instances --image-id $IMAGE_ID --count $COUNT --instance-type $INSTANCE_TYPE --key-name $SSH_KEY_NAME --subnet $SUBNET --security-group-ids $SGID

The script capabilities should allow for customization of:

Image ID, but it should default to our latest. Default is latest AMI, and we will update the script in svt whenever the AMI is updated. This way everyone always uses the latest unless the choose otherwise.
Count, instance count is variable of course. No default, fail if not specified.
Instance Type, no default, fail if not specified.
key-name should be hard-coded
subnet should be hard-coded
security group should be hard-coded
disk image size, I think we want to offer this
storage type, gp2/io1

Instance tags:

group, for billing purposes we'd like to track which group created the instance (right now this would be qe or perf). No default, fail if not specified
username, the aws account username who created the instance

Few other thoughts:

"hard-coded" could mean it comes from a config file, too I guess...up to you.
If anyone ever needs to customize things the script doesn't provide, they can always use the CLI tool manually.

Anything I missed?

/cc @sjug

Need to make oc process -v/-p flag version dependent

cluster-loader is temporarily broken on older versions of openshift due to use of the -p flag instead of -v for oc process. Need to do a version check and version the usage of the flag.

Image provisioner: set chrony configuration in gold image

set NTP server to clock.corp.redhat.com

customer feedback on network test

Comments from customer using the svt/networking tests:

I also tried to run Jeremy’s network benchmarck (svt), but we faced some difficulties:

uperf is automatically packaged inside docker images by the ansible playbook, but the docker images are built on the nodes themselves. And it failed because, on our R-Box setup, the nodes don’t have access to Internet. So, I had to prepare the docker image elsewhere, to push it to our own docker hub, and to comment the docker image creation in the ansible playbook.
The tests are requiring “pbench [to be] installed and configured on all hosts”. This requires to install a new repo. And this was made difficult because our nodes don’t have access to Internet and the repositories to add are not the “standard” rhel-7 ones that we are already mirroring on our Artifactory (which is the only source of rpm which is accessible from our nodes)
The “Patch pbench-uperf” task (here: https://github.com/openshift/svt/blob/master/networking/synthetic/pod-ip-test-setup.yaml#L93) was failing. I looked at the file to be patched and couldn’t find something similar to the hunks of the patch, so, I tried to just skip that task. When the “Run pbench-uperf for TCP benchmarks” task was finally launched, it never ended. I looked on the master where the “pbench-uperf” was launched and it appeared that it tried to contact IPs on the overlay network. The command that was launched was “ python network-test.py podIP --master --node --pods 8 ” in order to do the measure between two PODs, but it seems that even this scenario requires the master to be able to reach the overlay network. This was unfortunately not the case on our current R-Box setup where the masters are not non-schedulable nodes yet.

@vikaschoudhary16 @sjug

Add support for using an existing namespace

cluster-loader.py creates clusterproject{0..N} by default.
It would be very useful to have it use an existing project namespace (i.e.: logging / metrics) or a set of projects by default.

@hroyrh

Add a window-scaling functionality to cluster loader.

While the current stepsize/pause/delay options in the tuningsets sections of cluster-loader templates can be useful to rate-limit and mitigate issues like:

Error syncing pod, skipping: failed to "StartContainer" for "POD"  with RunContainerError: "runContainer: operation timeout: context deadline exceeded"

I've found that just by keeping only a certain number of pods outside of Running state seems to mitigates this issue automatically without estimating the tuningsets values to avoid this issue.

Therefore I propose to add a "queue-depth" field/functionality in the tuningsets section for cluster-loader templates, which would block the creation of new pods if the queue depth is greater than a given number.

Adding a link to a thread:
http://post-office.corp.redhat.com/archives/openshift-sme/2016-November/msg00301.html

@jeremyeder @sjug

Docker build issue with Network tests

Network tests do not work anymore on master node, docker build is failing because of Openshift firewall. Need to make tests run from a separate host.

Implement ec2 exponential back-off for cluster-loader

From old repo:

ekuric commented on Apr 5
Amazon API will reject to serve if api rate is overcome,
necessary to implement this in all functions which interact with amazon api
-> http://docs.aws.amazon.com/general/latest/gr/api-retries.html
@ofthecurerh

ofthecurerh commented on Apr 12
The boto3 library handles exponential backoff and retries, we just need to properly handle the exception.

Example snippet:

def create_volume(self, availability_zone, *_kwargs):
try:
volume = self.resource.create_volume(
DryRun = False,
AvailabilityZone = availability_zone,
*_kwargs)
except botocore.exceptions.ClientError as err:
logging.warn('Unexpected Error: %s', err.response['Error']['Code'])
else:
return volume
@ekuric

ekuric commented on Apr 14
@ofthecurerh thx, I will check to add this to create/delete scripts involving boto3.
@ekuric

ekuric commented 28 days ago
I have this working in my tests , will create PR for this PR

cluster_loader: template cleaning currently broken

From old repo:

The clean_templates() method has gotten into a bad state somehow. globalvars is not passed to it and there are other undefined varables (e.g templatefile). I have disabled template cleaning until this can be addressed.

Make quota section optional in cluster loader yaml

Rob recommends a change to cluster-loader that it inherit all quota and resource limit config from the templates or pods, rather than the existing default. Optional default.

See: https://trello.com/c/YruaPMBA/159-cluster-loader-quota-support

Cluster loader: EBS volume tagging is not working

Specified a value for ebstagprefix in the cluster_loader yaml file, but the volumes which were created were not tagged in EBS. Will investigate further.

/cc: @ekuric

Cleanup/Refine where image provisioner picks up each repo

Make sure all OpenShift/rhel-next/dockernext repos are coming from aos-ansible.

Move centos-stress dockerfile for WLG to SVT repo

/cc: @jmencak @sjug

Provisioner: Collectd start failed

During the collectd-install task, collectd is unable to start due to invalid configuration. Requires investigation.

Cluster loader: storage support creates PVCs in default namespace

Something may have changed in OCP since storage support was added to cluster_loader, but currently the tool creates the PVCs in the default namespaces.

Adding a namespace to the pvc template and giving it the value of the current namespace where the pod is being created fixes the problem. Preparing a PR for this.

/cc: @ekuric

Add description of what "svt" means in the README.md

Can we consider adding some verbage about what "svt" means in the README.md? Or at least use GitHub's features for description of the repo to bring some clarity?

Provisioner: pbench result host replacement fails

The string for the pbench results server in the example config file is changed. The pbench agent config step which uses a regex to replace needs updating.

cluster-loader.py fails when no eth0

The script expects there to be a eth0 interface, but this isnt always the case, a specially with > RHEL7

https://github.com/openshift/svt/blob/master/openshift_scalability/utils.py#L59

cluster-loader.py expects only 1 router

The script only expects 1 router to be deployed, which if there is more than one, causes an error.

https://github.com/openshift/svt/blob/master/openshift_scalability/utils.py#L60-L61

cluster-loader: pv/pvc support for RCs and templates

From old repo:

The PR #92 adds PV support for pods, for EBS and Ceph storage types. But we still need the support for ose object types rcs and templates.

masterVertical.sh and pyconfigMasterVirtScale.yaml behaviour

@timothysc
I've run ./masterVertical.sh in svt and got this:

In project clusterproject3 on server https://ip-172-31-41-250.us-west-2.compute.internal:8443

route/route0 not accepted: HostAlreadyClaimed (svc/service0)
  dc/deploymentconfig0 deploys docker.io/openshift/hello-openshift:latest 
    deployment #1 deployed about an hour ago - 1 pod

route/route1 not accepted: HostAlreadyClaimed (svc/service1)
  dc/deploymentconfig1 deploys docker.io/openshift/hello-openshift:latest 
    deployment #1 deployed about an hour ago - 1 pod

svc/service2v0 - 172.30.252.181:80 -> 8080
  dc/deploymentconfig2v0 deploys docker.io/openshift/hello-openshift:latest 
    deployment #1 deployed about an hour ago - 2 pods

bc/buildconfig0 source builds git://github.com/tiwillia/hello-openshift-example.git on istag/imagestream0:latest (from bc/buildconfig0)
  -> istag/imagestream0:latest
  build #1 failed 59 minutes ago

bc/buildconfig1 source builds git://github.com/tiwillia/hello-openshift-example.git on istag/imagestream1:latest (from bc/buildconfig1)
  -> istag/imagestream1:latest
  build/build1-1 failed about an hour ago (can't push to image)

bc/buildconfig2 source builds git://github.com/tiwillia/hello-openshift-example.git on istag/imagestream2:latest (from bc/buildconfig2)
  -> istag/imagestream2:latest
  build/build2-1 failed about an hour ago (can't push to image)

Errors:
  * bc/buildconfig1 is pushing to istag/imagestream1:latest, but the image stream for that tag does not exist.
  * bc/buildconfig2 is pushing to istag/imagestream2:latest, but the image stream for that tag does not exist.
  * build/build1-1 has failed.
  * build/build2-1 has failed.
  * route/route0 was not accepted by router "router": a route in another namespace holds www.example0.com and is older than route0 (HostAlreadyClaimed)
  * route/route1 was not accepted by router "router": a route in another namespace holds www.example1.com and is older than route1 (HostAlreadyClaimed)
  * route/route2 was not accepted by router "router": a route in another namespace holds www.example2.com and is older than route2 (HostAlreadyClaimed)

7 errors and 6 warnings identified, use 'oc status -v' to see details.

Tested on two clusters, same results. Not sure this is the intended behaviour.

Bring network tests up to date with OCP, Ansible, pbench, etc

Issue #87 - use the Ansible 2.0 API
remove pbench-uperf patching
update playbooks for latest OCP template processing syntax
respect new pbench-uperf port range
investigate svc-to-svc problem
investigate output data issue (possible pbench-uperf bug)
require user to explicitly provide public ssh key for pods to use
??? more to be added as discovered

add support for ebs dynamic storage allocations

From the old repo

Openshift support dynamic storage allocation, we have now support for EBS in cluster loader but it does below create steps
ebs -> pv -> pvc

with dynamic storage allocation this is reduced only to pvc step
@ekuric

ekuric commented 14 days ago
I will send PR for #111
@ofthecurerh

ofthecurerh commented 14 days ago • edited
Is a PR required? This should already work by using a template.

Edit: Link to example: https://github.com/ofthecurerh/svt/blob/master-vert/openshift_scalability/content/quickstarts/cakephp-mysql.json#L113
@ekuric

ekuric commented 14 days ago • edited
@ofthecurerh we still need to update pvc.json dynamically with pvclaim name, ie. still need to create pvc., so will be removing ebc/pv steps, but , unless I am wrong, pvc step needs to stay
@ofthecurerh

ofthecurerh commented 14 days ago
@ekuric I'm saying that instead of creating a separate function to handle pvc, we should be using the already defined template function. What you're doing in the ebs_create create function to replace the values in the pvc json is what templates are designed to do.
@ekuric

ekuric commented 14 days ago • edited
@ofthecurerh show me code,... what template function you mean , or link to it?
@ofthecurerh

ofthecurerh commented 14 days ago
@ekuric The cluster-loader can process/create templates, its nearly equivalent to oc process -f template.json -v SOME_VAR=foo | oc create -f -.

This is the cluster-loader config I'm using: https://github.com/ofthecurerh/svt/blob/master-vert/openshift_scalability/config/master-vert.yaml

And here's an example of a template: https://github.com/ofthecurerh/svt/blob/master-vert/openshift_scalability/content/quickstarts/cakephp-mysql.json

content/namespace-default.yaml deleted at some point - breaks kubernetes-only mode

Apparently at some point content/namespace-default.yaml was removed in a commit. kubernetes-only mode is broken for namespace creation.

Some pods end-up in Error state after using some quickstart templates with OCP 3.4

All quickstart templates work fine on 3.2.x and 3.3.x, however

$ oc version
oc v3.4.0.9
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-52-249.us-west-2.compute.internal:8443
openshift v3.4.0.9
kubernetes v1.4.0+776c994

$ oc get pods --all-namespaces
NAMESPACE            NAME                                  READY     STATUS      RESTARTS   AGE
cake                 cakephp-mysql-example-1-9a53a         1/1       Running     0          1h
cake                 cakephp-mysql-example-1-build         0/1       Completed   0          1h
cake                 mysql-1-vwo17                         1/1       Running     0          1h
cakephp-mysql0       cakephp-mysql-example-1-build         0/1       Completed   0          1h
cakephp-mysql0       cakephp-mysql-example-1-deploy        1/1       Running     0          1h
cakephp-mysql0       cakephp-mysql-example-1-hook-pre      0/1       Error       26         1h
dancer-mysql0        dancer-mysql-example-1-build          0/1       Completed   0          1h
dancer-mysql0        dancer-mysql-example-1-deploy         0/1       Error       0          1h
default              docker-registry-2-cozxh               1/1       Running     3          1d
default              registry-console-1-dpsxr              1/1       Running     3          1d
default              router-1-9cm9t                        1/1       Running     3          1d
django-postgresql0   django-psql-example-1-build           0/1       Completed   0          1h
django-postgresql0   django-psql-example-1-deploy          0/1       Error       0          1h
eap64-mysql0         eap-app-1-7pgrz                       1/1       Running     0          1h
eap64-mysql0         eap-app-1-build                       0/1       Completed   0          1h
eap64-mysql0         eap-app-mysql-1-pnaio                 1/1       Running     0          1h
nodejs-mongodb0      nodejs-mongodb-example-1-build        0/1       Completed   0          1h
nodejs-mongodb0      nodejs-mongodb-example-1-yizup        1/1       Running     0          1h
rails-postgresql0    rails-postgresql-example-1-build      0/1       Completed   0          1h
rails-postgresql0    rails-postgresql-example-1-deploy     0/1       Error       0          1h
rails-postgresql0    rails-postgresql-example-1-hook-pre   0/1       Error       0          1h
tomcat8-mongodb0     jws-app-1-5juwl                       1/1       Running     0          1h
tomcat8-mongodb0     jws-app-1-build                       0/1       Completed   0          1h
tomcat8-mongodb0     jws-app-mongodb-1-96aq0               1/1       Running     0          1h

$ oc describe pod cakephp-mysql-example-1-hook-pre -n cakephp-mysql0 
...
  FirstSeen LastSeen    Count   From                            SubobjectPath           Type        Reason      Message
  --------- --------    -----   ----                            -------------           --------    ------      -------
  1h        1h      1   {default-scheduler }                                    Normal      Scheduled   Successfully assigned cakephp-mysql-example-1-hook-pre to ip-172-31-52-249.us-west-2.compute.internal
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Pulling     pulling image "172.30.142.13:5000/cakephp-mysql0/cakephp-mysql-example@sha256:d4f00ca474587c79d94962d98a36b501bc498c066b982cbbf28d477c3f5562c9"
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Pulled      Successfully pulled image "172.30.142.13:5000/cakephp-mysql0/cakephp-mysql-example@sha256:d4f00ca474587c79d94962d98a36b501bc498c066b982cbbf28d477c3f5562c9"
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Created     Created container with docker id ae745fb761d9; Security:[seccomp=unconfined]
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Started     Started container with docker id ae745fb761d9
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Created     Created container with docker id 2631426ece46; Security:[seccomp=unconfined]
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal      Started     Started container with docker id 2631426ece46
  1h        1h      1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning     FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 10s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id f0ff58cb4e37; Security:[seccomp=unconfined]
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id f0ff58cb4e37
  1h    1h  2   {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 20s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id 3b4d3d4c809c; Security:[seccomp=unconfined]
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id 3b4d3d4c809c
  1h    1h  4   {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 40s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id 2b51069e4451; Security:[seccomp=unconfined]
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id 2b51069e4451
  1h    1h  7   {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 1m20s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id 9af9c059d42e
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id 9af9c059d42e; Security:[seccomp=unconfined]
  1h    1h  13  {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id aad9452568ca; Security:[seccomp=unconfined]
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id aad9452568ca
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id e67f62813644
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id e67f62813644; Security:[seccomp=unconfined]
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     Started container with docker id b20cded25344
  1h    1h  1   {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     Created container with docker id b20cded25344; Security:[seccomp=unconfined]
  1h    4m  27  {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Pulled      Container image "172.30.142.13:5000/cakephp-mysql0/cakephp-mysql-example@sha256:d4f00ca474587c79d94962d98a36b501bc498c066b982cbbf28d477c3f5562c9" already present on machine
  1h    4m  19  {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Started     (events with common reason combined)
  1h    4m  19  {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Normal  Created     (events with common reason combined)
  1h    7s  503 {kubelet ip-172-31-52-249.us-west-2.compute.internal}                   Warning FailedSync  Error syncing pod, skipping: failed to "StartContainer" for "lifecycle" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=lifecycle pod=cakephp-mysql-example-1-hook-pre_cakephp-mysql0(5810a073-913e-11e6-84c9-02579073c581)"

  1h    7s  530 {kubelet ip-172-31-52-249.us-west-2.compute.internal}   spec.containers{lifecycle}  Warning BackOff Back-off restarting failed docker container

'-hook-pre' seems to be the common denominator of the failed pods. Looking into templates now.

cluster-loader users not visible on oc get

Copied from old repository

Users created by cluster-loader are not visible when we do "oc get users". The reason being that the oauth tokens are only generated when a particular user logs in. Need to add another step to user creation where all the users log in after creation.
@hroyrh

hroyrh commented 23 days ago
@ofthecurerh @jeremyeder @timothysc @mffiedler I would like to hear you thoughts on whether we should use "oc login" and then create all the objects for that user, or instead use the "--token" parameter to assign each object to a user. Implementation wise, the latter option will be easier as we won't need to change much. The reason I am considering "oc login" option is because, in my opinion, it is more close to realistic customer setups , and so I thought that it might be useful from the perspective of running a test. Anyways please share your thoughts on which option we should go for, or should we go for both.
@jeremyeder

jeremyeder commented 23 days ago
Normally I do favor doing things exactly how the product is intended to be
used by customers to get the most coverage. But I think we should go with
--token. I think it's close enough, and we're testing the login path as
well as user density/objects this way as well.
…
@timothysc

timothysc commented 23 days ago
Just for completeness I would vote doing oc login unless that breaks too many conventions in the test.
@ofthecurerh

ofthecurerh commented 16 days ago
@hroyrh There are a few different things we are talking about here.

How do we create users?
Do we create project resources as a specific user or as a single superuser? (probably should be an option to do it either way)
Creating users can either be done via oc login or by using a user and rolebinding template via oc create. The oc login method has the benefit of rolling up this process into a single command (while also creating an API token). I prefer the oc create method because it allows us more flexibility and control over how the API objects are defined. Also this method simplifies the cluster-loader code base by using a single function to create resources rather than having a specialized one for each kind of object.

My suggestion to use the --token option instead of --kubeconfig was aimed at simplifying the process of creating project resources as multiple users. Currently cluster-loader creates objects as the system:admin by copying the admin.kubeconfig file. We'd have to refactor that part to create new kubeconfigs via oc adm plus manage mapping users to their kubeconfig filepath. Additionally when creating a user via oc login IIRC it only creates an API token.

@jeremyeder I'm not advocating we use the product in any non-standard way. The product in this case is OpenShift not the oc tool. IMO anything that returns a 200 response is valid and as the product was intended to be used.

Networking tests loopback gets stuck trying to register pbench on nodes

From old repo:

vikaslaad commented 22 days ago
In the networking tests there is a bug when loopback is run we need to comment 3 lines in inventory file for registering pbench task. Somehow that if condition is not working when node names are not passed.
@ofthecurerh

ofthecurerh commented 16 days ago
@vikaslaad could you provide more detail on how you're using it.

There are a few known issues that I and @schituku have run into that I'm working on a PR for. I'll turn these into actual issues to document what is being worked on.

Update SVT networking tests to use Ansible 2.0 python API

The networking tests are broken on Ansible 2.0. Update the tests to use the 2.0 Python API

/cc: @schituku @vikaslaad

Restructure cluster-loader code for better readability

I would like everyone's suggestions on this. @timothysc @jeremyeder @sjug @mffiedler

Better documentation for cluster_loader.py

Copied from old repo

Complete examples and usage considerations.

Config files for actual performance tests with links to test plans and/or Trello cards.

cluster_loader: Add option to delete projects if they already exist.

A more common scenario than the need for the cleaning option (which we should consider removing altogether) is the case where the user is running cluser_loader multiple times against the same yaml file. The artifacts which get created on the previous run must be manually deleted (or cluster_loader wrapped).

Add a -x option to attempt project deletion if it already exists. If it already exists and there is no -x, put out an error message suggesting the option be used.

Documentation for build_test script beyond syntax assist

Need real documentation for the build_test script.

Image provisioner: pbench id_rsa private key not found

Need to source this from the svt-private repo.

Network tests: add node-to-node/host-to-host test to the suite

Add an additional option for node-to-node tests. Should be a straightforward add since no OCP resources need to be created. Set up SSH keys, make sure uperf is installed, execute pbench-uperf.

Keep processed template in memory instead of writing a temp file to disk

Copied from old repo

The current process for creating a template is to:

Run "oc process -f template.json"
Capture stdout and deserialize to a json object
Deserialize json object and write it to a file on disk

I suggest we skip writing to disk and keep it memory, but as the raw output instead of a deserialized json object.

An example of this, but not necessarily the actual implementation:
processed = subprocess.Popen('oc process -f template.json', stdout=subprocess.PIPE) subprocess.Popen('oc create -f -', stdin=processed.stdout, stdout=subprocess.PIPE)

Can we control Jmeter's components through Beanshell? I want to disable all assertions through one flag. how can I do it?

Networking: Synthetic tests use deprecated oc process options

As of 3.5 oc process will not use -v option and suggest usage of -p. The network synthetic tests should be fixed to use oc process -p instead of oc process -v

cluster_loader: pod creation fails when quotas used

From old repo:

Pod creation with quotas is currently broken in cluster_loader using the default quota-default.json file.

Quota creation succeeds but pod creation fails:

Error from server: error when creating "/tmp/tmpSfS4m6": pods "hellopods0" is forbidden: Failed quota: default: must specify cpu,memory

Workaround: remove the quota statement from projects if you don't need it.

This would allow us the flexibility to script out whatever test scenario we need without being limited by the functionality of the cluster-loader.

The cluster-loader could make use of this library too.

openshift / svt Goto Github PK

svt's Introduction

OpenShift, Kubernetes and Docker: Performance, Scalability and Capacity Planning Research by Red Hat

IMPORTANT

How this repository is organized:

Dockerfiles and Dependencies

Feedback and Issues

svt's People

Contributors

Stargazers

Watchers

Forkers

svt's Issues

Recommend Projects

Recommend Topics

Recommend Org