kubernetes-sigs / kubetest2 Goto Github PK

View Code? Open in Web Editor NEW

325.0 10.0 102.0 995 KB

Kubetest2 is the framework for launching and running end-to-end tests on Kubernetes.

License: Apache License 2.0

Go 87.63% Makefile 1.37% Shell 7.98% Python 3.02%

k8s-sig-testing

kubetest2's Introduction

kubetest2

Kubetest2 is a framework for deploying Kubernetes clusters and running end-to-end tests against them.

It is intended to be the next significant iteration of kubetest

Concepts

kubetest2 is effectively split into three independent executables:

kubetest2: discovers and invokes deployers and testers in PATH
kubetest2-DEPLOYER: manages the lifecycle of a Kubernetes cluster
kubetest2-tester-TESTER: tests a Kubernetes cluster

The intent behind this design is:

minimize coupling between deployers and testers
encourage implementation of new deployers and testers out-of-tree
keep dependencies / surface area of kubetest2 small

We provide reference implementations but all all new implementations should be external implementations

Installation

To install kubetest2 and all reference deployers and testers: go install sigs.k8s.io/kubetest2/...@latest

To install a specific deployer: go install sigs.k8s.io/kubetest2/kubetest2-DEPLOYER@latest (DEPLOYER can be gce, gke, etc.)

To install a sepcific tester: go install sigs.k8s.io/kubetest2/kubetest2-tester-TESTER@latest (TESTER can be ginkgo, exec, etc.)

Usage

General usage is of the form:

kubetest2 <deployer> [Flags] [DeployerFlags] -- [TesterArgs]

Example: list all flags for the noop deployer and ginkgo tester

kubetest2 noop --test=ginkgo --help

Example: deploy a cluster using a local checkout of kubernetes/kubernetes, run Conformance tests

kubetest2 gce -v 2 \
  --repo-root $KK_REPO_ROOT \
  --gcp-project $YOUR_GCP_PROJECT \
  --legacy-mode \
  --build \
  --up \
  --down \
  --test=ginkgo \
  -- \
  --focus-regex='\[Conformance\]'

Reference Implementations

See individual READMEs for more information

Deployers

kubetest2-gce - use scripts in kubernetes/cloud-provider-gcp or kubernetes/kubernetes
kubetest2-gke - use gcloud containers
kubetest2-kind - use kind
kubetest2-noop - do nothing (to use a pre-existing cluster)

Testers

kubetest2-tester-clusterloader2 - use clusterloader2
kubetest2-tester-exec - exec a given command with the given args / flags
kubetest2-tester-ginkgo - runs e2e tests from kubernetes/kubernetes
kubetest2-tester-node - runs node e2e tests from kubernetes/kubernetes

External Implementations

Deployers

Testers

kubetest2-tester-kops

Support

This project is currently unversioned and unreleased. We make a best-effort attempt to enforce the following:

kubetest2 and its reference implementations must work with the in-development version of kubernetes and all currently supported kubernetes releases
- e.g. no generics until older supported kubernetes version supports generics
- e.g. ginkgo tester must work with both ginkgo v1 and ginkgo v2
changes to the following testers must not break jobs in the kubernetes project
- kubetest2-tester-exec
- kubetest2-tester-ginkgo

Contact

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

kubetest2's People

Contributors

Stargazers

Watchers

Forkers

amwat cofyc michaelmdresser chizhg albertomilan chrislovecnm bentheelder mkumatag devguyio saschagrunert boluisa devopstoday11 alinadonisa benmoss quentin-cha liggitt figo kubernetescheatsheet joshua-bone bobbypage rifelpet mauriciopoppe justinsb satva rajalakshmi-girish jasonladuke0311 isabella232 evalle supriya-premkumar yjuns sfowl rudeigerc dprotaso arxiv-research shwethakumbla anfernee libaoan xinydev guirish03 ipochi piyushgupta1551 jiahuif richackard dims namanl2001 ruquanzhao ekmixon gjkim42 frankfanslc nckturner lzhecheng olemarkus mainred zarvd zhengzheyang pseudomuto ciphertron awels hoskeri ttpp123 jprzychodzen manjitsin leonardpahlke ganjmonk nikhita aojea palnabarun azylinski juan-lee priyankasaggu11929 mushuee spiffxp cpanato mattcary maaobit ravish165 chazzrobbz mortent jsafrane peterbean410 beanopensource samruddhikhandale aramase ameukam aitchjoe cartermckinnon rjsadow iq-scm chendave borg-land mboersma haojue rodrigodelmonte hakuna-matatah kishen-v errordeveloper alifib susmani2026 rarchk jbtk

kubetest2's Issues

Node Tester Improvements

Tracker issue for improvements to the node tester to enable migration of CI jobs to kubetest2

xref: kubernetes/enhancements#2464

Error installing kubetest2

From the instructions to install kubetest2 I am running:

GO111MODULE=on go get sigs.k8s.io/kubetest2/...@latest

when I get error

go get: can't request version "latest" of pattern "sigs.k8s.io/kubetest2/..." that includes the main module (sigs.k8s.io/kubetest2)

what am I missing?

Support Version Marker Publishing

I'm migrating a few remaining kOps e2e jobs to kubetest2 and I'm facing a kubetest2 limitation that I'd like to address upstream. We have a pipeline of jobs using kubetest 1 that publishes version markers upon tests passing. These markers indicate a version of kOps to use in other kubetest2 jobs.

I'm hoping to have kubetest2-kops publish version markers but I'm facing the following challenges:

The Deployer interface doesn't know whether the tests passed or failed. The deployer knows the exact version to publish in a version marker, but it can't determine whether or not to publish the marker.
The Tester interface doesn't know the version to publish as a marker.

This likely relates to my #87 but I'm wondering if it would simpler for the Deployer to know whether the tester succeeded or failed. We could add an extension like the other DeployerWith interfaces, perhaps something like:

type DeployerWithTesterResults interface {
	OnSuccess() error
	OnFailure() error
}

I'm happy to open a PR if an implementation is agreed upon.

For reference, here is one of the remaining kOps jobs that uses this kubetest 1 functionality.

/cc @justinsb

Add a version command

kubetest2 --version should output the tag/commit that it's built from.

This is especially important since currently kubetest2 is used in a lot of place by go getting (mostly latest)
so knowing which exact binary(commit) was used is useful.
This will be further useful if it gets surfaced in testgrid metadata for any kubetest2 based job.

Additionally, this should be done for all the in-tree deployers and testers.

The version will also reflect the release once we move to tagged releases.
related: #17

A typical useful pattern is to inject and set package variables at build time through linker flags.
https://blog.alexellis.io/inject-build-time-vars-golang/

A good example can be found in the kind repo

https://github.com/kubernetes-sigs/kind/blob/fbf6cdcf5240c6b994657c9a4f85657ee0591331/Makefile#L57

https://github.com/kubernetes-sigs/kind/blob/fbf6cdcf5240c6b994657c9a4f85657ee0591331/pkg/cmd/kind/version/version.go

xref: kubernetes/enhancements#2464

/help
/good-first-issue

Add E2E tests for kubetest2 gke deployer

Since there has been active development on kubetest2 gke deployer, which has been used by multiple projects, it'll be good if we can have some e2e test coverage for it.

Document kubetest2 deployers

ref: #193 (comment)

Document:

we don't want any more in-tree kubetest2 deployers or testers (we may move some out eventually?)
well known out-of-tree kubetest2 deployers
- kubetest2-aks: https://github.com/kubernetes-sigs/cloud-provider-azure/tree/master/kubetest2-aks
- kubetest2-kops: https://github.com/kubernetes/kops/tree/master/tests/e2e/kubetest2-kops

/kind documentation
/sig testing
/priority important-longterm

Unable to go install kubetest2 with go 1.17

When using go install with go 1.17, dependencies that require go1.18 or above get pulled in

$ go version
go version go1.17.13 darwin/amd64
$ go install sigs.k8s.io/kubetest2/...@latest
...
# k8s.io/apimachinery/pkg/util/validation/field
go/pkg/mod/k8s.io/[email protected]/pkg/util/validation/field/errors.go:69:33: undefined: reflect.Pointer
note: module requires Go 1.19
# k8s.io/test-infra/prow/config/secret
...
go/pkg/mod/k8s.io/[email protected]/prow/config/secret/secret.go:34:32: too many errors
note: module requires Go 1.18

Note this doesn't happen with go 1.16 (used by kubernetes v1.22)

/kind bug
/sig testing
/priority important-soon

[GCE] support reusing existing ssh keys

/cc @spiffxp
xref: kubernetes/k8s.io#1751 (comment)

xref: kubernetes/enhancements#2464

Decouple RunDir from Artifacts

A tracking issue for: #91 (review)

RunDir: should be used for storing all files in general specific to a single run of kubetest2
Artifacts: should only be used intentionally for storing files which (in CI) get automatically uploaded to GCS.

also helps with: #97

/cc @justinsb @mkumatag

The logical separation was added in: #91
But currently RunDir (by default) is a subdirectory under artifacts, so there's no actual separation yet.

[GKE] Do not print the error log if GKE cluster creation fails but gets retried

If the GKE cluster creation fails because of some intermittent errors like GCE_STOCKOUT, the cluster creation will be retried in the next region/zone if multiple regions/zones are specified via the kubetest2-gke flag. In most cases the cluster creation will eventually succeed, but currently the error will be printed in the log which is confusing to the users, and it gets worse since the ERROR line is highlighted by Spyglass:

ERROR: (gcloud.beta.container.clusters.create) Operation [<Operation
 clusterConditions: [<StatusCondition
 canonicalCode: CanonicalCodeValueValuesEnum(UNAVAILABLE, 15)
 code: CodeValueValuesEnum(GCE_STOCKOUT, 1)
 message: "Instance 'gke-prow-test2-default-pool-4158b499-m5q4' creation failed: The zone 'projects/xxx/zones/us-central1-d' does not have enough resources available to fulfill the request.  '(resource type:compute)'.">, <StatusCondition
 canonicalCode: CanonicalCodeValueValuesEnum(UNAVAILABLE, 15)

One way to mitigate is do not print out the error if it gets retried. To do this, we need to make sure:

gcloud is printing out the log correctly to stdout and stderr, and we should only swallow the stderr
We still need to print out the error if it's a unretryable error, or it's the last region/zone that can be tried
(optional) ideally we should also print out the error regex that is matched, for ease of debugging

/assign

Should have a noop deployer

Currently kubetest2 requires a deployer entrypoint.
One common use case is to run tests on an existing cluster, for which the deployer is mostly a noop.
kubetest has something similar: https://github.com/kubernetes/test-infra/blob/master/kubetest/none.go

Unable to install kubetest2

I tried to install kubetest2 following the README.md. And I ran into issues with github.com/containers/image/v5/manifest

GO111MODULE=on go get sigs.k8s.io/kubetest2/...@latest                                                   

go: downloading sigs.k8s.io/kubetest2 v0.0.0-20210322183049-358715462ab9
go: downloading github.com/octago/sflags v0.2.0
go: downloading k8s.io/release v0.7.1-0.20210204090829-09fb5e3883b8
go: downloading k8s.io/test-infra v0.0.0-20200617221206-ea73eaeab7ff
go: downloading golang.org/x/sys v0.0.0-20201112073958-5cba982894dd
go: downloading cloud.google.com/go v0.51.0
go: downloading github.com/go-git/go-git/v5 v5.2.0
go: downloading github.com/containers/image/v5 v5.9.0
go: downloading github.com/shirou/gopsutil v0.0.0-20190901111213-e4ec7b275ada
go: downloading github.com/shirou/gopsutil/v3 v3.20.12
go: downloading golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9
go: downloading golang.org/x/oauth2 v0.0.0-20191202225959-858c2ad4c8b6
go: downloading github.com/google/go-github/v33 v33.0.0
go: downloading github.com/containers/libtrust v0.0.0-20190913040956-14b96171aa3b
go: downloading github.com/containers/ocicrypt v1.0.3
go: downloading github.com/docker/docker v1.4.2-0.20200309214505-aa6a9891b09c
go: downloading cloud.google.com/go/storage v1.0.0
go: downloading golang.org/x/net v0.0.0-20201110031124-69a78807bb2b
go: downloading github.com/klauspost/compress v1.11.3
go: downloading github.com/klauspost/pgzip v1.2.5
go: downloading github.com/ulikunitz/xz v0.5.8
go: downloading github.com/StackExchange/wmi v0.0.0-20190523213315-cbe66965904d
go: downloading google.golang.org/api v0.15.1
go: downloading google.golang.org/genproto v0.0.0-20200526211855-cb27e3aa2013
go: downloading golang.org/x/exp v0.0.0-20191227195350-da58074b4299
go: downloading golang.org/x/tools v0.0.0-20200616133436-c1934b75d054
go: downloading go.opencensus.io v0.22.2
go: downloading github.com/go-ole/go-ole v1.2.4
go: downloading github.com/golang/groupcache v0.0.0-20191227052852-215e87163ea7
# github.com/containers/image/v5/manifest
../../../pkg/mod/github.com/containers/image/[email protected]/manifest/oci.go:44:234: undefined: v1.MediaTypeImageLayerNonDistributableZstd
../../../pkg/mod/github.com/containers/image/[email protected]/manifest/oci.go:44: undefined: v1.MediaTypeImageLayerZstd
../../../pkg/mod/github.com/containers/image/[email protected]/manifest/oci.go:101:28: undefined: v1.MediaTypeImageLayerNonDistributableZstd
../../../pkg/mod/github.com/containers/image/[email protected]/manifest/oci.go:106:28: undefined: v1.MediaTypeImageLayerZstd

[GKE] cluster is ready

How do I ensure that the cluster is up and running properly?

EKS deployer

Hey guys, is there any plan on supporting an EKS deployer?

I looked at kubetest 1 in the past as a way test a standard runtime I put on my EKS clusters but have since not kept up with the project and just stumbled across this today.

Support deployers passing additional information to testers

Currently a tester is invoked with only arguments provided from the kubetest2 invocation (kubetest2 ... -- --tester-args-here) along with some environment variables. The deployer can optionally provide a kubeconfig file to the tester via an interface:

kubetest2/pkg/app/app.go

Lines 128 to 133 in d56fa28

    
           if dWithKubeconfig, ok := d.(types.DeployerWithKubeconfig); ok { 
        
           	if kconfig, err := dWithKubeconfig.Kubeconfig(); err == nil { 
        
           		envsForTester = append(envsForTester, fmt.Sprintf("%s=%s", "KUBECONFIG", kconfig)) 
        
           	} 
        
           }

As I build kubetest2 support for Kops I'm noticing that certain testers have flags whose values are only known by the deployer rather than the process invoking kubetest2.

For example a kubectl test relies on a Host flag, but the cluster's apiserver host name isn't known at the time that kubetest2 is invoked, so appending it to the end of the kubetest2 command isn't possible. Many of the cloudConfig flags in test_context.go have values that aren't known until the deployer has deployed the cluster too.

It would be nice to have a way for a deployer to provide a set of arguments and/or environment variables to the tester. One idea is to add a DeployerWithTesterArgs interface similar to the DeployerWithKubeconfig interface:

type DeployerWithTesterArgs interface {
	Deployer

	TesterArgs() []string
}

One concern I have with that is the deployer would need to know which tester is being used in order to provide it the necessary arguments. The tester interface doesn't provide any sort of identifier that we could pass into the new interface's function. We could add a Name to Tester:

type DeployerWithTesterArgs interface {
	Deployer

	TesterArgs(name string) []string
}

type Tester struct {
	Name string
	TesterPath string
	TesterArgs []string
}

but this increasingly tight coupling feels contradictory to kubetest2's objectives.

Any thoughts or ideas on how to provide testers with additional info they need from deployers? I'm happy to implement something if we reach a consensus.

User Documentation

There's no user documentation for kubetest2. I had to dig through source code to see what environment variables are passed to the test. Please add documentation for all environment variables that the test has available.

https://github.com/kubernetes-sigs/kubetest2/blob/master/pkg/app/app.go#L129

kubetest2-ginko tester overwrites local e2e.test

Expected behavior: kubetest2 builds and uses a local e2e.test
Actual behavior: kubetest2 builds a local e2e.test, but uses a different one.

I'm trying to test out a new e2e.test config flag (kubernetes/kubernetes#111481). I'm running the following kubetest2 command, inspired by the cloud-provider-gcp prow job:

GOPATH=/clank/go/src kubetest2 gce -v 6  --gcp-project mattcary-e2e-test --legacy-mode --repo-root /clank/go/src/k8s.io/kubernetes --build --up --down --test=ginkgo --node-size n1-standard-4 --master-size n1-standard-8 -- --parallel=30 --test-args='--minStartupPods=8 --ginkgo.flakeAttempts=3 --enabled-volume-drivers=gcepd' --skip-regex='\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\]' --focus-regex='\[Driver:.gcepd\]'

This is using kubetest2 at commit 2aac35a from 13 July.

The command above correctly builds e2e.test and puts it into a local runfiles. While the gce cluster is coming up, I can confirm that the e2e.test has my new flag.

However, somewhere around here in the logs it's replaced by some version pulled down from somewhere, instead of my local build:

I0728 15:11:35.330046  345616 ginkgo.go:121] Using kubeconfig at /clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/kubetest2-kubeconfig
I0728 15:11:37.148140  345616 package.go:199] Found existing tar at /usr/local/google/home/mattcary/.cache/kubernetes-test-linux-amd64.tar.gz
I0728 15:11:39.380846  345616 package.go:202] Validated hash for existing tar at /usr/local/google/home/mattcary/.cache/kubernetes-test-linux-amd64.tar.gz
I0728 15:11:41.857031  345616 package.go:165] Found existing kubectl at /clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/kubectl
W0728 15:11:43.508631  345616 package.go:171] sha256 does not match
Copying gs://kubernetes-release/release/v1.25.0-alpha.3/bin/linux/amd64/kubectl...
\ [1 files][ 42.0 MiB/ 42.0 MiB]                                                
Operation completed over 1 objects/42.0 MiB.                                     
I0728 15:11:46.098135  345616 ginkgo.go:91] Running ginkgo test as /clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/ginkgo [--nodes=30 /clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/e2e.test -- --kubeconfig=/clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/kubetest2-kubeconfig --kubectl-path=/clank/go/src/k8s.io/kubernetes/_rundir/af0111ff-bacd-4b37-92c3-e5516e3ff27e/kubectl --ginkgo.flakeAttempts=1 --ginkgo.skip=\[Slow\]|\[Serial\]|\[Disruptive\]|\[Flaky\]|\[Feature:.+\] --ginkgo.focus=\[Driver:.gcepd\] --report-dir=/clank/go/src/k8s.io/kubernetes/_artifacts --test.timeout=24h --minStartupPods=8 --ginkgo.flakeAttempts=3 --enabled-volume-drivers=gcepd]

That is, at the log message at 15:11:35 the correct e2e.test is in _rundir/af0111fff*, and by the log message at 15:11:26 its been replaced by a different one.

I've looked through the kubetest2 code and wonder if the problem is here: https://github.com/kubernetes-sigs/kubetest2/blob/master/pkg/testers/ginkgo/package.go#L105, although I'm not really sure what's going on.

kubetest2 should handle signals

When a test job times out, Prow sends a signal to the test process to allow it to clean up before it's terminated.

Ideally, kubetest2 would handle this signal, canceling any currently-running subprocess and running any cleanup that's necessary (e.g. tearing down resources if --down was provided or returning resources to boskos), but it appears this is not yet implemented.

[Proposal] gke deployer supports retrying when cluster creation failures happen

Background

kubetest2 gke deployer is now used by a few projects to create GKE clusters in the CI environment for test automation, and we've seen a few different errors which caused the cluster creation to be failed.

The most common errors we've seen are:

The zone xxx does not have enough resources available to fulfill the request. It corresponds to the GCE_STOCKOUT error. This error happens when there are not enough machines/resources in that data center to provision the requested nodes, and there is no way to predict the error since the stock in each region/zone is always changing dynamically.
All cluster resources were brought up, but: component "kube-apiserver" from endpoint xxx is unhealthy. It is a legitimate cluster bring up issue(?) as discussed in #85 (comment), and the reason for it is unknown. But usually when it happened, the similar cluster creation requests would have a large chance (~50%) to get the same error.
only x nodes out of y have registered; this is likely due to Nodes failing to start correctly. is similar as the 2nd error, and the reason for it is unknown either.

When these errors happen, usually they will have a wide impact - multiple PRs will be blocked since the presubmit test jobs fail, and simply rerunning the jobs won't fix them, hence directly impact the developers' productivity.

Solution

We've seen these errors in multiple projects where we use kubetest2 gke deployer, so we would prefer a solution in kubetest2 gke deployer rather than each project having its own workaround. Since these errors are not predictable and preventable, a mechanism to retry the cluster creation when these errors happen is proposed.

Details

These three errors happen during different stages of GKE cluster creation. The GCE_STOCKOUT error happens before the nodes are created, and the other two errors happen after the nodes are created.

For the GCE_STOCKOUT error, since we cannot predict when the region/zone will have enough resource to fulfill the request, the best strategy would be retrying the cluster creation request in a different region/zone. For the other two errors, since the cluster is already created, retrying with the same request will result in a duplicate cluster name error, so for ease of implementation and efficiency (deleting the broken cluster first will take some time), it's also more ideal to retry with a different region/zone.

Updates to the `Up` function

To support retrying in Up, two new flags --backup-regions and --backup-zones can be added. When cluster creation fails, the error messages can be checked to match with the error patterns that need to be retried. If there is a match, the cluster creation request will be retried in the next backup region/zone.

Updates to the `Down` function

To ensure proper cleanups, a map[string][]string data structure needs to be added to track which clusters have been created in each region/zone during the Up stage. And the Down function will iterate the map to delete all these clusters.

Multiple clusters in different regions/zones

Currently for multiple clusters creation, they are supposed to be created in the same region/zone since --region and --zone are only accepting a single string. In the future if we need the clusters to be created in different regions/zones, we can change --region and --zone to accept a list of strings, and the --backup-regions/zones can be changed to a list of lists correspondingly.

Other concerns

For multiple cluster creation scenario, there is a debate on whether we should only retry the failed cluster creation requests or all the requests. Under the current gke deployer implementation, since we always want the clusters to be created in the same region/zone, and we can ensure proper cleanups with the planned updates to Down, it's more preferred to retry all the cluster creation requests. In the future when we support creating them in different regions/zones, this logic be changed to only retry on the failed requests.

The error message in the JUnit xml result is not informative

When we use kubetest2 to run a test workflow, for each phase, it will add a test in the JUnit xml result if it fails, and include the error message, the main reason for adding this is for integration with Testgrid and Spyglass.

However, the error message for the test is something like exit status 255 and does not contain any information in detail, for example https://prow.knative.dev/view/gs/knative-prow/logs/ci-knative-client-continuous/1308147648712151041.

Two issues related to this:

Other phases like Up and Down also have this issue, it would be nice if some detailed information can be added in the error message.
Since most of the test flows will generate the JUnit xml files themselves (lots of the projects use gotestsum), is adding the Test phase into the JUnit result a bit noisy? (It is already causing a bit confusion around the Knative community).

/kind bug

kubetest2-gke deployer should accept major.minor version

kubetest2/kubetest2-gke/deployer/up.go

Line 277 in 96ab8c9

re, err := regexp.Compile(`(\d)\.(\d)+\.(\d)*(.*)`)

This is too strict. 1.18 is a valid cluster version.

/cc @chizhg @joshua-bone

Don't extract e2e.test and ginkgo binary into artifacts.BaseDir()

These two binaries are extracted under the Artifacts folder and collected post-run and this end up utilizing a lot of space in the GCS which is not required as nothing else is captured, hence it will be good if we can put them into some temp folder and trigger from there

Kubetest2 should support ginkgov2 for 1.25+ and ginkgov1 for 1.24-

FYI this causes problems when testing older k8s versions, see #204

Originally posted by @rifelpet in #200 (comment)

[GKE/EXE] exe tester does not accept multiple arguments

Hi all

I am getting an intermittent error running this

kubetest2 gke --cluster-name kubetest --project chris-love-operator-playground --zone us-central1-a --ignore-gcp-ssh-key --up --down --test=exec -- bazel test --stamp //e2e/... --test_arg=--pvc=true

The error is

I0317 14:01:41.262479 1782430 request.go:655] Throttling request took 1.046899937s, request: GET:https://34.68.57.200/apis/apps/v1?timeout=32s
panic: failed to fetch preferred server resource: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request

I am thinking that the GKE cluster is not up fully?

Unsafe concurrent use of bytes.Buffer

https://github.com/kubernetes-sigs/kubetest2/blob/master/pkg/process/junitexec.go#L59-L61 uses the same bytes.Buffer in two output streams, which means it can be written to concurrently. That is not threadsafe.

This panics randomly with changes made to bytes.Buffer growing in go1.19.

The code should be changes to make use of the buffer threadsafe or use two separate buffers

ginkgo - Support downloading kubernetes client tar

Currently the gingko tester supports downloading the e2e test package tar from a GCS bucket to use a specific version of the test package.

For test runs in which k/k is not being built it would be useful to support the same functionality for the kubernetes client package. This way we can align kubectl + kubernetes + e2e.test versions.

I'm implementing a kubetest2 deployer for kops and noticed this discrepancy because the prow image our jobs use has a 1.20 kubectl in its PATH and a text change in 1.20 is causing the 1.19 e2e tests to fail. The original kubetest downloaded kubectl via get-kube.sh and I think it would be useful to support similar functionality in kubetest2.

Running kubetest2-gke --down separately, does not delete clusters create by --up previously

Hi all

I am running kubetest2 gke --down --cluster-name bazel-test --project chris-love-operator-playground --zone us-central1-a --ignore-gcp-ssh-key and the gke cluster is not getting deleted.

kubetest2 gke --up --cluster-name bazel-test --project chris-love-operator-playground --zone us-central1-a --ignore-gcp-ssh-key created the cluster

Remove dependence on boskos and test-infra

Filing this issue to remove dependency of kubetest2 on boskos and in turn test-infra.

Thoughts from @BenTheElder on #207 (comment):

The best answer is probably copying in our own boskos client like @munnerz did in test-infra, boskos client is pretty simple and stable, and the API is not really changing.

Can we remove the hardcoded clusterloader2 folder name appended to $ARTIFACTS in --report-dir ?

Current situation -

when a perf test is executed using kubetest2-tester-clusterloader2 via a prow job, the results of the test are created in $artifacts/clusterloader2 which cannot be read by perfdash (perfdash looks for results inside the artifacts folder only)
Upon reading further, it seems that the concatenation of clusterloader2 to $artifacts is hardcoded and cannot be bypassed in any way and --report-dir is not longer a valid argument for the clusterloader2 tester.
Refer -

kubetest2/pkg/testers/clusterloader2/cl2.go

Line 81 in f4c453f

"--report-dir=" + filepath.Join(os.Getenv("ARTIFACTS"), "clusterloader2"),

Why is this required?
We are trying to execute these perf tests on ppc64le arch Cloud VMs and have perfdash read the test result data.
With the mandatory clusterloader2 folder being created inside the $arifatcs folder, perfdash is no longer able to read test data.

I understand this may have been done originally with a purpose but wanted to check if we can make this concatenation optional so that someone can bypass the clusterloader folder creation inside $ARTIFACTS folder if and when required?

Note: I could not find a way to bypass this but please let me know if there is any way to do so in the existing code.

Unable to run kubetest2 <deployer> --test more than once from same path

Summary: We are unable to use kubetest2 to test kubernetes more than once from a same path.

metadata.json generated by kubetest2 command is objecting to run kubetest2 second time from same path stating the following error:
Example using gingko tester:

root@kubetest2:/workspace# kubetest2 tf --test=ginkgo -- --flake-attempts 2 --focus-regex='ariable Expansion should verify that a failing subpath expansion can be modified during the lifecycle of a container'
F1210 11:00:28.745660   14208 ginkgo.go:205] failed to run ginkgo tester: key tester-version already exists in the metadata
Error: exit status 255
root@kubetest2:/workspace#

Expected result: Should be able to run kubetest2 any number of times from a given path. There should ether be a flag to ignore metadata.json or overwrite the metadata with latest kubetest2 run

In order to run k8s conformance tests by following option B at https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md#running-conformance-tests-with-kubetest, we are required to trigger kubetest2 command twice.

failed to run ginkgo tester: key tester-version already exists in the metadata
Error: exit status 255

is the error thrown when run from same path.

Note: This started happening after the below commit to fail if key is already present in metadata.json.
ac36e32
Expect this check to be more liberal https://github.com/kubernetes-sigs/kubetest2/blob/master/pkg/metadata/metadata.go#L49

Sample job that threw this error: https://prow.ppc64le-cloud.org/view/gs/ppc64le-kubernetes/logs/test-periodic-kubernetes-conformance-test-ppc64le/1468899640811196416

Implement minikube deployer

Would be great to have a minikube deployer, as an alternative to kind for local development.

/help

Add Cluster API deployer

Breaking out of #45 into a new issue

Creating a Cluster API deployer for kubetest would present a number of benefits:

Being able to run kubetest across any Cluster API infrastructure implementation.
Allows for minimal infrastructure specific code in this repo
Expands testing possibilities (e.g. routable ipv6 in AWS)

At a high-level, the deployer would need to do things along the following lines:

Start a kind cluster, or re-use one with a known good Kubernetes version
Allow user of kubetest to pass in as input: Cluster API versions, infrastructure provider, cluster sizing, etc...
Install Cluster API components
Deploy cluster and wait
Pass kubeconfig to kubetest to run

Cluster API currently has its own test framework, which is importable in library form to do e2e testing, bits of which can be leveraged for testing.

Cluster API providers such as that for AWS also additionally have an inverted workflow in order to run kubetest against main branches of Cluster API itself for conformance.

[GCE] add support for extracting pre-existing builds from GCS

/cc @spiffxp

xref: kubernetes/test-infra#21180 (review)
xref: kubernetes/enhancements#2464

Shadow canary job for pull-kubernetes-node-e2e

Issue to track improvements needed to migrate one of the kubernetes presubmit-blocking jobs as part of #164 , kubernetes/enhancements#2464

Testgrid: https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-node-e2e

Existing kubetest prowjob definition: https://github.com/kubernetes/test-infra/blob/3c87dfedd57e2dcad764a0d7fbb4343c7ca02a24/config/jobs/kubernetes/sig-node/sig-node-presubmit.yaml#L43-L88
(good to use the podutil one as the base, as we want to migrate away from bootstrap to podutil as well kubernetes/test-infra#20760)

example kubetest2 canary job (for a different gce job): https://github.com/kubernetes/test-infra/blob/3c87dfedd57e2dcad764a0d7fbb4343c7ca02a24/config/jobs/kubernetes/sig-cloud-provider/gcp/gcp-gce.yaml#L103-L147

Determine what subset of node e2e tests we run as part of that job
Add flags to the node tester to support the subset of parameters we invoke as part of this job
Create an equivalent kubetest2 invocation using the noop deployer and node tester to be used as part of the new job

cc @dims @Namanl2001

eks-tester pulling in kubernetes/kubernetes makes the main module hard to depend on

kubetest2 is going to be a very difficult dependency to work with when the EKS provider pulls in kubernetes/kubernetes at a particular version. kubernetes/test-infra#17869 (comment)

one way or another this should be removed from the main module, which should NOT depend on kubernetes/kubernetes

/assign @amwat

Support GCE PD testing

kubetest2 does not plumb env vars to its dependent processes as described here: https://github.com/kubernetes-sigs/kubetest2/blob/master/kubetest2-gce/deployer/common.go#L89.

From the comment that seems mostly an aesthetic decision, and a desire to understand the API between kubetest2 and the tests it runs rather than opening the door to promulgating a whole bunch of legacy crufty decisions.

This is currently causing problems due to kubernetes/kubernetes#109541, which gates GCE PD testing on an environment variable due to CSI migration (which implements the storage side of the cloud provider extraction project).

The result has been that GCE PD testing is not possible or is very difficult, see kubernetes/test-infra#26890.

Since enabling env var support in kubetest may be a long discussion, I'll start by adding a flag for GCE PD support. But this issue may be a good place for revisiting env var support in kubetest2, as that might also be needed if cluster creation needs more configuration (with cloud provider tests moving out of k/k, a lot of things that just worked by default may now need additional configuration).

Implement AKS deployer

Would be great to have an AKS deployer available in kubetest2. That will help us integrate with e2e-framework as we were discussing here

/help

bug: `make quick-verify` fail on local test

When try to test kubetest2 on kind and run make quick-verify, the problem come with

kubetest2 kind --up --down --test=exec -- kubectl get all -A
I0607 15:39:30.357663    6091 app.go:59] RunDir for this run: "/home/shuyang/go/src/sig.k8s.io/kubetest2/_artifacts/41593815-2b32-49b1-a611-7d59891e4d15"
I0607 15:39:30.358923    6091 app.go:90] ID for this run: "41593815-2b32-49b1-a611-7d59891e4d15"
I0607 15:39:30.358952    6091 up.go:62] Up(): creating kind cluster...
Creating cluster "" ...
ERROR: failed to create cluster: '' is not a valid cluster name, cluster names must match `^[a-zA-Z0-9_.-]+$`
I0607 15:39:30.439751    6091 down.go:32] Down(): deleting kind cluster...
Deleting cluster "" ...
Error: exit status 1
Makefile:82: recipe for target 'quick-verify' failed
make: *** [quick-verify] Error 1

Find from here, that somewhat kind name is set to empty

kubetest2/kubetest2-kind/deployer/deployer.go

Lines 49 to 61 in 1eb0d61

    
           type deployer struct { 
        
           	// generic parts 
        
           	commonOptions types.Options 
        
           	// kind specific details 
        
           	NodeImage      string `flag:"image-name" desc:"the image name to use for build and up"` 
        
           	ClusterName    string `flag:"cluster-name" desc:"the kind cluster --name"` 
        
           	BuildType      string `desc:"--type for kind build node-image"` 
        
           	ConfigPath     string `flag:"config" desc:"--config for kind create cluster"` 
        
           	KubeconfigPath string `flag:"kubeconfig" desc:"--kubeconfig flag for kind create cluster"` 
        
           	KubeRoot       string `desc:"--kube-root for kind build node-image"` 
        
           	logsDir string 
        
           }

So I need to run kubetest2 kind --up --cluster-name "name" --down --test=exec -- kubectl get all -A to fix the problem.

port this repo to gimme

kubernetes-sigs/kind#1769

cc @amwat @michaelmdresser

Kubetest2 doesn not down on ginkgo failures

Previously, kubetest2 would have a defer shouldDown() that ensured clusters were always cleaned up after run regardless of outcome given that --down flag was supplied.

d9ca6c8 seems to have removed that defer and instead only ensures down is executed on certain signals. If ginkgo returns an error, shouldDown will never be run.

The effect of this is rather massive resource leakage in the kOps AWS account.

We'd like to understand if this is intended behavior or if this is a bug.

/cc @ShwethaKumbla

kubetest2-openshift - proposed new feature

Hi all

I am working with Coachroach Labs and their operator, and we need to spin up an open shift cluster for e2e testing. We are using kubetest2 for both kind and gke currently and it is working great!

Is anyone opposed to having a kubetest2 deployer that wraps the OpenShift installer binary? I will not vendor any OpenShift specific dependencies.

How to set the number of worker nodes while creating a test cluster with kind deployer?

Hi, I tried to run some e2e tests with the kind deployer using a command like this:

kubetest2 kind --build --up --test=ginkgo --cluster-name new-test -- --focus-regex="\[Feature:Performance\]"

And I get an error:

Failure [0.012 seconds]
[BeforeSuite] BeforeSuite 
test/e2e/e2e.go:77

  Mar 24 16:30:10.649: Unexpected error:
      <*errors.errorString | 0xc002da2280>: {
          s: "there are currently no ready, schedulable nodes in the cluster",
      }
      there are currently no ready, schedulable nodes in the cluster
  occurred

I inspected the only node created and it has a taint:

Name:               new-test-control-plane
Roles:              control-plane
...
Taints:             node-role.kubernetes.io/control-plane:NoSchedule

Although the tests will start if I remove the taint manually, but is there a better way to increase the number of worker nodes? I can't find it in the documentation. Please advise.

Ginkgo v2 flag changes break testing older k8s versions

kOps uses kubetest2 to test all supported k8s versions. The flag changes from upgrading to ginkgo v2 (#200) impact all usage of kubetest2. Because ginkgo was only upgraded to v2 in k/k's master branch, older k8s versions still use ginkgo v1. This inconsistency results in failures to test older k8s versions.

example with k8s 1.22 from kubernetes/kops#14061

	 -------------------------------------------------------------------
	|                                                                   |
	|  Ginkgo timed out waiting for all parallel nodes to report back!  |
	|                                                                   |
	 -------------------------------------------------------------------

 e2e timed out. path: ./_rundir/d40e3fa1-0f35-11ed-80a5-ea2e797f9e54
[1] flag provided but not defined: -ginkgo.flake-attempts
[1] Usage of /home/prow/go/src/k8s.io/kops/_rundir/d40e3fa1-0f35-11ed-80a5-ea2e797f9e54/e2e.test: 
...

Can we make kubetest2's ginkgo flag changes conditional based on the k8s version being tested? Or what is the expected path forward to use the latest kubetest2 while testing older supported k8s versions?

Should binaries have there own repo or how do we manage deps better?

I think we have an interesting problem here with dependencies. I am wondering if we have binaries like say hypothetically kubetest2-kops in its own repo or hosted in the kops repo. Dependencies are nontrial to manage with the k8s projects, which we already saw with us removing the kubetest2-eks.

Thoughts?

--use-built-binaries for tests doesn't work with kind deployer

Setting the --use-built-binaries flag for running tests doesn't seem to be working when using the kind deployer because the kind node image builder doesn't bundle the e2e.test binary.

I wasn't sure if this was expected behavior but being able to run a single command to build a kind node image, start the cluster, and then run tests would be useful:

kubetest2 kind --build --test=ginkgo --up --kube-root=$(pwd) --config=./kind-config.yaml -- --focus-regex=<regex> --use-built-binaries

set SOURCE_DATE_EPOCH from repo when building

for reproducible builds kubernetes/test-infra#18706 (comment)

kubetest2 should do this internally based on the repo root.

Documentation and Examples

If these exist, please close this issue, but I am having a rough time finding documentation and examples. This project looks AWESOME, but I would love to use it, and I am uncertain how to. I have figured out that I can do this:

kubetest2 kind --up --down --test=exec -- kubectl get all -A

But what is the best pattern to use inside of a go unit test? Is this the pattern?

kubetest2 kind --up

Run my test code. And then

kubetest2 kind --down

Rather than running it directly, should I vendor kube2test and use it programmatically?

Thanks in advance!

[GCE] [Failing Test] conformance nodePort tests

https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master%20-%20kubetest2

xref: kubernetes/enhancements#2464

After moving to config-test.sh in #96 the default GCE network name changed from default to e2e-test-${USER}
so changes initially added in #34 are not taking effect.

/cc @BenTheElder @spiffxp

testsuite name does not show up in spyglass JUnit lens

The testsuite names shows up in testgrid https://testgrid.k8s.io/conformance-all#Conformance%20-%20GCE%20-%20master%20-%20kubetest2

but not in the spyglass lens
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-gce-conformance-latest-kubetest2/1368978837181829120

/cc @chizhg @spiffxp

xref: kubernetes/enhancements#2464, kubernetes/test-infra#21180 (comment) ,

Release

Do we have a release or beta release cut of this? Where can I download it from?

	if dWithKubeconfig, ok := d.(types.DeployerWithKubeconfig); ok {
	if kconfig, err := dWithKubeconfig.Kubeconfig(); err == nil {
	envsForTester = append(envsForTester, fmt.Sprintf("%s=%s", "KUBECONFIG", kconfig))
	}

	}

	type deployer struct {
	// generic parts
	commonOptions types.Options
	// kind specific details
	NodeImage string `flag:"image-name" desc:"the image name to use for build and up"`
	ClusterName string `flag:"cluster-name" desc:"the kind cluster --name"`
	BuildType string `desc:"--type for kind build node-image"`
	ConfigPath string `flag:"config" desc:"--config for kind create cluster"`
	KubeconfigPath string `flag:"kubeconfig" desc:"--kubeconfig flag for kind create cluster"`
	KubeRoot string `desc:"--kube-root for kind build node-image"`

	logsDir string
	}

kubernetes-sigs / kubetest2 Goto Github PK

kubetest2's Introduction

kubetest2

Concepts

Installation

Usage

Reference Implementations

External Implementations

Support

Contact

Code of conduct

kubetest2's People

Contributors

Stargazers

Watchers

Forkers

kubetest2's Issues

Background

Solution

Details

Updates to the Up function

Updates to the Down function

Multiple clusters in different regions/zones

Other concerns

Recommend Projects

Recommend Topics

Recommend Org

Updates to the `Up` function

Updates to the `Down` function