Giter Site home page Giter Site logo

kubernetes / test-infra Goto Github PK

View Code? Open in Web Editor NEW
3.8K 130.0 2.6K 212.17 MB

Test infrastructure for the Kubernetes project.

License: Apache License 2.0

Shell 10.12% Python 25.28% CSS 0.25% Makefile 1.57% HTML 0.99% JavaScript 2.01% Go 52.19% Dockerfile 1.54% TypeScript 0.54% Jsonnet 2.71% HCL 0.13% Jinja 0.35% Smarty 2.32%
k8s-sig-testing

test-infra's Issues

test-history/gen_json script should not depend on accessing Jenkins

It would be useful if, like the munger, the test-history scripts depended solely on GCS buckets as input. This would allow federating tests results on the dashboard, not just in PR statuses, via the familiar GCS bucket format.

Right now, accessing the jenkins server is used to list job names, their builds (with status), and the timestamp. These can be replaced with, respectively, a config file listing job name -> gcs path mappings, reading build numbers from the bucket, and parsing started.json + finished.json.

add Jenkins metadata to GCE VMs

When trying to clean up old VMs or other resources, I'm often left wondering "where did this even come from?".

We could probably add metadata describing the Jenkins job and build number that spawned the VM, as well as the PR# on PR Jenkins. There's even an add-instance-metadata function in cluster/gce/util.sh we can use.

Federation e2e failure: wrong ci build version

from kubernetes-e2e-gce-federation job logs

++ gsutil cat gs://kubernetes-release/ci/latest.txt
+ build_version=v1.3.0-beta.0
+ echo 'Using published version ci/v1.3.0-beta.0 (from ci/latest)'
+ fetch_tars_from_gcs ci v1.3.0-beta.0
+ local -r bucket=ci
+ local -r build_version=v1.3.0-beta.0

This is not the build_version that kubernetes-federation-build is pushing, which naturally causes the downstream kubernetes-e2e-gce-federation job to pull the wrong tarballs and fail.

I don't yet understand why this issue took so long to pop up, as the federation stuff has been merged for weeks and this started happening a few days ago.

Auto-file issues for all broken tests

We've had tests and even entire test suites broken for days, weeks, even months and nobody noticed. @lavalamp suggested that we could auto-file issues for all broken tests, as we do for flaky tests. That seems like a good idea to me.

federation e2e gce automated tests on Jenkins fail consistently with token auth attempt failed with status: 403 Forbidden

+++ [0829 21:48:48] Pushing gcr.io/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube:v1.4.0-alpha.3.197_ef82f394a9e1ba
-> GCR repository detected. Using gcloud
@nikhiljindal I think you know about this, but just so we don't lose track of it, here's an issue to track it.

See kubernetes/kubernetes#31655 (comment) for an example...

@k8s-bot federation gce e2e test this

The push refers to a repository [gcr.io/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube](len: 1)
6864c6906300: Preparing
Post https://gcr.io/v2/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube/blobs/uploads/: token auth attempt for registry: https://gcr.io/v2/token?account=oauth2accesstoken&scope=repository%3Ak8s-jkns-pr-bldr-e2e-gce-fdrtn%2Fhyperkube%3Apush%2Cpull&service=gcr.io request failed with status: 403 Forbidden
!!! Error in ./build/../build/../federation/cluster/common.sh:321
'gcloud docker push "${docker_image_tag}"' exited with status 1
Call stack:
1: ./build/../build/../federation/cluster/common.sh:321 push-federation-images(...)
2: ./build/push-federation-images.sh:29 main(...)
Exiting with status 1
Build step 'Execute shell' marked build as failure

metadata cache server curl check doesn't work

The curl check in the metadata cache control script doesn't work, as curl will fail over to the real metadata server:

$ curl -v http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip
* About to connect() to metadata.google.internal port 80 (#0)
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 169.254.169.254...
* connected
* Connected to metadata.google.internal (169.254.169.254) port 80 (#0)
> GET /computeMetadata/v1/instance/network-interfaces/0/ip HTTP/1.1

Cannot merge PR

My PR #105 cannot be merged, because of some problems with CLA. Despite I work at google, bot added CLA:NO label, and manual modification of labels didn't make my PR merge-able.

CC @gmarek

Cross-link gubernator pages

Pages should be discoverable through browsing.

/ to /pr
/pr/1345 to /pr/user
/build/$PR_LOGS/... to /pr/user
/pr/user to /pr/123? 
    Currently links to github directly, but we have 
    a better way to visualize the test results. 

404 on getting dockerized-e2e-runner.sh in all kubernetes builds

++ curl -fsS --retry 3 https://raw.githubusercontent.com/kubernetes/kubernetes/test-infra/jenkins/dockerized-e2e-runner.sh
curl: (22) The requested URL returned error: 404

The correct link should be https://raw.githubusercontent.com/kubernetes/test-infra/master/jenkins/dockerized-e2e-runner.sh(?)

/cc @k8s-oncall

Federation e2e tests failing: pulling ci tarball from wrong bucket.

From kubernetes-e2e-gce-federation logs:

+ local -r bucket=kubernetes-release-dev
++ gsutil cat gs://kubernetes-release-dev/ci/latest.txt
+ build_version=v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ echo 'Using published version kubernetes-release-dev/v1.4.0-alpha.0.1035+d30fd0cb0c23ab (from ci/latest)'
+ fetch_tars_from_gcs gs://kubernetes-release-dev/ci v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ local -r gspath=gs://kubernetes-release-dev/ci
+ local -r build_version=v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ echo 'Pulling binaries from GCS; using server version gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab.'
+ gsutil -mq cp gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab/kubernetes.tar.gz gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab/kubernetes-test.tar.gz .
Using published version kubernetes-release-dev/v1.4.0-alpha.0.1035+d30fd0cb0c23ab (from ci/latest)
Pulling binaries from GCS; using server version gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab.

It has pulled a tarball published by kubernetes-build, not kubernetes-federation-build.

That later causes this error:

FATAL: tagfile /workspace/kubernetes/hack/e2e-internal/../../cluster/../cluster/gce/../../cluster/gce/../../cluster/../federation/manifests/federated-image.tag does not exist. Make sure that you have run build/push-federation-images.sh

I've fixed this error once before ( #146 ) by having kubernetes-federation-build and kubernetes-e2e-gce-federation use an entirely separate ci bucket, so something must have changed since that last PR was merged.

\cc @quinton-hoole @nikhiljindal @ixdy @spxtr

cluster logs not collected from dockerized e2e on timeout

The kubernetes tarball is extracted inside the container in dockerized e2e, which gives us kubernetes/cluster/log-dump.sh. On timeout, we try to call log-dump.sh, but do so outside the container, so it's no longer available.

We should probably move the timeout handling inside the dockerized e2e container.

investigate docker-in-docker brokenness with kubekins-test and docker 1.11.1

As part of the Jenkins VM rebuild today, some nodes were upgraded to docker 1.11.1, instead of 1.9.1, as we'd been using before.

It seems that this causes problems for docker-in-docker in our kubekins-test image:

Verifying ./hack/../hack/verify-api-reference-docs.sh
Note: This assumes that swagger spec has been updated. Please run hack/update-swagger-spec.sh to ensure that.
Generating api reference docs at /go/src/k8s.io/kubernetes/_output/generated_html
Reading swagger spec from: /var/lib/jenkins/workspace/kubernetes-pull-test-unit-integration@2/api/swagger-spec/
docker: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
!!! Error in ./hack/update-api-reference-docs.sh:71
  'docker run ${user_flags} --rm -v "${TMP_IN_HOST}":/output:z -v "${SWAGGER_PATH}":/swagger-source:z gcr.io/google_containers/gen-swagger-docs:v5 "${SWAGGER_JSON_NAME}" "${REGISTER_FILE_URL}"' exited with status 127
Call stack:
  1: ./hack/update-api-reference-docs.sh:71 main(...)
Exiting with status 1
!!! Error in ./hack/../hack/verify-api-reference-docs.sh:34
  '"./hack/update-api-reference-docs.sh" "${OUTPUT_DIR}"' exited with status 1
Call stack:
  1: ./hack/../hack/verify-api-reference-docs.sh:34 main(...)
Exiting with status 1
FAILED   ./hack/../hack/verify-api-reference-docs.sh    1s

@bprashanth

somehow indicate which JUnit file a test failure came from

Motivating example: unit/integration test runs like https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/27600/kubernetes-pull-test-unit-integration/32233 have multiple JUnit files, with associated "verbose output" text files that can help debug test failures.

Rather than searching through each file, it'd be nice to know which one to go to for the verbose output. (Maybe even link directly to that file if it exists? May be getting too specific though.)

move federated test result config somewhere more prominent and make everything use it

jenkins/test-history/buckets.json is sort-of the source of truth for which buckets we care about, except that there is also configuration in gubernator/main.py, jenkins/test-history/gen_json.py, the submit queue, and testgrid. (And maybe other places, who knows.)

It'd be nice if we moved the configuration somewhere more prominent (maybe even top-level?) and then got all of our tooling using it.

It should also be well-documented.

(It'd be a good idea to add owners for each of the various builds at that time, too.)

[gubernator] FR: expand skipped lines

Feature request: expand skipped lines in gubernator logs.

E.g.

stderr: fatal: reference is not a tree: e5c3111e8dcb432df435dab96d7a19641adf0562

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1719)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:63)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:1984)
... skipping 9 lines ...
    at java.lang.Thread.run(Thread.java:745)
[xUnit] [INFO] - Starting to record.

Make the ... skipping 9 lines ... clickable, and expand the lines in place (e.g. unhide a hidden block).

/cc @mnshaw @rmmh

Add a build job for kops Docker images

I'd like to add a build job to pump out kops builds, so I can start using it for AWS bring-up on Jenkins as well. I recently pushed a PR to that repo to build an easy container for kops (just to avoid figuring out exactly how to package/release it just yet), but then we need to figure out how to push builds somewhere. This isn't hard, but right now gcr.io/google-containers is locked down, so a build job can't actually push there.

So here's a suggested route, putting up an issue since about half this stuff isn't code approvals:

  • Create a kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com service account.
  • Give kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com rights just to push to the gcr.io bucket for the kubernetes-jenkins project itself, i.e. gcr.io/kubernetes-jenkins
  • Use that in a new job to build/push kops.

I did consider a couple of alternate routes:

  1. Giving kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com rights to google-containers. Rejected because this gives anyone with https://github.com/kubernetes/test-infra or Jenkins access an easy way to trash a production bucket.
  2. Creating another project. I'm mostly indifferent to naming, so if someone wants CI docker pushes to go somewhere else, find a project name that's not taken and we can work on that.

cc @kubernetes/test-infra-maintainers @justinsb

Record/display cluster vital statistics at a glance for each run

Request: After a cluster has been brought up, record:

  • the actual cluster version the cluster thinks it's running (not the version we attempted to launch, these can be sometimes different if there's a bug/misconfiguration in a GKE test, for instance)
  • the docker version
  • the kernel uname string of the nodes
  • ... etc.

and be able to show those at a glance. I suspect a lot of this could be done with log post-processing, but some of it is difficult to find at all.

cc @cjcullen

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.