Giter Site home page Giter Site logo

kubernetes / test-infra Goto Github PK

View Code? Open in Web Editor NEW
3.8K 130.0 2.6K 210.04 MB

Test infrastructure for the Kubernetes project.

License: Apache License 2.0

Shell 10.37% Python 25.13% CSS 0.25% Makefile 1.56% HTML 0.99% JavaScript 2.00% Go 51.93% Dockerfile 1.53% TypeScript 0.54% Jsonnet 2.70% HCL 0.13% Jinja 0.35% Smarty 2.53%
k8s-sig-testing

test-infra's Introduction

test-infra

GoDoc Build status

This repository contains tools and configuration files for the testing and automation needs of the Kubernetes project.

Our architecture diagram provides an (updated #13063) overview of how the different tools and services interact.

CI Job Management

Kubernetes uses a prow instance at prow.k8s.io to handle CI and automation for the entire project. Everyone can participate in a self-service PR-based workflow, where changes are automatically deployed after they have been reviewed. All job configs are located in config/jobs

Dashboards

Test Result Dashboards

Job and PR Dashboards

Other Tools

  • boskos manages pools of resources; our CI leases GCP projects from these pools
  • experiment is a catchall directory for one-shot tools or scripts
  • gcsweb is a UI we use to display test artifacts stored in public GCS buckets
  • ghproxy is a GitHub-aware reverse proxy cache to help keep our GitHub API token usage within rate limits
  • gopherage is a tool for manipulating Go coverage files
  • greenhouse is a shared bazel cache we use to ensure faster build and test presubmit jobs
  • label_sync creates, updates and migrates GitHub labels across orgs and repos based on labels.yaml file
  • kettle extracts test results from GCS and puts them into bigquery
  • kubetest is how our CI creates and e2e tests kubernetes clusters
  • maintenance/migratestatus is used to migrate or retire GitHub status contexts on PRs across orgs and repos
  • metrics runs queries against bigquery to generate metrics based on test results
  • robots/commenter is used by some of our jobs to comment on GitHub issues

Contributing

Please see CONTRIBUTING.MD

test-infra's People

Contributors

0xmichalis avatar alvaroaleman avatar andyzhangx avatar bentheelder avatar cblecker avatar chaodaig avatar cjwagner avatar cpanato avatar dims avatar eparis avatar fejta avatar gmarek avatar hakman avatar ixdy avatar justaugustus avatar justinsb avatar k8s-ci-robot avatar katharine avatar krzyzacy avatar listx avatar mpherman2 avatar nikhita avatar openshift-bot avatar rifelpet avatar shyamjvs avatar spiffxp avatar spxtr avatar stevekuznetsov avatar wojtek-t avatar zmerlynn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

test-infra's Issues

metadata cache server curl check doesn't work

The curl check in the metadata cache control script doesn't work, as curl will fail over to the real metadata server:

$ curl -v http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip
* About to connect() to metadata.google.internal port 80 (#0)
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 10.240.0.2...
* Connection refused
*   Trying 169.254.169.254...
* connected
* Connected to metadata.google.internal (169.254.169.254) port 80 (#0)
> GET /computeMetadata/v1/instance/network-interfaces/0/ip HTTP/1.1

Auto-file issues for all broken tests

We've had tests and even entire test suites broken for days, weeks, even months and nobody noticed. @lavalamp suggested that we could auto-file issues for all broken tests, as we do for flaky tests. That seems like a good idea to me.

[gubernator] FR: expand skipped lines

Feature request: expand skipped lines in gubernator logs.

E.g.

stderr: fatal: reference is not a tree: e5c3111e8dcb432df435dab96d7a19641adf0562

    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.launchCommandIn(CliGitAPIImpl.java:1719)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl.access$500(CliGitAPIImpl.java:63)
    at org.jenkinsci.plugins.gitclient.CliGitAPIImpl$9.execute(CliGitAPIImpl.java:1984)
... skipping 9 lines ...
    at java.lang.Thread.run(Thread.java:745)
[xUnit] [INFO] - Starting to record.

Make the ... skipping 9 lines ... clickable, and expand the lines in place (e.g. unhide a hidden block).

/cc @mnshaw @rmmh

somehow indicate which JUnit file a test failure came from

Motivating example: unit/integration test runs like https://k8s-gubernator.appspot.com/build/kubernetes-jenkins/pr-logs/pull/27600/kubernetes-pull-test-unit-integration/32233 have multiple JUnit files, with associated "verbose output" text files that can help debug test failures.

Rather than searching through each file, it'd be nice to know which one to go to for the verbose output. (Maybe even link directly to that file if it exists? May be getting too specific though.)

investigate docker-in-docker brokenness with kubekins-test and docker 1.11.1

As part of the Jenkins VM rebuild today, some nodes were upgraded to docker 1.11.1, instead of 1.9.1, as we'd been using before.

It seems that this causes problems for docker-in-docker in our kubekins-test image:

Verifying ./hack/../hack/verify-api-reference-docs.sh
Note: This assumes that swagger spec has been updated. Please run hack/update-swagger-spec.sh to ensure that.
Generating api reference docs at /go/src/k8s.io/kubernetes/_output/generated_html
Reading swagger spec from: /var/lib/jenkins/workspace/kubernetes-pull-test-unit-integration@2/api/swagger-spec/
docker: error while loading shared libraries: libltdl.so.7: cannot open shared object file: No such file or directory
!!! Error in ./hack/update-api-reference-docs.sh:71
  'docker run ${user_flags} --rm -v "${TMP_IN_HOST}":/output:z -v "${SWAGGER_PATH}":/swagger-source:z gcr.io/google_containers/gen-swagger-docs:v5 "${SWAGGER_JSON_NAME}" "${REGISTER_FILE_URL}"' exited with status 127
Call stack:
  1: ./hack/update-api-reference-docs.sh:71 main(...)
Exiting with status 1
!!! Error in ./hack/../hack/verify-api-reference-docs.sh:34
  '"./hack/update-api-reference-docs.sh" "${OUTPUT_DIR}"' exited with status 1
Call stack:
  1: ./hack/../hack/verify-api-reference-docs.sh:34 main(...)
Exiting with status 1
FAILED   ./hack/../hack/verify-api-reference-docs.sh    1s

@bprashanth

Cannot merge PR

My PR #105 cannot be merged, because of some problems with CLA. Despite I work at google, bot added CLA:NO label, and manual modification of labels didn't make my PR merge-able.

CC @gmarek

federation e2e gce automated tests on Jenkins fail consistently with token auth attempt failed with status: 403 Forbidden

+++ [0829 21:48:48] Pushing gcr.io/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube:v1.4.0-alpha.3.197_ef82f394a9e1ba
-> GCR repository detected. Using gcloud
@nikhiljindal I think you know about this, but just so we don't lose track of it, here's an issue to track it.

See kubernetes/kubernetes#31655 (comment) for an example...

@k8s-bot federation gce e2e test this

The push refers to a repository [gcr.io/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube](len: 1)
6864c6906300: Preparing
Post https://gcr.io/v2/k8s-jkns-pr-bldr-e2e-gce-fdrtn/hyperkube/blobs/uploads/: token auth attempt for registry: https://gcr.io/v2/token?account=oauth2accesstoken&scope=repository%3Ak8s-jkns-pr-bldr-e2e-gce-fdrtn%2Fhyperkube%3Apush%2Cpull&service=gcr.io request failed with status: 403 Forbidden
!!! Error in ./build/../build/../federation/cluster/common.sh:321
'gcloud docker push "${docker_image_tag}"' exited with status 1
Call stack:
1: ./build/../build/../federation/cluster/common.sh:321 push-federation-images(...)
2: ./build/push-federation-images.sh:29 main(...)
Exiting with status 1
Build step 'Execute shell' marked build as failure

Record/display cluster vital statistics at a glance for each run

Request: After a cluster has been brought up, record:

  • the actual cluster version the cluster thinks it's running (not the version we attempted to launch, these can be sometimes different if there's a bug/misconfiguration in a GKE test, for instance)
  • the docker version
  • the kernel uname string of the nodes
  • ... etc.

and be able to show those at a glance. I suspect a lot of this could be done with log post-processing, but some of it is difficult to find at all.

cc @cjcullen

move federated test result config somewhere more prominent and make everything use it

jenkins/test-history/buckets.json is sort-of the source of truth for which buckets we care about, except that there is also configuration in gubernator/main.py, jenkins/test-history/gen_json.py, the submit queue, and testgrid. (And maybe other places, who knows.)

It'd be nice if we moved the configuration somewhere more prominent (maybe even top-level?) and then got all of our tooling using it.

It should also be well-documented.

(It'd be a good idea to add owners for each of the various builds at that time, too.)

Federation e2e tests failing: pulling ci tarball from wrong bucket.

From kubernetes-e2e-gce-federation logs:

+ local -r bucket=kubernetes-release-dev
++ gsutil cat gs://kubernetes-release-dev/ci/latest.txt
+ build_version=v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ echo 'Using published version kubernetes-release-dev/v1.4.0-alpha.0.1035+d30fd0cb0c23ab (from ci/latest)'
+ fetch_tars_from_gcs gs://kubernetes-release-dev/ci v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ local -r gspath=gs://kubernetes-release-dev/ci
+ local -r build_version=v1.4.0-alpha.0.1035+d30fd0cb0c23ab
+ echo 'Pulling binaries from GCS; using server version gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab.'
+ gsutil -mq cp gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab/kubernetes.tar.gz gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab/kubernetes-test.tar.gz .
Using published version kubernetes-release-dev/v1.4.0-alpha.0.1035+d30fd0cb0c23ab (from ci/latest)
Pulling binaries from GCS; using server version gs://kubernetes-release-dev/ci/v1.4.0-alpha.0.1035+d30fd0cb0c23ab.

It has pulled a tarball published by kubernetes-build, not kubernetes-federation-build.

That later causes this error:

FATAL: tagfile /workspace/kubernetes/hack/e2e-internal/../../cluster/../cluster/gce/../../cluster/gce/../../cluster/../federation/manifests/federated-image.tag does not exist. Make sure that you have run build/push-federation-images.sh

I've fixed this error once before ( #146 ) by having kubernetes-federation-build and kubernetes-e2e-gce-federation use an entirely separate ci bucket, so something must have changed since that last PR was merged.

\cc @quinton-hoole @nikhiljindal @ixdy @spxtr

404 on getting dockerized-e2e-runner.sh in all kubernetes builds

++ curl -fsS --retry 3 https://raw.githubusercontent.com/kubernetes/kubernetes/test-infra/jenkins/dockerized-e2e-runner.sh
curl: (22) The requested URL returned error: 404

The correct link should be https://raw.githubusercontent.com/kubernetes/test-infra/master/jenkins/dockerized-e2e-runner.sh(?)

/cc @k8s-oncall

Cross-link gubernator pages

Pages should be discoverable through browsing.

/ to /pr
/pr/1345 to /pr/user
/build/$PR_LOGS/... to /pr/user
/pr/user to /pr/123? 
    Currently links to github directly, but we have 
    a better way to visualize the test results. 

cluster logs not collected from dockerized e2e on timeout

The kubernetes tarball is extracted inside the container in dockerized e2e, which gives us kubernetes/cluster/log-dump.sh. On timeout, we try to call log-dump.sh, but do so outside the container, so it's no longer available.

We should probably move the timeout handling inside the dockerized e2e container.

add Jenkins metadata to GCE VMs

When trying to clean up old VMs or other resources, I'm often left wondering "where did this even come from?".

We could probably add metadata describing the Jenkins job and build number that spawned the VM, as well as the PR# on PR Jenkins. There's even an add-instance-metadata function in cluster/gce/util.sh we can use.

Add a build job for kops Docker images

I'd like to add a build job to pump out kops builds, so I can start using it for AWS bring-up on Jenkins as well. I recently pushed a PR to that repo to build an easy container for kops (just to avoid figuring out exactly how to package/release it just yet), but then we need to figure out how to push builds somewhere. This isn't hard, but right now gcr.io/google-containers is locked down, so a build job can't actually push there.

So here's a suggested route, putting up an issue since about half this stuff isn't code approvals:

  • Create a kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com service account.
  • Give kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com rights just to push to the gcr.io bucket for the kubernetes-jenkins project itself, i.e. gcr.io/kubernetes-jenkins
  • Use that in a new job to build/push kops.

I did consider a couple of alternate routes:

  1. Giving kubekins-image-builder@kubernetes-jenkins.iam.gserviceaccount.com rights to google-containers. Rejected because this gives anyone with https://github.com/kubernetes/test-infra or Jenkins access an easy way to trash a production bucket.
  2. Creating another project. I'm mostly indifferent to naming, so if someone wants CI docker pushes to go somewhere else, find a project name that's not taken and we can work on that.

cc @kubernetes/test-infra-maintainers @justinsb

test-history/gen_json script should not depend on accessing Jenkins

It would be useful if, like the munger, the test-history scripts depended solely on GCS buckets as input. This would allow federating tests results on the dashboard, not just in PR statuses, via the familiar GCS bucket format.

Right now, accessing the jenkins server is used to list job names, their builds (with status), and the timestamp. These can be replaced with, respectively, a config file listing job name -> gcs path mappings, reading build numbers from the bucket, and parsing started.json + finished.json.

Federation e2e failure: wrong ci build version

from kubernetes-e2e-gce-federation job logs

++ gsutil cat gs://kubernetes-release/ci/latest.txt
+ build_version=v1.3.0-beta.0
+ echo 'Using published version ci/v1.3.0-beta.0 (from ci/latest)'
+ fetch_tars_from_gcs ci v1.3.0-beta.0
+ local -r bucket=ci
+ local -r build_version=v1.3.0-beta.0

This is not the build_version that kubernetes-federation-build is pushing, which naturally causes the downstream kubernetes-e2e-gce-federation job to pull the wrong tarballs and fail.

I don't yet understand why this issue took so long to pop up, as the federation stuff has been merged for weeks and this started happening a few days ago.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.