Giter Site home page Giter Site logo

groundnuty / k8s-wait-for Goto Github PK

View Code? Open in Web Editor NEW
563.0 13.0 117.0 81 KB

A simple script that allows to wait for a k8s service, job or pods to enter a desired state

License: MIT License

Makefile 13.99% Shell 80.16% Dockerfile 5.85%
helm kubernetes dependencies synchronization init-containers chart

k8s-wait-for's Introduction

Latest Release Build Status Trivy Security Scan Codacy Badge Latest Docker Tag Latest GHCR Image

k8s-wait-for

This tool is still actively used and working stably despite not too frequent commits! Pull requests are most welcome!

Important: For kubernetes versions <=1.23 use k8s-wait-for versions 1.*, see here.

A simple script that allows waiting for a k8s service, job or pods to enter the desired state.

Running

You can start simple. Run it on your cluster in a namespace you already have something deployed:

kubectl run k8s-wait-for --rm -it --image ghcr.io/groundnuty/k8s-wait-for:v1.6 --restart Never --command /bin/sh

Read --help and play with it!

/ > wait_for.sh -h
This script waits until a job, pod or service enter a ready state. 

wait_for.sh job [<job name> | -l<kubectl selector>]
wait_for.sh pod [<pod name> | -l<kubectl selector>]
wait_for.sh service [<service name> | -l<kubectl selector>]

Examples:
Wait for all pods with a following label to enter 'Ready' state:
wait_for.sh pod -lapp=develop-volume-gluster-krakow

Wait for all selected pods to enter the 'Ready' state:
wait_for.sh pod -l"release in (develop), chart notin (cross-support-job-3p)"

Wait for all pods with a following label to enter 'Ready' or 'Error' state:
wait_for.sh pod-we -lapp=develop-volume-gluster-krakow

Wait for at least one pod to enter the 'Ready' state, even when the other ones are in 'Error' state:
wait_for.sh pod-wr -lapp=develop-volume-gluster-krakow

Wait for all the pods in that job to have a 'Succeeded' state:
wait_for.sh job develop-volume-s3-krakow-init

Wait for all the pods in that job to have a 'Succeeded' or 'Failed' state:
wait_for.sh job-we develop-volume-s3-krakow-init

Wait for at least one pod in that job to have 'Succeeded' state, does not mind some 'Failed' ones:
wait_for.sh job-wr develop-volume-s3-krakow-init

Example

A complex Kubernetes deployment manifest (generated by helm). This deployment waits for one job to finish and 2 pods to enter a ready state.

kind: StatefulSet
metadata:
  name: develop-oneprovider-krakow
  labels:
    app: develop-oneprovider-krakow
    chart: oneprovider-krakow
    release: develop
    heritage: Tiller
    component: oneprovider
  annotations:
    version: "0.2.17"
spec:
  selector:
    matchLabels:
      app: develop-oneprovider-krakow
      chart: oneprovider-krakow
      release: develop
      heritage: Tiller
      component: "oneprovider"
  serviceName: develop-oneprovider-krakow
  template:
    metadata:
      labels:
        app: develop-oneprovider-krakow
        chart: oneprovider-krakow
        release: develop
        heritage: Tiller
        component: "oneprovider"
      annotations:
        version: "0.2.17"
    spec:
      initContainers:
        - name: wait-for-onezone
          image: ghcr.io/groundnuty/k8s-wait-for:v1.6
          imagePullPolicy: Always
          args:
            - "job"
            - "develop-onezone-ready-check"
        - name: wait-for-volume-ceph
          image: ghcr.io/groundnuty/k8s-wait-for:v1.6
          imagePullPolicy: Always
          args:
            - "pod"
            - "-lapp=develop-volume-ceph-krakow"
        - name: wait-for-volume-gluster
          image: ghcr.io/groundnuty/k8s-wait-for:v1.6
          imagePullPolicy: Always
          args:
            - "pod"
            - "-lapp=develop-volume-gluster-krakow"
      containers:
      - name: oneprovider
        image: docker.onedata.org/oneprovider:ID-a3a9ff0d78
        imagePullPolicy: Always

Complex deployment use case

This container is used extensively in deployments of Onedata system onedata/charts to specify dependencies. It leverages Kubernetes init containers, thus providing:

- a detailed event log in `kubectl describe <pod>`, on what init container is pod hanging at the moment.
- a comprehensive view in `kubectl get pods` output where init containers are shown in a form `Init:<ready>/<total>`

Example output from the deployment run of ~16 pod with dependencies just after deployment:

NAME                                                   READY     STATUS              RESTARTS   AGE
develop-cross-support-job-3p-krk-3-lis-c-b4nv1         0/1       Init:0/1            0          11s
develop-cross-support-job-3p-krk-3-par-c-lis-n-z7x6w   0/1       Init:0/1            0          11s
develop-cross-support-job-3p-krk-3-x9719               0/1       Init:0/1            0          11s
develop-cross-support-job-3p-krk-g-par-3-ztvz0         0/1       Init:0/1            0          11s
develop-cross-support-job-3p-krk-g-v5lf2               0/1       Init:0/1            0          11s
develop-cross-support-job-3p-krk-n-par-3-pnbcm         0/1       Init:0/1            0          11s
develop-cross-support-job-3p-lis-3-cpj3f               0/1       Init:0/1            0          11s
develop-cross-support-job-3p-par-n-8zdt2               0/1       Init:0/1            0          11s
develop-cross-support-job-3p-par-n-lis-c-kqdf0         0/1       Init:0/1            0          11s
develop-oneclient-krakow-2773392814-wc1dv              0/1       Init:0/3            0          11s
develop-oneclient-lisbon-3267879054-2v6cg              0/1       Init:0/3            0          9s
develop-oneclient-paris-2076479302-f6hh9               0/1       Init:0/3            0          9s
develop-onedata-cli-krakow-1801798075-b5wpj            0/1       Init:0/1            0          11s
develop-onedata-cli-lisbon-139116355-fwtjv             0/1       Init:0/1            0          10s
develop-onedata-cli-paris-2662312307-9z9l1             0/1       Init:0/1            0          11s
develop-oneprovider-krakow-3634465102-tftc6            0/1       Pending             0          10s
develop-oneprovider-lisbon-3034775369-8n31x            0/1       Init:0/3            0          8s
develop-oneprovider-paris-3034358951-19mhf             0/1       Init:0/3            0          10s
develop-onezone-304145816-dmxn1                        0/1       ContainerCreating   0          11s
develop-volume-ceph-krakow-479580114-mkd1d             0/1       ContainerCreating   0          11s
develop-volume-ceph-lisbon-1249181958-1f0mt            0/1       ContainerCreating   0          9s
develop-volume-ceph-paris-400443052-dc347              0/1       ContainerCreating   0          9s
develop-volume-gluster-krakow-761992225-sj06m          0/1       Running             0          11s
develop-volume-gluster-lisbon-3947152141-jlmvb         0/1       Running             0          8s
develop-volume-gluster-paris-3588749681-9bnw8          0/1       ContainerCreating   0          11s
develop-volume-nfs-krakow-2528947555-6mxzt             1/1       Running             0          10s
develop-volume-nfs-lisbon-3473018547-7nljf             0/1       ContainerCreating   0          11s
develop-volume-nfs-paris-2956540513-4bdzt              0/1       ContainerCreating   0          11s
develop-volume-s3-krakow-23786741-pdxtj                0/1       Running             0          9s
develop-volume-s3-krakow-init-gqmmp                    0/1       Init:0/1            0          11s
develop-volume-s3-lisbon-3912793669-d4xh5              0/1       Running             0          10s
develop-volume-s3-lisbon-init-mq9nk                    0/1       Init:0/1            0          11s
develop-volume-s3-paris-124394749-qwt18                0/1       Running             0          8s
develop-volume-s3-paris-init-jb4k3                     0/1       Init:0/1            0          11s

1 min after, you can see the changes in the Status column:

develop-cross-support-job-3p-krk-3-lis-c-b4nv1         0/1       Init:0/1          0          1m
develop-cross-support-job-3p-krk-3-par-c-lis-n-z7x6w   0/1       Init:0/1          0          1m
develop-cross-support-job-3p-krk-3-x9719               0/1       Init:0/1          0          1m
develop-cross-support-job-3p-krk-g-par-3-ztvz0         0/1       Init:0/1          0          1m
develop-cross-support-job-3p-krk-g-v5lf2               0/1       Init:0/1          0          1m
develop-cross-support-job-3p-krk-n-par-3-pnbcm         0/1       Init:0/1          0          1m
develop-cross-support-job-3p-lis-3-cpj3f               0/1       Init:0/1          0          1m
develop-cross-support-job-3p-par-n-8zdt2               0/1       Init:0/1          0          1m
develop-cross-support-job-3p-par-n-lis-c-kqdf0         0/1       Init:0/1          0          1m
develop-oneclient-krakow-2773392814-wc1dv              0/1       Init:0/3          0          1m
develop-oneclient-lisbon-3267879054-2v6cg              0/1       Init:0/3          0          58s
develop-oneclient-paris-2076479302-f6hh9               0/1       Init:0/3          0          58s
develop-onedata-cli-krakow-1801798075-b5wpj            0/1       Init:0/1          0          1m
develop-onedata-cli-lisbon-139116355-fwtjv             0/1       Init:0/1          0          59s
develop-onedata-cli-paris-2662312307-9z9l1             0/1       Init:0/1          0          1m
develop-oneprovider-krakow-3634465102-tftc6            0/1       Init:1/3          0          59s
develop-oneprovider-lisbon-3034775369-8n31x            0/1       Init:2/3          0          57s
develop-oneprovider-paris-3034358951-19mhf             0/1       PodInitializing   0          59s
develop-onezone-304145816-dmxn1                        0/1       Running           0          1m
develop-volume-ceph-krakow-479580114-mkd1d             1/1       Running           0          1m
develop-volume-ceph-lisbon-1249181958-1f0mt            1/1       Running           0          58s
develop-volume-ceph-paris-400443052-dc347              1/1       Running           0          58s
develop-volume-gluster-krakow-761992225-sj06m          1/1       Running           0          1m
develop-volume-gluster-lisbon-3947152141-jlmvb         1/1       Running           0          57s
develop-volume-gluster-paris-3588749681-9bnw8          1/1       Running           0          1m
develop-volume-nfs-krakow-2528947555-6mxzt             1/1       Running           0          59s
develop-volume-nfs-lisbon-3473018547-7nljf             1/1       Running           0          1m
develop-volume-nfs-paris-2956540513-4bdzt              1/1       Running           0          1m
develop-volume-s3-krakow-23786741-pdxtj                1/1       Running           0          58s
develop-volume-s3-lisbon-3912793669-d4xh5              1/1       Running           0          59s
develop-volume-s3-paris-124394749-qwt18                1/1       Running           0          57s

Troubleshooting

Verify that you can access the Kubernetes API from within the k8s-wait-for container by running kubectl get services. If you get a permissions error like

Error from server (Forbidden): services is forbidden: User "system:serviceaccount:default:default" cannot list resource "services" in API group "" in the namespace "default"

the pod lacks the permissions to perform the kubectl get query. To fix this, follow the instrctions for the 'pod-reader' role and clusterrole here.

or use these command lines which add services and deployments to the pods in those examples: kubectl create role pod-reader --verb=get --verb=list --verb=watch --resource=pods,services,deployments

kubectl create rolebinding default-pod-reader --role=pod-reader --serviceaccount=default:default --namespace=default

An extensive discussion on the problem of granting necessary permissions and a number of example solutions can be found here.

Make sure the service account is mounted. The connection to the server localhost:8080 was refused - did you specify the right host or port? might indicate that the service account is not mounted to the pod. Double check whether your service account and pod define automountServiceAccountToken: true. If the service account is mounted, you should see files inside /var/run/secrets/kubernetes.io/serviceaccount folder, otherwise /var/run/secrets/kubernetes.io might not exist at all.

k8s-wait-for's People

Contributors

alexsteeel avatar c-simpson avatar eggplants avatar groundnuty avatar jellonek avatar kostegit avatar lautis avatar michalschott avatar moreinhardt avatar onpaj avatar pzankov avatar rally25rs avatar sergeyshaykhullin avatar sgreene570 avatar theophileds avatar tico24 avatar velichkostoev avatar villesau avatar yss14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-wait-for's Issues

Multiple CVEs reported by Aqua Scan

Aqua Scan reports tons of CVEs in cURL binary:

image

Since we use this image in production environment, every CVE causes critical security incident (via automated job periodically checking all clusters) which is a pain itself. Easy fix would be to bump cURL version.

retried job waited for for ever

If the job first fails and is retried by kubernetes, k8s-wait-for is not able to revive from it. It waits for the job for ever even after the retry has succeeded.

Namespaces other than default

The script does not seem to find pods or services in other namespaces ..
Adding --all-namespaces to the kubectl commands in the script solve the issue.

If the job is already deleted by ttl

We got a problem.
We use k8s-wait-for in initContainers in the workload to wait for database migrations to complete, and we have HPA running.
If the ttlSecondsAfterFinished on the Job expires and the pod is removed, then the new workload pods fail the job check.

What options do we have to solve the problem?

Add option

Hi,

It will be very nice to add an option to wait until all pods are running / failed in a specific namespace.
something like:
wait_for.sh namespace my-namespace

Tx,
Roee.

Pulling image is denied

❯ docker pull ghcr.io/groundnuty/k8s-wait-for:v2.0
Error response from daemon: Head "https://ghcr.io/v2/groundnuty/k8s-wait-for/manifests/v2.0": denied: denied

were the permissions for this changed recently?

Fix security vulnerabilities

Currently the image groundnuty/k8s-wait-for:no-root-v2.0 has several security vulnerabilities.

Running the command docker scout cves groundnuty/k8s-wait-for:no-root-v2.0 list all of these.
Here is the summary at the end:

67 vulnerabilities found in 8 packages
  LOW       5
  MEDIUM    30
  HIGH      29
  CRITICAL  3

The Trivy scan for this repo has been failing for some time too:
https://github.com/groundnuty/k8s-wait-for/actions/workflows/trivy.yml 💥

I have not looked into this in depth, but maybe the older image of alpine is a part of this?
FROM alpine:3.16.2

Wait for job still waiting if one of the pods failed

Hi @groundnuty ,

Your wait for service is amazing, but I have a feature request that maybe you can help with.

Issue:
I have a use case which sometimes (depends on the server resources) my job will create a pod that will be failed, but using backoffLimit it will create a second pod that will run successfully.
The thing is that wait for job is waiting that all pods will be succeeded and not just the last pod for a job.

Expected behavior:
add the ability to wait for job with only the last pod success and not all of the pods.

How to reproduce:

  1. create job withbackoffLimit: 1
  2. create pod with wait for job above
  3. failed the first pod from the job above
  4. check the wait for pod logs

[Question] Update alpine version

Hi, Thanks for this helpful code! 🚀

We are very interested in using this utility. Still, the current version uses a base image that contains some vulnerabilities classified as critical and is impossible to use in some projects by security policies:

I would like to know if it is possible to update the Alpine base image version** from 3.12.1 to 3.12.19 or the latest Alpine version.

thanks,

Wait for pod to be deleted

With kubectl it can be done, something like this. kubectl wait --for delete pod --selector=<label>=<value>

Unable to find "/v1, Resource=pods" invalid label key "-ltier"

I'm getting this error:

No resources found.
FalseError from server (BadRequest): Unable to find "/v1, Resource=pods" that match label selector "app=cloud-services,-ltier=postgres", field selector "": unable to parse requirement: invalid label key "-ltier": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')

This is on container image tag v1.2-5-g92c083e and the current script in master.


This appears to be caused by

    get_service_state_selectors=$(kubectl get service "$get_service_state_name" $KUBECTL_ARGS -ojson 2>&1 | jq -cr 'if . | has("items") then .items[]
else . end | [ .spec.selector | to_entries[] | "-l\(.key)=\(.value)" ] | join(",") ')

when the service has more than 1 selector

        "selector": {
            "app": "cloud-services",
            "tier": "postgres"

The to_entries[] | "-l\(.key)=\(.value)" ] | join(",") appears to be incorrect and would make

-lapp=cloud-services,-ltier=postgres

when it should not have the additional -l. It should be

-lapp=cloud-services,tier=postgres

The code below that is:

    for get_service_state_selector in $get_service_state_selectors ; do
        get_service_state_selector=$(echo "$get_service_state_selector" | tr ',' ' ')
        get_service_state_state=$(get_pod_state "$get_service_state_selectors")
        get_service_state_states="${get_service_state_states}${get_service_state_state}" ;
    done
    echo "$get_service_state_states"

which looks like it would then replace that comma with a space to make

-lapp=cloud-services -ltier=postgres

except for 2 issues; 1) get_service_state_selector is never used, only assigned to, 2) I don't think the kubectl command will allow multiple -l arguments anyway.


I think this could be fixed by doing:

get_service_state() {
    get_service_state_name="$1"
    get_service_state_selectors=$(kubectl get service "$get_service_state_name" $KUBECTL_ARGS -ojson 2>&1 | jq -cr 'if . | has("items") then .items[]
else . end | [ .spec.selector | to_entries[] | "\(.key)=\(.value)" ] | join(",") ')
    get_service_state_states=""
    for get_service_state_selector in $get_service_state_selectors ; do
        get_service_state_state=$(get_pod_state "-l$get_service_state_selectors")
        get_service_state_states="${get_service_state_states}${get_service_state_state}" ;
    done
    echo "$get_service_state_states"
}

But I'm not sure of all the cases that this has to handle.

Example in README does not work with kubectl 1.22

Hi. I'm new to this tool and to k8s in general and I had some problems getting started.

The example in the README does not seem to be working with later versions of kubectl:

$ kubectl run --generator=run-pod/v1 k8s-wait-for --rm -it --image groundnuty/k8s-wait-for:v1.3 --restart Never --command /bin/sh
Error: unknown flag: --generator
See 'kubectl run --help' for usage.
$ kubectl version --client
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.3", GitCommit:"c92036820499fedefec0f847e2054d824aea6cd1", GitTreeState:"clean", BuildDate:"2021-10-27T18:41:28Z", GoVersion:"go1.16.9", Compiler:"gc", Platform:"linux/amd64"}

The release notes indicate that the --generator flag was deprecated and then removed in 1.21:

https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.21.md

Old version in README examples leads to errors

I got wrecked by #60 because I was using an old image (v1.6). It took me a while to realize that I just copied the example in the README, but the latest was at v2.0.

Maybe it'd be helpful to update the README, or setup something to update the README when a new package is released?

Or maybe even better, since there's specific k8s versions this package is compatible with, wait-for.sh could detect this and throw?

returns 0 (success) when service does not exist

I stumbled across this issue when I was not passing a --namespace ... to the script args...

My init pods were reporting success

    state:
      terminated:
        containerID: docker://754eff26e250a9cab94d6a5f49e118bb8975fe93807bc05cbae4bcec6104f722
        exitCode: 0
        finishedAt: "2019-11-14T20:39:09Z"
        reason: Completed
        startedAt: "2019-11-14T20:39:08Z"

and the logs from the init container contained

parse error: Invalid numeric literal at line 1, column 6
service qa-postgres  is ready.

In reality, it was querying a service that does not exist. The output from the kubectl command would have been:

$ kubectl get service "qa-postgres" -ojson
Error from server (NotFound): services "qa-postgres" not found

which when piped to jq results in:

$ kubectl get service "qa-postgres" -ojson 2>&1 |  jq -cr
parse error: Invalid numeric literal at line 1, column 6

however that failure (as seen in the pod log output) results in the sh script returning 0 (success)

It should cause the init pod to fail.


The above is true whenever kubectl outputs non-json.

/usr/local/bin # kubectl get pods -n qa
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:qa:default" cannot list resource "pods" in API group "" in the namespace "qa"

/usr/local/bin # wait_for.sh service qa-postgress -n qa && echo "*** returns success"
parse error: Invalid numeric literal at line 1, column 6
[2019-11-14 23:12:49] service qa-postgress is ready.
*** returns success

wait-for.sh job never finishes if the job fails (even if it re-runs and completed)

I've got a job that normally succeeds and we don't see any problems with k8s-wait-for. However, I'm observing a case where sometimes the job will fail and restart a new job. The metadata on the job ends up here:

status:
  completionTime: "2022-03-03T01:25:56Z"
  conditions:
  - lastProbeTime: "2022-03-03T01:25:56Z"
    lastTransitionTime: "2022-03-03T01:25:56Z"
    status: "True"
    type: Complete
  failed: 1
  startTime: "2022-03-03T01:25:14Z"
  succeeded: 1

You can see it completed but has one failure. This is fine because the job only needs to succeed once. However right now k8s-wait-for does not consider this ok and continues to spin forever.

The wait completed when the job was not yet done

We are running a k8-wait-for init container in aws/eks. Kubernetes server version =
"serverVersion": {
"major": "1",
"minor": "23+",
"gitVersion": "v1.23.17-eks-0a21954",
"gitCommit": "cd5c12c51b0899612375453f7a7c2e7b6563f5e9",
"gitTreeState": "clean",
"buildDate": "2023-04-15T00:32:27Z",
"goVersion": "go1.19.6",
"compiler": "gc",
"platform": "linux/amd64"
}

But still, the outcome of the describe job command is
...
Completion Mode: NonIndexed
Start Time: Mon, 26 Jun 2023 10:55:00 +0000
Pods Statuses: 1 Active / 0 Succeeded / 0 Failed
Pod Template:
...

which means that the script will silently fail to wait for the job.

#63 will solve the immediate problem, but it may be a good idea to be a bit more defensive and exit as error when the first sed command yields an empty string.

Fails to run on GKE because of permission issue

The container fails on Error from server (Forbidden): jobs.batch "init-data-job" is forbidden: User "system:serviceaccount:default:default" cannot get jobs.batch in the namespace "default": Unknown user "system:serviceaccount:default:default"

Waiting for <foo> log output does not show timestamp

Here's some sample logs when using k8s-wait-for:v1.5.1 to wait for a k8s job to complete:

Waiting for job my-job...
Waiting for job my-job...
Waiting for job my-job..
[2021-09-27 14:02:52] job my-job is ready.

It would be awesome if every intermediate log leading up to the " is ready" line also outputted timestamps, so someone watching logs in real time could understand how long the pod has been waiting.

Code xrefs

https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L216
https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L207

How to debug?

Hi,

How can I debug initContainer with k8s-wait-for?
I only see sth like this:

Waiting for pod -lapp=xxxx...
Waiting for pod -lapp=xxxx...
Waiting for pod -lapp=xxxx...

Is there any debug swtich?

Wait for job does not work as expected

I was expecting for this app to wait until a job completed successfully but it only waited for the job to be ready. Am I misunderstanding something?

This is a portion of my deployment resource and I have verified that my job runs to completion and exits with a status code of 0.

initContainers:
  - name: data-migration-init
    image: 'groundnuty/k8s-wait-for:v1.7'
    args:
      - job
      - my-data-migration-job

cannot get resource "jobs" in API group "batch" in the namespace "default"

This seems similar to #39, however this issue provides no solution.

I am running k8s-wait-for as an init container in a deployment, waiting for a job to complete.

The container starts fine, but then exits with:

Error from server (Forbidden): jobs.batch "app-migration-a4c7915a7495153bcae396e7bd9e3d66c" is forbidden: User "system:serviceaccount:default:default" cannot get resource "jobs" in API group "batch" in the namespace "default"

It seems to me like the container with wait-for is lacking the required permissions to access the API with kubectl.

However, I was unable to find any information in the documentation on how to set up proper credentials.

Throttling warning causes incorrect behavior

I was wondering why my wait for was never finishing and while debugging in the container I saw that my API server was returning a warning message on top of the POD infos:

I0624 10:31:01.428761     375 request.go:668] Waited for 1.181153768s due to client-side throttling, not priority and fairness, request: GET:https://<server>/apis/rbac.authorization.k8s.io/v1?timeout=32s
NAME                            READY   STATUS    RESTARTS   AGE
<pod>   1/1     Running   0          173m
<pod>   1/1     Running   0          173m
<pod>   1/1     Running   0          173m

the script seems to be parsing the first line and ends up not detecting the pod status.

I might have some time to prepare a quick fix but for now I will leave an issue report here.

Basem Vaseghi [email protected], Daimler TSS GmbH, legal info/Impressum

wait_for.sh job accepts failed jobs

The wait for script cannot distinct between failed jobs when waiting for unless there are more then 10 failed pods when describing the job.

$kubectl get jobs,pods
NAME           COMPLETIONS   DURATION   AGE
job.batch/pi   0/1           22h        22h

NAME           READY   STATUS   RESTARTS   AGE
pod/pi-j64gc   0/1     Error    0          22h

When using the script without -we flag

# wait_for.sh job pi
[2020-07-03 08:11:50] job pi is ready.

The error is because of the regex at https://github.com/groundnuty/k8s-wait-for/blob/master/wait_for.sh#L172
It should be changed to

sed_reg='-e s/^[1-9][[:digit:]]*:[[:digit:]]+:[[:digit:]]+$/1/p -e s/^0:[[:digit:]]+:[1-9][[:digit:]]*$/1/p'

The last digit from the second regex [1-9][[:digit:]]+ will only match more then 1 digits. because of the + sign. Above regex changes it to * on the second digit. Allowing it to be missing.
Thus matching 0:0:1 as well as 0:0:10

After the fix:

/ #  wait_for.sh job pi
Waiting for job pi...
Waiting for job pi...

/ #  wait_for.sh job-we pi
[2020-07-03 08:59:20] job pi is ready.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.