lwolf / kube-cleanup-operator Goto Github PK

Kubernetes Operator to automatically delete completed Jobs and their Pods

License: MIT License

Makefile 4.34% Go 91.81% Dockerfile 1.10% Mustache 2.75%

kube-cleanup-operator's Introduction

Kubernetes cleanup operator

Kubernetes Controller to automatically delete completed Jobs and Pods. Controller listens for changes in Pods and Jobs and acts accordingly with config arguments.

Some common use-case scenarios:

Delete Jobs and their pods after their completion
Delete Pods stuck in a Pending state
Delete Pods in Evicted state
Delete orphaned Pods (Pods without an owner in non-running state)

flag name	pod	job
delete-successful-after	delete after specified period if owned by the job	delete after specified period
delete-failed-after	delete after specified period if owned by the job	delete after specified period
delete-orphaned-pods-after	delete after specified period (any completion status)	N/A
delete-evicted-pods-after	delete on discovery	N/A
delete-pending-pods-after	delete after specified period	N/A

Helm chart

Chart is available to install from https://charts.lwolf.org/ (https://github.com/lwolf/kube-charts)

$ helm repo add lwolf-charts http://charts.lwolf.org
"lwolf-charts" has been added to your repositories
$ helm search kube-cleanup
NAME                              	CHART VERSION	APP VERSION	DESCRIPTION
lwolf-charts/kube-cleanup-operator	1.0.0        	v0.8.1     	Kubernetes Operator to automatically delete completed Job...

Usage

# remember to change namespace in RBAC manifests for monitoring namespaces other than "default"

kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment/rbac.yaml

# create deployment
kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment/deployment.yaml


kubectl logs -f $(kubectl get pods --namespace default -l "run=cleanup-operator" -o jsonpath="{.items[0].metadata.name}")

# Use simple job to test it
kubectl create -f https://k8s.io/examples/controllers/job.yaml

Docker images

docker pull quay.io/lwolf/kube-cleanup-operator

or you can build it yourself as follows:

$ docker build .

Development

$ make install_deps
$ make build
$ ./bin/kube-cleanup-operator -run-outside-cluster -dry-run=true

Usage

Pre v0.7.0

    $ ./bin/kube-cleanup-operator --help
    Usage of ./bin/kube-cleanup-operator:
      -namespace string
            Watch only this namespace (omit to operate clusterwide)
      -run-outside-cluster
            Set this flag when running outside of the cluster.
      -keep-successful
            the number of hours to keep a successful job
            -1 - forever 
            0  - never (default)
            >0 - number of hours
      -keep-failures
            the number of hours to keep a failed job
            -1 - forever (default)
            0  - never
            >0 - number of hours
      -keep-pending
            the number of hours to keep a pending job
            -1 - forever (default)
            0  - forever
            >0 - number of hours
      -dry-run
            Perform dry run, print only

After v0.7.0

Usage of ./bin/kube-cleanup-operator:
  -delete-evicted-pods-after duration
        Delete pods in evicted state (golang duration format, e.g 5m), 0 - never delete (default 15m0s)
  -delete-failed-after duration
        Delete jobs and pods in failed state after X duration (golang duration format, e.g 5m), 0 - never delete
  -delete-orphaned-pods-after duration
        Delete orphaned pods. Pods without an owner in non-running state (golang duration format, e.g 5m), 0 - never delete (default 1h0m0s)
  -delete-pending-pods-after duration
        Delete pods in pending state after X duration (golang duration format, e.g 5m), 0 - never delete
  -delete-successful-after duration
        Delete jobs and pods in successful state after X duration (golang duration format, e.g 5m), 0 - never delete (default 15m0s)
  -dry-run
        Print only, do not delete anything.
  -ignore-owned-by-cronjobs
        [EXPERIMENTAL] Do not cleanup pods and jobs created by cronjobs
  -keep-failures int
        Number of hours to keep failed jobs, -1 - forever (default) 0 - never, >0 number of hours (default -1)
  -keep-pending int
        Number of hours to keep pending jobs, -1 - forever (default) >0 number of hours (default -1)
  -keep-successful int
        Number of hours to keep successful jobs, -1 - forever, 0 - never (default), >0 number of hours
  -legacy-mode true
        Legacy mode: true - use old `keep-*` flags, `false` - enable new `delete-*-after` flags (default true)
  -listen-addr string
        Address to expose metrics. (default "0.0.0.0:7000")
  -namespace string
        Limit scope to a single namespace
  -run-outside-cluster
        Set this flag when running outside of the cluster.
  -label-selector
        Delete only jobs and pods that meet label selector requirements. #See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

Optional parameters

DISCLAIMER: These parameters are not supported on this project since they are implemented by the underlying libraries. Any malfunction regarding the use them is not covered by this GitHub repository. They are included in this documentation since the debugging process is simplified.

-alsologtostderr
  log to standard error as well as files
-log_backtrace_at value
  when logging hits line file:N, emit a stack trace
-log_dir string
  If non-empty, write log files in this directory
-logtostderr
  log to standard error instead of files
-vmodule value
  comma-separated list of pattern=N settings for file-filtered logging

kube-cleanup-operator's People

Stargazers

Watchers

Forkers

imraghava jimmy-chuang ngocngv dvasilen lphiri aland-zhang perflytics doubleswirve mailtruck mangoweb kamarajuprathi albertoclarit nmnellis okumin nnasretdinov kurkop hamedhsn airhorns haghabozorgi wilbeibi jiphex aledbf gl0wa gtseres sears-harlem125 yimingli1994 roserocket rewardinsight tripledogdare gravitational sortigoza staranto gabrielnicolasavellaneda leongett rverma-nikiai rschumann gioamato devorbitus supernova106 srhegdegadikai caruccio justinrlee vincentmei1734 safanaj erharb virajs gkrizek thomasdupas freeman-virtual-helium evanohq artem-zinnatullin jdbaldry mrjackdavis sergeylanzman ashuparekh getoutreach dmilan77 freakst sherry-ummen divya063 zhu733756 meantrix lefthander igorbelitei haarchri atulsudhalkar ivaanko sindrig rocketraccoon talbright shaderecker fgeorgeanybox alyragab bahaeddine iomarcovalente tdongsi integratepl toughlama mhkyle sankalpverma fr4nkb3rt shinomineko chiragkyal ashutoshnirkhe intelius waynejohnn adelowo rushikch penguintoast guoyiang syedsalman3753 adrianohf reddymh zentavr iq-scm jiayu997 winrouter ghub-user mrvarmazyar btoll

kube-cleanup-operator's Issues

helm chart

Is there the official helm chart for this operator ?

add support for OIDC auth provider

We use Google's OpenID connect to validate tokens, need to add support in order to run outside the cluster

Request for deleting pods stuck terminating

Would it be possible to add the feature to delete pods that are stuck in a terminating state?

Thanks!

cleanup also pods completed

hello, is it possible to get more function that we can also cleanup pods wich are completed for configurable time ?

K8s alpha feature TTLAfterFinished is a good alternative

This is not really an issue, more of a helpful comment.

Thank you for a great project - it's been very helpful!

Issues:
We've been using this controller for a while, and it's been great for the most part.
We had issues with pods that would complete due to other reasons than normal completion, such as OOMKilled. When that happens, the job would schedule a replacement pod immediately - which is as we want it. But because the cleanup-operator would react on the pod going into to the status of OOMKilled, it would go in and delete it and it's parent job, which would leave the replacement pod without a reference to a job and therefore never become a candidate for cleanup ever again.
Agreed, pod's shouldn't die from OOM issues often, but it so happens that we had that happen a lot and therefore a lot of orphaned pods just filling up.

The new solution:
However, we recently tried the TTLAfterFinished alpha feature and it works absolutely amazing and has taken over the previous dependency we had on this controller.

Who is it useful for?
It's not possible to use for those running managed kubernetes, as it requires you to enable the feature flag in your cluster, but for those running their own clusters, it's a really good solution.

Does not remove jobs/pods that were created before Cleanup Operator was running

I have clusters with lots of completed jobs across several namespaces. When I run the cleanup-operator, I expected it to remove the old jobs that were already on the cluster. Instead it doesn't and only removes jobs that were ran and finished after cleanup-operator was running.

Is this expected behavior?

Keep parameters in minutes

invalid value "0.1" for flag -keep-successful: parse error
invalid value "0,1" for flag -keep-successful: parse error

I need these jobs for only ~10 minutes. Is it possible?

Resource requests and limits should be defined in the deployment manifest

The Kubernetes scheduler will be able to make better decisions on scheduling the kube-cleanup-operator if resource requests and limits were defined for CPU and memory. This needs to be done in the deployment.yaml. I would be happy to add those if you could provide some intuition around how many resources the operator needs.

Requested memory for deployment manifest is not high enough for large clusters

When running this deployment with the given manifest (https://github.com/lwolf/kube-cleanup-operator/blob/master/deploy/deployment.yaml) on larger clusters, this runs out of memory:

State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

My cluster that this failed on had ~3500 pods.

Bumping up to 100Mi seemed to work for me.

I'd recommend either changing the manifest, or leaving a comment by the resources that mentions the memory limit should be bumped up on larger clusters.

Add support for annotation-based customizations

There is a use case when jobs/pods need to be "cleaned up" after a different timeout.
It should be possible to support it by considering annotations on pods/jobs before making a "clean up" decision.

#71 (comment)

Expose cleanup Operator Metrics /metrics to be monitored by prometheus

To be able to monitor Cleanup Operator by Prometheus, we need to expose its metrics.

Problems running helm

Hi,
I am a bit confused, so the issue might be above the keyboard. I do need the namespace support so would like to use the helm cleanup operator. It runs great with kubectl apply with the rbac and deployment yaml files.
But when I install the helm chart from the helm repo with "helm install cleanup lwolf-charts/kube-cleanup-operator"
I get:
W1102 05:21:49.379093 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2021/11/02 05:21:49 Controller started... 2021/11/02 05:21:49 Listening at 0.0.0.0:7000 2021/11/02 05:21:49 Listening for changes... E1102 05:21:49.588274 1 reflector.go:178] github.com/lwolf/kube-cleanup-operator/pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:default:cleanup-kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope

I ALSO tried helm install from cloned git repo, then the error is "1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector." same happened even if I added affinity to the node tags.

Thanks

Pod xxxx was not created by a job, ignoring.

He Sergey,
i always get the message:

Pod xxxx was not created by a job, ignoring.

We running kubernetes 1.9.6 on google container engine.
Our job manifest f.e.:

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: behat
    chart: behat-1.0
.....

a describe of a succeeded pod

Name:           behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32-8qxpp
Namespace:      default
Labels:         app=behat
                controller-uid=52ae430b-3bd1-11e8-9134-42010a840081
                job-name=behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32
.....
Status:         Succeeded
Controlled By:  Job/behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32
......

    State:          Terminated
      Reason:       Completed

do you have any clues why the problem here occurs?

Best,
Marcel

Environment variable --keep-successful doesn't work

I'm testing this project with Kubernetes V. 1.10.6-gke.2, when I use --keep-successful=0 works, but, when I use --keepsccessful=1 only works in the first run.

Note: I change this variable edited the deployemnt with kubectl edit deploy cleanup-operator and adding kubectl edit deploy cleanup-operator to args.

Add more consistency to command line flags

Just a short suggestion: Almost every command line flag of the operator is written out completely - only the listen-addr is abbreviated. For more consistency i would suggest to also define it fully written out, e.g. listen-address.

Migrate from quay.io to github repository

Remove restartCount check

Hi,
Currently pods with restart count > 0 will not be handled.

I have few issues with this check:

It is only for the first container in the pod, but this can be easily checked.
I am not sure restart count is relevant, now that you can keep success/failed pods/jobs
As is, it causes index out of range when a new job is added and container was not started yet.

So I suggest to remove this check.

Pending pods don't get cleaned up

Pods in the pending state get never cleaned up.

How to reproduce:

Configure kube-cleanup-operator to clean up pending pods:

...
-delete-pending-pods-after=3m
...

Launch a pod that will never get scheduled (tested with a job and with a standalone pod).
For instance:

...
nodeSelector:
  feature: dummy
...

Wait until the retention is due.

Expected: the pending pod is deleted
Actual result: the pod is not deleted

Current configuration as logged:

2020/06/23 16:12:31 Provided options: 
	namespace: test-cleanup
	dry-run: false
	delete-successful-after: 3m0s
	delete-failed-after: 3m0s
	delete-pending-after: 3m0s
	delete-orphaned-after: 3m0s
	delete-evicted-after: 3m0s

	legacy-mode: false
	keep-successful: 0
	keep-failures: -1
	keep-pending: -1

move docker images away from docker

Clean up orphaned jobs

When the cluster scales down, some pods are deleted because their nodes are deleted. If the jobs controlling those pods have completed, they stick around and never get deleted. As a result, kubectl commands and the dashboard UI slow down over time.

This operator should include those “orphaned” jobs when it does its periodic cleanup.

Pulling from helm chart kube-cleanup-operator-1.0.3.tgz file not found

We are getting error "Error: invocation of kubernetes:helm:template returned an error: failed to generate YAML for specified Helm chart: failed to pull chart: failed to fetch http://charts.lwolf.org/kube-cleanup-operator-1.0.3.tgz : 404 Not Found". When we check in the helm search repo kube-cleanup its exist (attached images). While running pulumi preview we are getting error (attached images).
Could you please look into this issue.

Thanks,
Ankita

can't install helm chart by using absolute URL

Hi @lwolf

I am able to fetch/install helm chart by adding the repository, followed by helm fetch like this,

$ helm repo add test https://charts.lwolf.org/
"test" has been added to your repositories
$ helm repo update
$ helm fetch test/kube-cleanup-operator --version 1.0.1

But if I try to do by using absolute URL (either the chart repo one or github project one), it doesn't work -

$ helm fetch https://charts.lwolf.org/kube-cleanup-operator-1.0.0.tar.gz
Error: no cached repo found. (try 'helm repo update'): open C:\Users\ASHUTO~1.NIR\AppData\Local\Temp\helm\repository\stable-index.yaml: The system cannot find the file specified.

$ helm fetch https://github.com/lwolf/kube-charts/kube-cleanup-operator-1.0.0.tar.gz
Error: no cached repo found. (try 'helm repo update'): open C:\Users\ASHUTO~1.NIR\AppData\Local\Temp\helm\repository\stable-index.yaml: The system cannot find the file specified.

I need absolute URL to work, as our automation depends on that. Am I missing something or the URL is incorrect ?
Thanks for your help in advance!!

fix CVE-2023-39325: Update package golang.org/x/net to the 0.17.0 version

https://nvd.nist.gov/vuln/detail/CVE-2023-39325

x/net/http needs updating

Need support for priorityClass and containerSecurityContext

The kube-cleanup-operator chart shall support providing container level securityContext and priorityClass for the pod.

I can raise a PR for this if you are good @lwolf :)

Replace travis-ci with github-actions

PR Submission

Hi Team,

I have done code changes for the below tasks.

Pod(s) which are stuck in Terminating state and requires graceful delete based on age/time
Pod(s) which are in Error/ContainerStatusUnknown/OOMKilled/Terminated/Completed(Sometimes running pod changes to completed due to node re-creation/preemptive nodes ) based on age/time

can I raise the PR for the same?

Thanks,
Raj

be able to specify different successful/failed ttl for each pod/job

Instead of using the global ttl configured in cleanup-operator, I would like to specify ttl more granual. Say, using the labels or annotations set ttl=1h on successful completion for job A, and specify ttl=10m for failed completion for job B.

I would like to do this using the labels kube-cleanup-operator/ttl-success: 1h and kube-cleanup-operator/ttl-fail: 10m.

Why is it important ?
For some releases it is very important to read and analyze logs before removing pods or jobs, but for other is it not so important. This is why in many cases you need to specify different ttl for each job, or use the default value from the cleanup-operator.

RBAC issue

E0529 09:40:00.113780 1 reflector.go:178] github.com/lwolf/kube-cleanup-operator/pkg/controller/controller.go:143: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:default:cleanup-operator" cannot list resource "jobs" in API group "batch" at the cluster scope

pls fix it in the deployment RBAC file:
`apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cleanup-operator
rules:

apiGroups:
- ""
  resources:
- pods
  verbs:
- get
- list
- watch
- delete
apiGroups: ["batch", "extensions"]
resources:
- jobs
  verbs:
- get
- list
- watch
- delete
  `

log the status of a job/pod being removed

The log statements are not so informative. Sometimes it is not clear why the pod or a job was removed. Was it successfully completed or there was a failure ? What was the ttl of a job/pod ? Would be very helpful to have this information in log statements.

pushing the docker to quay.io

I want to add a release to the Makefile to push this docker to quay.io
Any objections?

Deploy in several namespaces

Hi,

Thanks for putting the effort in making this operator. It comes very handy. I have a question:

When I create the deployment kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment.yaml it's using the default namespace.

It is ok to affirm that I have to deploy the kube-cleanup-operatorin EACH namespace I have inside my cluster, right? Or there is a way to deploy the operator ONCE for ALL namespaces I have?

Thanks,

GKE cluster autoscaler scale down issue

Great work ;)

I'm experimenting an issue whith autoscaler on GKE : autoscaler does not scale down if the cleanup-operator is running. When I delete it, autoscaler scales down quickly.

Env
GKE, kubernetes 1.10.11-gke.1, pool with autoscaling activated

I'm testing autoscaler with some empty deployment requiring resources :

apiVersion: v1
kind: Namespace
metadata:
  name: test-autoscale
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: test-autoscale
  namespace: test-autoscale
spec:
  selector:
    matchLabels:
      app: test-autoscale
  replicas: 3 # tells deployment to run 1 pods matching the template
  template:
    metadata:
      labels:
        app: test-autoscale
    spec:
      containers:
      - name: test-autoscale
        image: nginx
        # Resources limits
        resources:
          requests:
            cpu: 500m

depending on replicas count and requested cpu (and the compute instance type in the pool) autoscaler will scale up, creating new nodes.

Then I delete the deployment. Autoscaler should scale down by deleting some nodes. When cleanup-operator is running it does not.

Be careful : cluster autoscaler will scale down 10 minutes later, so it is useful to test the status with the following command (which is updated every minute I think)
kubectl describe -n kube-system configmap cluster-autoscaler-status

You will see

ScaleDown:   NoCandidates (candidates=0)

When I delete the cleanup-operator, it needs less than one minute to get

ScaleDown:   CandidatesPresent (candidates=1)

Then 10 minutes later the node is drained / deleted

I tried to use the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: "true" on the cleanup-operator but unsuccessfully (spec.template.metadata.annotations:)

Any idea why this cleanup-operator is blocking scaledown and how to fix it ?

validation error

MacBook-Pro:kube-cleanup-operator itru$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml
error: error validating "https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml": error validating data: found invalid field backoffLimit for v1.JobSpec; if you choose to ignore these errors, turn validation off with --validate=false

after turning off this creating job successfully
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml --validate=false
job "pi" created

Multi namespace

Hi,

Is there any way I can specify multiple namespaces where I want to have my jobs/pods cleaned? I dont want all my jobs to be cleaned cluster-wide.

  - args:
    - --namespace=ns1,ns2,ns3

Regards

Errors after helm install, "Failed to list *v1.Pod: pods is forbidden"

Just recently installed your helm chart, pretty much as is but had to add a nodeSelector because we also have windows nodes. Other than that I didn't change any values. I'm getting the following errors:

kubectl logs -f kube-cleanup-operator-6c4747d7cb-6bdrn
2023/09/29 18:02:40 Starting the application. Version: , CommitTime:
2023/09/29 18:02:40 Provided options:
	namespace:
	dry-run: false
	delete-successful-after: 15m0s
	delete-failed-after: 0s
	delete-pending-after: 0s
	delete-orphaned-after: 1h0m0s
	delete-evicted-after: 15m0s
	ignore-owned-by-cronjobs: false

	legacy-mode: true
	keep-successful: 0
	keep-failures: -1
	keep-pending: -1
	label-selector:

2023/09/29 18:02:40
!!! DEPRECATION WARNING !!!
	 Operator is running in `legacy` mode. Using old format of arguments. Please change the settings.
	`keep-successful` is deprecated, use `delete-successful-after` instead
	`keep-failures` is deprecated, use `delete-failed-after` instead
	`keep-pending` is deprecated, use `delete-pending-after` instead
 These fields are going to be removed in the next version

W0929 18:02:40.798716       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023/09/29 18:02:40 Controller started...
2023/09/29 18:02:40 Listening at 0.0.0.0:7000
2023/09/29 18:02:41 Listening for changes...
E0929 18:02:41.835542       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:43.121566       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:46.228673       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:51.557849       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:03:00.762475       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:03:19.001154       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:04:02.187343       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope

Couple of questions:

What is missing that's not allowing it to list pods?
Why is the default to run in legacy mode if it's deprecated?
How do I tell it to monitor all namespaces, it's not clear from the docs?

Allow delay in job cleanup

As it stands, the job gets cleaned up immediately upon completion.

I would like to place a configurable delay on this, for instance 10 minute delay would allow a script watching for job completion to see it's job complete. A 1 hour delay might allow a manually initiated job to be seen by the issuer.

compatibility with k8s 1.9+

The kubernetes.io/created-by annotation is no longer added to controller-created objects. Use the metadata.ownerReferences item with controller set to true to determine which controller, if any, owns an object.

Add support for new type of metadata

add multiarch docker build

Add support for multiarch docker manifests

Cleanup Operator tries to remove pod twice

I'm following your example in the README. I can get cleanup-operator running just fine, but I'm seeing a weird problem where it seems like it's trying to remove the job and pod twice.

After the cleanup-operator was running, I simply ran:

kubectl create -f https://k8s.io/examples/controllers/job.yaml

After it completes, I see this in the log:

2019/08/30 15:08:18 Controller started...
2019/08/30 15:08:18 Listening for changes...

2019/08/30 15:40:44 Deleting pod 'pi-xrm7p'
2019/08/30 15:40:44 Deleting job 'pi'
2019/08/30 15:40:44 Deleting pod 'pi-xrm7p'
2019/08/30 15:40:44 failed to delete job pi: pods "pi-xrm7p" not found
2019/08/30 15:40:44 Deleting job 'pi'

I can confirm there is only 1 job and 1 pod so I have no idea why it would be trying twice like that.

I'm running on AWS EKS with Kube 1.12. Thanks!

helmchart not work

E1227 08:28:34.712122 1 reflector.go:178] pkg/controller/controller.go:154: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:default:cleanup-operator" cannot list resource "pods" in API group "" in the namespace "default"

problem using flags

hey guys,
i'm trying to understand how to use the flags:
'delete-successful-after' or any other flags.
where i should add it in the kubernetes deployment manifest as flag to setup custom value?

Thanks a lot !

404 job.yml

The example job is giving me 404:

https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml

kube-cleanup-operator does not work on GKE

I followed the steps in the readme file, but faced an issue when running the binary:

make install_deps
make build
./bin/kube-cleanup-operator --help --> successful
./bin/kube-cleanup-operator --run-outside-cluster --namespace=default -dry-run --> error

The error is the following:

panic: No Auth Provider found for name "gcp"

goroutine 1 [running]:
main.main()
	/Users/george/go/src/github.com/lwolf/kube-cleanup-operator/cmd/main.go:42 +0x8e2

I have managed to overcome this issue by replacing the following line in main.go:

_ "k8s.io/client-go/plugin/pkg/client/auth/oidc"

_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"

Am I doing something wrong?

Jobs are not getting deleted

Brief

kube-clean-up deletes pods for jobs but unable to delete job itself.
Logs shows:

[timestamp] Deleting job '<job_name>'
[timestamp] Deleting pod '<pod_name>'

Expected Behavior

Delete job and pod both

More context

Managed Kubernets: Yes (EKS on AWS)
Kubernetes Version: 1.15

Is there an option to delete pods like these as well?

I noticed the switch case here:

https://github.com/lwolf/kube-cleanup-operator/blob/master/pkg/controller/controller.go#L104

Would adding any of the other statuses (e.g., PodUnknown) allow for "Error" pods to be deleted?

https://github.com/kubernetes/api/blob/dc0dd48d5a5cae9f8736bb0643cfe6052e450f1b/core/v1/types.go#L2374

Apologies if this is out of scope for this project. Would greatly appreciate if you had any recommendations on how to delete "Error" pods (and maybe I can try implementing in my fork). Thanks again!

lwolf / kube-cleanup-operator Goto Github PK

kube-cleanup-operator's Introduction

Kubernetes cleanup operator

Helm chart

Usage

Docker images

Development

Usage

Optional parameters

kube-cleanup-operator's People

Stargazers

Watchers

Forkers

kube-cleanup-operator's Issues

Brief

Expected Behavior

More context

Recommend Projects

Recommend Topics

Recommend Org