Giter Site home page Giter Site logo

lwolf / kube-cleanup-operator Goto Github PK

View Code? Open in Web Editor NEW
488.0 15.0 109.0 21.61 MB

Kubernetes Operator to automatically delete completed Jobs and their Pods

License: MIT License

Makefile 4.34% Go 91.81% Dockerfile 1.10% Mustache 2.75%
kubernetes kubernetes-operator golang

kube-cleanup-operator's Introduction

Kubernetes cleanup operator

Build Status Go Report Card Docker Repository on Quay codecov

Kubernetes Controller to automatically delete completed Jobs and Pods. Controller listens for changes in Pods and Jobs and acts accordingly with config arguments.

Some common use-case scenarios:

  • Delete Jobs and their pods after their completion
  • Delete Pods stuck in a Pending state
  • Delete Pods in Evicted state
  • Delete orphaned Pods (Pods without an owner in non-running state)
flag name pod job
delete-successful-after delete after specified period if owned by the job delete after specified period
delete-failed-after delete after specified period if owned by the job delete after specified period
delete-orphaned-pods-after delete after specified period (any completion status) N/A
delete-evicted-pods-after delete on discovery N/A
delete-pending-pods-after delete after specified period N/A

Helm chart

Chart is available to install from https://charts.lwolf.org/ (https://github.com/lwolf/kube-charts)

$ helm repo add lwolf-charts http://charts.lwolf.org
"lwolf-charts" has been added to your repositories
$ helm search kube-cleanup
NAME                              	CHART VERSION	APP VERSION	DESCRIPTION
lwolf-charts/kube-cleanup-operator	1.0.0        	v0.8.1     	Kubernetes Operator to automatically delete completed Job...

Usage

screensharing

# remember to change namespace in RBAC manifests for monitoring namespaces other than "default"

kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment/rbac.yaml

# create deployment
kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment/deployment.yaml


kubectl logs -f $(kubectl get pods --namespace default -l "run=cleanup-operator" -o jsonpath="{.items[0].metadata.name}")

# Use simple job to test it
kubectl create -f https://k8s.io/examples/controllers/job.yaml

Docker images

docker pull quay.io/lwolf/kube-cleanup-operator

or you can build it yourself as follows:

$ docker build .

Development

$ make install_deps
$ make build
$ ./bin/kube-cleanup-operator -run-outside-cluster -dry-run=true

Usage

Pre v0.7.0

    $ ./bin/kube-cleanup-operator --help
    Usage of ./bin/kube-cleanup-operator:
      -namespace string
            Watch only this namespace (omit to operate clusterwide)
      -run-outside-cluster
            Set this flag when running outside of the cluster.
      -keep-successful
            the number of hours to keep a successful job
            -1 - forever 
            0  - never (default)
            >0 - number of hours
      -keep-failures
            the number of hours to keep a failed job
            -1 - forever (default)
            0  - never
            >0 - number of hours
      -keep-pending
            the number of hours to keep a pending job
            -1 - forever (default)
            0  - forever
            >0 - number of hours
      -dry-run
            Perform dry run, print only

After v0.7.0

Usage of ./bin/kube-cleanup-operator:
  -delete-evicted-pods-after duration
        Delete pods in evicted state (golang duration format, e.g 5m), 0 - never delete (default 15m0s)
  -delete-failed-after duration
        Delete jobs and pods in failed state after X duration (golang duration format, e.g 5m), 0 - never delete
  -delete-orphaned-pods-after duration
        Delete orphaned pods. Pods without an owner in non-running state (golang duration format, e.g 5m), 0 - never delete (default 1h0m0s)
  -delete-pending-pods-after duration
        Delete pods in pending state after X duration (golang duration format, e.g 5m), 0 - never delete
  -delete-successful-after duration
        Delete jobs and pods in successful state after X duration (golang duration format, e.g 5m), 0 - never delete (default 15m0s)
  -dry-run
        Print only, do not delete anything.
  -ignore-owned-by-cronjobs
        [EXPERIMENTAL] Do not cleanup pods and jobs created by cronjobs
  -keep-failures int
        Number of hours to keep failed jobs, -1 - forever (default) 0 - never, >0 number of hours (default -1)
  -keep-pending int
        Number of hours to keep pending jobs, -1 - forever (default) >0 number of hours (default -1)
  -keep-successful int
        Number of hours to keep successful jobs, -1 - forever, 0 - never (default), >0 number of hours
  -legacy-mode true
        Legacy mode: true - use old `keep-*` flags, `false` - enable new `delete-*-after` flags (default true)
  -listen-addr string
        Address to expose metrics. (default "0.0.0.0:7000")
  -namespace string
        Limit scope to a single namespace
  -run-outside-cluster
        Set this flag when running outside of the cluster.
  -label-selector
        Delete only jobs and pods that meet label selector requirements. #See https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

Optional parameters

DISCLAIMER: These parameters are not supported on this project since they are implemented by the underlying libraries. Any malfunction regarding the use them is not covered by this GitHub repository. They are included in this documentation since the debugging process is simplified.

-alsologtostderr
  log to standard error as well as files
-log_backtrace_at value
  when logging hits line file:N, emit a stack trace
-log_dir string
  If non-empty, write log files in this directory
-logtostderr
  log to standard error instead of files
-vmodule value
  comma-separated list of pattern=N settings for file-filtered logging

kube-cleanup-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-cleanup-operator's Issues

helm chart

Is there the official helm chart for this operator ?

cleanup also pods completed

hello, is it possible to get more function that we can also cleanup pods wich are completed for configurable time ?

K8s alpha feature TTLAfterFinished is a good alternative

This is not really an issue, more of a helpful comment.

Thank you for a great project - it's been very helpful!

Issues:
We've been using this controller for a while, and it's been great for the most part.
We had issues with pods that would complete due to other reasons than normal completion, such as OOMKilled. When that happens, the job would schedule a replacement pod immediately - which is as we want it. But because the cleanup-operator would react on the pod going into to the status of OOMKilled, it would go in and delete it and it's parent job, which would leave the replacement pod without a reference to a job and therefore never become a candidate for cleanup ever again.
Agreed, pod's shouldn't die from OOM issues often, but it so happens that we had that happen a lot and therefore a lot of orphaned pods just filling up.

The new solution:
However, we recently tried the TTLAfterFinished alpha feature and it works absolutely amazing and has taken over the previous dependency we had on this controller.

Who is it useful for?
It's not possible to use for those running managed kubernetes, as it requires you to enable the feature flag in your cluster, but for those running their own clusters, it's a really good solution.

Keep parameters in minutes

invalid value "0.1" for flag -keep-successful: parse error
invalid value "0,1" for flag -keep-successful: parse error

I need these jobs for only ~10 minutes. Is it possible?

Requested memory for deployment manifest is not high enough for large clusters

When running this deployment with the given manifest (https://github.com/lwolf/kube-cleanup-operator/blob/master/deploy/deployment.yaml) on larger clusters, this runs out of memory:

State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137

My cluster that this failed on had ~3500 pods.

Bumping up to 100Mi seemed to work for me.

I'd recommend either changing the manifest, or leaving a comment by the resources that mentions the memory limit should be bumped up on larger clusters.

Problems running helm

Hi,
I am a bit confused, so the issue might be above the keyboard. I do need the namespace support so would like to use the helm cleanup operator. It runs great with kubectl apply with the rbac and deployment yaml files.
But when I install the helm chart from the helm repo with "helm install cleanup lwolf-charts/kube-cleanup-operator"
I get:
W1102 05:21:49.379093 1 client_config.go:552] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work. 2021/11/02 05:21:49 Controller started... 2021/11/02 05:21:49 Listening at 0.0.0.0:7000 2021/11/02 05:21:49 Listening for changes... E1102 05:21:49.588274 1 reflector.go:178] github.com/lwolf/kube-cleanup-operator/pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:default:cleanup-kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope

I ALSO tried helm install from cloned git repo, then the error is "1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector." same happened even if I added affinity to the node tags.

Thanks

Pod xxxx was not created by a job, ignoring.

He Sergey,
i always get the message:

Pod xxxx was not created by a job, ignoring.

We running kubernetes 1.9.6 on google container engine.
Our job manifest f.e.:

apiVersion: batch/v1
kind: Job
metadata:
  labels:
    app: behat
    chart: behat-1.0
.....

a describe of a succeeded pod

Name:           behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32-8qxpp
Namespace:      default
Labels:         app=behat
                controller-uid=52ae430b-3bd1-11e8-9134-42010a840081
                job-name=behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32
.....
Status:         Succeeded
Controlled By:  Job/behat-d6048dbd-815e-4116-8bf1-71b2ae45c22b-32
......

    State:          Terminated
      Reason:       Completed

do you have any clues why the problem here occurs?

Best,
Marcel

Environment variable --keep-successful doesn't work

I'm testing this project with Kubernetes V. 1.10.6-gke.2, when I use --keep-successful=0 works, but, when I use --keepsccessful=1 only works in the first run.

Note: I change this variable edited the deployemnt with kubectl edit deploy cleanup-operator and adding kubectl edit deploy cleanup-operator to args.

Add more consistency to command line flags

Just a short suggestion: Almost every command line flag of the operator is written out completely - only the listen-addr is abbreviated. For more consistency i would suggest to also define it fully written out, e.g. listen-address.

Remove restartCount check

Hi,
Currently pods with restart count > 0 will not be handled.

I have few issues with this check:

  1. It is only for the first container in the pod, but this can be easily checked.
  2. I am not sure restart count is relevant, now that you can keep success/failed pods/jobs
  3. As is, it causes index out of range when a new job is added and container was not started yet.

So I suggest to remove this check.

Pending pods don't get cleaned up

Pods in the pending state get never cleaned up.

How to reproduce:

  1. Configure kube-cleanup-operator to clean up pending pods:
...
-delete-pending-pods-after=3m
...
  1. Launch a pod that will never get scheduled (tested with a job and with a standalone pod).
    For instance:
...
nodeSelector:
  feature: dummy
...
  1. Wait until the retention is due.

Expected: the pending pod is deleted
Actual result: the pod is not deleted

Current configuration as logged:

2020/06/23 16:12:31 Provided options: 
	namespace: test-cleanup
	dry-run: false
	delete-successful-after: 3m0s
	delete-failed-after: 3m0s
	delete-pending-after: 3m0s
	delete-orphaned-after: 3m0s
	delete-evicted-after: 3m0s

	legacy-mode: false
	keep-successful: 0
	keep-failures: -1
	keep-pending: -1

Clean up orphaned jobs

When the cluster scales down, some pods are deleted because their nodes are deleted. If the jobs controlling those pods have completed, they stick around and never get deleted. As a result, kubectl commands and the dashboard UI slow down over time.

This operator should include those “orphaned” jobs when it does its periodic cleanup.

Pulling from helm chart kube-cleanup-operator-1.0.3.tgz file not found

We are getting error "Error: invocation of kubernetes:helm:template returned an error: failed to generate YAML for specified Helm chart: failed to pull chart: failed to fetch http://charts.lwolf.org/kube-cleanup-operator-1.0.3.tgz : 404 Not Found". When we check in the helm search repo kube-cleanup its exist (attached images). While running pulumi preview we are getting error (attached images).
Could you please look into this issue.

image

image

Thanks,
Ankita

can't install helm chart by using absolute URL

Hi @lwolf

I am able to fetch/install helm chart by adding the repository, followed by helm fetch like this,

$ helm repo add test https://charts.lwolf.org/
"test" has been added to your repositories
$ helm repo update
$ helm fetch test/kube-cleanup-operator --version 1.0.1

But if I try to do by using absolute URL (either the chart repo one or github project one), it doesn't work -

$ helm fetch https://charts.lwolf.org/kube-cleanup-operator-1.0.0.tar.gz
Error: no cached repo found. (try 'helm repo update'): open C:\Users\ASHUTO~1.NIR\AppData\Local\Temp\helm\repository\stable-index.yaml: The system cannot find the file specified.

$ helm fetch https://github.com/lwolf/kube-charts/kube-cleanup-operator-1.0.0.tar.gz
Error: no cached repo found. (try 'helm repo update'): open C:\Users\ASHUTO~1.NIR\AppData\Local\Temp\helm\repository\stable-index.yaml: The system cannot find the file specified.

I need absolute URL to work, as our automation depends on that. Am I missing something or the URL is incorrect ?
Thanks for your help in advance!!

PR Submission

Hi Team,

I have done code changes for the below tasks.

  1. Pod(s) which are stuck in Terminating state and requires graceful delete based on age/time
  2. Pod(s) which are in Error/ContainerStatusUnknown/OOMKilled/Terminated/Completed(Sometimes running pod changes to completed due to node re-creation/preemptive nodes ) based on age/time

can I raise the PR for the same?

Thanks,
Raj

be able to specify different successful/failed ttl for each pod/job

Instead of using the global ttl configured in cleanup-operator, I would like to specify ttl more granual. Say, using the labels or annotations set ttl=1h on successful completion for job A, and specify ttl=10m for failed completion for job B.

I would like to do this using the labels kube-cleanup-operator/ttl-success: 1h and kube-cleanup-operator/ttl-fail: 10m.

Why is it important ?
For some releases it is very important to read and analyze logs before removing pods or jobs, but for other is it not so important. This is why in many cases you need to specify different ttl for each job, or use the default value from the cleanup-operator.

RBAC issue

E0529 09:40:00.113780 1 reflector.go:178] github.com/lwolf/kube-cleanup-operator/pkg/controller/controller.go:143: Failed to list *v1.Job: jobs.batch is forbidden: User "system:serviceaccount:default:cleanup-operator" cannot list resource "jobs" in API group "batch" at the cluster scope

pls fix it in the deployment RBAC file:
`apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cleanup-operator
rules:

  • apiGroups:
    • ""
      resources:
    • pods
      verbs:
    • get
    • list
    • watch
    • delete
  • apiGroups: ["batch", "extensions"]
    resources:
    • jobs
      verbs:
    • get
    • list
    • watch
    • delete
      `

log the status of a job/pod being removed

The log statements are not so informative. Sometimes it is not clear why the pod or a job was removed. Was it successfully completed or there was a failure ? What was the ttl of a job/pod ? Would be very helpful to have this information in log statements.

Deploy in several namespaces

Hi,

Thanks for putting the effort in making this operator. It comes very handy. I have a question:

  • When I create the deployment kubectl create -f https://raw.githubusercontent.com/lwolf/kube-cleanup-operator/master/deploy/deployment.yaml it's using the default namespace.

It is ok to affirm that I have to deploy the kube-cleanup-operatorin EACH namespace I have inside my cluster, right? Or there is a way to deploy the operator ONCE for ALL namespaces I have?

Thanks,

GKE cluster autoscaler scale down issue

Great work ;)

I'm experimenting an issue whith autoscaler on GKE : autoscaler does not scale down if the cleanup-operator is running. When I delete it, autoscaler scales down quickly.

Env
GKE, kubernetes 1.10.11-gke.1, pool with autoscaling activated

I'm testing autoscaler with some empty deployment requiring resources :

apiVersion: v1
kind: Namespace
metadata:
  name: test-autoscale
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: test-autoscale
  namespace: test-autoscale
spec:
  selector:
    matchLabels:
      app: test-autoscale
  replicas: 3 # tells deployment to run 1 pods matching the template
  template:
    metadata:
      labels:
        app: test-autoscale
    spec:
      containers:
      - name: test-autoscale
        image: nginx
        # Resources limits
        resources:
          requests:
            cpu: 500m

depending on replicas count and requested cpu (and the compute instance type in the pool) autoscaler will scale up, creating new nodes.

Then I delete the deployment. Autoscaler should scale down by deleting some nodes. When cleanup-operator is running it does not.

Be careful : cluster autoscaler will scale down 10 minutes later, so it is useful to test the status with the following command (which is updated every minute I think)
kubectl describe -n kube-system configmap cluster-autoscaler-status

You will see

ScaleDown:   NoCandidates (candidates=0)

When I delete the cleanup-operator, it needs less than one minute to get

ScaleDown:   CandidatesPresent (candidates=1)

Then 10 minutes later the node is drained / deleted

I tried to use the annotation cluster-autoscaler.kubernetes.io/safe-to-evict: "true" on the cleanup-operator but unsuccessfully (spec.template.metadata.annotations:)

Any idea why this cleanup-operator is blocking scaledown and how to fix it ?

validation error

MacBook-Pro:kube-cleanup-operator itru$ kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml
error: error validating "https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml": error validating data: found invalid field backoffLimit for v1.JobSpec; if you choose to ignore these errors, turn validation off with --validate=false

after turning off this creating job successfully
kubectl create -f https://raw.githubusercontent.com/kubernetes/kubernetes.github.io/master/docs/concepts/workloads/controllers/job.yaml --validate=false
job "pi" created

Multi namespace

Hi,

Is there any way I can specify multiple namespaces where I want to have my jobs/pods cleaned? I dont want all my jobs to be cleaned cluster-wide.

  - args:
    - --namespace=ns1,ns2,ns3

Regards

Errors after helm install, "Failed to list *v1.Pod: pods is forbidden"

Just recently installed your helm chart, pretty much as is but had to add a nodeSelector because we also have windows nodes. Other than that I didn't change any values. I'm getting the following errors:

kubectl logs -f kube-cleanup-operator-6c4747d7cb-6bdrn
2023/09/29 18:02:40 Starting the application. Version: , CommitTime:
2023/09/29 18:02:40 Provided options:
	namespace:
	dry-run: false
	delete-successful-after: 15m0s
	delete-failed-after: 0s
	delete-pending-after: 0s
	delete-orphaned-after: 1h0m0s
	delete-evicted-after: 15m0s
	ignore-owned-by-cronjobs: false

	legacy-mode: true
	keep-successful: 0
	keep-failures: -1
	keep-pending: -1
	label-selector:

2023/09/29 18:02:40
!!! DEPRECATION WARNING !!!
	 Operator is running in `legacy` mode. Using old format of arguments. Please change the settings.
	`keep-successful` is deprecated, use `delete-successful-after` instead
	`keep-failures` is deprecated, use `delete-failed-after` instead
	`keep-pending` is deprecated, use `delete-pending-after` instead
 These fields are going to be removed in the next version

W0929 18:02:40.798716       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2023/09/29 18:02:40 Controller started...
2023/09/29 18:02:40 Listening at 0.0.0.0:7000
2023/09/29 18:02:41 Listening for changes...
E0929 18:02:41.835542       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:43.121566       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:46.228673       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:02:51.557849       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:03:00.762475       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:03:19.001154       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope
E0929 18:04:02.187343       1 reflector.go:178] pkg/controller/controller_legacy.go:135: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:kube-cleanup:kube-cleanup-operator" cannot list resource "pods" in API group "" at the cluster scope

Couple of questions:

  1. What is missing that's not allowing it to list pods?
  2. Why is the default to run in legacy mode if it's deprecated?
  3. How do I tell it to monitor all namespaces, it's not clear from the docs?

Allow delay in job cleanup

As it stands, the job gets cleaned up immediately upon completion.

I would like to place a configurable delay on this, for instance 10 minute delay would allow a script watching for job completion to see it's job complete. A 1 hour delay might allow a manually initiated job to be seen by the issuer.

compatibility with k8s 1.9+

The kubernetes.io/created-by annotation is no longer added to controller-created objects. Use the metadata.ownerReferences item with controller set to true to determine which controller, if any, owns an object.

Add support for new type of metadata

Cleanup Operator tries to remove pod twice

I'm following your example in the README. I can get cleanup-operator running just fine, but I'm seeing a weird problem where it seems like it's trying to remove the job and pod twice.

After the cleanup-operator was running, I simply ran:

kubectl create -f https://k8s.io/examples/controllers/job.yaml

After it completes, I see this in the log:

2019/08/30 15:08:18 Controller started...
2019/08/30 15:08:18 Listening for changes...

2019/08/30 15:40:44 Deleting pod 'pi-xrm7p'
2019/08/30 15:40:44 Deleting job 'pi'
2019/08/30 15:40:44 Deleting pod 'pi-xrm7p'
2019/08/30 15:40:44 failed to delete job pi: pods "pi-xrm7p" not found
2019/08/30 15:40:44 Deleting job 'pi'

I can confirm there is only 1 job and 1 pod so I have no idea why it would be trying twice like that.

I'm running on AWS EKS with Kube 1.12. Thanks!

helmchart not work

E1227 08:28:34.712122 1 reflector.go:178] pkg/controller/controller.go:154: Failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:default:cleanup-operator" cannot list resource "pods" in API group "" in the namespace "default"

problem using flags

hey guys,
i'm trying to understand how to use the flags:
'delete-successful-after' or any other flags.
where i should add it in the kubernetes deployment manifest as flag to setup custom value?

Thanks a lot !

kube-cleanup-operator does not work on GKE

I followed the steps in the readme file, but faced an issue when running the binary:

make install_deps
make build
./bin/kube-cleanup-operator --help --> successful
./bin/kube-cleanup-operator --run-outside-cluster --namespace=default -dry-run --> error

The error is the following:

panic: No Auth Provider found for name "gcp"

goroutine 1 [running]:
main.main()
	/Users/george/go/src/github.com/lwolf/kube-cleanup-operator/cmd/main.go:42 +0x8e2

I have managed to overcome this issue by replacing the following line in main.go:

_ "k8s.io/client-go/plugin/pkg/client/auth/oidc"

to

_ "k8s.io/client-go/plugin/pkg/client/auth/gcp"

Am I doing something wrong?

Jobs are not getting deleted

Brief

kube-clean-up deletes pods for jobs but unable to delete job itself.
Logs shows:

[timestamp] Deleting job '<job_name>'
[timestamp] Deleting pod '<pod_name>'

Expected Behavior

Delete job and pod both

More context

Managed Kubernets: Yes (EKS on AWS)
Kubernetes Version: 1.15

ignore namespaces

i would like the ability to ignore namespaces rather than limiting the scope to certain namespace. Also to be able to take in a list as well, there could be many teams that do not want the jobs touched and this would be a value add.

be able to configure operator to remove only jobs

I would like to be able to configure operator to remove only completed jobs, but to ignore the removal of pods (of any pods). Pods that are related to jobs will be automatically removed by job controller.

Cleaning up pods with "Error" status

Hello -- I just deployed this tool yesterday and so far it's working pretty well. Thanks for creating this!

I did notice, however, that pods with a status of "Error" seem to stick around:

screen shot 2018-01-30 at 12 34 42 pm

Is there an option to delete pods like these as well?

I noticed the switch case here:

https://github.com/lwolf/kube-cleanup-operator/blob/master/pkg/controller/controller.go#L104

Would adding any of the other statuses (e.g., PodUnknown) allow for "Error" pods to be deleted?

https://github.com/kubernetes/api/blob/dc0dd48d5a5cae9f8736bb0643cfe6052e450f1b/core/v1/types.go#L2374

Apologies if this is out of scope for this project. Would greatly appreciate if you had any recommendations on how to delete "Error" pods (and maybe I can try implementing in my fork). Thanks again!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.