Giter Site home page Giter Site logo

airwallex / k8s-pod-restart-info-collector Goto Github PK

View Code? Open in Web Editor NEW
312.0 8.0 45.0 68 KB

Automated troubleshooting of Kubernetes Pods issues. Collect K8s pod restart reasons, logs, and events automatically.

Dockerfile 0.74% Shell 15.80% Go 71.83% Mustache 11.63%
automation collector golang k8s kubernetes kubernetes-controller monitoring pods restart troubleshooting

k8s-pod-restart-info-collector's People

Contributors

able8 avatar awx-devops-admin avatar jan-brychta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k8s-pod-restart-info-collector's Issues

Sending a PR with a new feature.

Hey folks,

I've modified the code a bit for our use case (fairly large gaming company). I was wondering how should I proceed with having it sent back to you so everything is in sync.

Motivation -
Some companies would prefer sending slack alerts for specific applications. For example, I may only be interested in the failing pods that are critical applications for which we are sending "on-call alerts". Everything else, can be ignored. We have no option to do that right now.

What's done?
In the helm values.yaml, users can now supply labels that they would want to be monitored. A new function "NewControllerWithLabels" will do everything as "NewController", except, it will only send a message to slack if the pod (that's restarting) has that label key on it.

This will bypass "ignoredNamespace" "ignoredPod" functions and will only rely on the label key that's supplied in the values.yaml.

This way, users can

  1. Either use the helm chart as it is built right now with additional option to ignore namespaces / pods
  2. OR simply supply label keys that they want to alert on (which will take away the ignored namespaces / pod features).

I am still testing it out in our environment. I am not sure how to proceed, if I should send the code back as a PR and if I can review it with somebody.

Add ability to define containersecuritycontext

Hi,

I run a cluster that has a policy engine on it that forbids insecure pods/containers.

Currently there is a way to define a pod security context, but not a container security context.

Can we add this in please? It just needs to be a new line in the container spec.

This is what I require:

podSecurityContext:
  runAsGroup: 2000
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 2000
  seccompProfile:
    type: RuntimeDefault

containerSecurityContext:
  allowPrivilegeEscalation: false
  readOnlyRootFilesystem: true
  seccompProfile:
    type: RuntimeDefault
  capabilities:
    drop: ["ALL"]

send alert to multiple slack channels with 'modern' (not legacy) webhooks

Hi,
at the moment, there is a possibility to override slack channel via label/annotation in your application.

Only legacy slack webhooks are capable of posting to multiple channels with one webhook URL via channel parameter.

Slack apps and "modern" incoming webhooks have specific URL mapped to specific channel, so there is no way to override them.

Would it be difficult to add some kind of option or better word would be mapping, where it would be possible to map annotation (or slack channel)
alert-slack-channel: "your-slack-channel-name" to specific webhook URL in the config?

something like:

WebhookURLMapping:
  default: https://hooks.foo.bar/xxxxxxx
  my_second_channel: https://hooks.foo.bar/yyyyyyyy

I understand, that at the moment I can use legacy webhooks, but they will stop working eventually.

Sending to Slack channel failed with failed to post webhook: Post "https://hooks.slack.com/services/***": x509: certificate is valid for *.github.com...

After the installation was completed, the test encountered this problem.

Sending to Slack channel failed with failed to post webhook: Post "https://hooks.slack.com/services/***": x509: certificate is valid for *.github.com...

I have checked the information, it may be a time zone problem, but there is no way to change the time zone in the devopsairwallex/k8s-pod-restart-info-collector container, and I don't have any permission to enter the container, and I can't install any tools, is there any other solution, or is there an image that can be sudoed?

I0102 03:32:06.934618       1 controller.go:69] Ignore: metallb-system/speaker-6mwfj restartCount: 7714 > 30
I0102 03:32:12.356664       1 controller.go:64] Update: metallb-system/speaker-6mwfj
┌──[[email protected]]-[~/ansible/hook]
└─$date
2023年 01月 03日 星期二 11:00:37 CST
/ # date -s "22:12:00"
date: can't set date: Operation not permitted
Tue Jan  3 22:12:00 UTC 2023
/ # date
Tue Jan  3 03:12:24 UTC 2023
/ #
/ # apk add -U tzdata
fetch https://mirrors.aliyun.com/alpine/v3.15/main/x86_64/APKINDEX.tar.gz
SSL certificate subject doesn't match host mirrors.aliyun.com
ERROR: https://mirrors.aliyun.com/alpine/v3.15/main: Permission denied
.....

Google chat intregation

Hi Team,
I want to intregrate alerts with google chat instead of slack.
In my organisation slack has not been used so want some solution for g-chat integration.
Thanks in Advance
Vishal

[Feature Request] Ignore a set of namespaces and specific pods

Thanks for the great tool!

In my case there are some system pods from DaemonSets which get expected restarts while the Node is still being initialized.
Will be useful to have a way to ignore a set of namespaces or even better specific pods via label selector.

Release newest versoon

Heya - quick one - can we have a release of the latest master please?

Want to use the regex functionality in watchNamespaces but it requires a new version.

Thanks!

Add backticks to format slack message nicely

Hello! Can we add backticks to in the sent message? Would help a lot with readability and copy paste.

Namely, these fields:
image

so it should be

cluster: cluster name, pod: pod name
rather than
cluster: cluster name, pod: pod name

[Feature Request] Support Existing Secret

Within the values.yaml I would like to be able to reference a pre-existing secret to define my variables (I set them via a secrets manager for encryption/gitops). Something like how Grafana allows it:

existingSecret: "kube-prometheus-stack-grafana"

Helm Chart Repo Created

Thanks for making this open-source, nice work!

I didn't see this was available as a Helm Chart Repository. I've added it to my repository if this helps someone:

https://github.com/reefland/helm-charts/tree/main/charts/apps/pod-restart-info-collector

helm repo add reefland https://reefland.github.io/helm-charts
helm repo update
helm install pod-restart-info-collector reefland/k8s-pod-restart-info-collector

I also created an ArgoCD application deployment, which references the above Helm Repository:
https://github.com/reefland/ansible-k3s-argocd-renovate/tree/master/_extra_apps/pod-restart-info-collector


The Slack WebHook works perfectly with other Slack compatible services. I used it with Mattermost:

image

Detailed Logs in the Alert are really handy:
image

watchNamespaces instead of ignoredNamespaces

Hi
I think more usable watchNamespaces instead of ignoreNamespaces. We can append or drop some NS and after this, we should update config of collector. Can you append this work mode of the collector watch only listed NS?
Or when collector work without clusterrole, only in one NS.

[BUG] container resource specs showing wrong values

If there are multiple containers in a pod

  1. we are looping container status
  2. then updating container specs using container.spec
  3. this is wrong because container status and containers in container.spec is not same always

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.