keikoproj / alert-manager Goto Github PK

View Code? Open in Web Editor NEW

10.0 10.0 10.0 861 KB

alert-manager provides CRD implementation for wavefront alerts, splunk alerts etc.,

Dockerfile 0.64% Makefile 4.48% Go 94.88%

alert-manager's People

Contributors

Stargazers

Watchers

Forkers

mnkg561 nrosenberg1 phanipadala lilwan calvinx408 sasagarw lili-wan awwwd ccpeng shankarramshivram

alert-manager's Issues

Should Alert Name be unique in Wavefront?

Is this a BUG REPORT or FEATURE REQUEST?:
Question

What happened:
Research and figure out if Alert Name should be unique in the wavefront and if that is the case, we need to create the wavefront alert name inside the code and may be allow a pattern (prefix.namespace_name.alertsconfig_name.wavefrontalert_name) or something like that to make sure it is unique.

What you expected to happen:
Alerts shouldn't be overwritten by each other since the name is same.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Alertsconfig status error description is not reset after correction

Is this a BUG REPORT or FEATURE REQUEST?:
bug

What happened:
Alertsconfig status error description is not reset after correcting the alert

What you expected to happen:
Alertsconfig status error description should be reset after correcting the alert

How to reproduce it (as minimally and precisely as possible):

Create an alertsconfig
Update the alertsconfig with an error case for example change the severity from warn to warning
Check the alertsconfig status, the state is error and errorDescription has details
Fix the error case
alertsconfig status state is Ready, but the errorDescription is not cleaned up

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):
I have done couple things to debug this issue

I added the following code to reset the error description after here
alertStatus.ErrorDescription = ""
The function should update the alertStatus, which it did for other fields (etc state, requestChecksum) but not ErrorDescription. Basically, I deployed my change and run the test, but I do not see ErrorDescription was cleaned up
So I was thinking maybe the filed was not updated (DB needs sometime to reflect the change for that field), and it got overwrite again inside function
To verify that, I changed my code to alertStatus.ErrorDescription = "good", and the status got updated with expected behavior "good". which means the update was successful if we change to a different string, but it did not work if it is an empty string
The work-round I did is removed omitemty tag for ErrorDescription field and it worked
We should debug more on this, it worked for integer (retryCount), could be something related to string behavior.

- controller logs:

$ kubectl logs

controller stuck when applying wavefront alert cr

Is this a BUG REPORT or FEATURE REQUEST?:
bug

What happened:
When I tried to apply/create new wavefront alert CR, the wavefront alert controller will stuck at the following phase

2021-08-05T22:27:11.438Z	INFO	controllers.wavefrontalert_controller.Reconcile	Start of the request	{"request_id": "5d1edbf5-f5f6-4a2c-a5c9-8de25d829fb1", "wavefrontalert_cr": "alert-manager-system/wavefrontalert-sample-3"}
2021-08-05T22:27:11.439Z	INFO	controllers.wavefrontalert_controller.Reconcile	New wavefront alert resource. Adding the finalizer	{"request_id": "5d1edbf5-f5f6-4a2c-a5c9-8de25d829fb1", "wavefrontalert_cr": "alert-manager-system/wavefrontalert-sample-3", "finalizer": "wavefrontalert.finalizers.alertmanager.keikoproj.io"}

Restart controller helps to resolve this issue

What you expected to happen:
The wavefront alert controller should not be stuck

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Minutes and resolveAfterMinutes data type

Is this a BUG REPORT or FEATURE REQUEST?:
Feature

What happened:
Minutes and resolveAfterMinutes are declared as int since wavefront allows only integer values. This should be fine for single alert creation but if users wants to use go lang template to variablize based on the requirements kubectl or open api will throw an error since go lang template {{ .foo }} is a string and not an integer.

Allow Minutes and resolveAfterMinutes as custom data type StringOrInt? - May be, depends on the requirement we can figure it out

What you expected to happen:
Minutes and resolveAfterMinutes can also parameterized using Go Lang template with AlertsConfig supplying the data

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Add support for threshold alert type in WaveFront

Is this a BUG REPORT or FEATURE REQUEST?:
FEATURE REQUEST

What happened:

What you expected to happen:
Add support for threshold alert type in WaveFront

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Support parallelism in alert controller for alert creation

Is this a BUG REPORT or FEATURE REQUEST?:
FEATURE

What happened:
Currently alert controller reconciles alert one at a time and it is very time consuming when comes to create large number of alerts.

What you expected to happen:
parallelism can be added to improve the performance of the controller and reduce the waiting time

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Support Global Variables in AlertsConfig

Is this a BUG REPORT or FEATURE REQUEST?:
Feature request

What happened:
We might have to support global variables in alerts config since there might be a variables where you need it for every config section (for ex: cluster name, BU, region etc.,). This gives us the interesting challenge where if there is a change in global variable we should be updating every alert which also means there might be configs which might not be using global variables.

We can start with updating alerts if there is a change in global variables and eventually try to figure out if there is a way to exclude the alerts if there is no usage of global variable in a particular config section

What you expected to happen:
Users can use global variable section to provide a field value if a field is needed in more configs instead of providing the value in each and every config section

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Provide an option for Alert check frequency

Is this a BUG REPORT or FEATURE REQUEST?:

Feature Request

What happened:

By default, every alerts gets created by wavefront has 1 minute frequency to check for the condition but there might be a situation where user want to have the flexibility for low level alerts such as checking for 5 mins instead of 1 min frequency.

What you expected to happen:
Users should have a flexibility to provide the frequency but can still have default frequency as 1 min.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Alerts Count in Alerts Config status

Is this a BUG REPORT or FEATURE REQUEST?:
Feature

What happened:
Figure out a way to handle the alerts count in alert config status. This should indicate the total number of alerts got created by particular alerts config.

What you expected to happen:
AlertConfig status should show the total count of alerts

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:
Should we report just the total count or only successful alert creation?

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Update go version to 1.18

Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request.

What happened:
New MacBooks with M1 chip breaks with go 1.15, 1.16

What you expected to happen:
Updating go version to 1.18 will reduce many such problems.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Unable to patch alertsconfig status due to status.retryCount: Required value

Is this a BUG REPORT or FEATURE REQUEST?:
bug

What happened:
When we first create alertsconfig in production with latest controller version, we are seeing this error

2021-10-14T22:50:48.364Z	ERROR	controllers.common.common.PatchStatus	Unable to patch the status	{"request_id": "e74683ff-8689-44ec-8e99-9e452803d67f", "status": "Ready", "error": "AlertsConfig.alertmanager.keikoproj.io \"ip-iksm-ppd-usw2-k8s.alertsconfig\" is invalid: status.retryCount: Required value"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
github.com/keikoproj/alert-manager/controllers/common.(*Client).PatchStatus
	/workspace/controllers/common/common.go:111
github.com/keikoproj/alert-manager/controllers/common.(*Client).PatchWfAlertAndAlertsConfigStatus
	/workspace/controllers/common/common.go:213
github.com/keikoproj/alert-manager/controllers.(*AlertsConfigReconciler).Reconcile
	/workspace/controllers/alertsconfig_controller.go:175
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99
I1014 22:50:48.364420       1 event.go:282] Event(v1.ObjectReference{Kind:"AlertsConfig", Namespace:"dev-iksmanager-usw2-prd", Name:"ip-iksm-ppd-usw2-k8s.alertsconfig", UID:"e2afc5c9-b4fe-45c0-9529-06e95d5b348a", APIVersion:"alertmanager.keikoproj.io/v1alpha1", ResourceVersion:"514478316", FieldPath:""}): type: 'Warning' reason: 'Error' Unable to patch status due to error AlertsConfig.alertmanager.keikoproj.io "ip-iksm-ppd-usw2-k8s.alertsconfig" is invalid: status.retryCount: Required value

Alerts were created successfully, but alertsconfig status was not updated due to above error

What you expected to happen:
We should not see this error while creating new alert

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Abstract status update to be reusable

Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request

What happened:
Wavefront Alert controller skelton implementation

What you expected to happen:

Add event support for all the error conditions (Update Status)
Handle requeue in case of error
Get K8s Secret for Wavefront API calls
Support Go Template execution

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Update all the alerts if there is a change in Non-exportedParams section

Is this a BUG REPORT or FEATURE REQUEST?:
Feature request

What happened:
There could be scenarios where user might have updated a change in wavefront alert CR which is not related to exportedParams- which means that change needed to be applied to all the wavefront alerts created using this template.

For example:

apiVersion: alertmanager.keikoproj.io/v1alpha1
kind: WavefrontAlert
metadata:
  name: wavefrontalert-sample2
spec:
  # Add fields here
  alertType: CLASSIC
  alertName: test-alert3
  condition: avg(ts({{ .foo}}))
  displayExpression: ts(status.health)
  minutes: 50
  resolveAfterMinutes: 5
  severity: "{{.bar}}"
  exportedParams:
    - foo
    - bar
  tags:
    - test-alert
    - something-weird

if there is a change in displayExpression - That change should be applied to all the wavefront alerts

What you expected to happen:
If there is a change in non exportedParams section, change must be applied to all the wavefront alerts since there is nothing can be done from alertsconfigs side to get that change

How to reproduce it (as minimally and precisely as possible):
Change the displayExpression in the above CR and that shouldn't reflect in any of the alerts at the wavefront side

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Update WavefrontAlert status when AlertConfig creates an alert

Is this a BUG REPORT or FEATURE REQUEST?:
Feature

What happened:
If an alert is parameterized using Golang template in wavefrontAlert spec, controller will not proceed further instead it puts in a particular status that this template is ReadyTobeUsed. When AlertsConfig used to substitute values to create the alerts we update the wavefront alert id and link in AlertsConfig status but if there is a change in WavefrontAlert spec without changing exportedParams we might need to update all the alerts which are using this spec. So, we need to update AlertsConfig CR information to be included in wavefront alert status to link those configs.

What you expected to happen:
WavefrontAlert status should reflect Alerts config in status

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Add LastUpdatedTimeStamp and WavefrontAlert generation as a version

Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request

What happened:
When we do any change in WavefrontAlert or AlertsConfig there is no easy way to see if the new change is applied unless we go to Wavefront link since none of the status fields gets changed(alertID and link etc.,). It might introduce a confusion to know if a particular change has been applied or not.

There are couple of things we can do about it.

Add LastUpdatedTimeStamp as a new field

alert-manager/api/v1alpha1/wavefrontalert_types.go

Line 130 in fc52f1e

type AlertStatus struct {
Add WavefrontAlert generation as one more field in

alert-manager/api/v1alpha1/alertsconfig_types.go

Line 75 in fc52f1e

type AssociatedAlert struct {

Make sure to update these while we update the status in successful scenario and (may be in error scenario too?)

What you expected to happen:
BY looking at the status, users should be able to see what was the last WavefrontAlert template applied and when was the last change.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Standalone alert shouldn't be allowed to use in AlertsConfig

Is this a BUG REPORT or FEATURE REQUEST?:
Bug

What happened:
As part of the demo/testing we used a stand alone alert in AlertsConfig and it created WavefrontAlert. The expectation is it should error out if a WavefrontAlert doesn't have any exportedParam fields otherwise its just a duplicate since there are no changes from original WavefrontAlert which was already created as part of Standalone Wavefront Alert.

What you expected to happen:
It should error out with stating that there is no exportParams field in the WavefrontAlert.

How to reproduce it (as minimally and precisely as possible):

Create Wavefront stand alone alert
Use that in one of the AlertsConfig

This will have 2 alerts in wavefront without any changes(duplicates)

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Handle wavefrontalert deletion when it associated with alertsconfig

Is this a BUG REPORT or FEATURE REQUEST?:
feature

What happened:
We can have a wavefrontalert with exported params and associated with multiple alertsconfig, and then if we delete this wavefrontalert, the current code deletes the individual alert in wavefront but does not update the alertsconfig

What you expected to happen:
When the wavefrontalert template got deleted, we should update alertsconfig as well

How to reproduce it (as minimally and precisely as possible):

Create a wavefrontalert template with exported params and associated with a alertsconfig.
Delete this wavefrontalert
wavefront alert is deleted from wavefront
alertsconfig still has it in the status

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Handle individual alert deletion in AlertsConfig

Is this a BUG REPORT or FEATURE REQUEST?:
Feature

What happened:
When one of the config section in alerts config gets removed, make sure to clean that alert in wavefront and update the status in AlertsConfig as well as Wavefront Alert.

What you expected to happen:
Individual alerts must be cleaned out if the particular section in removed in the latest spec

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Support default values in Wavefront Alert for exportedParams

Is this a BUG REPORT or FEATURE REQUEST?:
Feature Request

What happened:
There might be a situations where you have exportedParams common for most of the configs(lets say 95 out of 100 alert instances), it will be hard to provide that value for each and every cluster config. Instead support default values in wavefront alert CR so that 95 cluster configs doesn't have to provide that value from above example.

What you expected to happen:
AlertsConfig doesn't have to provide the value if its a default value provided parent wavefront alert

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Alerts in WavefrontAlert status should be in Map instead of list

Is this a BUG REPORT or FEATURE REQUEST?:
Feature

What happened:
For faster access O(1) we should use Map in the status instead of list. This is already implemented in AlertsConfig status.

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Release v0.0.6 - Alert-Manager

Release issue for visibility purposes.

Create CHANGELOG file
Bug Fix: #45
Updates: #46

Events are not created when alert manager is unable to create the alert due to error

Is this a BUG REPORT or FEATURE REQUEST?:
Bug

What happened:
When there is an invalid value passed as param [Eg: severity value as "warning" instead of "warn"] , controller will not create wavefront alert and show error in controller logs . Currently we dont see events generated for the error.

What you expected to happen:
We should see error message in the events when we use kubectl kubectl get events .

How to reproduce it (as minimally and precisely as possible):
In Alerts config , pass invalid severity value as "warning" and try to apply it.

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Alertsconfig status state doesn't change from "Error" to "Ready" when corrected the invalid severity param value

Is this a BUG REPORT or FEATURE REQUEST?:
Bug

What happened:
When there is an invalid value passed as param [Eg: severity value as "warning" instead of "warn"] , CR status is properly updated with error information as expected. Now , when we send the value "warn" controller doesn't pick the request as checkSum is not modified for previous invalid request.

What you expected to happen:
State should change to "Ready" when the request has valid severity value after an invalid value request.

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Create multiple alerts for one alertsconfig has error (first time)

Is this a BUG REPORT or FEATURE REQUEST?:
bug

What happened:
I tried to create two alerts for one alertsconfig

apiVersion: alertmanager.keikoproj.io/v1alpha1
kind: AlertsConfig
metadata:
  name: cluster-test-1
spec:
  # Add fields here
    globalGVK:
      group: alertmanager.keikoproj.io
      version: v1alpha1
      kind: WavefrontAlert
    alerts:
      - alertName: wavefrontalert-pod-restart-sample2
        params:
          env: preprod
          count: "10"
          severity: warn
      - alertName: wavefrontalert-pod-restart-sample3
        params:
          env: preprod
          count: "20"
          severity: warn

wavefrontalerts are installed

MTVL16092f2af:Downloads lwan3$ k get wavefrontalerts
NAME                                 AGE
wavefrontalert-pod-restart-sample2   132m
wavefrontalert-pod-restart-sample3   132m

The first alert was created successfully but the second was failed with the following error

2021-08-20T20:50:59.598Z	ERROR	controllers.alertsconfig_controller.PatchIndividualAlertsConfigError	error occured in alerts config for alert name	{"request_id": "e8708100-6f3e-4f5e-a324-a0eff6020973", "alertsConfig_cr": "cluster-test-1", "namespace": "alert-manager-system", "alertName": "wavefrontalert-pod-restart-sample3", "error": "server returned 400 Bad Request\n{\"status\":{\"result\":\"ERROR\",\"message\":\"Alert query has taken too long to execute. Please refine your query for faster execution time, for example by including fewer series.\",\"code\":400}}\n"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:132
github.com/keikoproj/alert-manager/controllers.(*AlertsConfigReconciler).PatchIndividualAlertsConfigError
	/workspace/controllers/alertsconfig_controller.go:239
github.com/keikoproj/alert-manager/controllers.(*AlertsConfigReconciler).Reconcile
	/workspace/controllers/alertsconfig_controller.go:177
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:235
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:198
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133
k8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:185
k8s.io/apimachinery/pkg/util/wait.UntilWithContext
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:99
I0820 20:50:59.598445       1 event.go:282] Event(v1.ObjectReference{Kind:"AlertsConfig", Namespace:"alert-manager-system", Name:"cluster-test-1", UID:"d0a9308d-5546-4ac0-9ba4-0b2989a55e48", APIVersion:"alertmanager.keikoproj.io/v1alpha1", ResourceVersion:"8723375", FieldPath:""}): type: 'Warning' reason: 'server returned 400 Bad Request
{"status":{"result":"ERROR","message":"Alert query has taken too long to execute. Please refine your query for faster execution time, for example by including fewer series.","code":400}}
' unable to create the alert

Alertconfig status

status:
  alertStatus:
    wavefrontalert-pod-restart-sample2:
      alertName: pod-restart-2
      associatedAlert:
        CR: wavefrontalert-pod-restart-sample2
      id: "1629492595288"
      lastChangeChecksum: 9b6bf561cdaee1ae709a076250e51285
      link: https://intuit.wavefront.com/alerts/1629492595288
      state: Ready
    wavefrontalert-pod-restart-sample3:
      alertName: ""
      associatedAlert: {}
      id: ""
      state: Error
  alertsCount: 0
  retryCount: 1
  state: Error

Retry worked

status:
  alertStatus:
    wavefrontalert-pod-restart-sample2:
      alertName: pod-restart-2
      associatedAlert:
        CR: wavefrontalert-pod-restart-sample2
      id: "1629492595288"
      lastChangeChecksum: 9b6bf561cdaee1ae709a076250e51285
      link: https://intuit.wavefront.com/alerts/1629492595288
      state: Ready
    wavefrontalert-pod-restart-sample3:
      alertName: pod-restart-3
      associatedAlert:
        CR: wavefrontalert-pod-restart-sample3
      id: "1629492786383"
      lastChangeChecksum: ecf7c719852a172da42509e448fee7b7
      link: https://intuit.wavefront.com/alerts/1629492786383
      state: Ready
  alertsCount: 0
  retryCount: 0
  state: Ready

What you expected to happen:
The alerts should be created successfully for the first time

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Fix resources names inside `config/` to create correct resources during `make deploy`

Is this a BUG REPORT or FEATURE REQUEST?:
This is a bug.

What happened:
Few resources such as ConfigMap, Role which are referenced inside kustomize already has alert-manager- as prefix but these resources are additionally named as alert-manager which is making the name attached wrongly during deployment. For example one configmap is named as:
alert-manager-alert-manager-configmap

What you expected to happen:
It should create resource named as alert-manager-configmap.

How to reproduce it (as minimally and precisely as possible):
100% reproducable. Just do make deploy and check the configmap name.

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Alerts Config Status: RetryCount is not set to 0 after resolving the errors

Is this a BUG REPORT or FEATURE REQUEST?:
Bug
What happened:
We had an invalid alert value in the spec which resulted in an error state in the AlertsConfig status. After correcting the value , RetryCount has not reset to 0.

What you expected to happen:
RetryCount should reset to 0 .

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

alert-manager version
Kubernetes version :

$ kubectl version -o yaml

Other debugging information (if applicable):

- controller logs:

$ kubectl logs

Setup Github Actions for CI

Setup travis ci job to build and push the images to docker hub.

keikoproj / alert-manager Goto Github PK

alert-manager's People

Contributors

Stargazers

Watchers

Forkers

alert-manager's Issues

Recommend Projects

Recommend Topics

Recommend Org