Giter Site home page Giter Site logo

autoalarm's Issues

Alarm not creating on ALB

When I set the following tags, on an ALB, I should get an alarm created with the default values as per the README.md.

autoalarm:alb-4xx-count-anomaly = //
autoalarm:alb-4xx-count-anomaly = /
autoalarm:alb-4xx-count-anomaly =

Network load balancer support

Presently AutoAlarm only supports application load balancers. Please add network load balancer support.

The following tags should be supported

autoalarm:nlb-peak-packets-per-second = -/-/300/2/Sum (not created by default)
autoalarm:nlb-peak-packets-per-second-anomaly = Sum/300/2 (not created by default)

autoalarm:nlb-tcp-elb-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:nlb-tcp-elb-reset-count-anomaly = Sum/300/2 (is created by default)

autoalarm:processed-packets = -/-/300/2/Sum (not created by default)
autoalarm:processed-packets-anomaly = Sum/300/2 (is created by default)

autoalarm:newflowcount = -/-/300/2/Sum (not created by default)
autoalarm:newflowcount-anomaly = Sum/300/2 (not created by default)

autoalarm:consumed-tcus = -/-/300/2/Sum (not created by default)
autoalarm:consumed-tcus-anomaly = Sum/300/2 (not created by default)

autoalarm:tc-target-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:tc-target-reset-count-anomaly = Sum/300/2 (is created by default)

autoalarm:rejected-flow-count = -/-/300/2/Sum (not created by default)
autoalarm:rejected-flow-count-anomaly = Sum/300/2 (not created by default)

autoalarm:consumed-lcus = -/-/300/2/Sum (not created by default)
autoalarm:consumed-lcus-anomaly = Sum/300/2 (not created by default)

autoalarm:tcp-client-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:tcp-client-reset-count-anomaly = Sum/300/2 (is created by default)

autoalarm:processed-bytes = -/-/300/2/Sum (not created by default)
autoalarm:processed-bytes-anomaly = Sum/300/2 (not created by default)

autoalarm:port-allocation-error-count = -/1/300/1/Sum (created by default)
autoalarm:port-allocation-error-count-anomaly = Sum/300/2 (not created by default)

autoalarm:active-flow-count = -/-/300/2/Sum (not created by default)
autoalarm:active-flow-count-anomaly = Sum/300/2 (not created by default)

Please work with @darenmcgill verify these are reasonable defaults.

Please review in design review when code complete with @darenmcgill and @erikrj before release to main.

Dependencies in root package.json

You have added dependencies in the root package.json which belong in the handlers/package.json. These need to be moved to avoid inheritance into the CDK portion of the project. Please also ensure all these dependencies are actually used and get rid of any that are not.

"@aws-sdk/credential-provider-node": "^3.600.0",
"@aws-sdk/protocol-http": "^3.374.0",
"@aws-sdk/signature-v4": "^3.374.0",
"axios": "^1.7.2"

Implement EC2 StatusCheckFailed Alarm

When an EC2 instance is created and is not tagged with autoalarm:disabled = false or when an EC2 instance has autoalarm:disabled removed or changed to "false", an alarm should be created to monitor the CloudWatch metric StatusCheckFailed.

When an EC2 instance is terminated or when autoalarm:disabled is changed to true the alarm previously created above should be deleted.

Do not worry where the alarm signal goes right now. Simply get alarm creation and deletion working based on the use cases above.

Implement RDS alarms for MySQL

We have databases that are using the truemark/rds-mysql/aws module but do not have a mysql alarms module for alerting.

Create alarms module for DynamoDB

There is a budding problem, where full table scans are happening in DynamoDB tables. We wanted to somehow block them. Per , we can't disable full table scans, because the gui requires it. After much discussion, we decided a Cloudwatch alarm on spikes in the ConsumedReadCapacityUnits metric is the best approach, and the data systems team should be aware.
Review this list of DynamoDB metrics and create a flexible module or library to apply to all DynamoDB tables, using CDK.
Observe utilization and performance, and identify any opportunities to construct custom metrics. (Creation is not expected in this ticket, that would be separate, after review of proposed custom metrics.)

Centralized Alarm Configurations

We want a single place to modify to add new alarms and modifying options on alarm creation. To this end alarm-config.mts was created and pushed to the develop branch. The following needs to be completed to complete the conversion:

  1. Modify all modules so the alarm configurations are in MetricAlarmConfigs.
  2. Modify how Anomaly alarms work since they will follow the same conventions as normal metric alarms going forward.
  3. Add additional unit tests to alarm-config.test.mts.
  4. Update the README accordingly
  5. Update the CHANGELOG.md entry for the upcoming release

Your PR must have approval from the team leads who own the alarms. I recommend you issue a single PR per module as you clean things up.

Be aware this change breaks backward compatibility for anomaly alarms. Before these changes are merged, you are also required to present an upgrade plan to avoid alarms from breaking and a timeline on how this will be rolled out one released.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.