truemark / autoalarm Goto Github PK
View Code? Open in Web Editor NEWTag based alarm generation automation
License: BSD 3-Clause "New" or "Revised" License
Tag based alarm generation automation
License: BSD 3-Clause "New" or "Revised" License
Extended statistic values passed in anomaly tags are not validated to be valid inputs and can result in failed calls to cloudwatch client and failure to create or update anomaly alarms.
When I set the following tags, on an ALB, I should get an alarm created with the default values as per the README.md.
autoalarm:alb-4xx-count-anomaly = //
autoalarm:alb-4xx-count-anomaly = /
autoalarm:alb-4xx-count-anomaly =
Presently AutoAlarm only supports application load balancers. Please add network load balancer support.
The following tags should be supported
autoalarm:nlb-peak-packets-per-second = -/-/300/2/Sum (not created by default)
autoalarm:nlb-peak-packets-per-second-anomaly = Sum/300/2 (not created by default)
autoalarm:nlb-tcp-elb-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:nlb-tcp-elb-reset-count-anomaly = Sum/300/2 (is created by default)
autoalarm:processed-packets = -/-/300/2/Sum (not created by default)
autoalarm:processed-packets-anomaly = Sum/300/2 (is created by default)
autoalarm:newflowcount = -/-/300/2/Sum (not created by default)
autoalarm:newflowcount-anomaly = Sum/300/2 (not created by default)
autoalarm:consumed-tcus = -/-/300/2/Sum (not created by default)
autoalarm:consumed-tcus-anomaly = Sum/300/2 (not created by default)
autoalarm:tc-target-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:tc-target-reset-count-anomaly = Sum/300/2 (is created by default)
autoalarm:rejected-flow-count = -/-/300/2/Sum (not created by default)
autoalarm:rejected-flow-count-anomaly = Sum/300/2 (not created by default)
autoalarm:consumed-lcus = -/-/300/2/Sum (not created by default)
autoalarm:consumed-lcus-anomaly = Sum/300/2 (not created by default)
autoalarm:tcp-client-reset-count = -/-/300/2/Sum (not created by default)
autoalarm:tcp-client-reset-count-anomaly = Sum/300/2 (is created by default)
autoalarm:processed-bytes = -/-/300/2/Sum (not created by default)
autoalarm:processed-bytes-anomaly = Sum/300/2 (not created by default)
autoalarm:port-allocation-error-count = -/1/300/1/Sum (created by default)
autoalarm:port-allocation-error-count-anomaly = Sum/300/2 (not created by default)
autoalarm:active-flow-count = -/-/300/2/Sum (not created by default)
autoalarm:active-flow-count-anomaly = Sum/300/2 (not created by default)
Please work with @darenmcgill verify these are reasonable defaults.
Please review in design review when code complete with @darenmcgill and @erikrj before release to main.
You have added dependencies in the root package.json which belong in the handlers/package.json. These need to be moved to avoid inheritance into the CDK portion of the project. Please also ensure all these dependencies are actually used and get rid of any that are not.
"@aws-sdk/credential-provider-node": "^3.600.0",
"@aws-sdk/protocol-http": "^3.374.0",
"@aws-sdk/signature-v4": "^3.374.0",
"axios": "^1.7.2"
When an EC2 instance is created and is not tagged with autoalarm:disabled = false or when an EC2 instance has autoalarm:disabled removed or changed to "false", an alarm should be created to monitor the CloudWatch metric StatusCheckFailed.
When an EC2 instance is terminated or when autoalarm:disabled is changed to true the alarm previously created above should be deleted.
Do not worry where the alarm signal goes right now. Simply get alarm creation and deletion working based on the use cases above.
We have databases that are using the truemark/rds-mysql/aws module but do not have a mysql alarms module for alerting.
There is a budding problem, where full table scans are happening in DynamoDB tables. We wanted to somehow block them. Per , we can't disable full table scans, because the gui requires it. After much discussion, we decided a Cloudwatch alarm on spikes in the ConsumedReadCapacityUnits metric is the best approach, and the data systems team should be aware.
Review this list of DynamoDB metrics and create a flexible module or library to apply to all DynamoDB tables, using CDK.
Observe utilization and performance, and identify any opportunities to construct custom metrics. (Creation is not expected in this ticket, that would be separate, after review of proposed custom metrics.)
It is not clear when reading through the README.md which tags can be used with Prometheus and which tags are CloudWatch only. Additionally, when a tag may be used with Prometheus, it's unclear if the default values change. For example I'm not certain if Prometheus supports the same extended metrics that CloudWatch supports.
We want a single place to modify to add new alarms and modifying options on alarm creation. To this end alarm-config.mts was created and pushed to the develop branch. The following needs to be completed to complete the conversion:
Your PR must have approval from the team leads who own the alarms. I recommend you issue a single PR per module as you clean things up.
Be aware this change breaks backward compatibility for anomaly alarms. Before these changes are merged, you are also required to present an upgrade plan to avoid alarms from breaking and a timeline on how this will be rolled out one released.
Add ReAlarm lambda to AutoAlarm CDK Project
Create a log metric filter alarm to watch for the errors described in this blog post.
Implement this alarm on all dbs where a read replica is implemented.
This either needs to be done in autoalarm or possibly in database-collector.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.