Giter Site home page Giter Site logo

amazon-cloudwatch-agent-test's Introduction

Amazon CloudWatch Agent Test

Be sure to:

  • Edit your repository description on GitHub

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Dev Manuals

  1. Learning Testing Workflow

amazon-cloudwatch-agent-test's People

Contributors

adam-mateen avatar amazon-auto avatar bhanuba avatar bryce-carey avatar chadpatel avatar chenalee avatar dchappa avatar dependabot[bot] avatar jaypolanco avatar jefchien avatar khanhntd avatar klwntsingh avatar lisguo avatar mitali-salvi avatar movence avatar musa-asad avatar nathalapooja avatar okankoamz avatar paramadon avatar saxypandabear avatar sethamazon avatar sky333999 avatar taohungyang avatar williazz avatar ymtaye avatar zhihonl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-cloudwatch-agent-test's Issues

[Discussion] Should metrics be iteratively tested in combinations instead of from a single JSON with all metrics enabled?

Issue raised by @adam-mateen in PR #67

Instead of just testing a single JSON config with ALL metrics enabled, it would be better to iteratively test combinations of metrics.

  • only configure metric A, and verify only A is reported with recent timestamp. B, C, D, ... should not be reported.
  • only configure metric B, and verify only B is reported with recent timestamp.
  • .... repeat for metrics A-Z.
  • Then configure metric A and B, and verify only those 2 get reported.

I realize the number of combinations will be too large and make test time a pain, so please think of ways to reduce the combinations... Maybe iterate from 0 to total_num_metrics, and than randomly select current_count number of metrics.

Collectd tests fail due to a connection refused

Description of the issue

  • Collectd tests are currently failing with the following error:
2023/05/07 06:10:11 CollectD test group failed due to: %!w(*fmt.wrapError=&{Failed to complete setup after agent run due to: write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused 0xc00026f400})

Analysis

  • Adding a small sleep after the first flush seemingly fixes the issue but root cause analysis to understand why the first flush call does not receive a connection refused but the subsequent one does is pending. See #210 for the change.
  • I also added additional logging to print the {status of the agent & the agent logs} once before the initial flush & again before the subsequent flush.
=== RUN   TestMetricValueBenchmarkSuite
>>>> Starting MetricBenchmarkTestSuite
=== RUN   TestMetricValueBenchmarkSuite/TestAllInSuite
2023/05/07 06:10:10 Executing subset of plugin tests: [collectd]
2023/05/07 06:10:10 Running CollectD
2023/05/07 06:10:10 Starting agent using agent config file agent_configs/collectd_config.json
2023/05/07 06:10:10 Copy File agent_configs/collectd_config.json to /opt/aws/amazon-cloudwatch-agent/bin/config.json
2023/05/07 06:10:10 File agent_configs/collectd_config.json abs path /home/ec2-user/amazon-cloudwatch-agent-test/test/metric_value_benchmark/agent_configs/collectd_config.json
2023/05/07 06:10:10 File : agent_configs/collectd_config.json copied to : /opt/aws/amazon-cloudwatch-agent/bin/config.json
2023/05/07 06:10:10 Agent has started
/////////////////////////////////////////////////////////////////////////
2023/05/07 06:10:10 First flush
2023/05/07 06:10:10 Agent status:{
  "status": "running",
  "starttime": "2023-05-07T06:10:09+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247354.0b251981"
}
May 07 06:10:10 ip-172-31-29-254.ec2.internal systemd[1]: Started Amazon CloudWatch Agent.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: I! Detecting run_as_user...
/////////////////////////////////////////////////////////////////////////
2023/05/07 06:10:10 Next flush
2023/05/07 06:10:11 Collectd here8. write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused
2023/05/07 06:10:11 Agent status:{
  "status": "running",
  "starttime": "2023-05-07T06:10:09+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247354.0b251981"
}
May 07 06:10:10 ip-172-31-29-254.ec2.internal systemd[1]: Started Amazon CloudWatch Agent.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json does not exist or cannot read. Skipping it.
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: I! Detecting run_as_user...
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! CWAGENT_LOG_LEVEL is set to "DEBUG"
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Starting AmazonCloudWatchAgent 1.247354.0b251981
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! AWS SDK log level not set
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded inputs: socket_listener
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded aggregators:
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded processors: ec2tagger
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Loaded outputs: cloudwatch
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! Tags enabled: host=ip-172-31-29-254.ec2.internal
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [agent] Config: Interval:15s, Quiet:false, Hostname:"ip-172-31-29-254.ec2.internal", Flush Interval:1s
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Initializing plugins
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [logagent] starting
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: Check ec2 metadata
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [processors.ec2tagger] ec2tagger: EC2 tagger has started, finished initial retrieval of tags and Volumes
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Connecting outputs
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Attempting connection to [outputs.cloudwatch]
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! Successfully created credential sessions
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! cloudwatch: get unique roll up list []
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Successfully connected to outputs.cloudwatch
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z D! [agent] Starting service inputs
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! [inputs.socket_listener] Listening on udp://127.0.0.1:25826
May 07 06:10:10 ip-172-31-29-254.ec2.internal start-amazon-cloudwatch-agent[25301]: 2023-05-07T06:10:10Z I! cloudwatch: publish with ForceFlushInterval: 5s, Publish Jitter: 3s
2023/05/07 06:10:11 CollectD test group failed due to: %!w(*fmt.wrapError=&{Failed to complete setup after agent run due to: write udp 127.0.0.1:44705->127.0.0.1:25826: write: connection refused 0xc00026f400})
    metrics_value_benchmark_test.go:103:
        	Error Trace:	/home/ec2-user/amazon-cloudwatch-agent-test/test/metric_value_benchmark/metrics_value_benchmark_test.go:103
        	Error:      	Not equal:
        	            	expected: "Successful"
        	            	actual  : "Failed"

        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -1,2 +1,2 @@
        	            	-(status.TestStatus) (len=10) "Successful"
        	            	+(status.TestStatus) (len=6) "Failed"

        	Test:       	TestMetricValueBenchmarkSuite/TestAllInSuite
        	Messages:   	Metric Benchmark Test Suite Failed
2023/05/07 06:10:11 >>>>>>>>>>>>>><<<<<<<<<<<<<<
2023/05/07 06:10:11 >>>>>>>>>>>>>>Failed<<<<<<<<<<<<<<
2023/05/07 06:10:11 ==============CollectD==============
2023/05/07 06:10:11 ==============Failed==============
Starting Agent   Failed
2023/05/07 06:10:11 ==============================
2023/05/07 06:10:11 >>>>>>>>>>>>>>><<<<<<<<<<<<<<<
>>>> Finished MetricBenchmarkTestSuite
--- FAIL: TestMetricValueBenchmarkSuite (1.32s)
    --- FAIL: TestMetricValueBenchmarkSuite/TestAllInSuite (1.32s)
FAIL
FAIL	github.com/aws/amazon-cloudwatch-agent-test/test/metric_value_benchmark	1.330s
FAIL
  • As can be seen in the logs above, the agent says its in a running status in both cases but two things stand out:
  1. For the initial flush, the agent logs indicate it was still at a very initial stage before any of the plugins were even initialized - so assuming the agent wasnt listening on the udp port yet, why did the flush not fail?
  2. For the subsequent flush, the agent logs indicate it was listening on the udp port - so why did the flush fail this time with a connection refused?

Intermittent test failure trying to read logs from CloudWatch

Copied from aws/amazon-cloudwatch-agent#552 since the integ tests are in this repo now.

null_resource.integration_test (remote-exec):     cwl_util.go:62: Error occurred while getting log events: operation error CloudWatch Logs: GetLogEvents, https response error StatusCode: 400, RequestID: 9068b214-19ed-4ee9-beea-951cc1cdf4b7, ResourceNotFoundException: The specified log group does not exist.

can be seem sometimes from CloudWatch Logs integration tests.

Should either:

  • Poll for the existence of the log group before trying to get logs
  • Retry the GetLogEvents call on ResourceNotFoundException

Update docs to include EKS cluster creation permissions

Error: creating EKS Cluster (cwagent-eks-integ-742c94a460e0d907): AccessDeniedException: User: [ARN] is not authorized to perform: eks:CreateCluster on resource: arn:aws:eks:us-west-2:***:cluster/cwagent-eks-integ-742c94a460e0d907

The public docs say to attach the AmazonEKSClusterPolicy, but I didn't see that you must also create an EKS cluster IAM role: https://docs.aws.amazon.com/eks/latest/userguide/service_IAM_role.html#create-service-role

I'm not sure if we really care to create a whole new role, and it sucks that there isn't a managed policy for this. We should include the necessary permissions in the existing set up documentation

Test package does not build on windows

go test -run=NO_MATCH ./...
go: downloading golang.org/x/sys v0.2.0
go: downloading github.com/mitchellh/mapstructure v1.5.0
go: downloading github.com/aws/aws-sdk-go-v2 v1.17.3
go: downloading github.com/aws/aws-sdk-go-v2/config v1.18.[10](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:11)
go: downloading github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.[11](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:12).49
go: downloading github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.[12](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:13).21
go: downloading github.com/aws/aws-sdk-go-v2/service/s3 v1.30.1
go: downloading github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.21.6
go: downloading github.com/aws/aws-sdk-go-v2/service/cloudwatchlogs v1.15.20
go: downloading github.com/aws/aws-sdk-go-v2/service/ec2 v1.77.0
go: downloading github.com/aws/aws-sdk-go-v2/service/ecs v1.19.1
go: downloading github.com/aws/aws-sdk-go-v2/service/ssm v1.33.0
go: downloading github.com/stretchr/testify v1.8.1
go: downloading github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.10.0
go: downloading github.com/aws/aws-sdk-go-v2/service/dynamodb v1.17.1
go: downloading github.com/google/uuid v1.3.0
go: downloading github.com/aws/smithy-go v1.[13](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:14).5
go: downloading github.com/aws/aws-sdk-go-v2/credentials v1.13.10
go: downloading github.com/aws/aws-sdk-go-v2/internal/ini v1.3.28
go: downloading github.com/aws/aws-sdk-go-v2/service/sso v1.12.0
go: downloading github.com/aws/aws-sdk-go-v2/service/ssooidc v1.[14](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:15).0
go: downloading github.com/aws/aws-sdk-go-v2/service/sts v1.18.2
go: downloading github.com/aws/aws-sdk-go-v2/internal/configsources v1.1.27
go: downloading github.com/jmespath/go-jmespath v0.4.0
go: downloading github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.4.10
go: downloading github.com/aws/aws-sdk-go-v2/internal/v4a v1.0.18
go: downloading github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.9.11
go: downloading github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.1.22
go: downloading github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.9.21
go: downloading github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.13.21
go: downloading github.com/davecgh/go-spew v1.1.1
go: downloading github.com/pmezard/go-difflib v1.0.0
go: downloading gopkg.in/yaml.v3 v3.0.1
go: downloading github.com/aws/aws-sdk-go-v2/service/dynamodbstreams v1.13.20
go: downloading github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.7.17
go: downloading github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.4.21
?   	github.com/aws/amazon-cloudwatch-agent-test/environment	[no test files]
?   	github.com/aws/amazon-cloudwatch-agent-test/environment/computetype	[no test files]
?   	github.com/aws/amazon-cloudwatch-agent-test/environment/ecsdeploymenttype	[no test files]
?   	github.com/aws/amazon-cloudwatch-agent-test/environment/ecslaunchtype	[no test files]
# github.com/aws/amazon-cloudwatch-agent-test/filesystem
Error: filesystem\window_permission.go:22:[16](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:17): undefined: Acl
Error: filesystem\window_permission.go:23:9: undefined: GetNamedSecurityInfo
Error: filesystem\window_permission.go:24:3: undefined: SE_FILE_OBJECT
Error: filesystem\window_permission.go:25:3: undefined: DACL_SECURITY_INFORMATION
Error: filesystem\window_permission.go:36:[18](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:19): undefined: AclSizeInformation
Error: filesystem\window_permission.go:37:8: undefined: GetAclInformation
Error: filesystem\window_permission.go:37:50: undefined: AclSizeInformationEnum
Error: filesystem\window_permission.go:77:13: undefined: AccessAllowedAce
Error: filesystem\window_permission.go:78:13: undefined: GetAce
Error: filesystem\window_permission.go:86:22: undefined: ACCESS_DENIED_ACE_TYPE
Error: filesystem\window_permission.go:86:22: too many errors
# github.com/aws/amazon-cloudwatch-agent-test/test/collection_interval [github.com/aws/amazon-cloudwatch-agent-test/test/collection_interval.test]
Error: test\collection_interval\collection_interval_test.go:77:48: undefined: common.ConfigOutputPath
Error: test\collection_interval\collection_interval_test.go:84:31: undefined: common.Host
Error: test\collection_interval\collection_interval_test.go:95:30: undefined: common.ConfigOutputPath
Error: test\collection_interval\collection_interval_test.go:99:61: undefined: common.Namespace
# github.com/aws/amazon-cloudwatch-agent-test/test/ca_bundle [github.com/aws/amazon-cloudwatch-agent-test/test/ca_bundle.test]
Error: test\ca_bundle\ca_bundle_test.go:73:11: undefined: common.ReplaceLocalStackHostName
Error: test\ca_bundle\ca_bundle_test.go:80:[21](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:22): undefined: common.ReadAgentOutput
FAIL	github.com/aws/amazon-cloudwatch-agent-test/test/ca_bundle [build failed]
FAIL	github.com/aws/amazon-cloudwatch-agent-test/test/collection_interval [build failed]
ok  	github.com/aws/amazon-cloudwatch-agent-test/test/ecs/ecs_metadata	0.0[32](https://github.com/williazz/amazon-cloudwatch-agent-test/actions/runs/4069178263/jobs/7008610468#step:4:33)s [no tests to run]
FAIL	github.com/aws/amazon-cloudwatch-agent-test/test/nvidia_gpu [build failed]
ok  	github.com/aws/amazon-cloudwatch-agent-test/test/sanity	0.059s [no tests to run]
mingw32-make: *** [makefile:14: compile] Error 2
Error: Process completed with exit code 1.

Create integration test for renaming Windows system metric

The logic for deriving the default name for Windows metrics differs from Unix. Rather than using a _ to separate, on Windows, the agent uses to separate. For example, instead of cpu_time_active, for the CPU plugin, the agent would hypothetically decorate the metric as cpu time_active. A bug was identified where we weren't properly renaming the metric when the prefix is omitted in the rename configuration, e.g.: "rename": "time_active" on Linux. The integration test that checks this on Linux is failing in our staging repo. There is nothing that would have caught this for Windows, as of right now. We need to add an integration test for that edge case.

Refactor test code structure

Currently, the test code layout is messy, with so many convoluted (and possibly circular) dependency directions for code execution. We should spend some time to clean up the codebase to make it easier to follow. My biggest concern is around the metric_value_benchmark test directory and everything in there. This is the biggest headache at the moment.

Add build checks in CI

There aren't any gates to ensure that the integration test code here actually builds, neither in raised PRs nor on merges to main. We should add GitHub actions on push to main and on PR submissions to make sure that the code compiles.

This will likely involve adding a build target in the Makefile to run go build on each of the directories in the repo to ensure that they build successfully

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.