Skip to content

Reduce unit test runtime#2056

Open
jefchien wants to merge 4 commits intomainfrom
optimize-tests
Open

Reduce unit test runtime#2056
jefchien wants to merge 4 commits intomainfrom
optimize-tests

Conversation

@jefchien
Copy link
Copy Markdown
Contributor

@jefchien jefchien commented Mar 17, 2026

Description of the issue

CloudWatch Agent unit tests can be further optimized since it currently has unit tests that run from 30 to 180 seconds. Initial deep dive shows that we are using sleep functions to avoid race conditions, there may be ways to reduce this so we can run unit tests even faster.

Current tests take ~3m20s (removed -failfast due to broken unit test)

Trimmed out anything under 2s

% go clean -testcache && time CGO_ENABLED=0 go test -timeout 15m -coverprofile coverage.txt ./...
ok      github.com/aws/amazon-cloudwatch-agent/extension/entitystore    29.550s coverage: 84.9% of statements
ok      github.com/aws/amazon-cloudwatch-agent/internal/publisher       9.134s  coverage: 92.5% of statements
ok      github.com/aws/amazon-cloudwatch-agent/internal/retryer 3.817s  coverage: 100.0% of statements
ok      github.com/aws/amazon-cloudwatch-agent/internal/tls     6.284s  coverage: 69.0% of statements
?       github.com/aws/amazon-cloudwatch-agent/plugins  [no test files]
panic: sync: negative WaitGroup counter

goroutine 247 [running]:
sync.(*WaitGroup).Add(0xc000012740, 0xffffffffffffffff)
        /home/chienjef/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.7.linux-amd64/src/sync/waitgroup.go:118 +0x23a
sync.(*WaitGroup).Done(...)
        /home/chienjef/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.25.7.linux-amd64/src/sync/waitgroup.go:156
github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile.TestLogFileMultiLogsReadingWithBlacklist.func3({0x12cc400, 0xc0006083c0})
        /workplace/chienjef/amazon-cloudwatch-agent/plugins/inputs/logfile/logfile_test.go:1341 +0xcd
github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile.(*tailerSrc).publishEvent(0xc00031c480, {{0xc000300240, 0x33, 0x40}, 0x0, 0x0}, {0x10?, 0x1?, 0x1?})
        /workplace/chienjef/amazon-cloudwatch-agent/plugins/inputs/logfile/tailersrc.go:320 +0x36b
github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile.(*tailerSrc).runTail(0xc00031c480)
        /workplace/chienjef/amazon-cloudwatch-agent/plugins/inputs/logfile/tailersrc.go:253 +0x6f1
created by github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile.(*tailerSrc).SetOutput.func1 in goroutine 174
        /workplace/chienjef/amazon-cloudwatch-agent/plugins/inputs/logfile/tailersrc.go:141 +0x65
FAIL    github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile   33.012s
ok      github.com/aws/amazon-cloudwatch-agent/plugins/outputs/cloudwatch       173.619s        coverage: 85.2% of statements
ok      github.com/aws/amazon-cloudwatch-agent/plugins/outputs/cloudwatchlogs/internal/pusher   62.455s coverage: 96.9% of statements
ok      github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals 5.062s  coverage: 78.6% of statements
ok      github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol     54.038s coverage: 95.5% of statements
ok      github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger     8.020s  coverage: 88.5% of statements
ok      github.com/aws/amazon-cloudwatch-agent/receiver/adapter 4.374s  coverage: 85.4% of statements
ok      github.com/aws/amazon-cloudwatch-agent/tool/util        16.030s coverage: 61.3% of statements
FAIL
CGO_ENABLED=0 go test -timeout 15m -coverprofile coverage.txt ./...  377.34s user 64.18s system 221% cpu 3:19.04 total

Description of changes

Reduces the test runtime to ~1m40s. Some non-test code changes were made in service of the optimization or to prevent a data race/leak.

Non-Test changes

Bug fixes

  • certWatcher.go: Add sync.WaitGroup so Start() waits for Watch() goroutine to exit before returning
  • metrics_limiter.go: Replace default: time.Sleep busy-waits with proper ticker.C selects, add defer ticker.Stop() to both GC and rotation goroutines

Refactors

  • ec2tagger.go: Move BackoffSleepArray and defaultRefreshInterval from package-level vars to struct fields with getter methods
  • cloudwatch.go: Extract backoffDuration() from backoffSleep() for testability
  • target.go: Extract timing params into struct fields with newTargetManagerWithTiming constructor

Test changes

Bug fixes

  • logfile_test.go: Fix wg.Done() called multiple times via sync.Once, preventing panic
  • logfile_test.go: Fix blacklist test comparing raw filename instead of generateLogGroupName() result
  • logthrottle_test.go: Fix data race on testLogger slices by adding sync.Mutex
  • entitystore tests (extension_test.go, ec2Info_test.go): Fix data race on logger buffer by introducing syncBuffer (thread-safe bytes.Buffer)
  • tailersrc_test.go: Fix data race on event counter by using atomic.Int32
  • certWatcher_test.go: Fix data race on callback flag by using atomic.Bool
  • ec2tagger_test.go: Fix data race from package-level variable mutation by using struct field assignment
  • certWatcher_test.go: Fix test teardown ordering — use t.Cleanup with sync.WaitGroup to ensure Start() goroutine exits before t.TempDir() removes cert files

General Improvements

  • Replace time.Sleep with require.Eventually for faster, more reliable synchronization
  • Add t.Parallel() throughout where safe (cloudwatch, pusher, cardinalitycontrol, entitystore, publisher, ec2tagger, certWatcher, processor)
  • Rewrite TestBackoffRetries to call backoffDuration() directly instead of backoffSleep(), eliminating ~45s of real sleeping
  • Reduce TestPublish workload from 250K to 100K metrics
  • Reduce TestProcessMetricsWithConcurrency random sleep range from 0–4900ms to 0–9ms
  • Reduce TestBackupConfigFile per-iteration sleep from 1s to 10ms (16 iterations: 16s → 160ms)
  • Replace TestMain + shared testdata with per-test t.TempDir() in certWatcher tests
  • Reduce oversized aggregation intervals, polling sleeps, and workload sizes where original values were unnecessarily large

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

% go clean -testcache && time make test
CGO_ENABLED=0 go test -timeout 15m -coverprofile coverage.txt -failfast ./...
ok  	github.com/aws/amazon-cloudwatch-agent/extension/entitystore	0.732s	coverage: 84.9% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/internal/publisher	2.626s	coverage: 92.5% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/internal/retryer	0.344s	coverage: 91.1% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/internal/tls	3.794s	coverage: 70.5% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/inputs/logfile	28.395s	coverage: 74.1% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/outputs/cloudwatch	17.833s	coverage: 85.4% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/outputs/cloudwatchlogs/internal/pusher	3.246s	coverage: 96.9% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals	0.527s	coverage: 78.6% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/processors/awsapplicationsignals/internal/cardinalitycontrol	1.255s	coverage: 95.5% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/plugins/processors/ec2tagger	0.392s	coverage: 88.7% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/receiver/adapter	0.486s	coverage: 85.4% of statements
ok  	github.com/aws/amazon-cloudwatch-agent/tool/util	0.197s	coverage: 61.3% of statements
make test  385.74s user 59.13s system 460% cpu 1:36.55 total

Requirements

Before commiting your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

PR Build: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/23221866759
Build Test Artifacts: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/23216131141
Integration Tests: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/23222042708

@jefchien jefchien requested a review from a team as a code owner March 17, 2026 23:46
@jefchien jefchien changed the title Improve unit test time. Reduce unit test runtime Mar 17, 2026
@jefchien jefchien added the ready for testing Indicates this PR is ready for integration tests to run label Mar 17, 2026
@jefchien jefchien changed the base branch from opsathon to main March 20, 2026 23:00
@okankoAMZ
Copy link
Copy Markdown
Contributor

Do we need to run go race condition tests for the unit test; I want to make sure there are no race conditions. Since it is a unit test I doubt we need to manage the parallelism but just want to make sure.

okankoAMZ
okankoAMZ previously approved these changes Mar 24, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Apr 1, 2026
@github-actions github-actions bot removed the Stale label Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label Apr 10, 2026
# Conflicts:
#	extension/entitystore/ec2Info_test.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants