feat(e2e): dynamically fetch VM extension version with caching, timeouts, and fallback#8064
Open
feat(e2e): dynamically fetch VM extension version with caching, timeouts, and fallback#8064
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the e2e VMSS model generation to dynamically resolve the latest Compute.AKS.Linux.AKSNode VM extension version via the Azure API and memoize the result to avoid repeated lookups.
Changes:
- Add an e2e cache key + cached wrapper for
GetLatestVMExtensionImageVersion. - Update
createVMExtensionLinuxAKSNodeto accept acontext.Contextand use the cached Azure lookup instead of a hardcoded version. - Optimize
GetLatestVMExtensionImageVersionto request only the top result usingTop+Orderby.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| e2e/test_helpers.go | Switch AKSNode extension creation to use cached latest-version lookup; update one call site for new signature. |
| e2e/scenario_test.go | Update scenario to pass a context into createVMExtensionLinuxAKSNode. |
| e2e/scenario_gpu_managed_experience_test.go | Update multiple GPU scenarios to pass a context into createVMExtensionLinuxAKSNode. |
| e2e/config/azure.go | Change extension version lookup to a server-side “top 1, order by name desc” query. |
| e2e/cache.go | Add cached wrapper + cache-key struct for VM extension latest-version lookups. |
8982d50 to
a9e9293
Compare
a9e9293 to
7400daf
Compare
b7537c1 to
fb410cb
Compare
Replace hardcoded VM extension version "1.406" with a cached Azure API call that fetches the latest version at runtime. Add a caching layer via CachedGetLatestVMExtensionImageVersion to avoid redundant API calls across tests. Update all callers to pass context for proper propagation. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
…ookup Replace `sort.Slice` + last-element pattern with `slices.MaxFunc` in `GetLatestVMExtensionImageVersion` for `O(n)` instead of `O(n log n)`. Rename the less method to `cmp` using `cmp.Compare` from stdlib, and add a nil name guard before accessing the result. Move the nil name check in `parseVersion` to an early return before splitting. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Replace the observation-only Test_CreateVMExtensionLinuxAKSNode_Timing from the GPU scenario file with proper unit tests in cache_test.go that assert caching correctness, warm call performance, per-key isolation, error caching, and struct key support. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Prevent the ListVersions Azure API call from blocking indefinitely when the API hangs, which would cause the global 90-minute test timeout to panic and kill all parallel tests. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
…ilure Instead of failing the test when the Azure API call to discover the latest VM extension version fails, fall back to a known-good hardcoded version (1.406) and log a warning. This prevents transient API issues like network errors or throttling from causing test failures. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
…ison Extract GetLatestVMExtensionImageVersion logic into a testable unexported function accepting a vmExtensionImageVersionLister interface. Add 26 table-driven tests covering parseVersion, vmExtensionVersion.cmp, and getLatestVMExtensionImageVersion including edge cases for nil names, malformed versions, API errors, and empty responses. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
fmt.Sprintf does not support the error-wrapping directive %w, which caused a build failure. The %w verb is only valid in fmt.Errorf. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
Replace fragile wall-clock timing assertions with deterministic checks for the createVMExtensionLinuxAKSNode cache test. The previous 10s/1s thresholds tested CI/network performance rather than caching logic. - Add testing.Short() skip guard for integration test - Add nil/empty checks on returned extensions and properties - Replace assert with require for fail-fast behavior - Keep cache consistency check and duration logging for diagnostics - Remove unused assert import Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
The nil guard in Logf fell through to t.Helper() and t.Logf() when the context had no testing.TB, causing a nil-pointer dereference panic. Add the missing return to match the sibling Log function's pattern. Signed-off-by: Suraj Deshmukh <suraj.deshmukh@microsoft.com>
fb410cb to
7872d35
Compare
Member
Author
➜ cd e2e
➜ go test -run Test_CreateVMExtensionLinuxAKSNode_Timing -v -count=1
...
=== RUN Test_CreateVMExtensionLinuxAKSNode_Timing
scenario_gpu_managed_experience_test.go:619: First call duration: 6.699633333s
scenario_gpu_managed_experience_test.go:627: Second call duration: 17.875µs
--- PASS: Test_CreateVMExtensionLinuxAKSNode_Timing (6.70s)
PASS
ok github.com/Azure/agentbaker/e2e 7.547s |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
Replaces the hardcoded VM extension version
"1.406"increateVMExtensionLinuxAKSNodewith a dynamic Azure API lookup that fetches the latest version at runtime, cached to avoid redundant API calls across parallel tests.Dynamic version discovery
CachedGetLatestVMExtensionImageVersionine2e/cache.gowith aGetLatestExtensionVersionRequeststruct key supportingLocation,ExtType, andPublishercreateVMExtensionLinuxAKSNodeine2e/test_helpers.goto accept acontext.Context, resolve location fromconfig.Config.DefaultLocation, and call the cached version lookupe2e/scenario_gpu_managed_experience_test.goande2e/scenario_test.goto passt.Context()Robustness
context.WithTimeoutincreateVMExtensionLinuxAKSNodeto preventListVersionsfrom blocking indefinitely and consuming the global 90-minute test timeout"1.406"with a warning log when the Azure API call fails (timeout, throttling, network error), instead of failing the testtoolkit.Logf: add missingreturnafter the nil guard'slog.Printffallback, matching the siblingLogfunction's pattern — without this, anytoolkit.Logfcall from a context withouttesting.TB(e.g.t.Context()) dereferences a niltRefactoring
VMExtenstionVersion→vmExtensionVersion(unexported, fixing the typo)sort.Slice+ last-element withslices.MaxFuncfor O(n) version lookupless→cmpusingcmp.Comparefrom stdlib, returning-1/0/1parseVersionas an early return, and on theslices.MaxFuncresult inGetLatestVMExtensionImageVersiongetLatestVMExtensionImageVersionfunction behind avmExtensionImageVersionListerinterface%w→%vinfmt.Sprintfpanic call ine2e/config/config.go(%wis only valid infmt.Errorf)Tests
e2e/config/azure_vmext_test.go— 26 table-driven unit tests:Test_parseVersion: 10 cases (three-part, two-part, single-part, nil name, non-numeric, partially numeric, empty string, extra parts, large numbers, leading zeros)Test_vmExtensionVersion_cmp: 9 cases (equal, major/minor/patch comparisons, zero values)Test_getLatestVMExtensionImageVersion: 7 cases (multiple versions, single version, two-part versions, API errors, empty list, nil names, mixed valid/malformed)e2e/cache_test.go— 5 unit tests forcachedFunc: consistent results, warm-vs-cold performance, per-key isolation, error caching, struct key supporte2e/scenario_gpu_managed_experience_test.go—Test_CreateVMExtensionLinuxAKSNode_Timingintegration test withtesting.Short()skip guard and deterministic cache consistency assertionsTesting Done