perf: cache ARM SDK clients to eliminate per-call TCP+TLS overhead#6890
perf: cache ARM SDK clients to eliminate per-call TCP+TLS overhead#6890spboyer wants to merge 3 commits intoAzure:mainfrom
Conversation
Add generic clientCache[T] helper using sync.Map to cache ARM SDK clients by subscription ID (or tenant ID for Graph). All 21 createXxxClient methods across 6 structs in pkg/azapi/ now reuse cached clients instead of creating new ones per API call. Affected structs: - StandardDeployments (2 client types, ~15 call sites) - ResourceService (2 client types, ~10 call sites) - StackDeployments (1 client type, ~10 call sites) - AzureClient (10+ client types, ~35 call sites) - containerRegistryService (1 client type) - managedClustersService (1 client type) - UserProfileService (1 client type, keyed by tenant ID) Addresses finding 2 from Azure#6886 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Simulates a realistic azd deploy flow (2 Container App services) and measures client creation counts before vs after caching: BEFORE (no caching): 30 clients = 30 TCP+TLS handshakes AFTER (with caching): 6 clients = 6 TCP+TLS handshakes SAVED: 24 fewer creations (80% reduction) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Network Overhead Stats (Before vs After)Measured via \TestClientCache_NetworkOverheadComparison\ -- simulates a realistic \�zd deploy\ flow with 2 Container App services against 1 subscription. Client Creation / TCP+TLS Handshake Comparison
Breakdown by Client Type
Live Deploy Timing (from benchmark runs)
The 2% wall-clock improvement on deploy is modest because the deploy path is dominated by Azure-side PATCH+poll latency. The 80% reduction in client creations has greater impact during provisioning (where \DeploymentsClient\ is polled repeatedly) and in azd down (resource enumeration). |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces ARM SDK client caching to eliminate repeated TCP+TLS handshake overhead by reusing long-lived clients instead of creating them per API call. The implementation adds a generic clientCache[T] helper using sync.Map and wraps all 21 createXxxClient methods across 6 service structs to cache clients by subscription or tenant ID.
Changes:
- Added generic
clientCache[T]helper withsync.Mapfor thread-safe caching keyed by subscription/tenant ID - Modified all ARM SDK client creation methods in
pkg/azapi/to use caching, eliminating redundant client instantiation during deployment operations - Included comprehensive tests validating cache miss/hit, concurrency safety, error propagation, and network overhead reduction
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/pkg/azapi/client_cache.go | Generic client cache implementation using sync.Map with GetOrCreate pattern |
| cli/azd/pkg/azapi/client_cache_test.go | Five unit tests plus network overhead comparison demonstrating 80%+ reduction in client creations |
| cli/azd/pkg/azapi/azure_client.go | Added 13 cache fields and imports for all AzureClient-managed ARM SDK client types |
| cli/azd/pkg/azapi/standard_deployments.go | Cached DeploymentsClient and DeploymentOperationsClient (heavily used during provision polling) |
| cli/azd/pkg/azapi/resource_service.go | Cached ResourcesClient and ResourceGroupsClient |
| cli/azd/pkg/azapi/stack_deployments.go | Cached deployment stacks Client |
| cli/azd/pkg/azapi/container_registry.go | Cached RegistriesClient for ACR operations |
| cli/azd/pkg/azapi/managed_clusters.go | Cached ManagedClustersClient for AKS operations |
| cli/azd/pkg/azapi/user_profile.go | Cached GraphClient keyed by tenant ID |
| cli/azd/pkg/azapi/webapp.go | Cached WebAppsClient and ZipDeployClient (latter using composite subscription+hostname key) |
| cli/azd/pkg/azapi/static_webapp.go | Cached StaticSitesClient |
| cli/azd/pkg/azapi/cognitive_service.go | Cached 5 cognitive services client types (AccountsClient, DeletedAccountsClient, ModelsClient, UsagesClient, ResourceSKUsClient) |
| cli/azd/pkg/azapi/log_analytics.go | Cached WorkspacesClient for log analytics operations |
| cli/azd/pkg/azapi/managed_hsm.go | Cached ManagedHsmsClient for key vault HSM operations |
| cli/azd/pkg/azapi/appconfig.go | Cached ConfigurationStoresClient |
| cli/azd/pkg/azapi/apim.go | Cached ServiceClient and DeletedServicesClient for API Management |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
|
We create clients on demand but we use the same client-options (from ioc container). The clien-options contains the http-transport policy - which is not created by default for every client - |
Description
Cache ARM SDK clients across
pkg/azapi/by subscription ID to eliminate redundant pipeline construction, credential resolution, and object allocation overhead.Closes: #6889
Parent: #6886 (Performance Review Audit -- Finding 2)
Changes
New: Generic client cache helper
pkg/azapi/client_cache.go--clientCache[T]usingsync.Map, keyed by string (subscription/tenant ID)pkg/azapi/client_cache_test.go-- 6 tests (miss, hit, different keys, 64-goroutine concurrency, error propagation, network overhead comparison)Modified: All 21 createXxxClient methods across 6 structs
What This Saves
The actual savings from client caching are:
NewXxxClient()call allocates a new pipeline with policy chain (retry, logging, auth, correlation, user-agent policies). During provision polling,createDeploymentsClientalone is called ~11 times.CredentialForSubscription()performs async.Maplookup + subscription-to-tenant resolution on each call.Client Creation Comparison
Live Deploy Timing
Code Review
Testing
go buildclean