Add Azure IMDS Support via Multi-Cloud Metadata Provider#1991
Add Azure IMDS Support via Multi-Cloud Metadata Provider#1991Paramadon wants to merge 14 commits intofeature-multi-cloudfrom
Conversation
726d806 to
251ef82
Compare
af1485e to
ac45e79
Compare
| // Initialize global cloud metadata provider early (non-blocking with timeout) | ||
| // Covers all agent modes (logs-only and OTEL) | ||
| log.Println("I! [agent] Initializing cloud metadata provider...") | ||
| initCtx, cancel := context.WithTimeout(ctx, 5*time.Second) |
There was a problem hiding this comment.
should this magic number be defined elsewhere? Should it be configurable?
looks async so its probably fine but can it slow down initialization? What if it fails?
There was a problem hiding this comment.
some features may be limited, I guess that means we can't get things from Azure or AWS IMDS
Conceptually are we supporting a whole new modality, supporting the scenario where the cloud metadata provider cannot be initialized?
Not sure the answer will keep thinking as I review
1a4b2ce to
620362c
Compare
Rename unused 'r' parameter to '_' in TestProvider_Refresh_Timeout to satisfy revive linter.
The return statement in InitGlobalProvider was reading globalErr without holding the lock, causing a race with concurrent readers.
sync.Once cannot be safely reset while concurrent Do() calls may be in progress. Replace with atomic uint32 flag and double-checked locking pattern, which allows safe reset from tests without racing.
- Reset global provider in TestTranslator to ensure test uses mock metadata - Update placeholderUtil tests to use SetGlobalProviderForTest instead of relying on legacy fallback path which doesn't work on Azure - Skip TestGetMetadataInfo_FallbackToLegacy on Azure since azure.IsAzure() takes precedence over the legacy fallback path
The test was resetting the global provider but then calling GetMetadataInfo which falls through to the Azure path on Azure CI runners. Now we set the mock provider first so GetMetadataInfo uses it instead of Azure IMDS.
30d6b51 to
f970232
Compare
| metadata ec2metadataprovider.MetadataProvider | ||
|
|
||
| // Cached metadata (fetched once at initialization) | ||
| mu sync.RWMutex |
There was a problem hiding this comment.
If it's only fetched once, consider using sync.Once, so we don't have to lock and unlock on each Get* call.
There was a problem hiding this comment.
mutex is needed because Refresh() can update the cached values. While we fetch once at initialization, the interface supports refreshing (used by ec2tagger for periodic updates). The lock protects against concurrent reads during refresh.
| // IsAzure detects if running on Azure using multiple methods: | ||
| // IsAzure detects if running on Azure. | ||
| // Detection order: | ||
| // 1. DMI sys_vendor check | ||
| // 2. DMI chassis asset tag check (Azure-specific) | ||
| // 3. IMDS probe as fallback (for containers without DMI access) | ||
| func IsAzure() bool { | ||
| // 1. Check sys_vendor for Microsoft | ||
| if data, err := os.ReadFile(DMISysVendorPath); err == nil { | ||
| if strings.Contains(strings.TrimSpace(string(data)), microsoftCorporation) { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| // 3. Check chassis asset tag (Azure-specific identifier) | ||
| if data, err := os.ReadFile(DMIChassisAssetPath); err == nil { | ||
| if strings.TrimSpace(string(data)) == azureChassisAssetTag { | ||
| return true | ||
| } | ||
| } | ||
|
|
||
| // 3. IMDS probe fallback (for containers without DMI) | ||
| if probeAzureIMDS() { | ||
| return true | ||
| } | ||
|
|
||
| return false | ||
| } |
There was a problem hiding this comment.
I wonder if this is over-engineered. Why do we need all of these checks? The OTEL resourcedetection processor (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.145.0/internal/metadataproviders/azure/metadata.go) is able to detect if it's azure just by checking the IMDS endpoint.
There was a problem hiding this comment.
Agree. updated to just check IMDS. I think this was from the original PR with that DMI is supposed to be much quicker than IMDS.
| // GetImageID returns a composite image identifier | ||
| // Azure doesn't have a single image ID like AWS AMI | ||
| // We return the VM ID as identifier | ||
| func (p *Provider) GetImageID() string { |
There was a problem hiding this comment.
Isn't this misleading? Why not just not support it similar to how the the AWS provider doesn't support ResourceGroup.
There was a problem hiding this comment.
You are right. removed the comments. both providers support refresh
|
This PR was marked stale due to lack of activity. |
|
Closing in favor of #2032 |
Add Azure IMDS Support via Multi-Cloud Metadata Provider
Problem
CloudWatch Agent currently only supports AWS metadata via EC2 IMDS. To enable Azure deployments, we need a way to fetch instance metadata (instance ID, region, subscription ID, private IP, etc.) from Azure IMDS while maintaining the existing AWS functionality.
Solution
Introduce a cloud metadata provider interface with AWS and Azure implementations. The provider auto-detects the cloud environment at agent startup and provides a unified API for fetching instance metadata.
Architecture
Key Design Decisions:
Changes
New Cloud Metadata Provider (
internal/cloudmetadata/)Providerinterface with methods for instance ID, region, account ID, hostname, private IPInitGlobalProvider()andGetGlobalProvider()AWS Provider (
internal/cloudmetadata/aws/)ec2util.GetEC2UtilSingleton()Azure Provider (
internal/cloudmetadata/azure/)/metadata/instance/compute)/metadata/instance/network)Agent Integration
cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.goConfig Placeholder Support
placeholderUtil.gouses provider for${aws:...}and${azure:...}placeholdersTesting
Unit Tests
Manual Verification