Skip to content

Add Azure IMDS Support via Multi-Cloud Metadata Provider#1991

Closed
Paramadon wants to merge 14 commits intofeature-multi-cloudfrom
paramadon/multi-cloud-imds
Closed

Add Azure IMDS Support via Multi-Cloud Metadata Provider#1991
Paramadon wants to merge 14 commits intofeature-multi-cloudfrom
paramadon/multi-cloud-imds

Conversation

@Paramadon
Copy link
Copy Markdown
Contributor

@Paramadon Paramadon commented Jan 21, 2026

Add Azure IMDS Support via Multi-Cloud Metadata Provider

Problem

CloudWatch Agent currently only supports AWS metadata via EC2 IMDS. To enable Azure deployments, we need a way to fetch instance metadata (instance ID, region, subscription ID, private IP, etc.) from Azure IMDS while maintaining the existing AWS functionality.

Solution

Introduce a cloud metadata provider interface with AWS and Azure implementations. The provider auto-detects the cloud environment at agent startup and provides a unified API for fetching instance metadata.

Architecture

                         ┌─────────────────────────────────────┐
                         │         Agent Startup               │
                         │  (amazon-cloudwatch-agent.go)       │
                         │                                     │
                         │  cloudmetadata.InitGlobalProvider() │
                         └─────────────────┬───────────────────┘
                                           │
                                           ▼
                         ┌─────────────────────────────────────┐
                         │         Global Singleton            │
                         │           (global.go)               │
                         │                                     │
                         │  • sync.Once initialization         │
                         │  • GetGlobalProvider()              │
                         │  • GetGlobalProviderOrNil()         │
                         └─────────────────┬───────────────────┘
                                           │
                                           ▼
                         ┌─────────────────────────────────────┐
                         │            Factory                  │
                         │          (factory.go)               │
                         │                                     │
                         │  DetectCloudProvider():             │
                         │    1. Check Azure DMI/SMBIOS        │
                         │    2. Check AWS (ec2util)           │
                         │    3. Return Unknown                │
                         └─────────────────┬───────────────────┘
                                           │
                      ┌────────────────────┴────────────────────┐
                      │                                         │
                      ▼                                         ▼
       ┌──────────────────────────┐          ┌──────────────────────────┐
       │      AWS Provider        │          │     Azure Provider       │
       │   (aws/provider.go)      │          │   (azure/provider.go)    │
       │                          │          │                          │
       │  Wraps ec2util singleton │          │  Calls Azure IMDS:       │
       │  • GetInstanceID()       │          │  • /metadata/instance/   │
       │  • GetRegion()           │          │    compute               │
       │  • GetAccountID()        │          │  • /metadata/instance/   │
       │  • GetPrivateIP()        │          │    network               │
       │  • GetHostname()         │          │                          │
       └────────────┬─────────────┘          └────────────┬─────────────┘
                    │                                     │
                    ▼                                     ▼
       ┌──────────────────────────┐          ┌──────────────────────────┐
       │       EC2 IMDS           │          │       Azure IMDS         │
       │   169.254.169.254        │          │   169.254.169.254        │
       └──────────────────────────┘          └──────────────────────────┘
                     ┌─────────────────────────────────────┐
                     │       Config Translation            │
                     │      (placeholderUtil.go)           │
                     │                                     │
                     │  GetGlobalProviderOrNil() to        │
                     │  resolve ${aws:...} ${azure:...}    │
                     │  placeholders                       │
                     └─────────────────────────────────────┘

Key Design Decisions:

Decision Rationale
Provider interface Unified API for AWS and Azure metadata
Azure detection first DMI check is fast and local (no network)
Singleton pattern One-time initialization at agent startup
Graceful degradation Agent continues if metadata unavailable
Backward compatible AWS code paths unchanged

Changes

New Cloud Metadata Provider (internal/cloudmetadata/)

  • Provider interface with methods for instance ID, region, account ID, hostname, private IP
  • Singleton management via InitGlobalProvider() and GetGlobalProvider()
  • Factory with cloud detection (DMI-based for Azure, ec2util for AWS)
  • Mock provider for testing

AWS Provider (internal/cloudmetadata/aws/)

  • Wraps existing ec2util.GetEC2UtilSingleton()
  • No changes to AWS metadata fetching logic

Azure Provider (internal/cloudmetadata/azure/)

  • Fetches compute metadata from Azure IMDS (/metadata/instance/compute)
  • Fetches network metadata for private IP (/metadata/instance/network)
  • Returns subscription ID, location, VM ID, VM size, resource group, etc.

Agent Integration

  • Provider initialized in cmd/amazon-cloudwatch-agent/amazon-cloudwatch-agent.go
  • Logs warning but continues if initialization fails

Config Placeholder Support

  • placeholderUtil.go uses provider for ${aws:...} and ${azure:...} placeholders
  • Falls back to existing code if provider unavailable

Testing

Unit Tests

  • 40+ tests covering provider interface, singleton, AWS provider, Azure provider
  • Race detection clean

Manual Verification

  • AWS EC2 (us-west-2): Provider detects AWS, metadata fetched correctly
  • Azure VM (eastus2): Provider detects Azure, IMDS metadata fetched correctly

@Paramadon Paramadon force-pushed the paramadon/multi-cloud-imds branch 2 times, most recently from 726d806 to 251ef82 Compare January 22, 2026 20:33
@Paramadon Paramadon changed the title Implmenting Azure IMDS changes on the agent Add Azure IMDS Support via Multi-Cloud Metadata Provider Jan 22, 2026
@Paramadon Paramadon force-pushed the paramadon/multi-cloud-imds branch 8 times, most recently from af1485e to ac45e79 Compare January 22, 2026 21:46
@Paramadon Paramadon marked this pull request as ready for review January 23, 2026 20:42
@Paramadon Paramadon requested a review from a team as a code owner January 23, 2026 20:42
@Paramadon Paramadon added the ready for testing Indicates this PR is ready for integration tests to run label Jan 23, 2026
@Paramadon Paramadon marked this pull request as draft January 23, 2026 21:31
@Paramadon Paramadon marked this pull request as ready for review January 23, 2026 21:31
// Initialize global cloud metadata provider early (non-blocking with timeout)
// Covers all agent modes (logs-only and OTEL)
log.Println("I! [agent] Initializing cloud metadata provider...")
initCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this magic number be defined elsewhere? Should it be configurable?

looks async so its probably fine but can it slow down initialization? What if it fails?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some features may be limited, I guess that means we can't get things from Azure or AWS IMDS

Conceptually are we supporting a whole new modality, supporting the scenario where the cloud metadata provider cannot be initialized?

Not sure the answer will keep thinking as I review

Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/aws/provider.go Outdated
@movence movence changed the base branch from main to feature-multi-cloud February 2, 2026 14:09
Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/aws/provider.go Outdated
Comment thread internal/cloudmetadata/azure/provider.go Outdated
Comment thread internal/cloudmetadata/azure/provider.go Outdated
Comment thread internal/cloudmetadata/azure/provider.go Outdated
Comment thread internal/cloudmetadata/global.go Outdated
Comment thread internal/cloudmetadata/provider.go
@movence movence force-pushed the feature-multi-cloud branch from 1a4b2ce to 620362c Compare February 11, 2026 14:56
Paramadon and others added 10 commits February 11, 2026 15:03
Rename unused 'r' parameter to '_' in TestProvider_Refresh_Timeout
to satisfy revive linter.
The return statement in InitGlobalProvider was reading globalErr
without holding the lock, causing a race with concurrent readers.
sync.Once cannot be safely reset while concurrent Do() calls may be
in progress. Replace with atomic uint32 flag and double-checked locking
pattern, which allows safe reset from tests without racing.
- Reset global provider in TestTranslator to ensure test uses mock metadata
- Update placeholderUtil tests to use SetGlobalProviderForTest instead of
  relying on legacy fallback path which doesn't work on Azure
- Skip TestGetMetadataInfo_FallbackToLegacy on Azure since azure.IsAzure()
  takes precedence over the legacy fallback path
The test was resetting the global provider but then calling GetMetadataInfo
which falls through to the Azure path on Azure CI runners. Now we set the
mock provider first so GetMetadataInfo uses it instead of Azure IMDS.
@movence movence force-pushed the paramadon/multi-cloud-imds branch from 30d6b51 to f970232 Compare February 11, 2026 15:47
metadata ec2metadataprovider.MetadataProvider

// Cached metadata (fetched once at initialization)
mu sync.RWMutex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's only fetched once, consider using sync.Once, so we don't have to lock and unlock on each Get* call.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mutex is needed because Refresh() can update the cached values. While we fetch once at initialization, the interface supports refreshing (used by ec2tagger for periodic updates). The lock protects against concurrent reads during refresh.

Comment thread internal/cloudmetadata/aws/provider.go
Comment on lines +144 to +171
// IsAzure detects if running on Azure using multiple methods:
// IsAzure detects if running on Azure.
// Detection order:
// 1. DMI sys_vendor check
// 2. DMI chassis asset tag check (Azure-specific)
// 3. IMDS probe as fallback (for containers without DMI access)
func IsAzure() bool {
// 1. Check sys_vendor for Microsoft
if data, err := os.ReadFile(DMISysVendorPath); err == nil {
if strings.Contains(strings.TrimSpace(string(data)), microsoftCorporation) {
return true
}
}

// 3. Check chassis asset tag (Azure-specific identifier)
if data, err := os.ReadFile(DMIChassisAssetPath); err == nil {
if strings.TrimSpace(string(data)) == azureChassisAssetTag {
return true
}
}

// 3. IMDS probe fallback (for containers without DMI)
if probeAzureIMDS() {
return true
}

return false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is over-engineered. Why do we need all of these checks? The OTEL resourcedetection processor (https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/v0.145.0/internal/metadataproviders/azure/metadata.go) is able to detect if it's azure just by checking the IMDS endpoint.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. updated to just check IMDS. I think this was from the original PR with that DMI is supposed to be much quicker than IMDS.

// GetImageID returns a composite image identifier
// Azure doesn't have a single image ID like AWS AMI
// We return the VM ID as identifier
func (p *Provider) GetImageID() string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this misleading? Why not just not support it similar to how the the AWS provider doesn't support ResourceGroup.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. removed the comments. both providers support refresh

Comment thread internal/cloudmetadata/azure/provider.go
Comment thread internal/cloudmetadata/global.go
Comment thread internal/cloudmetadata/global_test.go
Comment thread internal/cloudmetadata/global_test.go
Comment thread internal/cloudmetadata/global_test.go
Comment thread internal/cloudmetadata/global_test.go
@github-actions
Copy link
Copy Markdown
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added Stale and removed Stale labels Feb 20, 2026
@movence
Copy link
Copy Markdown
Contributor

movence commented Feb 23, 2026

Closing in favor of #2032

@movence movence closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for testing Indicates this PR is ready for integration tests to run

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants