Skip to content

Proposal: Extension Framework Improvements from Real-World Extension Development #6853

@jongio

Description

@jongio

Summary

After building 5 production Azure Developer CLI extensions (azd-app, azd-exec, azd-copilot, azd-rest) plus a shared common library (azd-core), I've identified significant framework gaps where extension authors must build substantial infrastructure themselves. This proposal identifies concrete functionality that could be contributed back to the official extension framework to benefit all extension developers.

Methodology

Detailed multi-model code review (Opus 4.6 + Codex 5.3) across all 6 repositories analyzing:

  • The official extension framework (gRPC server, extension loading, middleware, azdext SDK)
  • azd-core shared library (30+ packages of reusable infrastructure)
  • All 5 extensions' patterns, pain points, and workarounds

Key Findings

  • azd-core provides ~30 packages of reusable infrastructure that every extension ends up needing
  • All 5 extensions duplicate global flag registration, trace context setup, and command scaffolding
  • 4/5 extensions implement MCP servers with identical rate limiting, argument parsing, and security patterns
  • ~500-800 lines of boilerplate per extension could be eliminated with framework improvements
  • Estimated ~2,500-4,000 total lines eliminated across the ecosystem

P0: CRITICAL — Extension SDK Base

P0-1: Extension Base Command Builder

Problem: Every extension must independently redeclare azd's global flags (--debug, --no-prompt, --cwd, --environment, --trace-log-*) and manually extract OpenTelemetry trace context from environment variables. This is 30-50 lines of identical boilerplate per extension.

Evidence — identical flag registration in every extension:

Evidence — identical trace context extraction in every extension:

Proposed API:

rootCmd := azdext.NewExtensionRootCommand("my-extension", "1.0.0", func(ctx *azdext.ExtensionContext) {
    // ctx.Debug, ctx.NoPrompt, ctx.Cwd, ctx.Environment already parsed
    // ctx.Context() already has trace context + access token injected
})

P0-2: Global Flags Propagation via Environment Variables

Problem: When azd spawns extension processes, it only passes 4 environment variables: AZD_SERVER, AZD_ACCESS_TOKEN, FORCE_COLOR, COLUMNS. Critically, it does NOT pass the parsed global flags --debug, --no-prompt, --cwd, or -e/--environment. Extensions must reparse os.Args or guess from the environment.

Evidence — the framework's limited env var propagation:

Proposal: Export parsed global flags as environment variables when spawning extensions:

  • AZD_DEBUG=1 (from --debug)
  • AZD_NO_PROMPT=1 (from --no-prompt)
  • AZD_CWD=/path (from --cwd)
  • AZD_ENVIRONMENT=prod (from -e)
  • AZD_TRACE_LOG_FILE=/path (from --trace-log-file)

This requires changes to pkg/extensions/runner.go and cmd/middleware/extensions.go.


P0-3: Default ServiceTargetProvider Base Implementation

Problem: The ServiceTargetProvider interface requires implementing 6+ methods. Extensions like azd-app that only need a "local" service target must stub out all methods with no-op implementations, adding ~40 lines of boilerplate.

Evidence — the interface definition requiring all methods:

Evidence — stub implementations in azd-app:

Proposed API:

type LocalProvider struct {
    azdext.BaseServiceTargetProvider  // Embed for no-op defaults
}
// Only override Deploy() and ConfiguredEnvironment() — everything else inherits defaults

P0-4: Standard Extension Command Scaffolding

Problem: Every extension implements near-identical listen, metadata, version, and mcp serve subcommands. These are pure boilerplate — the logic is the same across all extensions, only the extension ID and root command differ.

Evidence — identical listen commands across all 4 extensions:

Evidence — identical metadata commands across all 4 extensions:

Evidence — identical version commands across all 4 extensions:

Proposed API:

rootCmd.AddCommand(
    azdext.NewListenCommand(azdClient, hostConfigurator),
    azdext.NewMetadataCommand("1.0", extensionId, rootCmdProvider),
    azdext.NewVersionCommand(extensionId, version, &outputFormat),
    azdext.NewMCPServeCommand(mcpServerConfigurator),
)

P0: CRITICAL — MCP Server Framework

P0-5: MCP Server Builder with Middleware

Problem: 4/5 extensions implement MCP servers. Each independently builds rate limiting with near-identical token bucket patterns. There is no framework-level middleware for rate limiting, path validation, or security — every extension re-invents these from scratch.

Evidence — rate limiter defined identically in 3 extensions + azd-core:

Evidence — manual rate limit checks in every MCP tool handler:

Proposed API:

mcpServer := azdext.NewMCPServerBuilder("my-extension", "1.0.0").
    WithRateLimit(60, 1.0).                           // Applied to all tools automatically
    WithPathValidation(projectDir).                    // Auto-validate file path params
    WithSecurityPolicy(azdext.DefaultSecurityPolicy).  // Block metadata endpoints, etc.
    AddTool("exec_script", handler, azdext.ToolOptions{
        Description: "Execute a script",
        Destructive: true,
        Params: map[string]azdext.Param{
            "script_path": {Type: "string", Required: true, Description: "Path to script"},
        },
    }).
    Build()

P0-6: Typed MCP Argument Parsing

Problem: Every MCP extension manually extracts arguments from mcp.CallToolRequest using untyped map[string]interface{} with verbose type assertions. This pattern is duplicated across all MCP-capable extensions and azd-core.

Evidence — identical argument parsing code duplicated:

What's missing: RequireString (error if absent), OptionalBool, OptionalInt, OptionalFloat helpers.

Proposed API:

args := azdext.ParseToolArgs(request)
path, err := args.RequireString("script_path")    // Returns error if missing
shell, _ := args.OptionalString("shell", "bash")  // Returns default if missing
timeout, _ := args.OptionalInt("timeout", 30)
verbose, _ := args.OptionalBool("verbose", false)

P0-7: MCP Result Marshaling Helpers

Problem: Each MCP extension builds its own result marshaling helpers to convert Go structs/strings into mcp.CallToolResult. This is 20-30 lines of JSON marshaling boilerplate per extension.

Evidence — custom marshaling helpers:

Proposed API:

return azdext.MCPTextResult("Operation completed: %s", name)
return azdext.MCPJSONResult(structuredData)   // Auto JSON marshal
return azdext.MCPErrorResult("Invalid input: %v", err)
return azdext.MCPResourceResult([]mcp.ResourceContents{...})

P0-8: MCP Security Middleware

Problem: Extensions that expose MCP tools for HTTP calls or file access must independently implement SSRF protection (blocking cloud metadata endpoints, private CIDRs), header redaction, and path validation. azd-rest hardcodes its own blocklists; azd-app repeats 6-step path validation in every resource handler.

Evidence — hardcoded security blocklists in azd-rest:

Evidence — path validation in azd-core security package:

Proposed API:

policy := azdext.NewMCPSecurityPolicy().
    BlockMetadataEndpoints().          // 169.254.169.254, fd00:ec2::254, etc.
    BlockPrivateNetworks().            // RFC 1918/5737 CIDRs
    RequireHTTPS().                    // Except localhost
    RedactHeaders("Authorization", "X-Api-Key").
    ValidatePathsWithinBase(projectDir)

server := azdext.NewMCPServerBuilder(...).
    WithSecurityPolicy(policy).
    Build()

P1: HIGH — Authentication & Token Management

P1-1: Framework Token Provider

Problem: Extensions that call Azure APIs need a thread-safe, cached token provider. Without framework support, each extension implements its own sync.Mutex + singleton caching pattern. This is error-prone and duplicated.

Evidence — manual token singleton in azd-rest:

Evidence — production-grade implementation in azd-core:

Proposal: Add shared token provider to the extension SDK or expose via gRPC Auth service:

// Option A: Standalone helper in SDK
provider := azdext.NewAzureTokenProvider()  // Cached, thread-safe
token, err := provider.GetToken(ctx, "https://management.azure.com/.default")

// Option B: Via gRPC service
token, err := client.Auth().GetToken(ctx, scope)

P1-2: URL-to-Scope Detection

Problem: Extensions making Azure API calls need to determine the correct OAuth scope for a given URL. azd-core maps 20+ Azure service URLs to their scopes, but this isn't available in the framework.

Evidence — comprehensive scope mapping in azd-core:

  • azd-core/auth/scope.go L9-77DetectScope() with:
    • Exact matches (L24-29): management.azure.com, graph.microsoft.com, api.loganalytics.io, dev.azure.com
    • Suffix matches (L50-68): vault.azure.net, blob.core.windows.net, dfs.core.windows.net, database.windows.net, search.windows.net, cognitiveservices.azure.com, openai.azure.com, etc. (15+ services)
    • Special cases: visualstudio.com (L35-36), kusto.windows.net (L39-40), servicebus.windows.net with path-based detection (L43-48)

Proposed API:

scope, err := azdext.DetectAzureScope("https://myvault.vault.azure.net/secrets/...")
// Returns: "https://vault.azure.net/.default"

scope, err := azdext.DetectAzureScope("https://management.azure.com/subscriptions/...")
// Returns: "https://management.azure.com/.default"

P1-3: Framework HTTP Client with Resilience

Problem: Extensions making HTTP calls need retry logic with exponential backoff, response size limits, and TLS configuration. Without framework support, each extension either uses raw net/http or depends on azd-core's HTTP client.

Evidence — production-grade HTTP client in azd-core:

Evidence — azd-rest reexports the entire client:

Proposed API:

client := azdext.NewHTTPClient(azdext.HTTPClientOptions{
    TokenProvider: tokenProvider,
    Retry:         3,
    Timeout:       30 * time.Second,
    Paginate:      true,
    MaxResponseSize: 100 * 1024 * 1024,  // 100MB
})
resp, err := client.Execute(ctx, azdext.HTTPRequest{Method: "GET", URL: url})

P1-4: Pagination Support

Problem: Azure APIs use 3 different pagination formats. Each extension that paginates must understand all 3 — or miss data.

Evidence — pagination logic handling 3 formats:

Proposed API:

pages := azdext.NewPaginator(client, initialURL)
var allItems []Item
for pages.Next(ctx) {
    items := pages.Current()
    allItems = append(allItems, items...)
}

P1-5: Key Vault Resolution via gRPC

Problem: Extensions that run scripts or manage environments need to resolve Azure Key Vault references embedded in environment variables. This requires complex parsing of 3 reference formats, thread-safe per-vault client caching, and credential management.

Evidence — Key Vault resolver with 3 pattern formats in azd-core:

Evidence — azd-exec consuming Key Vault resolution:

Proposed API — add to EnvironmentService gRPC:

resp, err := client.Environment().ResolveValues(ctx, &azdext.ResolveValuesRequest{
    EnvironmentName: "prod",
    ResolveKeyVault: true,  // Resolve all @Microsoft.KeyVault and akvs:// references
})
// Returns: resolved env vars with Key Vault secrets inline + warnings for failures

P1-6: Extension Configuration Helpers

Problem: Extensions need typed configuration loading from ~/.azd/config.json with schema validation and defaults. Each extension builds its own config loader.

Evidence — custom config loading in azd-app:

Proposed API:

type MyExtConfig struct {
    MaxRetries int    `json:"maxRetries" default:"3"`
    Shell      string `json:"shell" default:"bash"`
}
config, err := azdext.LoadExtensionConfig[MyExtConfig](ctx, client)

P2: MEDIUM — CLI Output & Logging

P2-1: Standard Output Helpers

Problem: Extensions produce inconsistent output. Some use colored text with ANSI codes, others plain text. No standard for JSON-mode output or structured tables. Users experience different formatting across extensions.

Evidence — comprehensive output library in azd-core:

Evidence — usage across all extensions:

Proposed API:

out := azdext.NewOutput(outputFormat)  // "default" or "json"
out.Success("Deployed %s to %s", service, host)
out.Warning("Deprecated feature: %s", name)
out.Table([]string{"Service", "Status"}, rows)
out.JSON(structuredData)  // Only outputs in JSON mode

P2-2: Structured Logging

Problem: Each extension sets up its own structured logging with debug mode detection. The pattern is identical but not provided by the framework.

Evidence — logging setup in azd-core:

Proposed API:

// Auto-configured from AZD_DEBUG env var
logger := azdext.NewLogger("my-extension")
logger.Debug("Processing request", "url", url, "method", method)
logger.Info("Operation completed", "duration", elapsed)

P2: MEDIUM — Security Utilities

P2-3: Security Validation Package

Problem: Extensions handling user input need path traversal prevention, service name validation, script name sanitization, and container environment detection. Each extension must discover and import these from azd-core rather than getting them from the framework.

Evidence — security functions in azd-core:

Proposed API:

err := azdext.Security.ValidatePath(userPath)
err := azdext.Security.ValidateServiceName(name)
sanitized := azdext.Security.SanitizeScriptName(script)
isContainer := azdext.Security.IsContainerEnvironment()
err := azdext.Security.ValidateURL(url, azdext.RequireHTTPS)

P2-4: SSRF Protection

Problem: MCP tools that make HTTP requests on behalf of AI models are particularly vulnerable to SSRF attacks. Extensions must independently implement blocklists for cloud metadata endpoints, private network CIDRs, and URL validation with DNS resolution. This is complex, security-critical code that shouldn't be duplicated.

Evidence — SSRF protection in azd-rest (hardcoded per-extension):

Proposed API:

validator := azdext.NewSSRFValidator()
validator.BlockMetadataEndpoints()   // Cloud provider metadata (AWS, Azure, GCP)
validator.BlockPrivateNetworks()     // RFC 1918 + link-local + loopback
if err := validator.Check(url); err != nil {
    return err  // "blocked: URL resolves to private network"
}

P3: LOWER — Process, Shell & File Utilities

P3-1: Shell Detection & Execution

Problem: Extensions that execute scripts need to detect the appropriate shell from file extensions and shebangs, then build the correct command arguments for each shell (bash -c, cmd /C, powershell -Command, etc.). azd-exec has TWO separate implementations of shell argument building — one for CLI, one for MCP — that should be unified.

Evidence — shell detection and constants in azd-core:

Evidence — duplicated shell argument builders in azd-exec:

  • azd-exec/commands/mcp.go L411-436buildShellArgs() for MCP (handles cmd, powershell/pwsh, bash/sh/zsh)
  • azd-exec/executor/command_builder.gobuildCommand() for CLI (same logic, different function)

Proposed API:

shell := azdext.DetectShell("script.sh")  // Returns "bash" (from extension + shebang)
cmd := azdext.BuildShellCommand(shell, scriptPath, isInline, args)  // Unified builder

P3-2: Atomic File Operations

Problem: Extensions writing config files risk corruption from partial writes (crash mid-write, concurrent writers). azd-core provides atomic write operations using the temp-file-then-rename pattern, but this isn't available from the framework.

Evidence — atomic file operations in azd-core:

Proposed API:

err := azdext.AtomicWriteJSON("config.json", data)       // JSON marshal + atomic write
err := azdext.AtomicWriteFile("script.sh", content, 0755) // Raw bytes + atomic write
var data Config
err := azdext.ReadJSON("config.json", &data)              // Handle missing files gracefully

P3-3: Tool Discovery & PATH Management

Problem: Extensions that integrate with external tools (node, python, docker, etc.) need to find executables in PATH and across common system installation directories, and provide helpful install suggestions when tools are missing.

Evidence — tool discovery in azd-core:

Proposed API:

path := azdext.FindTool("node")                              // Find in PATH + common dirs
suggestion := azdext.GetInstallSuggestion("python")          // "Install python from https://..."

P3-4: Interactive TUI Support

Problem: Extensions that need to launch interactive terminal applications (like GitHub Copilot CLI) face complex platform-specific challenges with TTY detection. When azd captures stdio for gRPC communication, child processes can't detect a TTY. azd-copilot implements 70+ lines of platform-specific hacking to work around this.

Evidence — platform-specific TTY hacking in azd-copilot:

Proposed API:

err := azdext.LaunchInteractive(executable, args, azdext.WithTTY())
// Handles Windows SetStdHandle, Unix /dev/tty, macOS, and Codespaces environments

P3-5: Cross-Platform Process Detection

Problem: Extensions monitoring services need to check if a process is still running. On Windows, stale PIDs are a real problem — a PID may be reused by a different process, so simple os.FindProcess isn't reliable. azd-core uses gopsutil for accurate cross-platform detection.

Evidence — process detection in azd-core:

Proposed API:

running, err := azdext.IsProcessRunning(pid)  // Handles Windows stale PIDs correctly

Scope Boundaries

  • ✅ All proposals are additive — no breaking changes to existing framework
  • ✅ Existing extensions continue to work unchanged
  • ❌ NOT proposing moving the extensions themselves into this repo
  • ❌ NOT proposing changing the gRPC protocol
  • ❌ NOT proposing replacing mark3labs/mcp-go — we wrap it with middleware

Impact Estimate

Metric Value
Proposals 23 specific items across 4 priority tiers
Boilerplate eliminated per extension ~500-800 lines
Total across ecosystem (5 extensions) ~2,500-4,000 lines
Extensions affected All current and future extensions

References

  • azd-core — Shared utility library with 30+ packages
  • azd-app — Service orchestration extension
  • azd-exec — Script execution extension
  • azd-copilot — AI/Copilot integration extension
  • azd-rest — REST API client extension

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions