Summary
The azure.ai.agents extension's leadership metrics (onboarding success rate, attempts-to-success, time-to-success, top errors, and the cross-client funnel) are computed by the Foundry Growth data-science team from a fixed set of telemetry fields emitted by azd core. There is currently nothing that fails CI when one of those fields is renamed, removed, or retyped, so a breaking change ships silently and the downstream KQL quietly returns wrong numbers.
This is not hypothetical -- it already happened. The original onboarding.kql filtered successful deploys with tobool(Props['cmd.exit.success']) == true, but cmd.exit.success does not exist in azd telemetry. Across 365 days of cmd.up + cmd.deploy events, 0 rows matched, so the success funnel was effectively reporting noise. Success is actually signaled by the top-level Success boolean / ResultCode. A coverage test would have caught a field-name drift like this before release.
The field set the data contract depends on
These azd-defined attribute keys are the brittle surface. All are declared in cli/azd/internal/tracing/fields/fields.go:
Command-grain (root command span):
CmdEntry -> cmd.entry
ProjectServiceHostsKey -> project.service.hosts
ProjectServiceTargetsKey -> project.service.targets
DevDeviceIdKey -> machine.devdeviceid
SubscriptionIdKey -> ad.subscription.id
ProjectNameKey -> project.name
Step-grain (exegraph.step child span):
ExeGraphStepNameKey -> exegraph.step.name
ExeGraphStepTagsKey -> exegraph.step.tags
Span status (out of scope for the field-constant test, listed for completeness): the top-level Success boolean and ResultCode come from the OTEL span status mapping, not from the fields package, so they are comparatively stable. The azd-defined keys above are the part that needs a guard.
Proposed guard
Extend the existing contract test at cli/azd/cmd/telemetry_coverage_test.go -- it already follows this exact pattern in TestTelemetryFieldConstants ("if a field constant is removed or renamed, this test will fail, catching regressions in the telemetry schema"). Add a focused subtest (e.g. AgentDataContractFields) that asserts each constant above resolves to its exact attribute key string, for example:
require.Equal(t, "cmd.entry", string(fields.CmdEntry.Key))
require.Equal(t, "project.service.hosts", string(fields.ProjectServiceHostsKey.Key))
require.Equal(t, "project.service.targets", string(fields.ProjectServiceTargetsKey.Key))
require.Equal(t, "machine.devdeviceid", string(fields.DevDeviceIdKey.Key))
require.Equal(t, "ad.subscription.id", string(fields.SubscriptionIdKey.Key))
require.Equal(t, "project.name", string(fields.ProjectNameKey.Key))
require.Equal(t, "exegraph.step.name", string(fields.ExeGraphStepNameKey.Key))
require.Equal(t, "exegraph.step.tags", string(fields.ExeGraphStepTagsKey.Key))
A renamed constant breaks the build; a changed key string breaks the assertion. Either way the change is caught in CI before it reaches customers and the DS queries.
Acceptance criteria
- A test in
cli/azd/cmd fails if any of the listed constants are renamed/removed or if their attribute key strings change.
- The test references the
azure.ai.agents data contract so a future editor understands why these specific fields are locked.
References
Summary
The
azure.ai.agentsextension's leadership metrics (onboarding success rate, attempts-to-success, time-to-success, top errors, and the cross-client funnel) are computed by the Foundry Growth data-science team from a fixed set of telemetry fields emitted by azd core. There is currently nothing that fails CI when one of those fields is renamed, removed, or retyped, so a breaking change ships silently and the downstream KQL quietly returns wrong numbers.This is not hypothetical -- it already happened. The original
onboarding.kqlfiltered successful deploys withtobool(Props['cmd.exit.success']) == true, butcmd.exit.successdoes not exist in azd telemetry. Across 365 days ofcmd.up+cmd.deployevents, 0 rows matched, so the success funnel was effectively reporting noise. Success is actually signaled by the top-levelSuccessboolean /ResultCode. A coverage test would have caught a field-name drift like this before release.The field set the data contract depends on
These azd-defined attribute keys are the brittle surface. All are declared in
cli/azd/internal/tracing/fields/fields.go:Command-grain (root command span):
CmdEntry->cmd.entryProjectServiceHostsKey->project.service.hostsProjectServiceTargetsKey->project.service.targetsDevDeviceIdKey->machine.devdeviceidSubscriptionIdKey->ad.subscription.idProjectNameKey->project.nameStep-grain (
exegraph.stepchild span):ExeGraphStepNameKey->exegraph.step.nameExeGraphStepTagsKey->exegraph.step.tagsSpan status (out of scope for the field-constant test, listed for completeness): the top-level
Successboolean andResultCodecome from the OTEL span status mapping, not from the fields package, so they are comparatively stable. The azd-defined keys above are the part that needs a guard.Proposed guard
Extend the existing contract test at
cli/azd/cmd/telemetry_coverage_test.go-- it already follows this exact pattern inTestTelemetryFieldConstants("if a field constant is removed or renamed, this test will fail, catching regressions in the telemetry schema"). Add a focused subtest (e.g.AgentDataContractFields) that asserts each constant above resolves to its exact attribute key string, for example:A renamed constant breaks the build; a changed key string breaks the assertion. Either way the change is caught in CI before it reaches customers and the DS queries.
Acceptance criteria
cli/azd/cmdfails if any of the listed constants are renamed/removed or if their attribute key strings change.azure.ai.agentsdata contract so a future editor understands why these specific fields are locked.References
cli/azd/internal/tracing/fields/fields.go.cli/azd/cmd/telemetry_coverage_test.go(TestTelemetryFieldConstants).