Summary
Setting OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental as an environment variable on AgentCore runtimes using Strands Agents SDK causes all builtin evaluators to fail with:
error.type: AgentSpanMappingException
error.message: Failed to parse user_query from agent-span with spanId: <id> and scope: strands.telemetry.tracer
Environment
- Region: us-east-1
- Runtimes affected: 16 runtimes using Strands Agents SDK
- ADOT auto-instrumentor:
telemetry.auto.version: 0.17.1-aws
- Evaluators affected: All builtin evaluators (Helpfulness, Correctness, GoalSuccessRate, Coherence, Faithfulness, InstructionFollowing, ToolSelectionAccuracy, ToolParameterAccuracy)
Timeline
| Time |
Event |
| 2026-05-29 12:40 |
Last successful eval scores (ToolSelectionAccuracy = 1.0) |
| ~12:45–13:00 |
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental applied via update_agent_runtime |
| 2026-05-29 13:29 |
First AgentSpanMappingException — 100% failure rate on all evaluators |
Root Cause
The env var changes how Strands serializes message content in its OTEL log records (scope: strands.telemetry.tracer). The evaluator parses user_query from these log records but expects a specific format.
Before (working) — body.input.messages[].content is a dict:
{
"role": "user",
"content": {
"content": "[{\"text\": \"Your actual prompt here...\"}]"
}
}
After (broken) — body.input.messages[].content is a raw string in GenAI semconv format:
{
"role": "user",
"content": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"Your actual prompt here...\"}]}]"
}
The evaluator's span parser expects the dict-with-nested-content-key structure and throws AgentSpanMappingException when it encounters the flattened string format.
Important Notes
- Strands SDK itself does NOT read this env var — it hardcodes its own attribute names. However, the env var changes how Strands serializes message content within its log records.
- The
gen_ai.user.message events from the ADOT botocore instrumentor are unaffected — those remain identical in both modes. Only the Strands-emitted log records change.
- All evaluators fail identically — the issue is in span parsing, not evaluation logic.
Reproduction Steps
- Create an AgentCore runtime with Strands Agents SDK
- Configure online evaluations with any builtin evaluator
- Invoke the agent — observe successful evaluation scores
- Set
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental via update_agent_runtime
- Invoke the agent again — observe
AgentSpanMappingException on all evaluators
Expected Behavior
AgentCore's builtin evaluators should either:
- Support both message content formats (dict wrapper and GenAI semconv string), OR
- Document that
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental is incompatible with online evaluations, OR
- The ADOT layer should not alter the Strands log record serialization format based on this env var
Workaround
Remove OTEL_SEMCONV_STABILITY_OPT_IN from runtime environment variables entirely. Evaluations resume working immediately on the next agent invocation.
Summary
Setting
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimentalas an environment variable on AgentCore runtimes using Strands Agents SDK causes all builtin evaluators to fail with:Environment
telemetry.auto.version: 0.17.1-awsTimeline
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimentalapplied viaupdate_agent_runtimeAgentSpanMappingException— 100% failure rate on all evaluatorsRoot Cause
The env var changes how Strands serializes message content in its OTEL log records (scope:
strands.telemetry.tracer). The evaluator parsesuser_queryfrom these log records but expects a specific format.Before (working) —
body.input.messages[].contentis a dict:{ "role": "user", "content": { "content": "[{\"text\": \"Your actual prompt here...\"}]" } }After (broken) —
body.input.messages[].contentis a raw string in GenAI semconv format:{ "role": "user", "content": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"Your actual prompt here...\"}]}]" }The evaluator's span parser expects the dict-with-nested-
content-key structure and throwsAgentSpanMappingExceptionwhen it encounters the flattened string format.Important Notes
gen_ai.user.messageevents from the ADOT botocore instrumentor are unaffected — those remain identical in both modes. Only the Strands-emitted log records change.Reproduction Steps
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimentalviaupdate_agent_runtimeAgentSpanMappingExceptionon all evaluatorsExpected Behavior
AgentCore's builtin evaluators should either:
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimentalis incompatible with online evaluations, ORWorkaround
Remove
OTEL_SEMCONV_STABILITY_OPT_INfrom runtime environment variables entirely. Evaluations resume working immediately on the next agent invocation.