Skip to content

OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental breaks all builtin online evaluators (AgentSpanMappingException) #1427

@tycenjmccann

Description

@tycenjmccann

Summary

Setting OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental as an environment variable on AgentCore runtimes using Strands Agents SDK causes all builtin evaluators to fail with:

error.type: AgentSpanMappingException
error.message: Failed to parse user_query from agent-span with spanId: <id> and scope: strands.telemetry.tracer

Environment

  • Region: us-east-1
  • Runtimes affected: 16 runtimes using Strands Agents SDK
  • ADOT auto-instrumentor: telemetry.auto.version: 0.17.1-aws
  • Evaluators affected: All builtin evaluators (Helpfulness, Correctness, GoalSuccessRate, Coherence, Faithfulness, InstructionFollowing, ToolSelectionAccuracy, ToolParameterAccuracy)

Timeline

Time Event
2026-05-29 12:40 Last successful eval scores (ToolSelectionAccuracy = 1.0)
~12:45–13:00 OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental applied via update_agent_runtime
2026-05-29 13:29 First AgentSpanMappingException — 100% failure rate on all evaluators

Root Cause

The env var changes how Strands serializes message content in its OTEL log records (scope: strands.telemetry.tracer). The evaluator parses user_query from these log records but expects a specific format.

Before (working)body.input.messages[].content is a dict:

{
  "role": "user",
  "content": {
    "content": "[{\"text\": \"Your actual prompt here...\"}]"
  }
}

After (broken)body.input.messages[].content is a raw string in GenAI semconv format:

{
  "role": "user",
  "content": "[{\"role\": \"user\", \"parts\": [{\"type\": \"text\", \"content\": \"Your actual prompt here...\"}]}]"
}

The evaluator's span parser expects the dict-with-nested-content-key structure and throws AgentSpanMappingException when it encounters the flattened string format.

Important Notes

  1. Strands SDK itself does NOT read this env var — it hardcodes its own attribute names. However, the env var changes how Strands serializes message content within its log records.
  2. The gen_ai.user.message events from the ADOT botocore instrumentor are unaffected — those remain identical in both modes. Only the Strands-emitted log records change.
  3. All evaluators fail identically — the issue is in span parsing, not evaluation logic.

Reproduction Steps

  1. Create an AgentCore runtime with Strands Agents SDK
  2. Configure online evaluations with any builtin evaluator
  3. Invoke the agent — observe successful evaluation scores
  4. Set OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental via update_agent_runtime
  5. Invoke the agent again — observe AgentSpanMappingException on all evaluators

Expected Behavior

AgentCore's builtin evaluators should either:

  • Support both message content formats (dict wrapper and GenAI semconv string), OR
  • Document that OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental is incompatible with online evaluations, OR
  • The ADOT layer should not alter the Strands log record serialization format based on this env var

Workaround

Remove OTEL_SEMCONV_STABILITY_OPT_IN from runtime environment variables entirely. Evaluations resume working immediately on the next agent invocation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions