Ask
AgentCore Runtime should expose a first-class session trace API — e.g. GetSessionTrace(sessionId, runtimeArn) — that returns the ordered, structured trace of an agent session: every model call (prompt, completion, token counts, latency), every tool call (name, full args, full result, duration, status), and lifecycle events. As structured JSON. Without requiring opt-in to Memory, without parsing runtime stdout, and without merging multiple log groups.
Why
I'm building a per-session observability UI for a fleet of Strands-based AgentCore Runtime agents. To render a timeline showing what tools the agent called, what args it passed, and what each tool returned, I have to combine two CloudWatch data sources via substring matching on session ID. The data already exists in the platform; customers shouldn't have to glue it together.
What's available today and why none of it works alone
| Source |
Has |
Doesn't have |
aws/spans (Transaction Search OTEL log group) |
Span name, kind, duration, tool name, tool status, gen_ai.tool.call.id |
Tool args, tool result, model prompt/response — Transaction Search truncates/drops large attribute payloads |
Runtime stdout /aws/bedrock-agentcore/runtimes/<runtime>-<id>-DEFAULT |
Full conversation: tool args + results, model I/O, embedded as JSON-strings under body.input.messages[].content |
No span/timing structure |
AgentCore Memory list_events |
Conversation turns if the agent calls memory_create_event |
Tool args/results unless the agent explicitly logs them as events |
| Bedrock model invocation logs |
Prompt + completion per call, includes tool_use blocks |
tool_result blocks come in the next invocation's prompt; per-invocation, not per-session |
Concrete repro that the data isn't in spans
fields name, attributes.gen_ai.tool.name, attributes.gen_ai.tool.call.id,
attributes.gen_ai.tool.arguments, attributes.gen_ai.tool.response
| filter name like "execute_tool"
| limit 5
tool.name and tool.call.id are populated. tool.arguments and tool.response are not — even though Strands emits them, Transaction Search drops them. So spans alone cannot answer "what did this tool call do."
What I have to do today
- Query
aws/spans filtered on attributes.session.id for span structure (name, duration, toolCallId).
- Query the runtime log group for the same session, filter for
tool_call, parse a JSON-string nested inside another JSON object (body.input.messages[].content decodes to [{role, parts:[{type:"tool_call"|"tool_call_response", id, arguments|response}]}]).
- Join the two on
toolCallId.
This is fragile in specific, reproducible ways:
- Undocumented payload shape that changes under you. Setting
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental silently changed runtime stdout shape: toolUse/toolResult blocks in body.output.messages[].content[] became tool_call/tool_call_response parts in body.input.messages[].content (now a JSON-encoded string instead of a structured array). My parser returned zero matches until rewritten. Neither shape is documented.
- Undocumented log group naming.
/aws/bedrock-agentcore/runtimes/<runtime>-<suffix>-DEFAULT is a contract I discover at runtime via DescribeLogGroups.
- No foreign key. Session ID matching across log groups is substring-based — the only link is a string the agent has to remember to embed in both places.
- Account-wide opt-in. Transaction Search must be enabled at the account level. Any customer without it gets an empty timeline.
Why the obvious workarounds don't solve it
- "Use AgentCore Memory
list_events." Memory is a separate product the agent has to opt into and explicitly write events to. Strands tool calls aren't auto-persisted. Forcing every team that wants observability to adopt Memory is a strange coupling.
- "Use Bedrock model invocation logging." Captures the model's outgoing
tool_use blocks. The tool_result block from the actual tool only appears in the next invocation's prompt, embedded in messages history. You'd have to scan every invocation to reassemble one tool round-trip — tool name in one entry, result in the next. Not a session-shaped API.
- "Export OTEL to your own backend with larger attribute limits." Sidesteps the indexer truncation but pushes the merge-and-parse problem onto every customer's infra. Defeats the purpose of Transaction Search.
What "right" looks like
{
"sessionId": "...",
"runtimeArn": "...",
"events": [
{
"type": "model_call",
"timestamp": "...",
"durationMs": 2800,
"modelId": "...",
"promptTokens": 1234,
"completionTokens": 567,
"messages": [...]
},
{
"type": "tool_call",
"timestamp": "...",
"durationMs": 58,
"toolName": "load_blueprint",
"toolCallId": "tooluse_...",
"args": { "blueprint_name": "requirements-analyst" },
"result": "...",
"status": "success"
},
{ "type": "model_call", "...": "..." }
]
}
The platform already has this data — model logs have prompt/completion, runtime stdout has tool I/O, OTEL has timing. Customers shouldn't be gluing it together with substring matches against undocumented log shapes.
Environment
- Region: us-east-1
- Runtime: AgentCore Runtime (Strands-based agents)
- Telemetry: OTEL → CloudWatch Transaction Search
- Affected agent flag:
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental
Ask
AgentCore Runtime should expose a first-class session trace API — e.g.
GetSessionTrace(sessionId, runtimeArn)— that returns the ordered, structured trace of an agent session: every model call (prompt, completion, token counts, latency), every tool call (name, full args, full result, duration, status), and lifecycle events. As structured JSON. Without requiring opt-in to Memory, without parsing runtime stdout, and without merging multiple log groups.Why
I'm building a per-session observability UI for a fleet of Strands-based AgentCore Runtime agents. To render a timeline showing what tools the agent called, what args it passed, and what each tool returned, I have to combine two CloudWatch data sources via substring matching on session ID. The data already exists in the platform; customers shouldn't have to glue it together.
What's available today and why none of it works alone
aws/spans(Transaction Search OTEL log group)gen_ai.tool.call.id/aws/bedrock-agentcore/runtimes/<runtime>-<id>-DEFAULTbody.input.messages[].contentlist_eventsmemory_create_eventtool_useblockstool_resultblocks come in the next invocation's prompt; per-invocation, not per-sessionConcrete repro that the data isn't in spans
tool.nameandtool.call.idare populated.tool.argumentsandtool.responseare not — even though Strands emits them, Transaction Search drops them. So spans alone cannot answer "what did this tool call do."What I have to do today
aws/spansfiltered onattributes.session.idfor span structure (name, duration, toolCallId).tool_call, parse a JSON-string nested inside another JSON object (body.input.messages[].contentdecodes to[{role, parts:[{type:"tool_call"|"tool_call_response", id, arguments|response}]}]).toolCallId.This is fragile in specific, reproducible ways:
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimentalsilently changed runtime stdout shape:toolUse/toolResultblocks inbody.output.messages[].content[]becametool_call/tool_call_responseparts inbody.input.messages[].content(now a JSON-encoded string instead of a structured array). My parser returned zero matches until rewritten. Neither shape is documented./aws/bedrock-agentcore/runtimes/<runtime>-<suffix>-DEFAULTis a contract I discover at runtime viaDescribeLogGroups.Why the obvious workarounds don't solve it
list_events." Memory is a separate product the agent has to opt into and explicitly write events to. Strands tool calls aren't auto-persisted. Forcing every team that wants observability to adopt Memory is a strange coupling.tool_useblocks. Thetool_resultblock from the actual tool only appears in the next invocation's prompt, embedded in messages history. You'd have to scan every invocation to reassemble one tool round-trip — tool name in one entry, result in the next. Not a session-shaped API.What "right" looks like
{ "sessionId": "...", "runtimeArn": "...", "events": [ { "type": "model_call", "timestamp": "...", "durationMs": 2800, "modelId": "...", "promptTokens": 1234, "completionTokens": 567, "messages": [...] }, { "type": "tool_call", "timestamp": "...", "durationMs": 58, "toolName": "load_blueprint", "toolCallId": "tooluse_...", "args": { "blueprint_name": "requirements-analyst" }, "result": "...", "status": "success" }, { "type": "model_call", "...": "..." } ] }The platform already has this data — model logs have prompt/completion, runtime stdout has tool I/O, OTEL has timing. Customers shouldn't be gluing it together with substring matches against undocumented log shapes.
Environment
OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental