Follow-up from PR #52 — developer productivity investment identified during PR #52's L4 spike. Not a runtime bug; a test-tooling gap.
Functional description
PR #52 shipped the --trace upload / download round-trip:
- Upload —
agent/src/telemetry.py + agent/src/pipeline.py (agent writes to S3).
- Download —
cli/src/commands/trace.ts + cdk/src/handlers/get-trace-url.ts (backend signs a URL; CLI downloads and gunzips).
- Contract pinning —
agent/tests/test_trace_key_contract.py is a cross-language contract test that pins the literal key shape traces/<user_id>/<task_id>.jsonl.gz across orchestrator, agent, handler, and CDK construct.
The gap: that contract test is a strong drift detector for the key shape but does NOT exercise the actual round-trip (agent writes → handler presigns → CLI downloads + gunzips). If a developer accidentally breaks the interaction between these three layers, the only way to catch it is to deploy to AWS and try it by hand.
Full CDK-stack E2E isn't feasible locally — Bedrock AgentCore has no LocalStack / moto equivalent. A moto-backed data-plane harness is the pragmatic middle-ground: fake AWS running in-process, exercising real code paths in under 10 seconds.
Who cares: contributors and maintainers. No user-facing impact. This is a developer-velocity investment.
Technical plan
Create a new integration test module:
File: agent/tests/integration/test_trace_roundtrip.py
Fixtures:
- moto-backed S3 bucket matching
TraceArtifactsBucket properties (7-day lifecycle, SSE, blocked public access).
- moto-backed DynamoDB
TaskTable with the real GSI schema.
Test flow:
- Seed a fake
TaskRecord with trace=true, user_id, task_id.
- Invoke
upload_trace_to_s3 with a fabricated trajectory.
- Directly invoke the
get-trace-url handler logic — either port its core to Python, or drive it via a lightweight shim that reads trace_s3_uri from mocked TaskTable and signs a URL.
- Use the CLI's download path (subprocess
bgagent trace download against a mocked API Gateway, or direct-invoke the CLI command module) to verify gunzip round-trips and JSONL reconstitutes.
Runtime: mise //agent:test:integration or pytest -m integration.
Estimated effort: 1-2 days for a minimal version. This is greenfield test scaffolding — no existing LocalStack / moto infrastructure in the repo today.
Acceptance criteria
- One end-to-end test exercising write → read → CLI decode.
- Runs in <10 seconds.
- CI wiring via explicit opt-in marker (e.g.
pytest -m integration) so it doesn't slow the default test loop.
agent/tests/integration/README.md explaining how to run locally and what the harness does / doesn't cover.
Out of scope
- CDK synth / deploy in tests (too heavy; AgentCore not mockable).
- Cognito auth flow (already covered by unit tests in CDK + CLI).
- Cross-user authorization tests (covered by handler unit tests).
References
Functional description
PR #52 shipped the
--traceupload / download round-trip:agent/src/telemetry.py+agent/src/pipeline.py(agent writes to S3).cli/src/commands/trace.ts+cdk/src/handlers/get-trace-url.ts(backend signs a URL; CLI downloads and gunzips).agent/tests/test_trace_key_contract.pyis a cross-language contract test that pins the literal key shapetraces/<user_id>/<task_id>.jsonl.gzacross orchestrator, agent, handler, and CDK construct.The gap: that contract test is a strong drift detector for the key shape but does NOT exercise the actual round-trip (agent writes → handler presigns → CLI downloads + gunzips). If a developer accidentally breaks the interaction between these three layers, the only way to catch it is to deploy to AWS and try it by hand.
Full CDK-stack E2E isn't feasible locally — Bedrock AgentCore has no LocalStack / moto equivalent. A moto-backed data-plane harness is the pragmatic middle-ground: fake AWS running in-process, exercising real code paths in under 10 seconds.
Who cares: contributors and maintainers. No user-facing impact. This is a developer-velocity investment.
Technical plan
Create a new integration test module:
File:
agent/tests/integration/test_trace_roundtrip.pyFixtures:
TraceArtifactsBucketproperties (7-day lifecycle, SSE, blocked public access).TaskTablewith the real GSI schema.Test flow:
TaskRecordwithtrace=true,user_id,task_id.upload_trace_to_s3with a fabricated trajectory.get-trace-urlhandler logic — either port its core to Python, or drive it via a lightweight shim that readstrace_s3_urifrom mockedTaskTableand signs a URL.bgagent trace downloadagainst a mocked API Gateway, or direct-invoke the CLI command module) to verify gunzip round-trips and JSONL reconstitutes.Runtime:
mise //agent:test:integrationorpytest -m integration.Estimated effort: 1-2 days for a minimal version. This is greenfield test scaffolding — no existing LocalStack / moto infrastructure in the repo today.
Acceptance criteria
pytest -m integration) so it doesn't slow the default test loop.agent/tests/integration/README.mdexplaining how to run locally and what the harness does / doesn't cover.Out of scope
References
agent/src/telemetry.py— trace uploadcli/src/commands/trace.ts— CLI downloadcdk/src/handlers/get-trace-url.ts— backend URL signingagent/tests/test_trace_key_contract.py— existing contract test (this issue complements, not replaces)