Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
359e0e8
docs(design): add interactive agents design document
scoropeza Apr 16, 2026
4dbc894
docs(implementation): add Phase 1a implementation prompt
scoropeza Apr 16, 2026
a1ea93c
docs(implementation): add subagent and design-integrity rules to Phas…
scoropeza Apr 16, 2026
a505de2
feat(agent): add ProgressWriter for task progress events
scoropeza Apr 17, 2026
c87aa68
feat(cli): add bgagent watch command
scoropeza Apr 17, 2026
4e98ef5
feat(agent): add DynamoDB Local testing infrastructure
scoropeza Apr 17, 2026
809afce
docs: add interactive agents Phase 1a design and guides
scoropeza Apr 17, 2026
dbcd60a
docs(backlog): known gaps for future iteration
scoropeza Apr 17, 2026
4d5d862
docs(design): mark HITL section 9.3 as pending rev 4 (Cedar-driven)
scoropeza Apr 17, 2026
8257438
feat(agent): port ProgressWriter into agent/src/ module layout
Apr 17, 2026
cb257f1
docs(design): Phase 1b decisions resolved (rev 4) + planning diagrams
Apr 17, 2026
9cf73ae
feat(cdk): Phase 1b Step 1 — two-runtime split + lifecycle + DDB Streams
Apr 17, 2026
4b041b8
fix(cdk): keep Runtime construct id stable to avoid CFN replacement
Apr 17, 2026
7a5db56
feat(agent): Phase 1b Step 2 — SSEAdapter module (real-time sibling o…
Apr 17, 2026
0e958b9
docs(design): lock endpoint choice — content-type negotiated /invocat…
Apr 18, 2026
a55fa60
feat(agent): Phase 1b Step 3 — wire SSEAdapter into pipeline/runner/s…
Apr 18, 2026
2e95b9f
feat(cdk,cli): Phase 1b Step 4 — get-task-events ?after=<event_id> ca…
Apr 20, 2026
36ee9ea
feat(cli): Phase 1b Step 5 — SSE client wrapper (hybrid: @ag-ui/core …
Apr 20, 2026
5a6d994
feat(cli): Phase 1b Step 6 — bgagent watch SSE-first with polling fal…
Apr 20, 2026
30066b7
fix(cli): use ID token for REST, access token for AgentCore SSE
Apr 21, 2026
60fcbc8
fix(agent,cli): Phase 1b SSE path bring-up — 3 real bugs + debug infra
Apr 21, 2026
c11a78a
docs(design): rev 5 — Branch A execution-location model + streaming r…
Apr 21, 2026
d17fbd8
feat(agent): Phase 1b rev-5 Part 2 — attach-don't-spawn + HealthyBusy…
Apr 21, 2026
5b3b587
feat(cdk,cli): Phase 1b rev-5 Part 1 (WIP) — execution_mode + watch e…
Apr 21, 2026
b9d6736
feat(agent,cdk,cli): Phase 1b rev-5 final — bgagent run + RUN_ELSEWHE…
Apr 21, 2026
7279bb8
fix(agent,cdk,cli): Phase 1b rev-5 E2E bring-up — 3 real bugs
Apr 21, 2026
6431e4b
fix(agent,cdk,cli,docs): Phase 1b rev-5 pre-push hardening (P0s + key…
Apr 21, 2026
e579bd4
feat(cdk,docs): stranded-task reconciler (P0-c) + MAX_CONCURRENT_TASK…
Apr 22, 2026
23fdf00
fix(agent,cdk,cli): rev-5 Round 1 — correctness (P1-3, P1-1, OBS-4)
Apr 22, 2026
c887938
fix(agent,cli): rev-5 Round 2 — error surfacing (P1-2, P1-5)
Apr 22, 2026
12a39e2
feat(agent,cdk): rev-5 Round 3 — observability (OBS-1/2/3, P1-4)
Apr 22, 2026
726a5a3
refactor(agent): rev-5 Round 4a — encapsulation (TDA-1, TDA-2, TDA-6)
Apr 22, 2026
fe87b59
refactor(agent,cdk,cli): rev-5 Round 4b — shared types (TDA-3, TDA-4,…
Apr 22, 2026
55db38a
feat(agent,cdk,cli): rev-5 Round 5 — design alignment (POLL-1, DATA-1)
Apr 22, 2026
284cebf
docs(design): rev-5 Round 6 — followups doc reflects all landed rounds
Apr 22, 2026
036f749
docs(cdk): link upstream issue for AssetImage.bind double-attach (CDK-1)
Apr 22, 2026
053573a
chore(docs): v3 diagram viewport adjustments after local inspection
Apr 22, 2026
390e0f6
feat(cdk): fan-out plane Lambda — closes last strict Phase 1b item (§…
Apr 22, 2026
343f81a
docs(diagram): v4 reframe nudge as Phase 2 REST feature on both runti…
Apr 22, 2026
0752775
feat(nudge): Phase 2 interactive nudge — mid-run steering via REST + …
Apr 23, 2026
ba80a46
docs(diagram): v5 consolidated — current state + Phase 1c/2.5/3 propo…
Apr 23, 2026
d495311
docs(phase3): Cedar-HITL detailed design + 12-page draw.io companion
Apr 24, 2026
1353ebe
docs(phase3): rewrite Cedar-HITL design — integrate review findings (…
Apr 28, 2026
2e5d62b
docs(cleanup): consolidate draw.io files under docs/diagrams/
Apr 28, 2026
45c254e
docs(phase3): ship Cedar-HITL as standard functionality — remove feat…
Apr 28, 2026
11caa17
fix(cancel): cancel now actually stops the running agent (no PR on ca…
Apr 29, 2026
7b3fcae
fix(agent): maintain status_created_at on RUNNING/terminal transitions
Apr 29, 2026
ed1ad31
Merge branch 'main' into feature/interactive-background-agents
krokoko Apr 29, 2026
af46d3d
chore(lint): fix CI lint + type-check failures; drop stale implementa…
Apr 29, 2026
7b3b5b1
docs(design): retire two-runtime split; adopt async-only architecture…
Apr 30, 2026
cb60de2
refactor(cdk): collapse to single AgentCore runtime; unify stranded t…
Apr 30, 2026
62728dc
refactor(agent): delete SSE machinery; sync-only /invocations path
Apr 30, 2026
d3bdc37
refactor(cli): delete SSE client and bgagent run; fix polling cursor bug
Apr 30, 2026
4c642b9
feat(agent): combined-turn nudge acknowledgment (AD-5) + defer hydration
Apr 30, 2026
9bb3b81
feat(cli): bgagent status deterministic templated snapshot (AD-6)
Apr 30, 2026
bf16b89
feat(cli): adaptive watch polling + retry + AbortSignal (design §5.3)
Apr 30, 2026
93f6d3b
feat(cdk): FanOutConsumer per-channel routing with defaults (§6.2, AD-4)
Apr 30, 2026
3f3ebaa
feat(cdk): GitHub edit-in-place dispatcher with If-Match ETag (§6.4)
May 1, 2026
041d95f
feat(trace): --trace flag raises progress writer preview cap (§10.1, …
May 1, 2026
f489abd
feat(trace): S3 trajectory upload + bgagent trace download (§10.1, K2…
May 1, 2026
7a888a9
chore(L1): final-sweep hygiene pass
May 3, 2026
3acd217
test(L2): backfill regression guards from K2 final-review analysis
May 3, 2026
8b00fc6
feat(L3): production hardening of trace/watch/github paths
May 3, 2026
b1b9cba
feat(L4): self-healing trace_s3_uri + --force overwrite guard
May 3, 2026
7e16b9e
fix(trace): raise get-trace-url timeout + memory for SDK cold-start
May 4, 2026
4ddede9
fix(trace): drop ContentEncoding=gzip on S3 upload
May 4, 2026
f65bf4b
fix(cli): render trace_s3_uri in bgagent status snapshot
May 4, 2026
9fe704e
fix(fanout): handle DDB string numerics and unwrap agent_milestone fo…
May 4, 2026
e88f060
fix(cli): harden bgagent watch snapshot retries and terminal exit
May 4, 2026
c09bfd7
refactor(shared): hoist DDB numeric coercion to shared helper + close…
May 4, 2026
2c2eda0
fix(fanout): drop If-Match from GitHub PATCH (endpoint rejects condit…
May 4, 2026
ccf38c3
fix(cli): bgagent watch terminal message includes task_id and error c…
May 4, 2026
1c87094
fix(cdk): lower task-input-guardrail PROMPT_ATTACK inputStrength HIGH…
May 4, 2026
78f6cda
feat(cli): bgagent status snapshot surfaces Type and Reason for user …
May 4, 2026
dbcf203
fix(fanout): classify agent_status=error_max_turns / error_max_budget…
May 4, 2026
a314e59
refactor(cli): unify bgagent status formatter — --wait is a pure bloc…
May 4, 2026
a3c7c96
fix(api): surface channel_source on TaskDetail responses
May 4, 2026
402087a
feat(cli): surface Channel and Description in bgagent status snapshot
May 4, 2026
388da13
docs(diagrams): regenerate interactive-agents-phases.drawio for rev-6…
May 4, 2026
c779016
chore(cdk): revert agent-plugins blueprint to krokoko/agent-plugins
May 4, 2026
1b8ffa6
fix(fanout): partial-batch retry + github-comment defense-in-depth (k…
May 5, 2026
d6ad9a5
fix(types): tighten TaskDetail numeric coercion + ChannelSource liter…
May 5, 2026
9e6c23f
fix(agent): cancel hook short-circuits nudge consumption (krokoko rev…
May 5, 2026
db55bfa
fix(agent): progress writer error classification + shared circuit bre…
May 5, 2026
331e283
fix(reconciler,agent): reconciler alarming + TaskConfig trace-without…
May 5, 2026
d8a98d6
hotfix(fanout): drop unused _context/_callback args — nodejs24 reject…
May 5, 2026
7e170da
docs(diagrams): updated formatting of edges
May 5, 2026
6817869
fix(lint): address CI lint + typecheck failures from krokoko review-f…
May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ Use this routing before editing so the right package and tests get updated:
| Shared API request/response shapes | `cdk/src/handlers/shared/types.ts` | **`cli/src/types.ts`** (must stay in sync) |
| `bgagent` CLI commands and HTTP client | `cli/src/`, `cli/test/` | `cli/src/types.ts` if API types change |
| Agent runtime (clone, tools, prompts, container) | `agent/src/` (`pipeline.py`, `runner.py`, `config.py`, `hooks.py`, `policy.py`, `prompts/`, Dockerfile, etc.) | `agent/tests/`, `agent/README.md` for env/PAT |
| Agent progress events (written to `TaskEventsTable` from the MicroVM; read by `bgagent watch`) | `agent/src/progress_writer.py`, `agent/src/pipeline.py` and `agent/src/runner.py` (integration points) | `agent/tests/test_progress_writer.py`; `cli/src/commands/watch.ts` for the consumer side |
| User-facing or design prose | `docs/guides/`, `docs/design/` | Run **`mise //docs:sync`** or **`mise //docs:build`** (do not edit `docs/src/content/docs/` by hand) |
| Monorepo tasks, CI glue | Root `mise.toml`, `scripts/`, `.github/workflows/` | — |

Expand Down
1 change: 1 addition & 0 deletions agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@ The `run.sh` script overrides the container's default CMD to run `python /app/sr
| `MAX_BUDGET_USD` | No | | **Local batch only** (shell env when running `entrypoint.py` directly). Range 0.01–100; agent stops when the budget is reached. For deployed AgentCore **server** mode and production tasks, set **`max_budget_usd`** on task creation (REST API, CLI `--max-budget`, or Blueprint default); the orchestrator sends it in the `/invocations` JSON body — server mode does not read `MAX_BUDGET_USD` from the environment. |
| `DRY_RUN` | No | | Set to `1` to validate config and print the prompt without running the agent |
| `ANTHROPIC_DEFAULT_HAIKU_MODEL` | No | `anthropic.claude-haiku-4-5-20251001-v1:0` | Bedrock model ID for the pre-flight safety check (see below) |
| `NUDGES_TABLE_NAME` | No | | **Phase 2.** DynamoDB table for mid-task user nudges (`<user_nudge>` XML blocks injected between turns). If unset, the agent runs without nudge support — `nudge_reader.read_pending()` returns `[]` and logs a WARN once. Set automatically by the CDK stack on both AgentCore runtimes. |

**Pre-flight check model**: Claude Code runs a quick safety verification using a small Haiku model before executing each tool command. On Bedrock, the default Haiku model ID may not be enabled in your account, causing the check to time out with *"Pre-flight check is taking longer than expected"* warnings. The agent sets `ANTHROPIC_DEFAULT_HAIKU_MODEL` to a known-available Bedrock Haiku model ID to avoid this. If you see pre-flight timeout warnings, verify that this model is enabled in your Bedrock model access settings.

Expand Down
24 changes: 24 additions & 0 deletions agent/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Local development services for agent testing.
#
# Usage:
# docker compose up -d # Start DynamoDB Local
# docker compose down # Stop and clean up
#
# The agent container (run via run.sh --local-events) connects to
# the "agent-local" network to reach DynamoDB Local at
# http://dynamodb-local:8000.

services:
dynamodb-local:
image: amazon/dynamodb-local:latest
container_name: dynamodb-local
ports:
- "8000:8000"
command: ["-jar", "DynamoDBLocal.jar", "-inMemory", "-sharedDb"]
networks:
- agent-local

networks:
agent-local:
name: agent-local
driver: bridge
35 changes: 35 additions & 0 deletions agent/mise.toml
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,38 @@ run = [
{ task = "security:bandit" },
{ task = "security:image" },
]

# LOCAL DEVELOPMENT (DynamoDB Local for progress events)

[tasks."local:up"]
description = "Start DynamoDB Local and create tables for local agent testing"
run = [
"docker compose up -d",
"bash scripts/create-local-tables.sh",
]

[tasks."local:down"]
description = "Stop DynamoDB Local (all data is ephemeral)"
run = "docker compose down"

[tasks."local:events"]
description = "Query progress events from DynamoDB Local"
run = """
aws dynamodb scan \
--table-name TaskEventsTable \
--endpoint-url http://localhost:8000 \
--region us-east-1 \
--no-cli-pager \
--output table 2>/dev/null || echo "No events found (is DynamoDB Local running?)"
"""

[tasks."local:events:json"]
description = "Query progress events from DynamoDB Local (JSON)"
run = """
aws dynamodb scan \
--table-name TaskEventsTable \
--endpoint-url http://localhost:8000 \
--region us-east-1 \
--no-cli-pager \
--output json 2>/dev/null || echo "{}"
"""
35 changes: 34 additions & 1 deletion agent/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,16 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# ---------------------------------------------------------------------------
usage() {
cat <<'EOF'
Usage: ./agent/run.sh [--server] <owner/repo> [args...]
Usage: ./agent/run.sh [--server] [--local-events] <owner/repo> [args...]

Modes:
(default) Local batch mode — runs the agent, then exits
--server Server mode — starts FastAPI on port 8080 (/invocations + /ping)

Flags:
--local-events Connect to DynamoDB Local (port 8000) for progress events.
Requires: docker compose up -d && ./agent/scripts/create-local-tables.sh

The second argument (after flags) is auto-detected:
- If numeric, treated as a GitHub issue number
- Otherwise, treated as a task description
Expand Down Expand Up @@ -47,6 +51,9 @@ Examples:
# Local mode — dry run (print prompt, don't invoke agent)
DRY_RUN=1 ./agent/run.sh "myorg/myrepo" 42

# Local mode with progress events to DynamoDB Local
./agent/run.sh --local-events "myorg/myrepo" 42

# Server mode — start FastAPI, then invoke via curl
./agent/run.sh --server "myorg/myrepo"
curl http://localhost:8080/ping
Expand All @@ -61,13 +68,18 @@ EOF
# Parse flags
# ---------------------------------------------------------------------------
MODE="local"
LOCAL_EVENTS=false

while [[ $# -gt 0 ]]; do
case "$1" in
--server)
MODE="server"
shift
;;
--local-events)
LOCAL_EVENTS=true
shift
;;
--help|-h)
usage
;;
Expand Down Expand Up @@ -206,6 +218,24 @@ DOCKER_ARGS=(
[[ -n "${MAX_TURNS:-}" ]] && DOCKER_ARGS+=(-e "MAX_TURNS=${MAX_TURNS}")
[[ -n "${MAX_BUDGET_USD:-}" ]] && DOCKER_ARGS+=(-e "MAX_BUDGET_USD=${MAX_BUDGET_USD}")

# Local events mode: connect to DynamoDB Local via the agent-local network
if [[ "$LOCAL_EVENTS" == true ]]; then
# Verify DynamoDB Local is running
if ! docker inspect dynamodb-local >/dev/null 2>&1; then
echo "ERROR: DynamoDB Local is not running." >&2
echo " Start it with: cd agent && docker compose up -d" >&2
echo " Create tables: ./agent/scripts/create-local-tables.sh" >&2
exit 1
fi
DOCKER_ARGS+=(
--network agent-local
-e "TASK_EVENTS_TABLE_NAME=TaskEventsTable"
-e "TASK_TABLE_NAME=TaskTable"
-e "AWS_ENDPOINT_URL_DYNAMODB=http://dynamodb-local:8000"
)
echo " Events: DynamoDB Local (http://localhost:8000)"
fi

# Server mode: expose port 8080
if [[ "$MODE" == "server" ]]; then
DOCKER_ARGS+=(-p 8080:8080)
Expand Down Expand Up @@ -236,6 +266,9 @@ echo "Monitor in another terminal:"
echo " docker logs -f ${CONTAINER_NAME} # live output"
echo " docker stats ${CONTAINER_NAME} # CPU, memory, disk I/O"
echo " docker exec ${CONTAINER_NAME} du -sh /workspace # disk usage"
if [[ "$LOCAL_EVENTS" == true ]]; then
echo " mise run local:events # query progress events"
fi
echo ""

if [[ "$MODE" == "server" ]]; then
Expand Down
62 changes: 62 additions & 0 deletions agent/scripts/create-local-tables.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env bash
# Create DynamoDB tables in DynamoDB Local for local agent testing.
#
# Prerequisites:
# docker compose up -d (starts DynamoDB Local on port 8000)
# AWS CLI installed
#
# Usage:
# ./agent/scripts/create-local-tables.sh

set -euo pipefail

ENDPOINT="http://localhost:8000"
REGION="us-east-1"

# Common args for all commands
DDB_ARGS=(--endpoint-url "$ENDPOINT" --region "$REGION" --no-cli-pager)

echo "Creating local DynamoDB tables..."

# ---------------------------------------------------------------------------
# TaskEventsTable — matches cdk/src/constructs/task-events-table.ts
# PK: task_id (S), SK: event_id (S, ULID)
# TTL: ttl
# ---------------------------------------------------------------------------
if aws dynamodb describe-table --table-name TaskEventsTable "${DDB_ARGS[@]}" >/dev/null 2>&1; then
echo " TaskEventsTable already exists — skipping"
else
aws dynamodb create-table \
--table-name TaskEventsTable \
--attribute-definitions \
AttributeName=task_id,AttributeType=S \
AttributeName=event_id,AttributeType=S \
--key-schema \
AttributeName=task_id,KeyType=HASH \
AttributeName=event_id,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
"${DDB_ARGS[@]}" >/dev/null
echo " TaskEventsTable created"
fi

# ---------------------------------------------------------------------------
# TaskTable — matches cdk/src/constructs/task-table.ts
# PK: task_id (S)
# TTL: ttl
# GSIs omitted (not needed for local agent testing)
# ---------------------------------------------------------------------------
if aws dynamodb describe-table --table-name TaskTable "${DDB_ARGS[@]}" >/dev/null 2>&1; then
echo " TaskTable already exists — skipping"
else
aws dynamodb create-table \
--table-name TaskTable \
--attribute-definitions \
AttributeName=task_id,AttributeType=S \
--key-schema \
AttributeName=task_id,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
"${DDB_ARGS[@]}" >/dev/null
echo " TaskTable created"
fi

echo "Done. Tables available at $ENDPOINT"
12 changes: 12 additions & 0 deletions agent/src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ def build_config(
task_type: str = "new_task",
branch_name: str = "",
pr_number: str = "",
trace: bool = False,
user_id: str = "",
) -> TaskConfig:
"""Build and validate configuration from explicit parameters.

Expand Down Expand Up @@ -102,6 +104,8 @@ def build_config(
branch_name=branch_name,
pr_number=pr_number,
task_id=task_id or uuid.uuid4().hex[:12],
trace=trace,
user_id=user_id,
)


Expand All @@ -118,6 +122,14 @@ def get_config() -> TaskConfig:
max_budget_usd=float(os.environ.get("MAX_BUDGET_USD", "0")) or None,
aws_region=os.environ.get("AWS_REGION", ""),
dry_run=os.environ.get("DRY_RUN", "").lower() in ("1", "true", "yes"),
# Local-batch ``--trace`` parity (design §10.1). Without
# these env vars a developer running the agent outside
# AgentCore could never exercise the trace path. Both are
# opt-in; empty ``USER_ID`` with ``TRACE=1`` logs a skip
# warning (see ``pipeline.run_task``) rather than writing
# an unreachable ``traces//`` key.
trace=os.environ.get("TRACE", "").lower() in ("1", "true", "yes"),
user_id=os.environ.get("USER_ID", ""),
)
except ValueError as e:
print(f"ERROR: {e}", file=sys.stderr)
Expand Down
Loading
Loading