Skip to content

RFC: GitHub Actions CI/CD with protected deployment environment and approval gates #73

@scottschreckengaust

Description

@scottschreckengaust

RFC: Automated Deployment Pipeline with Protected Environments

Status: Draft (revised per feedback)

Author: @scottschreckengaust

Related: #70 (context stack names), #72 (ephemeral cleanup)


Summary

Establish a GitHub Actions deployment pipeline that:

  1. Builds and synthesizes CDK once per variant in build.yml
  2. Stores cdk-<variant>.out as immutable deployment artifacts (synth once, deploy exact artifact)
  3. Gates all deployments behind a protected GitHub environment (deploy) requiring manual approval — triggered by deploy label (with optional variant qualifiers)
  4. Deploys to AWS using OIDC federation assuming CDK bootstrap roles (no long-lived credentials)
  5. Stack naming: main-<variant>-prd for production, ephemeral for PRs/branches
  6. On successful deployment: creates a GitHub Release (drafted → published) with tagged main and cdk-*.out artifacts
  7. Cleanup targets stacks tagged with github (not by description), gated behind approval with cancel-in-progress concurrency

Decisions (from discussion)

Question Decision
PR deployments Opt-in via deploy label (with optional variant qualifiers)
Synth strategy Once in build.yml, deploy the exact artifact — no re-synth
Cleanup approval Always manually gated — later runs cancel prior pending requests
Cost gate No — resource review in approval is sufficient
Permissions boundary Yes — use CDK bootstrap roles (deploy, lookup, file-publishing, image-publishing)
main deploy approval Always require — never skip, even after PR merge
Variant selection Label-driven: deploy = default only, deploy:ecs, deploy:eks, deploy:* = all
Baselines Per-variant against main-<variant>-prd — stored as release artifacts

Design

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│  GitHub Actions                                                      │
│                                                                      │
│  build.yml (CI) — every push/PR                                      │
│  ├─ steps: install → compile → test → lint → synth (once per variant)│
│  ├─ matrix: variants resolved from labels                            │
│  ├─ artifact: cdk-<variant>.out (immutable, uploaded per leg)        │
│  └─ output: stack_name, is_protected, variant                        │
│                                                                      │
│  deploy.yml (CD) — on `deploy` label OR main merge                   │
│  ├─ trigger: label added + build success, OR push to main            │
│  ├─ environment: "deploy" (ALWAYS requires approval, no bypass)      │
│  ├─ steps:                                                           │
│  │   ├─ download cdk-<variant>.out artifact (exact build output)     │
│  │   ├─ configure-aws-credentials (OIDC → CDK bootstrap roles)      │
│  │   ├─ baseline-diff (compare vs last release baseline)             │
│  │   ├─ post diff summary to deployment log                          │
│  │   ├─ cdk deploy --app cdk-<variant>.out --require-approval never  │
│  │   └─ on success: draft release → tag → attach artifacts → publish │
│  └─ concurrency: one deploy at a time per stack                      │
│                                                                      │
│  cleanup.yml                                                         │
│  ├─ trigger: schedule (every 4h) + workflow_dispatch                 │
│  ├─ environment: "deploy" (ALWAYS requires approval)                 │
│  ├─ concurrency: cancel-in-progress (later runs cancel prior)        │
│  └─ steps: find stacks tagged `github` → force-detach ENIs → delete  │
└─────────────────────────────────────────────────────────────────────┘
         │
         │ OIDC (aws-actions/configure-aws-credentials)
         │ role-to-assume: CDK deploy role
         ▼
┌─────────────────────────────────────────────────────────────────────┐
│  AWS Account                                                         │
│  ├─ IAM OIDC Provider (token.actions.githubusercontent.com)          │
│  ├─ CDK Bootstrap Roles (permissions boundary):                      │
│  │   ├─ cdk-hnb659fds-deploy-role-*                                  │
│  │   ├─ cdk-hnb659fds-lookup-role-*                                  │
│  │   ├─ cdk-hnb659fds-file-publishing-role-*                         │
│  │   └─ cdk-hnb659fds-image-publishing-role-*                        │
│  ├─ CloudFormation Stacks (tagged: github=true)                      │
│  │   ├─ main-agentcore-prd (protected, terminationProtection=true)   │
│  │   ├─ main-ecs-prd (protected, terminationProtection=true)         │
│  │   ├─ pr-42-abc1234-agentcore (ephemeral, tagged)                  │
│  │   └─ commit-abc1234-ecs (ephemeral, tagged)                       │
│  └─ CDK Bootstrap (cdk-toolkit stack)                                │
└─────────────────────────────────────────────────────────────────────┘

Label-Driven Variant Selection

Labels

Label Variants deployed Use case
deploy agentcore (default variant only) Standard deployment
deploy:ecs ecs only Test ECS variant
deploy:eks eks only Test EKS variant
deploy:ecs + deploy:eks ecs and eks Test multiple non-default
deploy:* All variants (agentcore + ecs + eks) Full matrix deployment
No deploy* label Nothing deployed Default (CI only)

Resolution logic

- name: Resolve variants from labels
  id: variants
  run: |
    LABELS='${{ toJson(github.event.pull_request.labels.*.name) }}'

    if echo "$LABELS" | jq -e 'index("deploy:*")' > /dev/null; then
      # deploy:* = all variants
      echo 'matrix=["agentcore","ecs","eks"]' >> "$GITHUB_OUTPUT"
    elif echo "$LABELS" | jq -e '[.[] | select(startswith("deploy:"))] | length > 0' > /dev/null; then
      # Specific variant labels
      VARIANTS=$(echo "$LABELS" | jq '[.[] | select(startswith("deploy:")) | ltrimstr("deploy:")]')
      echo "matrix=$VARIANTS" >> "$GITHUB_OUTPUT"
    elif echo "$LABELS" | jq -e 'index("deploy")' > /dev/null; then
      # Plain "deploy" = default variant only
      echo 'matrix=["agentcore"]' >> "$GITHUB_OUTPUT"
    else
      echo 'matrix=[]' >> "$GITHUB_OUTPUT"
    fi

Release Flow

Successful deployments from main produce GitHub Releases:

main merge
  → build.yml (synth once per variant in matrix)
  → upload artifacts: cdk-agentcore.out, cdk-ecs.out, ...
  → deploy.yml (approval gate — downloads exact artifacts)
  → successful deployment
  → Draft Release created:
      Tag: v<date>-<short-sha> (e.g. v2026.05.11-abc1234)
      Assets:
        - cdk-agentcore.out.tar.gz
        - cdk-ecs.out.tar.gz
        - agentcore.resource-types.json (baseline)
        - ecs.resource-types.json (baseline)
  → Publish Release

Baselines live in releases, not in the repo. The diff step downloads the baseline from the latest published release for that variant:

- name: Download baseline from latest release
  run: |
    LATEST=$(gh release view --json tagName -q .tagName 2>/dev/null || echo "")
    if [[ -n "$LATEST" ]]; then
      gh release download "$LATEST" \
        --pattern "${{ matrix.variant }}.resource-types.json" \
        --dir /tmp/baseline/ || true
    fi

This means:

  • No baseline commits polluting the repo history
  • Baselines are immutable (tied to a release tag)
  • First deploy (no prior release) has no baseline → everything shows as "new" (correct)
  • Rollback = re-deploy from a prior release's cdk-*.out artifact

Synth-Once, Deploy-Exact Artifact

The cdk.out is synthesized exactly once per variant during build.yml. The deploy.yml never re-synths — it downloads and deploys the exact artifact:

# build.yml
- name: CDK Synth
  run: |
    npx cdk synth -c computeVariant=${{ matrix.variant }} \
      -c stackName=${{ steps.naming.outputs.stack_name }} \
      --output cdk-${{ matrix.variant }}.out

- uses: actions/upload-artifact@v4
  with:
    name: cdk-${{ matrix.variant }}-out
    path: cdk-${{ matrix.variant }}.out/

# deploy.yml (no synth — uses exact artifact from build)
- uses: actions/download-artifact@v4
  with:
    name: cdk-${{ matrix.variant }}-out
    path: cdk-${{ matrix.variant }}.out/

- name: Deploy
  run: npx cdk deploy --app cdk-${{ matrix.variant }}.out --all --require-approval never

This guarantees what was tested in CI is exactly what gets deployed — no new Date() drift, no env var differences, no CDK version skew.


Permissions: CDK Bootstrap Role Assumption

The GitHub OIDC role only needs permission to assume the CDK bootstrap roles. This is the CDK security best practice:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "sts:AssumeRole",
      "Resource": [
        "arn:aws:iam::ACCOUNT:role/cdk-hnb659fds-deploy-role-*",
        "arn:aws:iam::ACCOUNT:role/cdk-hnb659fds-lookup-role-*",
        "arn:aws:iam::ACCOUNT:role/cdk-hnb659fds-file-publishing-role-*",
        "arn:aws:iam::ACCOUNT:role/cdk-hnb659fds-image-publishing-role-*"
      ]
    },
    {
      "Sid": "CleanupENIs",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeNetworkInterfaces",
        "ec2:DetachNetworkInterface",
        "ec2:DeleteNetworkInterface",
        "cloudformation:ListStacks",
        "cloudformation:DescribeStacks",
        "cloudformation:DeleteStack",
        "cloudformation:ListStackResources"
      ],
      "Resource": "*"
    }
  ]
}

Trust policy (OIDC):

{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "Federated": "arn:aws:iam::ACCOUNT:oidc-provider/token.actions.githubusercontent.com"
    },
    "Action": "sts:AssumeRoleWithWebIdentity",
    "Condition": {
      "StringEquals": {
        "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
      },
      "StringLike": {
        "token.actions.githubusercontent.com:sub": "repo:aws-samples/sample-autonomous-cloud-coding-agents:*"
      }
    }
  }]
}

Stack Naming and Tagging

Git ref Label Stack name Protected Tag: github
main (auto) main-agentcore-prd true true
main deploy:ecs main-ecs-prd true true
PR #42 deploy pr-42-abc1234-agentcore false true
PR #42 deploy:ecs pr-42-abc1234-ecs false true
Branch push deploy commit-abc1234-agentcore false true

All stacks deployed via this pipeline get tagged:

Tags.of(stack).add('github', 'true');
Tags.of(stack).add('variant', computeVariant);
Tags.of(stack).add('ref', gitRef);

GitHub Environment: deploy

Setting Value Rationale
Required reviewers ≥1 reviewer, NOT the actor who triggered Prevents self-approval
Wait timer 0 (manual approval is the gate)
Deployment branches All branches Allow PR deploys via label
Allow administrators to bypass No No bypass for anyone
Prevent self-review Yes Enforce separation of duties

Environment secrets:

Secret Value
AWS_ROLE_ARN arn:aws:iam::ACCOUNT:role/GitHubActionsCDKRole
AWS_REGION us-east-1

Cleanup Workflow

name: Cleanup Ephemeral Stacks
on:
  schedule:
    - cron: '0 */4 * * *'
  workflow_dispatch:
    inputs:
      max_age_hours:
        description: 'Max age in hours (0 = all non-protected)'
        default: '0'
      dry_run:
        description: 'Dry run mode'
        type: boolean
        default: true

concurrency:
  group: cleanup-ephemeral
  cancel-in-progress: true  # later runs cancel prior pending requests

jobs:
  cleanup:
    runs-on: ubuntu-latest
    environment: deploy  # ALWAYS requires approval
    permissions:
      id-token: write
      contents: read

    steps:
      - uses: actions/checkout@v4

      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
          aws-region: ${{ secrets.AWS_REGION }}

      - name: Run cleanup
        env:
          MAX_AGE_HOURS: ${{ inputs.max_age_hours || '0' }}
        run: ./scripts/cleanup-ephemeral-stacks.sh --tag github=true

Resource Baseline and Diff (via Releases)

Diff output example (shown to approver in Step Summary)

## ⚠️ New AWS Resource Types (agentcore variant)

The following resource types are NEW compared to latest release v2026.05.10-fa647ca:

  + AWS::EKS::Cluster
  + AWS::EKS::Nodegroup
  + AWS::IAM::OpenIDConnectProvider

Approver action: Verify cost model, quotas, security posture, and cleanup behavior.

## Resource count: 47 → 50 (+3)

Approval Gate: What Reviewers Should Check

The deployment summary provides:

  1. Resource type diff from baseline (new/removed services)
  2. Full cdk diff (property-level changes from the synthesized artifact)
  3. Variant and stack name being deployed
  4. Labels that triggered the deployment

Per new resource type, verify:

Check How
Cost model AWS Pricing / awspricing MCP
Service quotas aws service-quotas list-service-quotas --service-code <code>
Security posture Public endpoints? VPC-only? Encryption at rest?
IAM blast radius What * permissions does CDK grant?
Cleanup behavior RemovalPolicy.DESTROY? Orphan risk?
Regional availability Available in target region?

Implementation Plan

Phase 1: Foundation

  • Create GitHub environment deploy (no self-approval, no bypass, prevent self-review)
  • Set up AWS OIDC provider
  • Create GitHub Actions role with sts:AssumeRole to CDK bootstrap roles + ENI cleanup
  • CDK bootstrap the target account

Phase 2: Build pipeline

  • Update build.yml to synth once per variant, upload cdk-<variant>.out artifacts
  • Add variant matrix resolved from deploy / deploy:<variant> / deploy:* labels
  • Add github + variant + ref tags to all stacks in CDK

Phase 3: Deploy pipeline

  • Create deploy.yml — downloads exact artifact, never re-synths
  • Implement OIDC → CDK bootstrap role assumption
  • Add baseline diff step (download from latest release)
  • Add cdk diff output to step summary
  • Implement release flow (draft → deploy → tag → publish)

Phase 4: Cleanup

  • Update cleanup-ephemeral-stacks.sh to target by tag (github=true)
  • Create cleanup.yml with approval gate and cancel-in-progress
  • Schedule every 4h

Phase 5: Observability

  • CloudWatch alarms (stack count, ENI leaks, cost)
  • Document approval checklist in CONTRIBUTING.md

Security Considerations

  • No long-lived credentials: OIDC only → assumes CDK bootstrap roles
  • Permissions boundary: GitHub role can ONLY assume the 4 CDK bootstrap roles + ENI cleanup
  • No self-approval: Enforced at GitHub environment level
  • No admin bypass: Even org owners must get approval
  • Audit trail: GitHub deployment history + CloudTrail
  • Tag-based targeting: Cleanup only touches github=true tagged stacks
  • Termination protection: main-*-prd stacks cannot be accidentally deleted
  • Artifact integrity: What CI tested is exactly what gets deployed (no re-synth)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions