Skip to content

Add automated PR documentation impact analysis workflow #6924

@jongio

Description

@jongio

Summary

Build a GitHub Actions workflow with a custom TypeScript action that automatically analyzes PR diffs and identifies which documentation needs updating -- both in-repo (Azure/azure-dev) and in the external docs repo (MicrosoftDocs/azure-dev-docs-pr).

Implementation: PR #6927

Flow

flowchart TD
    A["PR Event: opened/synchronize/closed"] --> B{Event Type?}
    B -->|opened / synchronize| C["Fetch PR Diff via API"]
    B -->|closed + merged| SKIP["Skip: PRs already exist"]
    B -->|closed + not merged| Z["Close doc PRs, clean branches"]

    D["Manual Trigger"] --> E{Mode?}
    E -->|single| C
    E -->|all_open| F["Enumerate open PRs"]
    E -->|list| G["Parse PR numbers"]
    F --> C
    G --> C

    C --> H["Classify changes"]
    H --> I["Build docs inventory"]
    I --> J["AI Analysis via GPT-4o"]
    J --> K{Docs impacted?}

    K -->|No| L["Post: no doc changes needed"]
    K -->|Yes| M["Generate doc proposals"]

    M --> N{"In-repo docs?"}
    N -->|Yes| O["Branch: docs/pr-N in azure-dev"]
    O --> P["Create/update PR"]
    N -->|No| Q{"External docs?"}

    P --> Q
    Q -->|Yes| R["Mint token via OIDC"]
    R --> R2["Branch: docs/pr-N in docs repo"]
    R2 --> S["Create/update docs PR"]
    Q -->|No| T["Update tracking comment"]

    S --> T
    L --> U["Done"]
    T --> U
    Z --> U
    SKIP --> U
Loading

Security Architecture

flowchart LR
    subgraph "Fork PR Security"
        FP["Fork PR"] --> PRT["PR target trigger"]
        PRT --> MAIN["Runs from main"]
        MAIN --> SAFE["Fork cant modify workflow"]
    end

    subgraph "OIDC + Key Vault Signing"
        OIDC["OIDC Token"] --> AZ["azure/login"]
        AZ --> KV["Key Vault Sign"]
        KV --> JWT["Signed JWT"]
        JWT --> TOKEN["Install Token"]
        TOKEN --> WRITE["Write to docs repo"]
    end

    subgraph "Data Flow (API only)"
        API["GitHub REST API"] --> DIFF["Read PR diff"]
        API --> DOCS["Read doc inventory"]
        API --> NEVER["NEVER checkout or execute PR code"]
    end
Loading

Problem

When code changes land in Azure/azure-dev, documentation in two locations may need updating:

  1. In-repo docs -- markdown files within Azure/azure-dev (e.g., cli/azd/docs/, READMEs, etc.)
  2. External docs -- MicrosoftDocs/azure-dev-docs-pr (the public-facing Learn documentation)

There is no automated system to detect which docs are impacted by a code PR, propose updates, or track the relationship between code PRs and doc PRs.

Proposed Solution

Workflow Triggers

  • pull_request_target: [opened, synchronize, reopened, closed] targeting main -- uses pull_request_target instead of pull_request to prevent fork PRs from exfiltrating secrets
  • workflow_dispatch for manual/batch runs (single PR, all open PRs, or a list)

Authentication

Layer Method Purpose
In-repo operations GITHUB_TOKEN Read PR diff, create doc PRs in azure-dev, post comments
Azure login OIDC federated credentials azure/login@v2 exchanges GitHub OIDC token for Azure access
JWT signing Azure Key Vault az keyvault key sign signs a GitHub App JWT (RSA key is non-exportable)
Cross-repo writes GitHub App installation token Short-lived token scoped to MicrosoftDocs/azure-dev-docs-pr

Key security properties:

  • No secrets stored in GitHub -- OIDC is fully keyless (federated credential binding)
  • Private key never leaves Key Vault -- signing happens server-side via az keyvault key sign
  • Short-lived tokens -- GitHub App installation tokens expire in 1 hour
  • Scoped access -- token only grants access to repos where the App is installed

Core Behavior

  1. Diff analysis -- Extract and classify PR changes (API, behavior, config, feature, deprecation, bug fix)
  2. Doc inventory -- Build manifest of all docs in both repos (via git.getTree + git.getBlob for efficiency, with sanitizeText() on all extracted content)
  3. AI-powered impact mapping -- Use the GitHub Models API (openai/gpt-4o) to determine which docs are impacted, with comprehensive output validation (repo format regex, path traversal blocking, unknown repo rejection, impact count cap)
  4. Companion doc PRs -- Create/update PRs in both repos with branch naming docs/pr-{source-pr-number}
  5. Tracking comment -- Maintain a comment on the source PR linking to all companion doc PRs (with author verification to prevent spoofing and multi-layer markdown injection prevention)
  6. Cleanup -- Auto-close companion doc PRs when the source PR is closed without merge

Key Design Decisions

  • AI backend: GitHub Models API with openai/gpt-4o
  • Branch naming: docs/pr-{N} for deterministic 1:1 mapping
  • Rebase-aware: Respects human edits on doc PRs (never force-pushes)
  • Auth: OIDC + Key Vault signing (no secrets stored in GitHub, private key never on runner)
  • Trigger: pull_request_target -- workflow code runs from main, preventing fork secret exfiltration
  • Graceful degradation: Without cross-repo token, still scans docs and reports impacts (just can't create PRs)
  • Architecture: 12 focused source modules, all under 200 lines
  • Injection prevention: 5-layer defense -- sanitizePlainText() on AI output, sanitizeText() on doc manifest input, escapeTableCell() on tracking comments (strips HTML/markdown), sanitizeForMarkdown() on PR bodies, output length caps (MAX_REASON=200, MAX_SUMMARY=500, MAX_IMPACTS=15)

Infrastructure (managed by EngSys)

Component Value Purpose
GitHub Environment AzureSDKEngKeyVault OIDC federated credential binding
Azure Key Vault azuresdkengkeyvault Hosts the non-exportable RSA signing key
Key Vault Key azure-sdk-automation RSA key used to sign GitHub App JWTs
GitHub App ID 1086291 Azure SDK Automation GitHub App

Tasks

  • Scaffold the action project
  • Implement diff extraction and change classification
  • Implement docs inventory builder (both repos)
  • Implement GitHub Models AI integration for analysis
  • Implement PR manager (create/update branches, PRs, rebase logic)
  • Implement tracking comment manager
  • Implement main entry point with event handling and manual mode
  • Create workflow definition (doc-monitor.yml)
  • Integrate OIDC + Key Vault signing (eng/common/actions/login-to-github)
  • MQ code review -- 12 findings fixed (security, logic, performance, type safety)
  • Red team security assessment and hardening (11 findings: 7 code-fixed, 3 admin-tracked, 1 low-risk accepted)
  • Add unit tests for pure functions
  • End-to-end validation (requires EngSys infrastructure setup)

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions