diff --git a/Formula/devflow.rb b/Formula/devflow.rb index d4fcc11..4fa7e46 100644 --- a/Formula/devflow.rb +++ b/Formula/devflow.rb @@ -40,6 +40,12 @@ def caveats Optional tools: brew install agent-deck worktrunk + + Optional CLI for clean web-page fetching in /devflow:review-document + (and the personal defuddle skill if installed): + npm install -g defuddle + Without defuddle, web URLs fall back to WebFetch — works fine, just + noisier output. EOS end diff --git a/README.md b/README.md index b70a4d5..f6c5427 100644 --- a/README.md +++ b/README.md @@ -137,7 +137,7 @@ git clone https://github.com/AndreJorgeLopes/devflow.git ~/dev/devflow cd ~/dev/devflow && make link ``` -**Prerequisites:** git, tmux, Homebrew (macOS). Recommended: Docker CLI + runtime, Claude Code or OpenCode, uv. +**Prerequisites:** git, tmux, Homebrew (macOS). Recommended: Docker CLI + runtime, Claude Code or OpenCode, uv. Optional: [defuddle](https://github.com/kepano/defuddle) (`npm install -g defuddle`) for clean web-page fetching in `/devflow:review-document`. Without it, web URLs fall back to `WebFetch`. ### First Run @@ -203,7 +203,7 @@ SKILLS ## Skills & Commands -`devflow init` installs 18 slash commands. Type `/devflow:` in Claude Code to see them all. +`devflow init` installs 19 slash commands. Type `/devflow:` in Claude Code to see them all. | Skill | Layer | What It Does | |-------|:-----:|-------------| @@ -211,6 +211,8 @@ SKILLS | `/devflow:finish-feature` | Review + Memory | Verify, commit, create PR/MR, retain learnings, cleanup | | `/devflow:create-pr` | Review + Memory | Self-review + code checks + PR creation pipeline | | `/devflow:pre-push-check` | Review + Process | Full pre-push review against check rules + CLAUDE.md | +| `/devflow:review` | Review | Multi-perspective code review of a PR/MR or local diff | +| `/devflow:review-document` | Review | Multi-perspective prose review (KB/RFC/spike/runbook/PRD) on Google Docs, Confluence, local files, or URLs | | `/devflow:spec-feature` | Memory + Process | Architecture recall + spec doc + task breakdown | | `/devflow:architecture-decision` | Memory + Process | Document ADR, retain in Hindsight, update CLAUDE.md | | `/devflow:best-roi-task` | Process | Find highest ROI task in a Jira Epic | diff --git a/devflow-plugin/skills/review-document/AGENTS.md b/devflow-plugin/skills/review-document/AGENTS.md new file mode 100644 index 0000000..5c21751 --- /dev/null +++ b/devflow-plugin/skills/review-document/AGENTS.md @@ -0,0 +1,96 @@ +# Review agents — definitions and activation rules + +Loaded by `SKILL.md` Phase 2 in Thorough mode. Each agent receives the context packet (cleaned doc text + `DOC_TYPE` + `ANCHOR_TYPE` + linked tickets + external sources + Hindsight memories + `EXISTING_COMMENTS` + relevant code excerpts for technical doc types). + +In Thorough mode, dispatch all active agents **in a single message with multiple `Agent` tool uses** so they run in parallel. Each agent returns findings as structured JSON (see "Finding format" below). + +## Subagent selection — prefer OMC specialists when available + +If `oh-my-claudecode` is installed (detect via `omc --version` succeeding OR the presence of `/oh-my-claudecode:*` slash commands), use the specialized OMC agent for each role. Generic-purpose fallback otherwise. + +| Role | OMC specialist | Generic fallback | +|---|---|---| +| Correctness + structure + completeness | `critic` (Opus) | `general-purpose` | +| Prose / tone / clarity / audience-fit | `writer` (Haiku) | `general-purpose` | +| External claim verification (vendor / API / regulatory docs) | `document-specialist` | `general-purpose` | +| Code / runtime claim verification | `verifier` | `general-purpose` | +| Architecture / system-layering claim cross-check | `architect` (Opus, read-only) | `general-purpose` | +| Security / PII / auth claim audit | `security-reviewer` | `general-purpose` | +| Visual / UI screenshot validation | `visual-verdict` skill (invoked SEQUENTIALLY after the parallel agent batch returns — it is a skill, not a subagent dispatchable via `Task`) | n/a | + +Fallback is a real fallback — never block the review on OMC absence. Findings from `general-purpose` are still high quality, just less role-specialized. + +## Always-on agents per `DOC_TYPE` + +| `DOC_TYPE` | Always-on (3 agents) | +|---|---| +| `kb` (customer-facing knowledge base) | `critic` + `writer` + `document-specialist` | +| `spike` / `rfc` | `critic` + `verifier` + `architect` | +| `prd` | `critic` + `writer` + `verifier` | +| `runbook` | `critic` + `verifier` + `security-reviewer` | +| `design` (UI prose) | `critic` + `writer` (always-on subagents); also run `visual-verdict` skill sequentially after the batch returns if screenshots are present | +| `generic` | `critic` + `writer` + `document-specialist` | + +## Conditional agents (activate when signal triggers) + +| Agent | Activate when | +|---|---| +| `security-reviewer` | Doc mentions auth / tokens / PII / SSO / encryption / GDPR / customer data export / DSAR | +| `architect` | Doc claims behaviour of ≥2 services OR includes a system-layering diagram | +| `visual-verdict` skill (sequential, not a parallel subagent) | Doc references screenshots, UI mocks, or embedded images | +| `verifier` | Doc claims runtime behaviour of code that exists in the current workspace | +| `document-specialist` | Doc cites external vendor / API / regulatory sources (URLs in body) | + +Conditional agents are dispatched in parallel with the always-on set. + +## Agent prompts per role + +Each agent receives the context packet plus a role-specific prompt. Adapt these per invocation — they are starting points, not rigid templates. + +### Critic — correctness + structure + completeness +Review for: factual contradictions inside the doc, missing sections a reader would reasonably ask, structural flow, ambiguity, audience-fit. Output severity-tagged findings with anchor + verbatim quote + one-sentence why + concrete fix. **Do NOT flag styling.** Watch for: terminology drift (same concept named two ways), self-contradicting sentences, sections that contradict each other, missing edge cases. + +### Writer — prose / tone / clarity +Review for: tone consistency across the doc, reading level vs declared audience, jargon density, passive voice that obscures who-does-what, awkward sentences, unclear CTAs. Output 5–10 specific `quote → rewrite` pairs. **Do NOT flag substance** — that is the critic's job. Limit findings to where rewriting the sentence concretely improves comprehension. + +### Document-specialist — external claim verification +Verify factual claims about external systems (vendor APIs, regulations, third-party SDKs) against authoritative upstream sources. Use the `defuddle` skill for clean web fetches. Time-box: ≤8 fetches, ≤15 minutes total. **Authoritative sources only** — vendor's own docs (developers.facebook.com, business.whatsapp.com, twilio.com/docs, etc.). Secondary BSP / SDK provider docs (Infobip, Vonage, etc.) acceptable as confirmation, never primary. Reject random blogs / Stack Overflow / tutorials as evidence. Output a verification table: `Claim | Doc quote | Source URL | Verdict (confirmed / partial / not-found / contradicted) | Notes`. End with `"claims to verify in-person with PM"` — flag claims that need internal-team confirmation (e.g. "is this an Aircall product decision or a Meta restriction?"). + +### Verifier — code / runtime claim cross-check +For each doc claim about how the code behaves: `Grep` the workspace, `Read` the relevant file, confirm or contradict. Output: `Claim | file:line | Verdict | Quote of actual code`. + +### Architect — architecture / system-layering claim cross-check +Read the doc's "how it works" / data-flow / sequence-diagram sections. Cross-check against actual service code in the workspace. Surface mismatches between what the doc describes and what the code does. Read-only — do not propose code changes, only flag doc/code drift. + +### Security-reviewer — PII / auth / secrets / GDPR +Audit the doc for: secrets in examples (API keys, tokens, real customer phone numbers), customer-data exposure in code samples, missing GDPR / data-residency notes, OWASP-relevant claims (auth flow descriptions that contradict reality), DSAR-handling claims. Output severity-tagged findings. + +## Finding format (JSON returned by every agent) + +```json +[{ + "agent": "critic", + "severity": "critical|important|suggestion", + "confidence": 85, + "category": "factual|consistency|completeness|clarity|tone|security|verification|missing-section", + "anchor_type": "file_line|section_quote", + "file": "/abs/path/to/doc.md", + "line": 42, + "section": "What is X?", + "quote": "verbatim text snippet from the doc", + "title": "Short description (the bionic-reading key term)", + "detail": "Full explanation grounded in the cleaned doc text and any verified source", + "suggestion": "Concrete fix — inline rewrite verbatim, or one-sentence instruction", + "external_source_url": "https://... (REQUIRED for document-specialist findings; null otherwise)" +}] +``` + +**`anchor_type` semantics:** +- `file_line` — local file. Populate `file` + `line`, omit `section` (or include for context). +- `section_quote` — hosted doc. Populate `section` + `quote` (verbatim snippet user can Ctrl+F). Omit `file` + `line`. + +The skill orchestrator sets the global `ANCHOR_TYPE` in Phase 0c and passes it to each agent. Agents emit findings in the matching shape; the orchestrator validates before output. + +## Recurring-pattern handling + +If an agent finds the SAME issue class at >3 anchors (e.g. passive voice across 6 paragraphs, vendor-name-capitalization drift across 5 sections), it MUST collapse to one finding with `category: "recurring"` and list the affected anchors in `detail`. Per-instance flagging produces noise. diff --git a/devflow-plugin/skills/review-document/SKILL.md b/devflow-plugin/skills/review-document/SKILL.md new file mode 100644 index 0000000..e89ac7b --- /dev/null +++ b/devflow-plugin/skills/review-document/SKILL.md @@ -0,0 +1,166 @@ +--- +name: review-document +description: Use when reviewing a prose document — KB article, RFC, spike, runbook, PRD, design doc, knowledge-base page — hosted on Google Docs, Confluence, a local file path, or an arbitrary URL. Checks correctness, internal consistency, audience-fit, prose clarity, and external-claim verification; cross-checks against existing platform comments to avoid re-flagging; returns severity-tagged findings with anchor + quote + concrete fix. Use when asked to "review this doc / KB / RFC / spike / runbook / PRD" and the target is prose, not a code diff. Counterpart to /devflow:write-spike. NOT for code diffs — use /devflow:review for those. +--- + +# /devflow:review-document — Multi-perspective prose document review + +You are a thorough, multi-perspective document reviewer. Counterpart to `/devflow:write-spike`. Reviews prose docs on any platform with deep context gathering and parallel review agents. Sibling to `/devflow:review` (which reviews code diffs). + +## Scope guardrails (read before doing anything) + +1. **NEVER flag pure styling issues.** Broken markdown table rows, header-level inconsistencies, missing alignment chars, font weight, colour, whitespace — none. Only substance: factual correctness, internal consistency, completeness, prose clarity, audience-fit, external-claim verification. Renderer quirks belong in a linter, not a doc review. +2. **NEVER re-flag findings already raised in existing platform comments.** Phase 1c mirrors `/devflow:review` Phase 1e — read all comments first, cross-check every finding. Re-flagging a colleague's existing comment erodes trust. +3. **NEVER trust the raw fetched text without handling source-format quirks** (Google Docs suggestion mode, Confluence change-tracking spans, Word `.docx` track-changes all leak phantom text). Always fetch the clean post-suggestion view — see Phase 0b for the authoritative recipe. + +## Input + +`$ARGUMENTS` may contain any combination of: + +- **Google Doc URL or ID** — `https://docs.google.com/document/d//...` or just the 44-char `` +- **Confluence page URL or ID** — `https://.atlassian.net/wiki/spaces/.../pages/` or the bare ID +- **Local file path** — `.md` / `.txt` / `.adoc` / `.rst` (absolute or `~`-relative) +- **Arbitrary URL** — blog, Medium, Substack, Notion-public, etc. Fetched via the `defuddle` skill (preferred) or `WebFetch` fallback +- **Nothing** — fall back to the most-recently-edited prose file in the current working directory +- **Optional doc-type hint** — `--type kb|spike|rfc|runbook|prd|design|generic` (auto-detected via Phase 0d if omitted) +- **Optional `quick` keyword** — single-pass review, skip subagents + +Parse to determine `SOURCE_TYPE`, `URL_OR_PATH`, `DOC_TYPE`, `MODE` (`quick` | `thorough`, default `thorough`). + +## Phase 0 — Source detection + clean fetch + anchor selection + +### 0a. Identify source platform + +| Input shape | Platform | Fetch tool | +|---|---|---| +| `docs.google.com/document/d/` or a bare 44-char base64-ish ID | Google Doc | Drive MCP (`mcp__bf06f3e8-*`) | +| `.atlassian.net/wiki/.../pages/` or bare numeric ID | Confluence | Atlassian MCP (`mcp__5ebcd1ed-*`) | +| Path starting `/`, `.`, `~` | Local file | `Read` | +| Any other `http(s)://` URL | Web | `defuddle` skill (preferred), `WebFetch` fallback | + +### 0b. Fetch CLEAN text (mandatory — strip suggestion / track-change artifacts) + +| Platform | Primary call | Strikethrough / track-change handling | +|---|---|---| +| Google Doc | `mcp__bf06f3e8-*__download_file_content` with `exportMimeType: "text/plain"` — strips strikethrough at export time AND embeds inline comments as `[a]/[b]/[c]` markers. Avoid `read_file_content` for review (concatenates strikethrough + replacement → phantom typos like `"version 1Phase 1"` when live text is `"Phase 1"`). | Use the export path. | +| Confluence | `mcp__5ebcd1ed-*__getConfluencePage` with `bodyFormat=storage` | Strip `` spans before passing to agents | +| Local file | `Read` | None (assume final) | +| Web (defuddle preferred) | invoke the `defuddle` skill | None (defuddle returns clean rendered text) | +| Web (fallback) | `WebFetch` | None | + +**Defuddle is a soft dependency** of devflow. If `defuddle` skill / CLI is absent, fall back to `WebFetch` and warn the user in the output's Context section: `"Web URL fetched via WebFetch (defuddle not installed — see devflow README for install steps)"`. Never block on missing defuddle. + +### 0c. Auto-pick anchor format per source + +| Source | Anchor format used in output | +|---|---| +| Local file | `:` | +| Hosted (Google Doc / Confluence / arbitrary URL) | `§"

" → ""` (so the user can Ctrl+F to the location in the platform UI) | + +Skill orchestrator sets `ANCHOR_TYPE` once and passes it to all agents so their findings are emitted in the right shape. + +### 0d. Detect `DOC_TYPE` if not explicit + +Apply in order: +1. `--type` hint → use it +2. Filename / URL slug contains `spike` / `rfc` / `runbook` / `prd` / `design` / `kb` → that type +3. Top-of-doc frontmatter `type:` field → use it +4. Audience signal in first ~500 chars: "customers" / "agents" / "admins" / "support team" → `kb`; "engineers" / "architecture" / "service" / "system" → `spike` or `rfc`; step-by-step ops → `runbook`; user-stories / "as a … I want" → `prd` +5. Default → `generic` + +`DOC_TYPE` drives which agents activate in Phase 2. + +## Phase 1 — Context + comments + memory + +### 1a. Linked source-of-truth pull + +- Extract Jira-style task IDs from doc body (`[A-Z][A-Z0-9]+-\d+`). For each hit, fetch via `mcp__5ebcd1ed-*__getJiraIssue` (cap 3 tickets). +- Extract URLs from doc body. Flag any pointing at code repos / merge requests / external vendor docs as cross-check candidates for Phase 2 agents. +- For technical doc types (`spike` / `rfc` / `runbook`): identify referenced code paths in the doc → `Read` them so the `verifier` / `architect` agents can ground their findings. + +### 1b. Hindsight recall + +- Call `mcp__hindsight__recall` with `(doc title + extracted feature keywords)`. Cap top 5 hits. +- Skip silently if Hindsight MCP not configured. + +### 1c. Existing platform comments — MANDATORY cross-check input + +| Platform | Comment fetch | Notes | +|---|---|---| +| Confluence | `mcp__5ebcd1ed-*__getConfluencePageFooterComments` + `getConfluencePageInlineComments` (both paginated fully) | Inline comments include `anchor_text` (the quoted snippet they attach to) → perfect for cross-check | +| Google Doc | `mcp__bf06f3e8-*__download_file_content` with `exportMimeType: "text/plain"` embeds inline comments as `[a]`, `[b]`, `[c]` markers at the anchor position AND lists the comment bodies in `[a]author-text` / `[b]author-text` blocks AT THE END of the exported text. Parse those after the last article paragraph. The anchor for finding `[a]` is the markdown text immediately preceding the marker. | +| Local file | None possible (no platform) — skip silently | +| Arbitrary URL | None possible — skip silently | + +For each fetched comment, extract `(author, body, anchor_text_if_inline, resolved_state if available)`. Store as `EXISTING_COMMENTS`. Phase 3b uses this to mark every finding NEW vs RAISED-*. + +**Google Docs caveat:** the `text/plain` export does NOT carry `resolved_state` for inline `[a]/[b]/[c]` comments. Treat all extracted Google Doc comments as `RAISED-OPEN` by default. Only downgrade to `RAISED-RESOLVED-FIXED` if you can see in the cleaned body that the issue is no longer present (manual heuristic). The Confluence path does carry true `resolved_state`. + +### 1d. Cleaned-text sanity check + +If the Phase 0b clean fetch differs from the raw fetch by >10% in length, warn the user and ask whether to proceed (heavy track-changes can legitimately strip a lot — confirm intent). + +## Phase 2 — Review agents + +Agent set depends on `DOC_TYPE` and `MODE`. See `AGENTS.md` (sibling file) for full definitions, prompts, and activation rules. + +### Quick Mode + +Single in-context pass covering correctness + clarity + a sanity-check of factual claims. Skip subagents entirely. Skip to Phase 4. + +### Thorough Mode + +**Read `AGENTS.md` for the full agent set per doc type.** Dispatch all active agents in ONE message with multiple `Task` tool calls (each with `subagent_type: ` per the AGENTS.md role table — e.g. `subagent_type: critic`, `subagent_type: writer`) so they run in parallel. Each agent receives the context packet: + +- Cleaned doc text (Phase 0b output) +- `DOC_TYPE`, `ANCHOR_TYPE`, audience signal +- Linked tickets + external sources (Phase 1a) +- Hindsight memories (Phase 1b) +- `EXISTING_COMMENTS` (Phase 1c) — for the agent to avoid duplicating where possible +- Relevant code excerpts for technical doc types (Phase 1a tail) + +## Phase 3 — Severity scoring + dedup + cross-check + +### 3a. Confidence scoring +Apply the rubric in `TEMPLATES.md` (`0–100`, drop below 50). Cap any "recurring pattern" finding (e.g. passive voice repeated across the doc) at one entry — not one per occurrence. + +### 3b. Cross-check against `EXISTING_COMMENTS` +Mark every surviving finding as one of: + +- **NEW** — no prior comment overlaps anchor + issue class. Include normally. +- **RAISED-OPEN** — comment covers this anchor + issue class, unresolved. Include with tag `(also raised by @ — open)`. +- **RAISED-RESOLVED-FIXED** — comment marked resolved AND latest doc text confirms fix landed. **Move to 🟢 Strengths section under "Already addressed in review comments". Do NOT re-flag.** +- **RAISED-RESOLVED-NOT-FIXED** — comment marked resolved by author but issue still present in current text. Re-flag with tag `(thread marked resolved but issue still present)`. High signal. + +### 3c. Cross-agent dedup +Same anchor + same issue class flagged by multiple agents → keep highest-confidence version, append `(N/ agents)`. + +## Phase 4 — Output + +**Read `TEMPLATES.md` (sibling) for the structured Markdown output template.** Format mirrors `/devflow:review`: emoji severity (🔴 🟠 🟡 🟢 🔵), bold key terms, ≤3 lines per finding in 🔴/🟠, one-line for 🟡, TL;DR block with severity counts + top-3 fixes + one-line verdict. + +Differences from `/devflow:review`: +- Anchor auto-picks per Phase 0c (`file:line` for local, `§heading + quote` for hosted) — do not emit `file:line` for a Google Doc, it is useless. +- Verdict labels: `✅ APPROVED` / `⚠️ NEEDS FIXES` / `❓ NEEDS DISCUSSION` (no merge-state semantics). +- No 🟠 "external-unmodified" category (no diff-scope concept for prose). +- Code-suggestion fenced blocks are replaced with inline prose rewrites quoted verbatim. + +## Phase 5 — Retain learnings + +If any factual nuance, gotcha, or convention surfaced during review (e.g. `"Meta BSUID docs: phone number only hides after 30 days of no interaction"`), call `mcp__hindsight__retain` to persist. Tag by `(doc-type, topic, source)`. Skip silently if Hindsight unavailable. + +## Phase 6 — Posting comments back to the platform + +**Not in v1.** Output is chat-only by default (per the user's `default-to-draft` rule). If the user later asks "draft these as Confluence inline comments" or similar, that is a separate follow-up — do not auto-post. + +## Rationalizations — STOP + +| Thought | Counter | +|---|---| +| "Skip comment fetch, doc looks new" | New docs often have half-resolved threads. Fetch (Phase 1c). | +| "Skip the strikethrough strip" | Produces phantom typos. Always strip (Phase 0b). | +| "Markdown render is broken" | Out of scope (guardrail #1). Drop. | +| "Critic alone is enough" | Writer catches tone, document-specialist catches factual drift. Run the full set per doc type. | +| "Vendor claim looks standard" | Vendor docs drift. Verify. | + +$ARGUMENTS diff --git a/devflow-plugin/skills/review-document/TEMPLATES.md b/devflow-plugin/skills/review-document/TEMPLATES.md new file mode 100644 index 0000000..94d5f00 --- /dev/null +++ b/devflow-plugin/skills/review-document/TEMPLATES.md @@ -0,0 +1,142 @@ +# Review templates — scoring + output + +Bundle file referenced by `SKILL.md` Phase 3 (scoring + cross-check) and Phase 4 (visual output). + +## Phase 3 — Confidence scoring rubric + +Score every finding 0–100. Drop anything below 50. + +| Score | Meaning | Section in output | +|---|---|---| +| 0–49 | Linter territory / styling / false positive / pre-existing minor drift | Drop | +| 50–74 | Real but minor, low-impact, or recurring-but-already-flagged-once | 🟡 **Suggestion** | +| 75–89 | Likely real, impacts comprehension or correctness for the declared audience | 🟠 **Important** | +| 90–100 | Confirmed real, critical, must-fix before publish (self-contradiction, factual error, missing-from-Meta-docs claim) | 🔴 **Critical** | + +### Recurring-pattern cap +Same issue class at >3 anchors → collapse to one finding with `category: "recurring"` and list affected anchors in `detail`. Do not emit per-instance findings for recurring patterns. + +### Cross-agent dedup +Same anchor + same issue class flagged by multiple agents → keep the highest-confidence version, append `(N/ agents)` to the title. + +### Cross-check status (mandatory on every finding — set in Phase 3b) + +| Status | Meaning | Output treatment | +|---|---|---| +| **NEW** | No prior comment overlaps anchor + issue class | Include in 🔴 / 🟠 / 🟡 section as normal | +| **RAISED-OPEN** | Existing comment covers this; still unresolved | Include in section + line tag `(also raised by @ — open)` | +| **RAISED-RESOLVED-FIXED** | Existing comment covers this AND latest doc text confirms fix | **DO NOT re-flag.** Move to 🟢 Strengths under "Already addressed in review comments" with comment author | +| **RAISED-RESOLVED-NOT-FIXED** | Resolved by author but issue still present in current text | Include in section + tag `(thread marked resolved but issue still present)` — high reviewer-trust value | + +Re-flagging closed comments erodes trust. The cross-check is the single biggest source of review-noise reduction. + +--- + +## Phase 4 — Visual output template + +Compact, emoji-anchored, ADHD-friendly. ~3 lines per finding in 🔴/🟠, one line in 🟡. Bold the **key term** in each finding title (bionic-reading anchor). TL;DR block at the bottom with severity counts + top-3 fixes + one-line verdict. + +### Emoji legend (used at every section header AND every finding bullet) + +| Emoji | Means | +|---|---| +| 🔴 | Critical (must fix before publish) | +| 🟠 | Important (should fix) | +| 🟡 | Suggestion (consider) | +| 🟢 | Strength / already-addressed-in-comments | +| 🔵 | Info / context / TL;DR | + +### Template + +``` +# 🔵 Doc Review: + +**Source:** • **Type:** • **Mode:** • **Length:** +**Verdict:** ✅ APPROVED / ⚠️ NEEDS FIXES / ❓ NEEDS DISCUSSION + +## 🔵 Context used +- 🎯 **Initiative:** +- 📚 **Audience:** +- 🔍 **Verified vs external sources:** +- 💬 **Existing platform comments:** +- 🧠 **Hindsight memory:** +- 🤖 **Agents run:** +- 🧹 **Cleanup:** <"stripped N strikethrough spans" / "raw fetch used"> + +## 🟢 Strengths + +- 🟢 +- 🟢 **Already addressed in review comments:** + resolved> (no re-flag) + +## 🔴 Critical (must fix before publish) + +### 🔴 — **** +**Conf:** /100 • **Agent(s):** • **Status:** NEW / RAISED-OPEN @ +Quote: `""` +**Why:** +**Fix:** +\`\`\` + +\`\`\` + +## 🟠 Important (should fix) + +### 🟠 — **** +**Conf:** /100 • **Status:** +Quote: `""` +**Why:** +**Fix:** + +## 🟡 Suggestions + +- 🟡 — ****: `""` → `""` (why: ) + +--- + +## 🔵 Missing sections (add before publish) + +1. **
** — + +## 🔵 TL;DR + +> 🔴 ** critical** · 🟠 ** important** · 🟡 ** suggestions** · 🟢 ** already-addressed** +> +> **Top 3 to fix:** +> 1. **** — `` +> 2. **** — `` +> 3. **** — `` +> +> **Verdict:** +``` + +### Anchor format rules (auto-picked in Phase 0c) + +- **Local file** → `:` (e.g. `/Users/foo/docs/spike.md:42`) +- **Hosted doc** (Google Doc / Confluence / arbitrary URL) → `§"

" → ""` so the user can Ctrl+F to the location in the platform UI + +### Rules for visual output + +- **Skip empty sections.** If no critical findings, drop the 🔴 Critical heading entirely. Do NOT print "None". +- **Bold the key term** in each finding title (the noun phrase that names the issue class — "self-contradiction", "missing 30-day rule", "phantom UI string"). Bionic-reading anchor. +- **3 lines max per finding** in 🔴/🟠. One line for 🟡. +- **Cross-check status is mandatory** on every finding (NEW / RAISED-OPEN / RAISED-RESOLVED-FIXED / RAISED-RESOLVED-NOT-FIXED). RAISED-RESOLVED-FIXED items go to 🟢, never appear in 🔴/🟠/🟡. +- **Anchor format must match `ANCHOR_TYPE` set in Phase 0c.** Never emit `file:line` for a Google Doc / Confluence page. +- **TL;DR mandatory.** Severity counts + top-3 + one-line verdict. +- **Quick mode** drops the per-finding `Conf:` line and `Fix:` fenced blocks; consolidates into one paragraph per severity. + +### Rules for the prose-rewrite Fix block + +For prose findings, the `Fix:` block contains the rewritten sentence verbatim (NOT a code-suggestion `suggestion` fence — that is for diff inline-comments which we do not emit). The user copies the rewrite into the source platform manually. + +Example: +``` +**Fix:** +\`\`\` +No. Aircall does not currently support WhatsApp voice calls. Separately, traditional outbound phone calls require a phone number, which is unavailable for username-only contacts. +\`\`\` +``` + +For one-line suggestions, inline the rewrite directly: +``` +**Fix:** Drop the duplicate "Aircall" — change `"in the Aircallin Aircall workspace"` to `"in Aircall Workspace"`. +``` diff --git a/install.sh b/install.sh index 4dde5a5..bc7cdd1 100755 --- a/install.sh +++ b/install.sh @@ -55,3 +55,12 @@ echo "" echo " devflow init # Initialize for the current directory" echo " devflow init ~/myapp # Initialize for a specific project" echo "" + +# --- Optional soft-dep check: defuddle ---------------------------------------- +if ! command -v defuddle >/dev/null 2>&1; then + warn "Optional: defuddle CLI not found." + echo " /devflow:review-document uses defuddle for clean web-page fetches." + echo " Without it, web URLs fall back to WebFetch (still works, just noisier)." + echo " Install with: npm install -g defuddle" + echo "" +fi diff --git a/skills/registry.json b/skills/registry.json index de13608..a70bb9c 100644 --- a/skills/registry.json +++ b/skills/registry.json @@ -40,6 +40,12 @@ "category": "code-review", "layer": 3 }, + { + "name": "review-document", + "description": "Multi-perspective prose document review (KB / RFC / spike / runbook / PRD / design / KB page) on Google Docs, Confluence, local files, or arbitrary URLs — parallel agents, comment cross-check, severity scoring, chat-only findings. Counterpart to write-spike.", + "category": "code-review", + "layer": 3 + }, { "name": "new-feature", "description": "Start a new feature with worktree, memory recall, and workspace setup", diff --git a/skills/review-document/AGENTS.md b/skills/review-document/AGENTS.md new file mode 100644 index 0000000..5c21751 --- /dev/null +++ b/skills/review-document/AGENTS.md @@ -0,0 +1,96 @@ +# Review agents — definitions and activation rules + +Loaded by `SKILL.md` Phase 2 in Thorough mode. Each agent receives the context packet (cleaned doc text + `DOC_TYPE` + `ANCHOR_TYPE` + linked tickets + external sources + Hindsight memories + `EXISTING_COMMENTS` + relevant code excerpts for technical doc types). + +In Thorough mode, dispatch all active agents **in a single message with multiple `Agent` tool uses** so they run in parallel. Each agent returns findings as structured JSON (see "Finding format" below). + +## Subagent selection — prefer OMC specialists when available + +If `oh-my-claudecode` is installed (detect via `omc --version` succeeding OR the presence of `/oh-my-claudecode:*` slash commands), use the specialized OMC agent for each role. Generic-purpose fallback otherwise. + +| Role | OMC specialist | Generic fallback | +|---|---|---| +| Correctness + structure + completeness | `critic` (Opus) | `general-purpose` | +| Prose / tone / clarity / audience-fit | `writer` (Haiku) | `general-purpose` | +| External claim verification (vendor / API / regulatory docs) | `document-specialist` | `general-purpose` | +| Code / runtime claim verification | `verifier` | `general-purpose` | +| Architecture / system-layering claim cross-check | `architect` (Opus, read-only) | `general-purpose` | +| Security / PII / auth claim audit | `security-reviewer` | `general-purpose` | +| Visual / UI screenshot validation | `visual-verdict` skill (invoked SEQUENTIALLY after the parallel agent batch returns — it is a skill, not a subagent dispatchable via `Task`) | n/a | + +Fallback is a real fallback — never block the review on OMC absence. Findings from `general-purpose` are still high quality, just less role-specialized. + +## Always-on agents per `DOC_TYPE` + +| `DOC_TYPE` | Always-on (3 agents) | +|---|---| +| `kb` (customer-facing knowledge base) | `critic` + `writer` + `document-specialist` | +| `spike` / `rfc` | `critic` + `verifier` + `architect` | +| `prd` | `critic` + `writer` + `verifier` | +| `runbook` | `critic` + `verifier` + `security-reviewer` | +| `design` (UI prose) | `critic` + `writer` (always-on subagents); also run `visual-verdict` skill sequentially after the batch returns if screenshots are present | +| `generic` | `critic` + `writer` + `document-specialist` | + +## Conditional agents (activate when signal triggers) + +| Agent | Activate when | +|---|---| +| `security-reviewer` | Doc mentions auth / tokens / PII / SSO / encryption / GDPR / customer data export / DSAR | +| `architect` | Doc claims behaviour of ≥2 services OR includes a system-layering diagram | +| `visual-verdict` skill (sequential, not a parallel subagent) | Doc references screenshots, UI mocks, or embedded images | +| `verifier` | Doc claims runtime behaviour of code that exists in the current workspace | +| `document-specialist` | Doc cites external vendor / API / regulatory sources (URLs in body) | + +Conditional agents are dispatched in parallel with the always-on set. + +## Agent prompts per role + +Each agent receives the context packet plus a role-specific prompt. Adapt these per invocation — they are starting points, not rigid templates. + +### Critic — correctness + structure + completeness +Review for: factual contradictions inside the doc, missing sections a reader would reasonably ask, structural flow, ambiguity, audience-fit. Output severity-tagged findings with anchor + verbatim quote + one-sentence why + concrete fix. **Do NOT flag styling.** Watch for: terminology drift (same concept named two ways), self-contradicting sentences, sections that contradict each other, missing edge cases. + +### Writer — prose / tone / clarity +Review for: tone consistency across the doc, reading level vs declared audience, jargon density, passive voice that obscures who-does-what, awkward sentences, unclear CTAs. Output 5–10 specific `quote → rewrite` pairs. **Do NOT flag substance** — that is the critic's job. Limit findings to where rewriting the sentence concretely improves comprehension. + +### Document-specialist — external claim verification +Verify factual claims about external systems (vendor APIs, regulations, third-party SDKs) against authoritative upstream sources. Use the `defuddle` skill for clean web fetches. Time-box: ≤8 fetches, ≤15 minutes total. **Authoritative sources only** — vendor's own docs (developers.facebook.com, business.whatsapp.com, twilio.com/docs, etc.). Secondary BSP / SDK provider docs (Infobip, Vonage, etc.) acceptable as confirmation, never primary. Reject random blogs / Stack Overflow / tutorials as evidence. Output a verification table: `Claim | Doc quote | Source URL | Verdict (confirmed / partial / not-found / contradicted) | Notes`. End with `"claims to verify in-person with PM"` — flag claims that need internal-team confirmation (e.g. "is this an Aircall product decision or a Meta restriction?"). + +### Verifier — code / runtime claim cross-check +For each doc claim about how the code behaves: `Grep` the workspace, `Read` the relevant file, confirm or contradict. Output: `Claim | file:line | Verdict | Quote of actual code`. + +### Architect — architecture / system-layering claim cross-check +Read the doc's "how it works" / data-flow / sequence-diagram sections. Cross-check against actual service code in the workspace. Surface mismatches between what the doc describes and what the code does. Read-only — do not propose code changes, only flag doc/code drift. + +### Security-reviewer — PII / auth / secrets / GDPR +Audit the doc for: secrets in examples (API keys, tokens, real customer phone numbers), customer-data exposure in code samples, missing GDPR / data-residency notes, OWASP-relevant claims (auth flow descriptions that contradict reality), DSAR-handling claims. Output severity-tagged findings. + +## Finding format (JSON returned by every agent) + +```json +[{ + "agent": "critic", + "severity": "critical|important|suggestion", + "confidence": 85, + "category": "factual|consistency|completeness|clarity|tone|security|verification|missing-section", + "anchor_type": "file_line|section_quote", + "file": "/abs/path/to/doc.md", + "line": 42, + "section": "What is X?", + "quote": "verbatim text snippet from the doc", + "title": "Short description (the bionic-reading key term)", + "detail": "Full explanation grounded in the cleaned doc text and any verified source", + "suggestion": "Concrete fix — inline rewrite verbatim, or one-sentence instruction", + "external_source_url": "https://... (REQUIRED for document-specialist findings; null otherwise)" +}] +``` + +**`anchor_type` semantics:** +- `file_line` — local file. Populate `file` + `line`, omit `section` (or include for context). +- `section_quote` — hosted doc. Populate `section` + `quote` (verbatim snippet user can Ctrl+F). Omit `file` + `line`. + +The skill orchestrator sets the global `ANCHOR_TYPE` in Phase 0c and passes it to each agent. Agents emit findings in the matching shape; the orchestrator validates before output. + +## Recurring-pattern handling + +If an agent finds the SAME issue class at >3 anchors (e.g. passive voice across 6 paragraphs, vendor-name-capitalization drift across 5 sections), it MUST collapse to one finding with `category: "recurring"` and list the affected anchors in `detail`. Per-instance flagging produces noise. diff --git a/skills/review-document/SKILL.md b/skills/review-document/SKILL.md new file mode 100644 index 0000000..e89ac7b --- /dev/null +++ b/skills/review-document/SKILL.md @@ -0,0 +1,166 @@ +--- +name: review-document +description: Use when reviewing a prose document — KB article, RFC, spike, runbook, PRD, design doc, knowledge-base page — hosted on Google Docs, Confluence, a local file path, or an arbitrary URL. Checks correctness, internal consistency, audience-fit, prose clarity, and external-claim verification; cross-checks against existing platform comments to avoid re-flagging; returns severity-tagged findings with anchor + quote + concrete fix. Use when asked to "review this doc / KB / RFC / spike / runbook / PRD" and the target is prose, not a code diff. Counterpart to /devflow:write-spike. NOT for code diffs — use /devflow:review for those. +--- + +# /devflow:review-document — Multi-perspective prose document review + +You are a thorough, multi-perspective document reviewer. Counterpart to `/devflow:write-spike`. Reviews prose docs on any platform with deep context gathering and parallel review agents. Sibling to `/devflow:review` (which reviews code diffs). + +## Scope guardrails (read before doing anything) + +1. **NEVER flag pure styling issues.** Broken markdown table rows, header-level inconsistencies, missing alignment chars, font weight, colour, whitespace — none. Only substance: factual correctness, internal consistency, completeness, prose clarity, audience-fit, external-claim verification. Renderer quirks belong in a linter, not a doc review. +2. **NEVER re-flag findings already raised in existing platform comments.** Phase 1c mirrors `/devflow:review` Phase 1e — read all comments first, cross-check every finding. Re-flagging a colleague's existing comment erodes trust. +3. **NEVER trust the raw fetched text without handling source-format quirks** (Google Docs suggestion mode, Confluence change-tracking spans, Word `.docx` track-changes all leak phantom text). Always fetch the clean post-suggestion view — see Phase 0b for the authoritative recipe. + +## Input + +`$ARGUMENTS` may contain any combination of: + +- **Google Doc URL or ID** — `https://docs.google.com/document/d//...` or just the 44-char `` +- **Confluence page URL or ID** — `https://.atlassian.net/wiki/spaces/.../pages/` or the bare ID +- **Local file path** — `.md` / `.txt` / `.adoc` / `.rst` (absolute or `~`-relative) +- **Arbitrary URL** — blog, Medium, Substack, Notion-public, etc. Fetched via the `defuddle` skill (preferred) or `WebFetch` fallback +- **Nothing** — fall back to the most-recently-edited prose file in the current working directory +- **Optional doc-type hint** — `--type kb|spike|rfc|runbook|prd|design|generic` (auto-detected via Phase 0d if omitted) +- **Optional `quick` keyword** — single-pass review, skip subagents + +Parse to determine `SOURCE_TYPE`, `URL_OR_PATH`, `DOC_TYPE`, `MODE` (`quick` | `thorough`, default `thorough`). + +## Phase 0 — Source detection + clean fetch + anchor selection + +### 0a. Identify source platform + +| Input shape | Platform | Fetch tool | +|---|---|---| +| `docs.google.com/document/d/` or a bare 44-char base64-ish ID | Google Doc | Drive MCP (`mcp__bf06f3e8-*`) | +| `.atlassian.net/wiki/.../pages/` or bare numeric ID | Confluence | Atlassian MCP (`mcp__5ebcd1ed-*`) | +| Path starting `/`, `.`, `~` | Local file | `Read` | +| Any other `http(s)://` URL | Web | `defuddle` skill (preferred), `WebFetch` fallback | + +### 0b. Fetch CLEAN text (mandatory — strip suggestion / track-change artifacts) + +| Platform | Primary call | Strikethrough / track-change handling | +|---|---|---| +| Google Doc | `mcp__bf06f3e8-*__download_file_content` with `exportMimeType: "text/plain"` — strips strikethrough at export time AND embeds inline comments as `[a]/[b]/[c]` markers. Avoid `read_file_content` for review (concatenates strikethrough + replacement → phantom typos like `"version 1Phase 1"` when live text is `"Phase 1"`). | Use the export path. | +| Confluence | `mcp__5ebcd1ed-*__getConfluencePage` with `bodyFormat=storage` | Strip `` spans before passing to agents | +| Local file | `Read` | None (assume final) | +| Web (defuddle preferred) | invoke the `defuddle` skill | None (defuddle returns clean rendered text) | +| Web (fallback) | `WebFetch` | None | + +**Defuddle is a soft dependency** of devflow. If `defuddle` skill / CLI is absent, fall back to `WebFetch` and warn the user in the output's Context section: `"Web URL fetched via WebFetch (defuddle not installed — see devflow README for install steps)"`. Never block on missing defuddle. + +### 0c. Auto-pick anchor format per source + +| Source | Anchor format used in output | +|---|---| +| Local file | `:` | +| Hosted (Google Doc / Confluence / arbitrary URL) | `§"

" → ""` (so the user can Ctrl+F to the location in the platform UI) | + +Skill orchestrator sets `ANCHOR_TYPE` once and passes it to all agents so their findings are emitted in the right shape. + +### 0d. Detect `DOC_TYPE` if not explicit + +Apply in order: +1. `--type` hint → use it +2. Filename / URL slug contains `spike` / `rfc` / `runbook` / `prd` / `design` / `kb` → that type +3. Top-of-doc frontmatter `type:` field → use it +4. Audience signal in first ~500 chars: "customers" / "agents" / "admins" / "support team" → `kb`; "engineers" / "architecture" / "service" / "system" → `spike` or `rfc`; step-by-step ops → `runbook`; user-stories / "as a … I want" → `prd` +5. Default → `generic` + +`DOC_TYPE` drives which agents activate in Phase 2. + +## Phase 1 — Context + comments + memory + +### 1a. Linked source-of-truth pull + +- Extract Jira-style task IDs from doc body (`[A-Z][A-Z0-9]+-\d+`). For each hit, fetch via `mcp__5ebcd1ed-*__getJiraIssue` (cap 3 tickets). +- Extract URLs from doc body. Flag any pointing at code repos / merge requests / external vendor docs as cross-check candidates for Phase 2 agents. +- For technical doc types (`spike` / `rfc` / `runbook`): identify referenced code paths in the doc → `Read` them so the `verifier` / `architect` agents can ground their findings. + +### 1b. Hindsight recall + +- Call `mcp__hindsight__recall` with `(doc title + extracted feature keywords)`. Cap top 5 hits. +- Skip silently if Hindsight MCP not configured. + +### 1c. Existing platform comments — MANDATORY cross-check input + +| Platform | Comment fetch | Notes | +|---|---|---| +| Confluence | `mcp__5ebcd1ed-*__getConfluencePageFooterComments` + `getConfluencePageInlineComments` (both paginated fully) | Inline comments include `anchor_text` (the quoted snippet they attach to) → perfect for cross-check | +| Google Doc | `mcp__bf06f3e8-*__download_file_content` with `exportMimeType: "text/plain"` embeds inline comments as `[a]`, `[b]`, `[c]` markers at the anchor position AND lists the comment bodies in `[a]author-text` / `[b]author-text` blocks AT THE END of the exported text. Parse those after the last article paragraph. The anchor for finding `[a]` is the markdown text immediately preceding the marker. | +| Local file | None possible (no platform) — skip silently | +| Arbitrary URL | None possible — skip silently | + +For each fetched comment, extract `(author, body, anchor_text_if_inline, resolved_state if available)`. Store as `EXISTING_COMMENTS`. Phase 3b uses this to mark every finding NEW vs RAISED-*. + +**Google Docs caveat:** the `text/plain` export does NOT carry `resolved_state` for inline `[a]/[b]/[c]` comments. Treat all extracted Google Doc comments as `RAISED-OPEN` by default. Only downgrade to `RAISED-RESOLVED-FIXED` if you can see in the cleaned body that the issue is no longer present (manual heuristic). The Confluence path does carry true `resolved_state`. + +### 1d. Cleaned-text sanity check + +If the Phase 0b clean fetch differs from the raw fetch by >10% in length, warn the user and ask whether to proceed (heavy track-changes can legitimately strip a lot — confirm intent). + +## Phase 2 — Review agents + +Agent set depends on `DOC_TYPE` and `MODE`. See `AGENTS.md` (sibling file) for full definitions, prompts, and activation rules. + +### Quick Mode + +Single in-context pass covering correctness + clarity + a sanity-check of factual claims. Skip subagents entirely. Skip to Phase 4. + +### Thorough Mode + +**Read `AGENTS.md` for the full agent set per doc type.** Dispatch all active agents in ONE message with multiple `Task` tool calls (each with `subagent_type: ` per the AGENTS.md role table — e.g. `subagent_type: critic`, `subagent_type: writer`) so they run in parallel. Each agent receives the context packet: + +- Cleaned doc text (Phase 0b output) +- `DOC_TYPE`, `ANCHOR_TYPE`, audience signal +- Linked tickets + external sources (Phase 1a) +- Hindsight memories (Phase 1b) +- `EXISTING_COMMENTS` (Phase 1c) — for the agent to avoid duplicating where possible +- Relevant code excerpts for technical doc types (Phase 1a tail) + +## Phase 3 — Severity scoring + dedup + cross-check + +### 3a. Confidence scoring +Apply the rubric in `TEMPLATES.md` (`0–100`, drop below 50). Cap any "recurring pattern" finding (e.g. passive voice repeated across the doc) at one entry — not one per occurrence. + +### 3b. Cross-check against `EXISTING_COMMENTS` +Mark every surviving finding as one of: + +- **NEW** — no prior comment overlaps anchor + issue class. Include normally. +- **RAISED-OPEN** — comment covers this anchor + issue class, unresolved. Include with tag `(also raised by @ — open)`. +- **RAISED-RESOLVED-FIXED** — comment marked resolved AND latest doc text confirms fix landed. **Move to 🟢 Strengths section under "Already addressed in review comments". Do NOT re-flag.** +- **RAISED-RESOLVED-NOT-FIXED** — comment marked resolved by author but issue still present in current text. Re-flag with tag `(thread marked resolved but issue still present)`. High signal. + +### 3c. Cross-agent dedup +Same anchor + same issue class flagged by multiple agents → keep highest-confidence version, append `(N/ agents)`. + +## Phase 4 — Output + +**Read `TEMPLATES.md` (sibling) for the structured Markdown output template.** Format mirrors `/devflow:review`: emoji severity (🔴 🟠 🟡 🟢 🔵), bold key terms, ≤3 lines per finding in 🔴/🟠, one-line for 🟡, TL;DR block with severity counts + top-3 fixes + one-line verdict. + +Differences from `/devflow:review`: +- Anchor auto-picks per Phase 0c (`file:line` for local, `§heading + quote` for hosted) — do not emit `file:line` for a Google Doc, it is useless. +- Verdict labels: `✅ APPROVED` / `⚠️ NEEDS FIXES` / `❓ NEEDS DISCUSSION` (no merge-state semantics). +- No 🟠 "external-unmodified" category (no diff-scope concept for prose). +- Code-suggestion fenced blocks are replaced with inline prose rewrites quoted verbatim. + +## Phase 5 — Retain learnings + +If any factual nuance, gotcha, or convention surfaced during review (e.g. `"Meta BSUID docs: phone number only hides after 30 days of no interaction"`), call `mcp__hindsight__retain` to persist. Tag by `(doc-type, topic, source)`. Skip silently if Hindsight unavailable. + +## Phase 6 — Posting comments back to the platform + +**Not in v1.** Output is chat-only by default (per the user's `default-to-draft` rule). If the user later asks "draft these as Confluence inline comments" or similar, that is a separate follow-up — do not auto-post. + +## Rationalizations — STOP + +| Thought | Counter | +|---|---| +| "Skip comment fetch, doc looks new" | New docs often have half-resolved threads. Fetch (Phase 1c). | +| "Skip the strikethrough strip" | Produces phantom typos. Always strip (Phase 0b). | +| "Markdown render is broken" | Out of scope (guardrail #1). Drop. | +| "Critic alone is enough" | Writer catches tone, document-specialist catches factual drift. Run the full set per doc type. | +| "Vendor claim looks standard" | Vendor docs drift. Verify. | + +$ARGUMENTS diff --git a/skills/review-document/TEMPLATES.md b/skills/review-document/TEMPLATES.md new file mode 100644 index 0000000..94d5f00 --- /dev/null +++ b/skills/review-document/TEMPLATES.md @@ -0,0 +1,142 @@ +# Review templates — scoring + output + +Bundle file referenced by `SKILL.md` Phase 3 (scoring + cross-check) and Phase 4 (visual output). + +## Phase 3 — Confidence scoring rubric + +Score every finding 0–100. Drop anything below 50. + +| Score | Meaning | Section in output | +|---|---|---| +| 0–49 | Linter territory / styling / false positive / pre-existing minor drift | Drop | +| 50–74 | Real but minor, low-impact, or recurring-but-already-flagged-once | 🟡 **Suggestion** | +| 75–89 | Likely real, impacts comprehension or correctness for the declared audience | 🟠 **Important** | +| 90–100 | Confirmed real, critical, must-fix before publish (self-contradiction, factual error, missing-from-Meta-docs claim) | 🔴 **Critical** | + +### Recurring-pattern cap +Same issue class at >3 anchors → collapse to one finding with `category: "recurring"` and list affected anchors in `detail`. Do not emit per-instance findings for recurring patterns. + +### Cross-agent dedup +Same anchor + same issue class flagged by multiple agents → keep the highest-confidence version, append `(N/ agents)` to the title. + +### Cross-check status (mandatory on every finding — set in Phase 3b) + +| Status | Meaning | Output treatment | +|---|---|---| +| **NEW** | No prior comment overlaps anchor + issue class | Include in 🔴 / 🟠 / 🟡 section as normal | +| **RAISED-OPEN** | Existing comment covers this; still unresolved | Include in section + line tag `(also raised by @ — open)` | +| **RAISED-RESOLVED-FIXED** | Existing comment covers this AND latest doc text confirms fix | **DO NOT re-flag.** Move to 🟢 Strengths under "Already addressed in review comments" with comment author | +| **RAISED-RESOLVED-NOT-FIXED** | Resolved by author but issue still present in current text | Include in section + tag `(thread marked resolved but issue still present)` — high reviewer-trust value | + +Re-flagging closed comments erodes trust. The cross-check is the single biggest source of review-noise reduction. + +--- + +## Phase 4 — Visual output template + +Compact, emoji-anchored, ADHD-friendly. ~3 lines per finding in 🔴/🟠, one line in 🟡. Bold the **key term** in each finding title (bionic-reading anchor). TL;DR block at the bottom with severity counts + top-3 fixes + one-line verdict. + +### Emoji legend (used at every section header AND every finding bullet) + +| Emoji | Means | +|---|---| +| 🔴 | Critical (must fix before publish) | +| 🟠 | Important (should fix) | +| 🟡 | Suggestion (consider) | +| 🟢 | Strength / already-addressed-in-comments | +| 🔵 | Info / context / TL;DR | + +### Template + +``` +# 🔵 Doc Review: + +**Source:** • **Type:** • **Mode:** • **Length:** +**Verdict:** ✅ APPROVED / ⚠️ NEEDS FIXES / ❓ NEEDS DISCUSSION + +## 🔵 Context used +- 🎯 **Initiative:** +- 📚 **Audience:** +- 🔍 **Verified vs external sources:** +- 💬 **Existing platform comments:** +- 🧠 **Hindsight memory:** +- 🤖 **Agents run:** +- 🧹 **Cleanup:** <"stripped N strikethrough spans" / "raw fetch used"> + +## 🟢 Strengths + +- 🟢 +- 🟢 **Already addressed in review comments:** + resolved> (no re-flag) + +## 🔴 Critical (must fix before publish) + +### 🔴 — **** +**Conf:** /100 • **Agent(s):** • **Status:** NEW / RAISED-OPEN @ +Quote: `""` +**Why:** +**Fix:** +\`\`\` + +\`\`\` + +## 🟠 Important (should fix) + +### 🟠 — **** +**Conf:** /100 • **Status:** +Quote: `""` +**Why:** +**Fix:** + +## 🟡 Suggestions + +- 🟡 — ****: `""` → `""` (why: ) + +--- + +## 🔵 Missing sections (add before publish) + +1. **
** — + +## 🔵 TL;DR + +> 🔴 ** critical** · 🟠 ** important** · 🟡 ** suggestions** · 🟢 ** already-addressed** +> +> **Top 3 to fix:** +> 1. **** — `` +> 2. **** — `` +> 3. **** — `` +> +> **Verdict:** +``` + +### Anchor format rules (auto-picked in Phase 0c) + +- **Local file** → `:` (e.g. `/Users/foo/docs/spike.md:42`) +- **Hosted doc** (Google Doc / Confluence / arbitrary URL) → `§"

" → ""` so the user can Ctrl+F to the location in the platform UI + +### Rules for visual output + +- **Skip empty sections.** If no critical findings, drop the 🔴 Critical heading entirely. Do NOT print "None". +- **Bold the key term** in each finding title (the noun phrase that names the issue class — "self-contradiction", "missing 30-day rule", "phantom UI string"). Bionic-reading anchor. +- **3 lines max per finding** in 🔴/🟠. One line for 🟡. +- **Cross-check status is mandatory** on every finding (NEW / RAISED-OPEN / RAISED-RESOLVED-FIXED / RAISED-RESOLVED-NOT-FIXED). RAISED-RESOLVED-FIXED items go to 🟢, never appear in 🔴/🟠/🟡. +- **Anchor format must match `ANCHOR_TYPE` set in Phase 0c.** Never emit `file:line` for a Google Doc / Confluence page. +- **TL;DR mandatory.** Severity counts + top-3 + one-line verdict. +- **Quick mode** drops the per-finding `Conf:` line and `Fix:` fenced blocks; consolidates into one paragraph per severity. + +### Rules for the prose-rewrite Fix block + +For prose findings, the `Fix:` block contains the rewritten sentence verbatim (NOT a code-suggestion `suggestion` fence — that is for diff inline-comments which we do not emit). The user copies the rewrite into the source platform manually. + +Example: +``` +**Fix:** +\`\`\` +No. Aircall does not currently support WhatsApp voice calls. Separately, traditional outbound phone calls require a phone number, which is unavailable for username-only contacts. +\`\`\` +``` + +For one-line suggestions, inline the rewrite directly: +``` +**Fix:** Drop the duplicate "Aircall" — change `"in the Aircallin Aircall workspace"` to `"in Aircall Workspace"`. +``` diff --git a/visualizations/architecture/code-review-architecture.md b/visualizations/architecture/code-review-architecture.md index a1ab5e3..f10becb 100644 --- a/visualizations/architecture/code-review-architecture.md +++ b/visualizations/architecture/code-review-architecture.md @@ -212,6 +212,107 @@ flowchart TD --- +## 5. /devflow:review-document — Prose Doc Review + +`/devflow:review-document` is the prose-doc counterpart to `devflow review`. Where `devflow review` reviews **code diffs**, `/devflow:review-document` reviews **prose docs** (KB articles, RFCs, spikes, runbooks, PRDs, design docs). Different inputs, different agent set, different output anchor model. + +```mermaid +%%{init: {'flowchart': {'rankSpacing': 50, 'nodeSpacing': 30, 'diagramPadding': 15}}}%% +flowchart TD + START["/devflow:review-document <source>"] + + subgraph P0 [" Phase 0 — Source detection + clean fetch "] + DETECT{"Source type?"} + GD["Google Doc
Drive MCP
strip strikethrough"] + CF["Confluence page
Atlassian MCP
strip change-tracking"] + LF["Local file
Read tool"] + URL["Arbitrary URL
defuddle (preferred)
WebFetch (fallback)"] + ANCHOR["Auto-pick anchor:
file:line for local
§heading+quote for hosted"] + DETECT -->|"docs.google.com"| GD + DETECT -->|"atlassian.net/wiki"| CF + DETECT -->|"path"| LF + DETECT -->|"https://"| URL + GD --> ANCHOR + CF --> ANCHOR + LF --> ANCHOR + URL --> ANCHOR + end + + subgraph P1 [" Phase 1 — Context + comments "] + TICKETS["Extract Jira IDs
+ fetch via Atlassian MCP"] + MEM["Hindsight recall
(doc title + keywords)"] + COMMENTS["Fetch existing platform comments
(Confluence inline/footer,
Google Doc = manual)"] + SANITY["Cleaned-text sanity check
(warn if >10% stripped)"] + end + + subgraph P2 [" Phase 2 — Parallel agents (per doc type) "] + CRITIC["critic (Opus)
correctness + structure"] + WRITER["writer (Haiku)
prose / tone / clarity"] + DOCSPEC["document-specialist
external claim verification"] + ARCH["architect (opt)
system-claim cross-check"] + VER["verifier (opt)
code-claim cross-check"] + SEC["security-reviewer (opt)
PII / auth / GDPR"] + end + + subgraph P3 [" Phase 3 — Scoring + cross-check "] + SCORE["Confidence 0-100
drop <50
collapse recurring"] + DEDUP["Cross-agent dedup"] + XCHECK["Cross-check vs existing comments:
NEW / RAISED-OPEN /
RAISED-RESOLVED-FIXED / -NOT-FIXED"] + end + + OUTPUT["🔵 Chat-only output
(no platform posting)
emoji severity + anchor + quote + fix + TL;DR"] + + START --> DETECT + ANCHOR --> TICKETS + TICKETS --> MEM --> COMMENTS --> SANITY + SANITY --> CRITIC + SANITY --> WRITER + SANITY --> DOCSPEC + SANITY -.->|"if signal"| ARCH + SANITY -.->|"if signal"| VER + SANITY -.->|"if signal"| SEC + CRITIC --> SCORE + WRITER --> SCORE + DOCSPEC --> SCORE + ARCH -.-> SCORE + VER -.-> SCORE + SEC -.-> SCORE + SCORE --> DEDUP --> XCHECK --> OUTPUT + + classDef reviewStyle fill:#d97706,color:#fff,stroke:#b45309 + classDef cliStyle fill:#374151,color:#fff,stroke:#1f2937 + classDef inputStyle fill:#6b7280,color:#fff,stroke:#4b5563 + classDef agentStyle fill:#7c3aed,color:#fff,stroke:#5b21b6 + classDef condStyle fill:#9ca3af,color:#fff,stroke:#6b7280 + + class START cliStyle + class DETECT,ANCHOR reviewStyle + class GD,CF,LF,URL inputStyle + class TICKETS,MEM,COMMENTS,SANITY inputStyle + class CRITIC,WRITER,DOCSPEC agentStyle + class ARCH,VER,SEC condStyle + class SCORE,DEDUP,XCHECK reviewStyle + class OUTPUT reviewStyle +``` + +### Key differences vs `devflow review` + +| Dimension | `devflow review` (code) | `/devflow:review-document` (prose) | +|---|---|---| +| Input | git diff or PR/MR URL | Google Doc / Confluence / local file / arbitrary URL | +| Fetch | `gh pr diff` / `glab mr diff` / `git diff` | Drive MCP / Atlassian MCP / Read / defuddle | +| Agent set | code-focused (bug scanner, convention, test coverage…) | prose-focused (critic, writer, document-specialist…) | +| Anchor | `file:line` always | auto-pick: `file:line` for local, `§heading+quote` for hosted | +| Existing comments cross-check | yes (GitHub reviews / GitLab discussions, paginated) | yes (Confluence footer + inline; Google Doc unsupported by Drive MCP — manual fallback) | +| Output | code-suggestion fences inline | prose rewrites verbatim | +| Inline comment posting | optional draft pending (GitHub PENDING / GitLab draft_notes) | chat-only, no platform posting in v1 | + +### Soft dependency — defuddle + +For arbitrary-URL inputs, the skill prefers the upstream **defuddle** CLI (`github.com/kepano/defuddle`, `npm install -g defuddle`) for clean rendered text. When defuddle is missing, it falls back to `WebFetch` and surfaces that downgrade in the output's Context section. defuddle is intentionally NOT vendored into devflow — it tracks upstream releases via `npm`. + +--- + ## Notes - **Why multi-CLI?** Continue.dev (`cn check`) was a single point of failure and required a separate npm dependency. The new dispatch layer uses whichever AI CLI is already installed. @@ -219,3 +320,4 @@ flowchart TD - **Structured output** via `--json-schema` is optional — when provided, Claude Code returns machine-parseable results for CI integration. - **Fallback order** is deterministic: env override > claude > opencode. No interactive prompts during dispatch. - **`devflow review`** is distinct from `devflow check` — it's a lighter-weight review that doesn't use `.devflow/checks/` rules. It supports two modes: local diff (against CLAUDE.md conventions) and remote PR/MR review by URL (GitHub `gh` or GitLab `glab`). +- **`/devflow:review-document`** is the sibling for prose docs (KB / RFC / spike / runbook / PRD / design). It uses different agents and a different anchor format because the inputs and audiences are different. See section 5 above. diff --git a/visualizations/architecture/devflow-ecosystem.md b/visualizations/architecture/devflow-ecosystem.md index 5f87a16..d5d8f47 100644 --- a/visualizations/architecture/devflow-ecosystem.md +++ b/visualizations/architecture/devflow-ecosystem.md @@ -263,6 +263,7 @@ graph TD | `devflow check` | Run code review against .devflow/checks/ (Claude Code primary, OpenCode fallback) | L4 | | `devflow review` | Review local diff against CLAUDE.md conventions via Claude Code | L4, L5 | | `devflow review ` | Fetch PR/MR diff (gh/glab) and review via Claude Code | L4 | +| `/devflow:review-document ` | Multi-perspective prose-doc review (KB/RFC/spike/runbook/PRD) on Google Docs, Confluence, local files, or URLs. Soft dep: defuddle CLI for clean web fetches. | L4 | | `devflow web` | Open agent-deck web dashboard (:8420) | L2 | | `devflow conductor` | Manage conductors (start, stop, status) | L2 | | `devflow skills list` | List all 10 skills from registry with install status | L5 |