Skip to content

Test engineer plugin#150

Closed
nthompson-bitwarden wants to merge 13 commits into
mainfrom
test-engineer-plugin
Closed

Test engineer plugin#150
nthompson-bitwarden wants to merge 13 commits into
mainfrom
test-engineer-plugin

Conversation

@nthompson-bitwarden

Copy link
Copy Markdown

🎟️ Tracking

https://bitwarden.atlassian.net/browse/QA-1983

📔 Objective

This branch introduces a new bitwarden-test-engineer plugin to the marketplace -a test-engineering toolkit whose first role is the test-strategist agent.

Given a change (feature, bugfix, refactor, or migration), the agent recommends what to test, at which layer, and why, shaped to each repo's actual test practice. It does test planning only - it does not author, run, or maintain tests.

  • Registration - new entry in .claude-plugin/marketplace.json, plugin.json, CHANGELOG.md, README, and cspell additions.
  • One agent (test-strategist) - classifies inputs (Jira ticket via Atlassian MCP, GitHub PR via gh, test-case CSV, or plain description), fans out subagents to gather evidence, then runs two skills in sequence.
  • Two skills:
    • assessing-test-coverage - backward-looking inventory of what's already tested, buckets tests by layer, cites each as a GitHub permalink, flags gaps, emits an HTML coverage report.
    • analyzing-test-stack - maps each behavior to the cheapest sufficient test layer (unit/integration/E2E) per repo's shape, names concrete tooling, surfaces shape-wrong tests (ice-cream-cone, over-testing), emits an HTML report.
  • Supporting references/scripts - report templates, CSS/style tokens, monorepo-layout and test-layer guidance, severity-risk model, and a build-report.sh script.

Key design principle: each behavior is tested at the cheapest layer that buys the needed confidence, with layer weighting decided per repo (unit-heavy pyramid for server/clients, integration/snapshot trophy for ios, all-E2E for the dedicated private test repo). Atlassian integration is optional with graceful degradation.

@github-actions

github-actions Bot commented Jun 22, 2026

Copy link
Copy Markdown

Plugin Validation Summary — bitwarden-test-engineer (PR #150)

Overall result: ✅ PASS. All three validation passes succeeded with zero critical and zero major issues. Only minor, advisory warnings remain — none blocking.

Validation Tool Result
Plugin structure & manifest plugin-validator (plugin-dev) ✅ Pass
Skill quality skill-reviewer (plugin-dev) ✅ Pass
Security reviewing-claude-config (claude-config-validator) ✅ Pass

Components validated: 1 agent (test-strategist), 2 skills (analyzing-test-stack, assessing-test-coverage), shared references/ + scripts/, manifest, README, and CHANGELOG. Version 1.0.0 is consistent across plugin.json, marketplace.json, and agents/test-strategist.md.

Note: the repo's validate-plugin-structure.sh / validate-marketplace.sh scripts were not run in this environment (require command approval). Their checks — manifest fields, frontmatter, version/name consistency, changelog format — were instead covered by the plugin-validator agent below.


1. Plugin Structure (plugin-validator) — ✅ PASS

  • Manifest (.claude-plugin/plugin.json): valid JSON; name kebab-case; version valid semver 1.0.0; description, author, homepage, repository, keywords well-formed. No invalid/unknown fields.
  • Agent (agents/test-strategist.md): name test-strategist (lowercase/hyphenated, within 3–50); description carries 4 well-formed <example> blocks with Context/user/assistant/<commentary>; model: inherit valid; color: green valid; substantial system prompt; version: 1.0.0 present and consistent.
  • Skills: both SKILL.md files have valid frontmatter, names matching their directories, and scoped allowed-tools. All references/ files resolve.
  • Hooks / MCP: none defined (correct). The plugin consumes bitwarden-atlassian-tools' MCP server via mcp__plugin_bitwarden-atlassian-tools_* tool references, documented as an optional dependency with graceful degradation.
  • Cross-references: all support files resolve (input-sources.md, report-style-tokens.md, report-style.css, report-template-common.md, all skill references/, and scripts/build-report.sh — executable 0755).
  • CHANGELOG.md: Keep a Changelog format, references SemVer, single ## [1.0.0] - 2026-06-15 entry under ### Added, version matches manifest.
  • File organization: README present and comprehensive; no stray files (.DS_Store, node_modules, .env, etc.).

Minor / advisory (no action required):

  • agents/test-strategist.md — grants Write, Task, and scoped Bash(...). Appropriate for an orchestrator that fans out subagents and runs build-report.sh; every Bash entry is tightly scoped (e.g. Bash(gh pr view:*), Bash(${CLAUDE_PLUGIN_ROOT}/scripts/build-report.sh:*)), no blanket Bash. Flagged only so a reviewer confirms Write is intended (it is).
  • agents/test-strategist.md filename — the file is test-strategist.md, not AGENT.md. Auto-discovery accepts any *.md under agents/, so this is valid; however bump-plugin-version.sh targets AGENT.md (uppercase), so manually confirm the agent version field on the next version bump.

2. Skill Review (skill-reviewer) — ✅ PASS (both skills)

Both skills pass the core quality bar. Frontmatter valid; descriptions are third-person/imperative with specific trigger phrases and exemplary mutual disambiguation ("does NOT … for that, use the other skill") preventing the forward/backward-looking skills from cross-triggering. Progressive disclosure is textbook — lean cores, detail offloaded to references/, shared plugin-level references/ reused across both skills. All referenced files and named section anchors resolve. allowed-tools tightly scoped (read-only git/gh + the single build script).

Minor / warnings (should-fix / optional):

  • skills/assessing-test-coverage/references/finding-coverage.md:20,40 — bare relative reference to references/monorepo-layout.md, which actually lives in the sibling analyzing-test-stack skill. A reader following that path from assessing-test-coverage/references/ would not find it. Recommend qualifying it the way assessing-test-coverage/SKILL.md:27 does ("the analyzing-test-stack skill's references/monorepo-layout.md"). This is the only finding with real reader-confusion risk.
  • analyzing-test-stack/SKILL.md:3 (~660 chars) and assessing-test-coverage/SKILL.md:3 (~580 chars) — descriptions exceed the ~500-char guideline. Length is earned by disambiguation; trim or accept as a deliberate trade-off.
  • Body word counts (802 / 712) are below the 1,000-word soft target — acceptable given strong progressive disclosure.
  • Neither skill has an examples/ directory; a sample generated HTML report would aid the model (optional enhancement).

3. Security Review (reviewing-claude-config) — ✅ PASS

  • Secrets: no API keys, tokens, passwords, private keys, or hardcoded credentials in any changed file. (All "token" hits across the plugin refer to CSS style tokens and LLM token cost/pool — benign.)
  • No settings.local.json committed; no settings files in the changeset.
  • scripts/build-report.sh: hardened — set -euo pipefail; all inputs validated (--kind allowlist, --slug regex, --date regex, file-existence checks); CSS path resolved via BASH_SOURCE (portable, no hardcoded absolute paths); CSS spliced via awk line-by-line so no shell escaping touches content. No curl/wget/eval/nc//dev/tcp/base64 -d/rm -rf or other network or destructive operations.
  • Permission scoping: agent tools are least-privilege — all Bash(...) entries are command-prefix-scoped, the build script is referenced via ${CLAUDE_PLUGIN_ROOT} (portable), and MCP access is limited to the specific Atlassian read tools. No dangerous auto-approvals, no overly broad file access.
  • URLs: all HTTPS; no insecure http:// or WS endpoints.

Recommended actions (none blocking)

  1. Should-fix: qualify the monorepo-layout.md references in skills/assessing-test-coverage/references/finding-coverage.md:20,40 to name the sibling skill.
  2. Optional: trim the two skill descriptions toward ~500 chars; add examples/ with a sample report.
  3. Reminder: on the next version bump, manually verify the version field in agents/test-strategist.md (the bump script targets AGENT.md, not this filename).

Verdict: APPROVE. No errors (must-fix) found; remaining items are warnings/optional.

@nthompson-bitwarden nthompson-bitwarden marked this pull request as ready for review June 23, 2026 12:45
@nthompson-bitwarden nthompson-bitwarden requested a review from a team as a code owner June 23, 2026 12:45
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown

🤖 Bitwarden Claude Code Review

Overall Assessment: APPROVE

Reviewed the new bitwarden-test-engineer plugin: the test-strategist agent, two skills (assessing-test-coverage, analyzing-test-stack), shared reference docs, the report-style.css visual system, and the build-report.sh assembly script. Traced the build script's argument validation, CSS-splice, and combined-mode region/id-namespacing logic — the only executable code in the PR — and found it sound. Marketplace, plugin.json, README, and CHANGELOG are consistent at 1.0.0, and CI plugin validation reported PASS.

Code Review Details

No blocking findings.

Notes considered and dismissed:

  • The agent declares tools: as a YAML list with scoped Bash(...) patterns (e.g. Bash(gh pr view:*)). This matches the existing bitwarden-code-reviewer agent, which uses the same scoped-Bash form, so it is a supported, established pattern — not a defect.
  • build-report.sh validates --slug and --date against strict regexes before using them in paths, splices CSS via awk (no shell escaping of CSS), and gates combined-mode extraction on <body> to avoid matching element names inside CSS comments. No injection or correctness issue found.

@withinfocus

Copy link
Copy Markdown
Contributor

Is there viability to factoring these in as skills for the existing software engineer? We want to unify test development here.

@nthompson-bitwarden

Copy link
Copy Markdown
Author

@withinfocus

Is there viability to factoring these in as skills for the existing software engineer? We want to unify test development here.

I think that's possible but seems to go against the persona-per-plugin pattern. The two skills included don't necessarily need to be a part of this persona, but I think it fits pretty well. Note that these two skills are explicitly not targeting test development, but planning/strategy.

Software-engineer's description states, "collaborates with QA on testing questions" - so I could see it being a consumer or peer of testing, rather than it's owner. I don't think the reports these skills generate should be limited to software-engineers, either. A product-analyst could potentially make use of them, or any role that doesn't have or need to have the software-engineer plugin installed.

For test development specifically - I would think those Skill definitions would exist alongside each repo.

@nthompson-bitwarden

Copy link
Copy Markdown
Author

closing in favor of smaller iteration #152

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants