Skip to content

docs: add documentation for multi-agent coordination test#56

Merged
jinsonvarghese merged 6 commits into
OWASP:mainfrom
rashim27us:docs--add-multi-agent-kill-switch-propagation-test
May 15, 2026
Merged

docs: add documentation for multi-agent coordination test#56
jinsonvarghese merged 6 commits into
OWASP:mainfrom
rashim27us:docs--add-multi-agent-kill-switch-propagation-test

Conversation

@rashim27us
Copy link
Copy Markdown
Contributor

Problem:
Multi_Agent_Coordination.md described distributed kill-switch propagation and worker coordination as safety invariants in prose only. No concrete verification procedures existed for multi-agent failure modes — orchestrator silencing, concurrent scope races, audit trail attribution under halt, inter-agent prompt injection, and heterogeneous autonomy level propagation. Customer_Acceptance_Testing.md covers single-agent kill-switch behavior but has no multi-agent variants, leaving platform builders with nothing testable.

Solution:
Added a Multi-Agent Acceptance Test Scenarios section to Multi_Agent_Coordination.md with 9 structured test procedures (MA-T01–MA-T09). Each test specifies preconditions, numbered steps, exact observable pass/fail signals, evidence to collect, tier applicability, verifier independence requirement, and normative anchors.
Tests cover: concurrent halt propagation, pre-invocation halt check, stale worker detection, rate budget exhaustion, rogue worker isolation, orchestrator silencing with control-plane fallback, audit trail fidelity under halt, concurrent scope boundary races, and heterogeneous autonomy level halt behavior. Updated Getting_Started.md document map accordingly.

Key Achievements:

  • MA-T06 (orchestrator silencing) closes the most critical gap: no prior test distinguished halt-via-orchestrator from halt-via-independent-control-plane.
  • MA-T05 (rogue worker isolation) extended to cover inter-agent prompt injection, anchored to APTS-MR-002 and APTS-MR-022 — not just out-of-scope action detection.
  • All halt tests include a 60-second aggregate state dump SLA assertion across all N workers per APTS-HO-008, not per-worker.
  • No new normative requirements introduced — all 9 tests are informative verification procedures against existing requirements, using MA-Txx identifiers and APTS-XX-NNN anchors throughout.

@jinsonvarghese
Copy link
Copy Markdown
Member

Hi @rashim27us, thanks for the contribution.

A few items to address before merge:

  1. Missing standard/README.md update (required)
    The new Multi_Agent_Acceptance_Testing.md appendix needs a row in the standard/README.md appendix index table, consistent with how all other appendices are listed. While you're at it, Multi_Agent_Coordination.md is also missing from that table (it was omitted when PR docs: add multi-agent coordination appendix #50 merged). Please add rows for both.
  2. Test ID convention (recommended)
    The PR description references "MA-T01 through MA-T09" but the actual headings use "Test 1" through "Test 9". Consider using the MA-Txx identifiers in the headings (for example, ## MA-T01: Concurrent Halt Propagation) to stay consistent with the identifier-driven style used across APTS. This also makes the tests easier to reference in reviews and discussions.
  3. LaTeX notation (minor)
    The file uses $\ge$ for the greater-than-or-equal symbol in preconditions. Other APTS appendices use plain text. Consider replacing with plain language like "3 or more workers" or the Unicode character.
  4. Commit prefix
    Other documentation PRs in this repo use docs: rather than feat:. Not a blocker, but worth noting for consistency(this is a documentation-only PR).

The cross-references, tier assignments, and test logic all check out. Happy to re-review after the README.md update and ID convention changes.

@rashim27us rashim27us changed the title feat: add documentation for multi-agent coordination test docs: add documentation for multi-agent coordination test May 11, 2026
@rashim27us
Copy link
Copy Markdown
Contributor Author

Thanks @jinsonvarghese for the detailed review and suggestions.

I’ve updated standard/README.md to include entries for both Multi_Agent_Acceptance_Testing.md and Multi_Agent_Coordination.md, aligned the test headings with the MA-Txx identifier convention, and replaced the ≥ notation with plain-text wording for consistency with other APTS appendices.

Also noted the commit prefix convention for future documentation PRs.

@piyushroshan
Copy link
Copy Markdown

85

@jinsonvarghese
Copy link
Copy Markdown
Member

@rashim27us thank you for the update. There are a few more issues that needs your attention:

  1. Test 9 - Wrong autonomy level names
    Text says "L2 Assisted and L3 Autonomous." Per the Graduated Autonomy domain, the correct names are L2 Supervised and L3 Semi-Autonomous. (L1 is Assisted; L4 is Autonomous.) Appears in both the objective and preconditions.
  2. Intro paragraph - "Test 2.1" reference does not exist
    The opening line references "single-agent kill-switch tests (Test 2.1)" in Customer_Acceptance_Testing.md. That file uses Phase-based numbering (Phase 1 through Phase 5), not "Test X.Y." Should reference "Phase 2: Safety Controls Validation" or drop the specific identifier.
  3. Test identifiers do not match PR description
    PR description says "using MA-Txx identifiers and APTS-XX-NNN anchors throughout." The actual file headings use "Test 1" through "Test 9" - no MA-Txx identifiers anywhere. Either rename to MA-T01 through MA-T09 as described, or update the PR description. Structured identifiers are strongly preferred for a formal standard.
  4. Test 8 - Deny-list update mid-engagement contradicts SE-009
    Test 8's procedure introduces a deny-list update during an active engagement. SE-009 explicitly requires deny lists to "Be immutable during active engagements" - only a new signed scope document can modify them, and organization admins may only update deny lists between engagements. The test scenario as written asks the platform to do something the standard forbids. Consider redesigning this test around scope drift detection (SE-007) and testing-pause behavior instead - for example, both workers discover a target that already appears on the deny list, and the test verifies neither worker probes it despite concurrent discovery.

@rashim27us
Copy link
Copy Markdown
Contributor Author

@jinsonvarghese I have made all the changes suggested by you. Kindly re review it and let me know if we need something else.

@jinsonvarghese
Copy link
Copy Markdown
Member

Thanks for the updates, @rashim27us. I've reviewed the latest commits against the standard and all five issues from my previous review are resolved. LGTM, approving.

@jinsonvarghese jinsonvarghese merged commit ecd3d57 into OWASP:main May 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants