Skip to content

[Test Improver] test: add unit tests for policy outcome routing (0% -> ~95%)#943

Draft
danielmeppiel wants to merge 1 commit intomainfrom
test-assist/outcome-routing-coverage-24944974871-b5eda711c56cfbd8
Draft

[Test Improver] test: add unit tests for policy outcome routing (0% -> ~95%)#943
danielmeppiel wants to merge 1 commit intomainfrom
test-assist/outcome-routing-coverage-24944974871-b5eda711c56cfbd8

Conversation

@danielmeppiel
Copy link
Copy Markdown
Collaborator

🤖 Test Improver here - I'm an automated AI assistant focused on improving tests for this repository.

Goal and Rationale

apm_cli.policy.outcome_routing is the single source of truth for all 9-outcome policy-discovery routing decisions across the install pipeline. It was extracted from two duplicated implementations in PR #832, but had no dedicated test file — only a few smoke tests in test_pr_832_findings.py.

This module controls critical security behavior: whether a failed policy fetch causes an install to be blocked or silently allowed. The routing logic is non-trivial and the security properties need to be explicitly verified.

Approach

Added tests/unit/policy/test_outcome_routing.py with 56 unit tests covering the complete routing table:

Outcome Behavior Tested
disabled Return None, no logging
hash_mismatch Always fail-closed (raise)
absent Always fail-open (return None)
no_git_remote Always fail-open (return None)
empty Always fail-open (return None)
malformed Respects fetch_failure_default
cache_miss_fetch_fail Respects fetch_failure_default
garbage_response Respects fetch_failure_default
cached_stale Respects cached policy.fetch_failure
found Return policy, log resolved
Unknown outcome Defensive fallback: return None

Security properties explicitly verified:

  • hash_mismatch always raises PolicyViolationError regardless of fetch_failure_default (even "warn")
  • absent / no_git_remote / empty are always fail-open even when fetch_failure_default="block" — they indicate "no org policy", not a network error
  • raise_blocking_errors=False (dry-run path) suppresses all raises across the board
  • None logger is tolerated for non-CLI callers in every code path
  • _FETCH_FAILURE_OUTCOMES is a strict subset of _NON_FOUND_LOGGED_OUTCOMES (invariant test)

Coverage Impact

File Before After
src/apm_cli/policy/outcome_routing.py ~10% (smoke only) ~95%

Trade-offs

  • All tests use mock loggers — no integration or network calls
  • Tests verify the routing table's semantics, not implementation internals
  • PolicyFetchResult and ApmPolicy are used directly (not mocked) for clarity

Test Status

56 passed in 0.36s
Full suite: 5505 passed (5449 baseline + 56 new)
Pre-existing failure: tests/unit/commands/test_policy_status.py (Unicode box-drawing in Rich output fails _ascii_only() check — not caused by this PR)

Reproducibility

uv run pytest tests/unit/policy/test_outcome_routing.py -v

Generated by Daily Test Improver ·

To install this agentic workflow, run

gh aw add githubnext/agentics/workflows/daily-test-improver.md@b87234850bf9664d198f28a02df0f937d0447295

Adds 56 dedicated unit tests for apm_cli.policy.outcome_routing covering
the full 9-outcome routing table with all branching conditions.

Security properties verified:
- hash_mismatch is always fail-closed (PolicyViolationError raised)
  regardless of fetch_failure_default or logger presence
- absent / no_git_remote / empty are always fail-open even when
  fetch_failure_default=block (they are not network-fetch failures)
- malformed / cache_miss_fetch_fail / garbage_response respect
  the project-level fetch_failure_default knob
- cached_stale respects the cached policy's own fetch_failure knob
- disabled short-circuits without logging
- raise_blocking_errors=False (dry-run path) bypasses all raises
- None logger is tolerated across all outcomes

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@danielmeppiel danielmeppiel added automation Deprecated: use type/automation. Kept for issue history; will be removed in milestone 0.10.0. testing Deprecated: use area/testing. Kept for issue history; will be removed in milestone 0.10.0. labels Apr 26, 2026
@github-actions
Copy link
Copy Markdown

APM Review Panel Verdict

Disposition: APPROVE


Per-persona findings

Python Architect: PR adds a single test module (tests/unit/policy/test_outcome_routing.py) with no source changes. Routine PR -- one OO diagram and one flow diagram below.

OO / class diagram

classDiagram
    direction LR
    class route_discovery_outcome {
        <<Pure Function>>
        +route(fetch_result, logger, fetch_failure_default, raise_blocking_errors) ApmPolicy
    }
    class PolicyFetchResult {
        <<ValueObject>>
        +outcome str
        +source str
        +policy ApmPolicy
        +error str
        +fetch_error str
        +cached bool
        +cache_age_seconds int
    }
    class ApmPolicy {
        <<ValueObject>>
        +enforcement str
        +fetch_failure str
    }
    class PolicyViolationError {
        <<Exception>>
        +policy_source str
    }
    class _FETCH_FAILURE_OUTCOMES {
        <<Constant>>
    }
    class _NON_FOUND_LOGGED_OUTCOMES {
        <<Constant>>
    }
    class test_outcome_routing {
        <<TestModule>>
        +TestModuleConstants
        +TestDisabledOutcome
        +TestHashMismatchOutcome
        +TestAlwaysFailOpenOutcomes
        +TestFetchFailureOutcomes
        +TestCachedStaleOutcome
        +TestFoundOutcome
        +TestUnknownOutcome
        +TestErrorFieldRouting
    }
    route_discovery_outcome ..> PolicyFetchResult : reads
    route_discovery_outcome ..> ApmPolicy : returns
    route_discovery_outcome ..> PolicyViolationError : raises
    route_discovery_outcome ..> _FETCH_FAILURE_OUTCOMES : checks
    route_discovery_outcome ..> _NON_FOUND_LOGGED_OUTCOMES : checks
    test_outcome_routing ..> route_discovery_outcome : exercises
    test_outcome_routing ..> PolicyFetchResult : constructs
    test_outcome_routing ..> ApmPolicy : constructs
    test_outcome_routing ..> PolicyViolationError : asserts on
    class test_outcome_routing:::touched
    classDef touched fill:#fff3b0,stroke:#d47600
Loading

Execution flow diagram

flowchart TD
    A["pytest\ntests/unit/policy/test_outcome_routing.py"] --> B["route_discovery_outcome\noutcome_routing.py"]
    B --> C{outcome?}
    C -->|disabled| D["return None -- no logging"]
    C -->|hash_mismatch| E["logger.policy_discovery_miss\noutcome, source, error"]
    E --> F{raise_blocking_errors?}
    F -->|True| G["raise PolicyViolationError\nhash mismatch"]
    F -->|False| H["return None -- dry-run"]
    C -->|"absent / no_git_remote / empty"| I["logger.policy_discovery_miss\noutcome, source"]
    I --> J["return None -- always fail-open"]
    C -->|"malformed / cache_miss_fetch_fail / garbage_response"| K["logger.policy_discovery_miss\noutcome, source, error"]
    K --> L{fetch_failure_default?}
    L -->|warn| M["return None"]
    L -->|block| N{raise_blocking_errors?}
    N -->|True| O["raise PolicyViolationError"]
    N -->|False| P["return None -- dry-run"]
    C -->|cached_stale| Q["logger.policy_resolved\n+ logger.policy_discovery_miss"]
    Q --> R{policy.fetch_failure?}
    R -->|block| S["raise PolicyViolationError\nrefresh failed"]
    R -->|warn| T["return policy"]
    C -->|found| U["logger.policy_resolved\nsource, cached, enforcement, age_seconds"]
    U --> V["return policy"]
    C -->|unknown| W["return None -- defensive fallback"]
Loading

Design patterns

  • Used in this PR: none -- straight-line procedural test code; class-per-outcome grouping is an organizational convention appropriate for the scope.
  • Pragmatic suggestion: none -- the current shape is the simplest correct design at this scope. The _make_logger() / _make_policy() / _make_result() factory helpers are idiomatic pytest. No additional pattern would improve readability.

The test class-per-outcome structure mirrors the routing table in outcome_routing.py exactly -- correct design. Importing private constants (_FETCH_FAILURE_OUTCOMES, _NON_FOUND_LOGGED_OUTCOMES) is acceptable here since both are part of the module's published contract and are imported by production callers (policy_gate.py). No structural issues.


CLI Logging Expert: No production logging code added. Tests correctly exercise the logger interface via MagicMock() stubs and verify kwarg contracts through call_args[1]. The None logger path (non-CLI callers) is covered in every outcome class -- consistent with outcome_routing.py's documented tolerance for non-CLI callers. One minor style note: call_args[1] positional indexing works but call_args.kwargs is more explicit in newer mock versions; not blocking.


DevX UX Expert: No CLI surface changes. The tests validate that PolicyViolationError messages reach the user with actionable content (match="hash mismatch", match="contoso" for source reference, match="refresh failed"). The error string "Update apm.yml policy.hash or contact your org admin" follows the "name the thing, give the fix" principle. No UX regressions.


Supply Chain Security Expert: Five critical security properties are explicitly verified and deserve to be called out:

  1. hash_mismatch always raises regardless of fetch_failure_default -- test_raises_even_with_fetch_failure_default_warn confirms this. This is the most important property: a pinned policy hash mismatch must never silently proceed.
  2. absent / no_git_remote / empty are fail-open even under fetch_failure_default=block -- test_returns_none_with_block parameterizes all three. Correct: these mean "no org policy exists", not "fetch failed".
  3. malformed / cache_miss_fetch_fail / garbage_response respect the block knob -- test_block_mode_raises confirms fail-closed under opt-in config.
  4. raise_blocking_errors=False suppresses all raises (dry-run escape hatch) -- tested for hash_mismatch, fetch failures, and cached_stale with block config.
  5. _FETCH_FAILURE_OUTCOMES subset _NON_FOUND_LOGGED_OUTCOMES invariant -- TestModuleConstants.test_fetch_failure_outcomes_are_subset_of_non_found ensures the constant sets cannot drift silently in ways that would change fail-closed behavior.

Non-blocking observation: TestUnknownOutcome documents that unrecognized outcomes return None and log nothing (fail-open for future outcomes). This is intentional per outcome_routing.py -- new outcome strings from future policy fetches should not block existing installs. The tests correctly make this an explicit contract rather than leaving it implicit.

No new security surface introduced.


Auth Expert: Not activated -- PR changes only tests/unit/policy/test_outcome_routing.py; no file from the fast-path auth list was touched and the diff introduces no authentication behavior, token management, or credential resolution changes.


OSS Growth Hacker: Automated test improvement (0% to ~95%) on a security-critical module. The test file's structured docstrings and outcome-per-class organization make it a reference implementation for contributors writing future policy tests -- good for the contributor funnel. No direct conversion surface (README, quickstart, templates) affected by this PR.

Side-channel to CEO: The Daily Test Improver agentic workflow autonomously generating 56 security-validated tests is itself a story angle ("APM maintains its own security test coverage with AI agents") that reinforces the AI-native development positioning. Flag for a future release narrative or CHANGELOG entry.


CEO arbitration

Specialists agree unanimously: this is a well-executed, security-relevant test coverage addition for a critical routing module. The Supply Chain Security Expert correctly identifies the five explicit security property verifications as the most valuable part of this PR. The Python Architect finds no structural issues; the test shape mirrors the production code's routing table -- the right organizational choice. No specialist raised a blocking concern, there are no trade-offs to arbitrate, and no breaking changes are introduced. The Growth Hacker's side-channel on the agentic self-improvement narrative is noted and will be considered for future release communications. Disposition: APPROVE.


Required actions before merge

None.


Optional follow-ups

  • The pre-existing failure in tests/unit/commands/test_policy_status.py (Unicode box-drawing characters in Rich output failing _ascii_only() check) is unrelated to this PR but should be addressed in a dedicated follow-up -- it violates the cross-platform encoding rule and will fail on Windows cp1252 terminals.
  • The TestUnknownOutcome behavior (fail-open for unrecognized outcomes) is now an explicit contract. Consider adding a brief comment in outcome_routing.py's defensive fallback block to link to this test class as documentation of the intentional design decision.

Generated by PR Review Panel for issue #943 · ● 709.3K ·

danielmeppiel added a commit that referenced this pull request Apr 26, 2026
…ivation (#948)

The label-name guard previously lived inside the agent prompt as Step 0
('exit 0 if label != panel-review'). This had two failure modes:

1. EVERY label add on EVERY PR triggered the workflow, spent ~50s on
   pre_activation + activation + apm bundle restore + agent container
   spin-up, then asked the LLM to please exit early. Observed real
   cost: PR #943 was labelled 'automation'/'testing' (never
   'panel-review') and the agent ran for 5 min 12 s before stopping.

2. The 'exit 0' instruction was a prompt-level request to the LLM,
   not a deterministic gate. An LLM that decides the PR diff is
   interesting can ignore the instruction and proceed.

Fix: move the guard to gh-aw's 'on.steps:' (pre-activation step). When
the triggering label is not 'panel-review' (and the event is not
workflow_dispatch), the step exits 1 -> pre_activation job fails ->
all downstream jobs (activation, apm, agent) skip. Total cost when
filtered out: one ubuntu-slim runner for ~10s, no LLM, no bundle
restore.

gh-aw does not expose 'names:' on pull_request_target (verified at
compile time), so 'on.steps:' is the cheapest available gate. The
prompt's Step 0 is removed; a short note documents that filtering now
happens at the workflow level.

Lock file regenerated via 'gh aw compile pr-review-panel'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automation Deprecated: use type/automation. Kept for issue history; will be removed in milestone 0.10.0. testing Deprecated: use area/testing. Kept for issue history; will be removed in milestone 0.10.0.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant