Skip to content

Add agent capability tiers, rule units, and sub-agent delegation#81

Open
oshorefueled wants to merge 44 commits intomainfrom
feat/sub-agents
Open

Add agent capability tiers, rule units, and sub-agent delegation#81
oshorefueled wants to merge 44 commits intomainfrom
feat/sub-agents

Conversation

@oshorefueled
Copy link
Copy Markdown
Contributor

@oshorefueled oshorefueled commented Apr 3, 2026

Why

Agent mode previously ran one LLM call per rule with no awareness of
which model to use for different task types, and it had no way to
delegate sub-tasks to specialized agents. This PR adds capability-tier
routing, deterministic pre-computed rule units, merged lint calls, and
sub-agent delegation to make agent evaluation faster and more capable.

What

New agent capabilities:

  • Capability tiers (high-cap / mid-cap / low-cap): each
    provider now exposes per-tier model/deployment config; callers request
    a tier and the resolver picks the configured model with upward fallback
  • Deterministic rule units: file-to-rule mappings are pre-computed
    before the agent loop so each lint tool call only receives the rules
    applicable to the target file — no hallucinated rule sources
  • Merged lint calls: multiple rules for the same file are bundled
    into one structured LLM call instead of N separate calls
  • Sub-agent delegation: a new agent tool in the executor dispatches
    a restricted read-only sub-agent (no lint/report tools); defaults to
    high-cap when no model tier is specified
  • System prompt composition: composeSystemPrompt() assembles
    directive + userInstructions + rule body with consistent double-newline
    separation, used by both evaluators and the lint tool handler

Provider layer:

  • Removed RequestBuilder / EvalContext; runPromptStructured now
    takes explicit (systemPrompt, userMessage, schema) params
  • StructuredPromptContext carries systemDirective and
    userInstructions into the evaluator pipeline

CLI refactors (no behavior change):

  • Shared resolvePresetsDir() fixes dev/dist path divergence in
    validate vs main commands
  • Shared resolveMatchedPromptsForFile() unifies rule matching across
    standard and agent evaluation paths
  • result-routing.ts split into check-routing, judge-routing,
    debug-artifact, and issue-sink modules; IssueSink interface
    isolates all format branching from routing logic
  • Typed NoConfigurationFoundError replaces bare Error for missing
    scan-path config

Correctness fixes:

  • Sub-agents use high-cap by default (was: fell back to default
    provider, ignoring tier config)
  • Unmatched rules excluded from agent score output (was: scored 10.0/10
    even when never applied to any file)
  • buildRuleName filters empty pack segments (was: emitted .RuleId
    with leading dot when pack was empty)
  • Criterion name lookup is case-insensitive (was: missed criterion IDs
    when LLM returned different casing)
  • Score summary keyed by pack-qualified name to prevent cross-pack
    collisions

Scope

In scope

  • Agent executor: capability tiers, rule units, merged lint, sub-agent
  • Provider layer: RequestBuilder removal, tier-aware factory
  • CLI structural refactors: preset resolver, rule matcher, result-routing
  • All bug fixes listed above

Out of scope

  • Changing rule file formats or frontmatter schema
  • New evaluator types
  • Search provider changes

Behavior impact

  • Agent mode: LLM calls are now merged per file (fewer, larger calls)
    and the agent can spawn read-only sub-agents for document exploration
  • Provider config: tier-specific model/deployment env vars are now
    recognized (e.g., OPENAI_HIGH_CAPABILITY_MODEL, OPENAI_LOW_CAPABILITY_MODEL)
  • Score output: rules that never matched a file are no longer shown
    in the per-rule score summary
  • Rule name format: names with an empty pack now omit the leading dot
    (e.g., RuleId instead of .RuleId)
  • Standard CLI: preset resolution and rule matching behavior is
    unchanged; the refactors are path-equivalent rewrites

Risk

  • Merged lint calls change the token usage profile — more tokens per call
    but fewer calls total; output schema is validated against Zod so
    malformed responses still surface as operational errors
  • Sub-agent delegation is opt-in via the agent tool; if the LLM never
    calls it, behavior is identical to before
  • RequestBuilder removal touches all providers; the simplified
    interface is tested via the provider contract test suite

How to test / verify

Checks run

  • npm run test:run — 346 tests, all passing
  • npm run lint — zero errors

Manual verification

  • Set LLM_PROVIDER=openai + OPENAI_HIGH_CAPABILITY_MODEL=gpt-4.1 and
    run agent mode; confirm the agent sub-tool uses gpt-4.1
  • Run against a file that matches only some rules; confirm unmatched
    rules do not appear in score output
  • Run vectorlint validate in a project using dev (tsx) and built
    (dist) modes; confirm preset resolution succeeds in both

Rollback

  • Revert to the commit before feat(agent): add provider capability tiers
  • No schema migrations or persistent state changes — rollback is safe

- add provider-scoped capability-tier env fields and upward-only resolution

- update global and example config templates for tier-specific models

- cover parser and resolver behavior with focused provider tests
- add a capability-aware provider bundle for agent-mode model routing

- split top-level loop and lint execution across resolved capability tiers

- return final agent-loop text and cover the new routing in focused tests
- replace lint input with explicit rules[] rule calls

- add subordinate agent tool schema with capability-tier selection

- register the new tool descriptions and cover the contracts in tests
- extract shared token estimation and group matched rules under a budget

- render matched rule units explicitly in the agent system prompt

- precompute rule units before the agent loop and cover the flow in tests
- add a bundled structured-output schema with per-finding rule attribution

- run one provider request for each multi-rule lint call instead of per-rule loops

- cover bundled routing, prompt isolation, and severity preservation in tests
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces provider capability-tier resolution and upward-only fallback, refactors agent execution to group matched rules into token-budgeted "matched rule units", adds sub-agent delegation with bounded read-only tools, and extracts CLI agent/file evaluation and result-routing into focused modules; updates schemas, provider factory, and tests to match the new flows.

Changes

Cohort / File(s) Summary
Configuration & Env
\.env\.example, CONFIGURATION.md, README.md, src/config/global-config.ts, src/schemas/env-schemas.ts
Add per-provider high/mid/low capability env variables and docs; document upward-only fallback order and agent-mode selection semantics.
Capability resolution & providers
src/providers/model-capability.ts, src/providers/capability-provider-resolver.ts, src/providers/provider-factory.ts, src/providers/index.ts, src/providers/llm-provider.ts, src/providers/vercel-ai-provider.ts
New model-capability tier types and resolver; provider factory split to create providers per-model identifier; provider interfaces/signatures adjusted to use systemPrompt/userMessage and include tool-loop text/usage.
Agent core & tooling
src/agent/executor.ts, src/agent/prompt-builder.ts, src/agent/types.ts, src/agent/tools-registry.ts, src/agent/tool-handlers.ts
Executor switched to matched rule units and delegated tool handlers; lint tool accepts merged rules[]; new agent tool added; tool handlers wired to capability resolver and sub-agent flows.
Matched rule grouping & sub-agents
src/agent/rule-units.ts, src/agent/sub-agent.ts, src/agent/lint-prompt.ts, src/agent/findings.ts
Add matched-rule-unit grouping with token estimates and budgeted splitting; add sub-agent runner that invokes provider tool loops with a trimmed read-only toolset; lint prompt merging utilities and finding construction added.
CLI refactor & evaluation
src/cli/commands.ts, src/cli/orchestrator.ts, src/cli/types.ts, src/cli/agent-evaluation.ts, src/cli/file-evaluation.ts, src/cli/agent-mode-capability.ts, src/cli/agent-mode-capability.ts
Extract agent-mode and single-file evaluators; wire capability-provider resolver into CLI flow; add systemDirective/userInstructions plumbing to evaluation types.
Result routing & output
src/cli/result-routing.ts, src/cli/result-routing/*, src/cli/issue-output.ts, src/cli/result-routing/issue-sink.ts, src/prompts/schema.ts
Add routing for check/judge results, sinks for multiple output formats, debug-artifact writers, and a merged-check schema including per-finding ruleSource.
Evaluator & prompt plumbing
src/evaluators/*, src/prompts/system-prompt.ts, src/boundaries/*
Introduce StructuredPromptContext and composeSystemPrompt; evaluator constructors accept structured context; removed RequestBuilder/EvalContext and reworked system-prompt composition; moved token-estimate util and re-export.
Utilities & errors
src/utils/token-estimate.ts, src/errors/*, src/boundaries/scan-path-resolver.ts, src/rules/matched-prompts.ts, src/cli/preset-resolution.ts
Add token estimator, domain-specific NoConfigurationFoundError, resolveMatchedPromptsForFile, and preset-resolution helper.
Tests & package
tests/**, package.json
Large test updates/additions covering matched units, sub-agents, capability resolution/fallbacks, routing, and formatters; move @types/micromatch to devDependencies.

Sequence Diagram(s)

sequenceDiagram
  participant CLI
  participant CapabilityResolver as CapabilityResolver
  participant DefaultProvider as DefaultProvider
  participant TierProvider as TierProvider
  participant AgentExecutor as AgentExecutor
  participant ToolHandlers as ToolHandlers
  participant SubAgent as SubAgent
  CLI->>CapabilityResolver: createCapabilityProviderResolver(env)
  CLI->>AgentExecutor: runAgentExecutor(scanPaths, systemDirective, capabilityResolver)
  AgentExecutor->>ToolHandlers: createToolHandlers(defaultProvider, resolveCapabilityProvider, ...)
  ToolHandlers->>TierProvider: resolveCapabilityProvider(requestedTier)
  ToolHandlers->>TierProvider: runAgentToolLoop(lint/agent) 
  ToolHandlers->>SubAgent: runSubAgent(task, tools, provider=tierProvider)
  SubAgent->>TierProvider: runAgentToolLoop(sub-agent systemPrompt)
  SubAgent-->>ToolHandlers: compactResult / usage
  TierProvider-->>ToolHandlers: lint findings / usage
  ToolHandlers-->>AgentExecutor: append findings / nested usage
  AgentExecutor-->>CLI: aggregated findings, usage, operational flags
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ayo6706

Poem

🐰
I nibble rules into tidy sets,
Pack tokens tight as sunrise nets,
Sub-agents hop with gentle scope,
Each finding bears its source and hope,
Hooray — the linting burrow grows!

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sub-agents

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/config/global-config.ts (1)

13-86: ⚠️ Potential issue | 🟡 Minor

Keep the generated config template aligned with .env.example.

These two copy-paste templates have already drifted: this file includes the OpenAI tier block while .env.example still starts at Azure, and the Anthropic default model values differ between the two files. Shipping both as-is will give users conflicting keys/defaults. Please sync them in this PR, or better, generate both from shared provider metadata.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/global-config.ts` around lines 13 - 86,
DEFAULT_GLOBAL_CONFIG_TEMPLATE diverges from .env.example (provider order and
Anthropic model defaults); update the DEFAULT_GLOBAL_CONFIG_TEMPLATE string so
it matches .env.example exactly (remove or reorder the OpenAI tier block so the
file starts at Azure if .env.example does, and make ANTHROPIC_MODEL /
ANTHROPIC_HIGH_CAPABILITY_MODEL / ANTHROPIC_MID_CAPABILITY_MODEL /
ANTHROPIC_LOW_CAPABILITY_MODEL values identical to .env.example). Preferably
refactor to generate DEFAULT_GLOBAL_CONFIG_TEMPLATE from the shared provider
metadata source (or a single canonical constants object) rather than
copy-pasting, and ensure the template string and the canonical source use the
same symbols (e.g., DEFAULT_GLOBAL_CONFIG_TEMPLATE, ANTHROPIC_MODEL) so future
drift is prevented.
🧹 Nitpick comments (7)
src/providers/model-capability.ts (2)

1-2: Decouple capability resolution from provider-factory.

Importing ProviderType from the SDK-heavy factory makes this config-only module transitively pull provider transport code into runtime consumers. Moving ProviderType into a small shared types/constants module would keep this resolver lightweight and avoid that dependency leak.

As per coding guidelines, "Define shared domain constants, enums, or types for core runtime concepts; avoid magic strings in shared runtime code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/model-capability.ts` around lines 1 - 2, The module
model-capability.ts currently imports ProviderType from provider-factory which
drags in heavy SDK/transport code; extract ProviderType into a lightweight
shared types/constants module (e.g., create a new types file exporting the
ProviderType enum/const) and update model-capability.ts to import ProviderType
from that new shared module instead of './provider-factory'; ensure the new
shared module only contains the type/enum (no runtime/provider logic) and adjust
any other files that import ProviderType to use the new module.

89-90: Use a custom config/provider error here too.

Line 90 is another new raw Error path in src/providers. Align it with the repo's error hierarchy so unsupported-provider states surface consistently.

As per coding guidelines, "Prefer the repository's custom error hierarchy over native Error; catch blocks use unknown type and extend existing custom error types before introducing raw exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/model-capability.ts` around lines 89 - 90, Replace the raw
throw new Error in the default branch with the repository's custom
unsupported-provider error type (e.g., UnsupportedProviderError or the
appropriate subclass of your existing custom error hierarchy) so unsupported
LLM_PROVIDER values surface consistently; import the custom error at the top of
src/providers/model-capability.ts, construct it with a descriptive message that
includes the actual (envConfig as { LLM_PROVIDER?: string }).LLM_PROVIDER value,
and ensure any upstream catch sites that currently handle unknown are compatible
with the custom error type.
src/providers/provider-factory.ts (1)

28-43: Use the repo error hierarchy for unsupported-provider failures.

Line 41 adds a new raw Error path for configuration failures. Please throw the existing provider/config error type here so setup failures stay classifiable.

As per coding guidelines, "Prefer the repository's custom error hierarchy over native Error; catch blocks use unknown type and extend existing custom error types before introducing raw exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/providers/provider-factory.ts` around lines 28 - 43, The switch in
resolveDefaultModelIdentifier currently throws a raw Error for unknown provider;
replace that with the repository's custom provider/config error (e.g.,
UnsupportedProviderError or ProviderConfigError) and include the LLM_PROVIDER
value in the error payload/message so failures remain classifiable; update the
top of the file to import the custom error type, and construct/throw it instead
of new Error(`Unsupported provider type: ...`) inside
resolveDefaultModelIdentifier while preserving the original message content and
context.
docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md (1)

521-524: Minor style: adverb repetition.

Line 523 uses "only" twice: "returns compact final text only - the sub-agent only receives". Consider rephrasing for clarity.

✏️ Suggested fix
-- the sub-agent runs synchronously and returns compact final text only
-- the sub-agent only receives `read_file`, `search_files`, `list_directory`, `search_content`
+- the sub-agent runs synchronously and returns compact final text
+- the sub-agent receives only `read_file`, `search_files`, `list_directory`, `search_content`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md` around lines
521 - 524, The phrasing repeats the adverb "only" in the two bullet points;
change the wording so it's not redundant—for example, rewrite "the sub-agent
runs synchronously and returns compact final text only" to "the sub-agent runs
synchronously and returns compact final text; it receives only `read_file`,
`search_files`, `list_directory`, and `search_content`" so the two concepts
(return behavior and permitted actions) are clear and the second "only" applies
directly to the list of allowed operations.
tests/agent/rule-units.test.ts (1)

58-91: Consider adding edge case tests for boundary conditions.

The token budget splitting test is well-implemented, but consider adding tests for edge cases:

  • Empty fileRuleMatches array (should return empty array)
  • Single match with budget smaller than its estimated tokens
  • All matches fit under budget (no splitting needed)

These would improve confidence in the boundary handling logic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agent/rule-units.test.ts` around lines 58 - 91, Add three new tests for
buildMatchedRuleUnits covering boundary cases: (1) empty fileRuleMatches array
should return an empty array, (2) a single match whose prompt estimated tokens
exceed the provided budget should still produce one unit containing that single
rule (with estimatedTokens present and numeric), and (3) multiple matches whose
combined estimated tokens are <= budget should produce a single unit containing
all rules. Use the existing test helpers (makePrompt / PromptFile) to construct
prompts and pass the tokenBudget argument to buildMatchedRuleUnits to validate
these behaviors.
docs/superpowers/specs/2026-04-02-agent-rule-units-design.md (1)

281-281: Minor: Consider conciseness improvement.

"all of the delegated context" could be shortened to "the delegated context" without losing meaning.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/superpowers/specs/2026-04-02-agent-rule-units-design.md` at line 281,
Update the sentence in the spec that reads "receive the result, and continue
without carrying all of the delegated context forward in the main conversation"
to the more concise "receive the result, and continue without carrying the
delegated context forward in the main conversation" by removing the word "all";
locate the sentence in the paragraph that begins "The sub-agent `agent` tool is
not limited to exploration" and replace the phrase to keep meaning while
reducing redundancy.
src/agent/executor.ts (1)

536-538: Use a repo error type for the missing-provider guard.

This new entry-point check throws raw Error, which makes provider-resolution failures harder to classify consistently with the rest of the agent path.

As per coding guidelines "Prefer the repository's custom error hierarchy over native Error; catch blocks use unknown type and extend existing custom error types before introducing raw exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/executor.ts` around lines 536 - 538, Replace the raw throw in the
runAgentExecutor missing-provider guard with the repository's custom error type
(e.g., ProviderResolutionError or AgentError): import the appropriate error
class from the project's errors module, then throw new
ProviderResolutionError('runAgentExecutor requires at least one provider.') (or
wrap additional context/metadata as the custom error supports) instead of throw
new Error(...); update the import list to include the chosen error type and
ensure no other code expects a raw Error for this check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.env.example:
- Around line 49-52: Remove the extra empty line before the search provider
section in .env.example so the comment "# Search Provider Configuration"
immediately follows the last variable entry (e.g., after the commented
"BEDROCK_TEMPERATURE" line); this eliminates the dotenv-linter warning by
ensuring there are no consecutive blank lines between environment blocks.

In `@docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md`:
- Around line 1-13: The document jumps from the top-level heading "# Agent Rule
Units and Sub-Agent Implementation Plan" directly to "### Task 1", breaking
heading hierarchy; update the task section(s) so they use a second-level heading
(change "### Task 1" to "## Task 1" or insert an H2 before the task list) and
ensure subsequent task headings consistently use H2 (e.g., "## Task 2", etc.) so
the document follows a correct H1→H2→H3 structure and improves navigation and
accessibility.

In `@src/agent/executor.ts`:
- Around line 148-160: The function buildEffectiveRuleBody currently lets
reviewInstruction replace prompt.body which causes tool evaluations to run on
non-canonical text; change it so prompt.body is always included first
(preserving the canonical rule text from PromptFile), then append the
reviewInstruction (e.g., as a supplemental or override note) only if present,
and finally append the context block if provided; update the implementation in
buildEffectiveRuleBody to concatenate prompt.body + optional reviewInstruction
(with clear separation) + optional context so ruleSource attribution remains
accurate.
- Around line 683-695: The loop is recording findings from result.data.findings
without ensuring finding.ruleSource is one of the requested rules
(promptBySource), which lets unrelated/typo sources (e.g., "C") be recorded and
then throw via buildUnknownRuleSourceError after some findings were appended.
Before calling resolvePromptBySource and appendInlineFinding, check whether
finding.ruleSource exists in the promptBySource (or validSources) set; if it is
missing, skip this finding (or collect/report it inline) instead of throwing to
avoid aborting the entire batch after partial appends. Update the loop around
resolvePromptBySource, appendInlineFinding, and buildUnknownRuleSourceError to
only call appendInlineFinding for sources present in promptBySource and handle
unknown sources by continuing rather than throwing.

In `@tests/env-parser.test.ts`:
- Around line 233-266: The test case "parses provider-scoped capability-tier
model fields for Gemini and Bedrock" is incorrectly nested inside the "OpenAI
Configuration" describe; move that entire it(...) block out of the "OpenAI
Configuration" describe and place it in its own describe (e.g.,
describe('Provider-scoped capability-tier models', ...)) or alongside other
provider tests so it runs independently; keep the test body unchanged (it
references parseEnvironment and ProviderType.Gemini /
ProviderType.AmazonBedrock) and adjust surrounding braces/indentation to close
the OpenAI describe before adding the new describe.

---

Outside diff comments:
In `@src/config/global-config.ts`:
- Around line 13-86: DEFAULT_GLOBAL_CONFIG_TEMPLATE diverges from .env.example
(provider order and Anthropic model defaults); update the
DEFAULT_GLOBAL_CONFIG_TEMPLATE string so it matches .env.example exactly (remove
or reorder the OpenAI tier block so the file starts at Azure if .env.example
does, and make ANTHROPIC_MODEL / ANTHROPIC_HIGH_CAPABILITY_MODEL /
ANTHROPIC_MID_CAPABILITY_MODEL / ANTHROPIC_LOW_CAPABILITY_MODEL values identical
to .env.example). Preferably refactor to generate DEFAULT_GLOBAL_CONFIG_TEMPLATE
from the shared provider metadata source (or a single canonical constants
object) rather than copy-pasting, and ensure the template string and the
canonical source use the same symbols (e.g., DEFAULT_GLOBAL_CONFIG_TEMPLATE,
ANTHROPIC_MODEL) so future drift is prevented.

---

Nitpick comments:
In `@docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md`:
- Around line 521-524: The phrasing repeats the adverb "only" in the two bullet
points; change the wording so it's not redundant—for example, rewrite "the
sub-agent runs synchronously and returns compact final text only" to "the
sub-agent runs synchronously and returns compact final text; it receives only
`read_file`, `search_files`, `list_directory`, and `search_content`" so the two
concepts (return behavior and permitted actions) are clear and the second "only"
applies directly to the list of allowed operations.

In `@docs/superpowers/specs/2026-04-02-agent-rule-units-design.md`:
- Line 281: Update the sentence in the spec that reads "receive the result, and
continue without carrying all of the delegated context forward in the main
conversation" to the more concise "receive the result, and continue without
carrying the delegated context forward in the main conversation" by removing the
word "all"; locate the sentence in the paragraph that begins "The sub-agent
`agent` tool is not limited to exploration" and replace the phrase to keep
meaning while reducing redundancy.

In `@src/agent/executor.ts`:
- Around line 536-538: Replace the raw throw in the runAgentExecutor
missing-provider guard with the repository's custom error type (e.g.,
ProviderResolutionError or AgentError): import the appropriate error class from
the project's errors module, then throw new
ProviderResolutionError('runAgentExecutor requires at least one provider.') (or
wrap additional context/metadata as the custom error supports) instead of throw
new Error(...); update the import list to include the chosen error type and
ensure no other code expects a raw Error for this check.

In `@src/providers/model-capability.ts`:
- Around line 1-2: The module model-capability.ts currently imports ProviderType
from provider-factory which drags in heavy SDK/transport code; extract
ProviderType into a lightweight shared types/constants module (e.g., create a
new types file exporting the ProviderType enum/const) and update
model-capability.ts to import ProviderType from that new shared module instead
of './provider-factory'; ensure the new shared module only contains the
type/enum (no runtime/provider logic) and adjust any other files that import
ProviderType to use the new module.
- Around line 89-90: Replace the raw throw new Error in the default branch with
the repository's custom unsupported-provider error type (e.g.,
UnsupportedProviderError or the appropriate subclass of your existing custom
error hierarchy) so unsupported LLM_PROVIDER values surface consistently; import
the custom error at the top of src/providers/model-capability.ts, construct it
with a descriptive message that includes the actual (envConfig as {
LLM_PROVIDER?: string }).LLM_PROVIDER value, and ensure any upstream catch sites
that currently handle unknown are compatible with the custom error type.

In `@src/providers/provider-factory.ts`:
- Around line 28-43: The switch in resolveDefaultModelIdentifier currently
throws a raw Error for unknown provider; replace that with the repository's
custom provider/config error (e.g., UnsupportedProviderError or
ProviderConfigError) and include the LLM_PROVIDER value in the error
payload/message so failures remain classifiable; update the top of the file to
import the custom error type, and construct/throw it instead of new
Error(`Unsupported provider type: ...`) inside resolveDefaultModelIdentifier
while preserving the original message content and context.

In `@tests/agent/rule-units.test.ts`:
- Around line 58-91: Add three new tests for buildMatchedRuleUnits covering
boundary cases: (1) empty fileRuleMatches array should return an empty array,
(2) a single match whose prompt estimated tokens exceed the provided budget
should still produce one unit containing that single rule (with estimatedTokens
present and numeric), and (3) multiple matches whose combined estimated tokens
are <= budget should produce a single unit containing all rules. Use the
existing test helpers (makePrompt / PromptFile) to construct prompts and pass
the tokenBudget argument to buildMatchedRuleUnits to validate these behaviors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3ae15f8-2b25-445f-ba19-dd5d08d43a64

📥 Commits

Reviewing files that changed from the base of the PR and between eff7743 and 2b4166e.

📒 Files selected for processing (35)
  • .env.example
  • CONFIGURATION.md
  • README.md
  • docs/logs/2026-04-03-agent-rule-units-and-subordinate-agent.log.md
  • docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md
  • docs/superpowers/specs/2026-04-02-agent-rule-units-design.md
  • src/agent/executor.ts
  • src/agent/prompt-builder.ts
  • src/agent/rule-units.ts
  • src/agent/sub-agent.ts
  • src/agent/tools-registry.ts
  • src/agent/types.ts
  • src/boundaries/user-instruction-loader.ts
  • src/cli/commands.ts
  • src/cli/orchestrator.ts
  • src/cli/types.ts
  • src/config/global-config.ts
  • src/prompts/schema.ts
  • src/providers/capability-provider-bundle.ts
  • src/providers/llm-provider.ts
  • src/providers/model-capability.ts
  • src/providers/provider-factory.ts
  • src/providers/vercel-ai-provider.ts
  • src/schemas/env-schemas.ts
  • src/utils/token-estimate.ts
  • tests/agent/agent-executor.test.ts
  • tests/agent/prompt-builder.test.ts
  • tests/agent/rule-units.test.ts
  • tests/agent/sub-agent.test.ts
  • tests/agent/types-contract.test.ts
  • tests/env-parser.test.ts
  • tests/global-config.test.ts
  • tests/orchestrator-agent-output.test.ts
  • tests/provider-factory.test.ts
  • tests/providers/vercel-ai-provider-agent-loop.test.ts

Comment thread .env.example
Comment thread docs/plans/2026-04-03-agent-rule-units-and-subordinate-agent.md Outdated
Comment thread src/agent/executor.ts Outdated
Comment thread src/agent/executor.ts Outdated
Comment thread tests/env-parser.test.ts
@oshorefueled
Copy link
Copy Markdown
Contributor Author

oshorefueled commented Apr 3, 2026

Agent Capability Tiers, Rule Units, and Sub-Agents

1. Capability Changes

After merge, you can:

  • configure provider-scoped high-capability, mid-capability, and low-capability models or deployment names
  • run agent mode with deterministic matched rule units and bundled multi-rule lint requests
  • delegate bounded read-only workspace analysis to sub-agents through the agent tool

2. Boundary Changes

  • Agent-mode tool contract:
    • lint now takes file plus rules[]
    • each rule item supports ruleSource, optional reviewInstruction, and optional context
    • agent takes task, optional label, and optional capability-tier model
  • Config/env contract:
    • new provider-specific capability-tier fields were added for OpenAI, Azure OpenAI, Anthropic, Gemini, and Bedrock
    • Azure uses deployment-name fields instead of raw model IDs

3. High-Level Flow

  1. Agent mode resolves the workspace targets and loaded rules.
  2. The runtime precomputes deterministic matched rule units under a token budget.
  3. The top-level agent loop runs with the high-capability provider tier.
  4. When the agent calls lint, the runtime reads the file once, builds one bundled prompt from rules[], and sends one structured review request with the mid-capability tier.
  5. Findings are recorded with their original ruleSource and per-rule severity.
  6. When the agent calls agent, the runtime launches a bounded sub-agent with read-only tools only and returns a compact result or compact error.

4. Invariants

  • Capability fallback is upward-only within the active provider:
    • low-capability -> mid-capability -> high-capability -> provider default
    • mid-capability -> high-capability -> provider default
  • Bundled linting must preserve original ruleSource attribution for every finding.
  • Sub-agents remain read-only and cannot call lint, recurse into agent, write files, or finalize the main review session.
  • Agent mode stays a review-only system; this change does not introduce edit/write behavior.

5. Failure Modes and Signals

  • Misconfigured capability-tier env vars:
    • noticed through env parsing or provider-factory test failures
  • Bad bundled lint payloads or unknown ruleSource entries:
    • fail the whole lint call before review runs
  • Sub-agent execution failure:
    • returned to the main agent as a compact { ok: false, error } payload

6. Verification

Checks run:

  • npm run build
  • npm run lint
  • npm run test:run -- tests/agent tests/providers tests/orchestrator-agent-output.test.ts tests/provider-factory.test.ts tests/env-parser.test.ts tests/global-config.test.ts

Quick manual verification:

  1. Configure one provider with capability-tier env vars.
  2. Run vectorlint <file> --mode agent --output json.
  3. Confirm bundled findings keep per-rule ruleSource.
  4. Confirm delegated agent work returns compact sub-agent output and never exposes write or recursive tools.

7. Rollback

  1. Revert the commits introduced on feat/sub-agents.
  2. Remove any newly added capability-tier env values if they were applied in operator environments.
  3. Re-run the verification commands above to confirm the repository is back on the prior agent-mode behavior.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/executor.ts`:
- Around line 535-538: Replace the native Error thrown when no provider is
configured with the repository's ConfigError: inside runAgentExecutor where
defaultProvider is determined (the block checking if (!defaultProvider)), change
throw new Error(...) to throw new ConfigError('runAgentExecutor requires at
least one provider.'); and add the necessary import for ConfigError at the top
of the file so the symbol resolves.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5854367c-2165-4cdd-92e6-a0ff615a5689

📥 Commits

Reviewing files that changed from the base of the PR and between 2b4166e and 5953769.

📒 Files selected for processing (4)
  • .env.example
  • artifacts/2026-04-03-agent-capability-tiers-rule-units-and-sub-agents.md
  • src/agent/executor.ts
  • tests/agent/agent-executor.test.ts
✅ Files skipped from review due to trivial changes (1)
  • artifacts/2026-04-03-agent-capability-tiers-rule-units-and-sub-agents.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/agent/agent-executor.test.ts

Comment thread src/agent/executor.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/providers/model-capability.ts`:
- Around line 15-27: The resolver resolveCapabilityValue currently returns
defaultValue without validating it; change resolveCapabilityValue to treat
empty/whitespace strings as missing and fail fast by throwing a descriptive
error when no non-blank value is found across TIER_SEARCH_ORDER[requested] and
the provided defaultValue is blank, so callers (e.g., capability-provider-bundle
and provider-factory) receive a config-time exception rather than a propagated
invalid identifier.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7788eb18-c713-441f-acc5-01702ff467fc

📥 Commits

Reviewing files that changed from the base of the PR and between 5953769 and 18deb3a.

📒 Files selected for processing (5)
  • .env.example
  • artifacts/2026-04-03-agent-capability-tiers-rule-units-and-sub-agents.md
  • src/agent/executor.ts
  • src/providers/model-capability.ts
  • src/providers/provider-factory.ts
✅ Files skipped from review due to trivial changes (2)
  • artifacts/2026-04-03-agent-capability-tiers-rule-units-and-sub-agents.md
  • src/agent/executor.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • .env.example
  • src/providers/provider-factory.ts

Comment thread src/providers/model-capability.ts Outdated
- trim capability-tier identifiers before resolving provider models

- throw ConfigError when no non-blank default or tier value exists

- cover the config-time failure path with a focused provider test
- route uncapped agent-mode work through the default provider and let lint opt into capability tiers per call

- replace bundled lint naming with merged terminology and share the base finding schema across merged responses

- rename the provider bundle to a resolver, shorten capability tier names, and remove the tracked artifact and redundant sub-agent write guard
- Keep orchestrator as a thin composition root for mode dispatch,

  aggregation, formatter selection, and final reporting

- Extract per-file evaluation, prompt-result processing, violation

  reporting, and agent-mode flow into dedicated modules

- Preserve runtime behavior while reducing cross-cutting coupling

  and making future changes easier to test and review
- Rename evaluation flow modules to plain responsibility-based names

  (agent-evaluation, file-evaluation, result-routing, issue-output)

- Update orchestrator and internal imports to match the new

  naming convention without changing runtime behavior
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (2)
src/agent/sub-agent.ts (1)

42-48: Missing usage in error response path.

When an error is thrown, any partial usage from the provider is lost. If the provider tracks token usage before failing, this data should be preserved for aggregation.

♻️ Proposed fix to preserve usage on error
   } catch (error) {
     const message = error instanceof Error ? error.message : String(error);
     return {
       ok: false,
       error: message,
+      // Note: usage is not available from thrown errors; if needed,
+      // the provider would need to include it in a custom error type
     };
   }

This is a minor consideration since standard errors don't carry usage data, but worth noting if future provider implementations need to surface partial usage on failure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/sub-agent.ts` around lines 42 - 48, The catch block in sub-agent.ts
currently returns only { ok: false, error: message } and drops any provider
token usage; change the error path in the catch inside the function handling
provider calls (the catch that builds the returned object) to include usage when
present by extracting usage from the thrown error or from the local usage
variable (e.g., check (error as any).usage || usage) and return it as a usage
field alongside ok:false and error so partial token usage is preserved for
aggregation.
src/agent/types.ts (1)

35-47: Consider using TRIMMED_NON_BLANK_STRING for consistency.

The TOP_LEVEL_REFERENCE_SCHEMA and TOP_LEVEL_REPORT_INPUT_SCHEMA use z.string().min(1) without trimming, while the new schemas use TRIMMED_NON_BLANK_STRING. This could allow whitespace-only strings like " " to pass validation.

♻️ Proposed fix for consistency
 export const TOP_LEVEL_REFERENCE_SCHEMA = z.object({
-  file: z.string().min(1),
+  file: TRIMMED_NON_BLANK_STRING,
   startLine: z.number().int().positive(),
   endLine: z.number().int().positive(),
 });

 export const TOP_LEVEL_REPORT_INPUT_SCHEMA = z.object({
   kind: z.literal('top-level'),
-  ruleSource: z.string().min(1),
-  message: z.string().min(1),
+  ruleSource: TRIMMED_NON_BLANK_STRING,
+  message: TRIMMED_NON_BLANK_STRING,
   suggestion: z.string().optional(),
   references: z.array(TOP_LEVEL_REFERENCE_SCHEMA).optional(),
 });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/types.ts` around lines 35 - 47, Replace plain z.string().min(1)
usages with the existing TRIMMED_NON_BLANK_STRING to prevent whitespace-only
values: update TOP_LEVEL_REFERENCE_SCHEMA's file to use
TRIMMED_NON_BLANK_STRING, update TOP_LEVEL_REPORT_INPUT_SCHEMA's ruleSource and
message to use TRIMMED_NON_BLANK_STRING, and make suggestion
TRIMMED_NON_BLANK_STRING.optional() if you want to trim/validate non-blank
optional suggestions; keep references as-is. Ensure you import or reference the
TRIMMED_NON_BLANK_STRING constant where these schemas are defined.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/cli/agent-evaluation.ts`:
- Around line 96-107: In buildAgentRuleScores (used by evaluateFilesInAgentMode)
don't treat matchedFiles.size === 0 as a perfect pass; instead either omit that
rule from results or mark it explicitly as "not evaluated" (e.g., set score to 0
and scoreText to 'N/A') so disabled/out-of-scope prompts aren't shown as
successful; update the same logic where it's repeated (around the other block at
lines ~221-225) and reference normalizeRuleSource, findingsByRule, and
filesByRuleSource when locating the conditional to change.

In `@src/cli/file-evaluation.ts`:
- Around line 248-250: The map key currently uses (p.meta.id || p.filename)
which allows collisions across packs; change the key to the pack-qualified rule
name used elsewhere so entries are unique per pack. Specifically, build the key
by prefixing the rule id or filename with the pack identifier (for example using
p.meta.pack or p.pack if present) and use that value when calling
allScores.set(ruleName, promptResult.scoreEntries) so summaries are namespaced
by pack rather than overwritten by duplicate ids.
- Around line 71-79: Replace the generic Error thrown when frontmatter is
missing criteria with the repository's config error type: catch the condition
inside the evaluatorType !== baseEvaluatorType branch and throw ConfigError (or
the repo's equivalent) instead of new Error, include the promptFile.filename in
the message, and ensure ConfigError is imported where file-evaluation.ts uses
evaluatorType, baseEvaluatorType, meta and promptFile.filename so
handleUnknownError() can classify the validation failure properly.

In `@src/cli/issue-output.ts`:
- Around line 22-31: The buildRuleName function can emit leading dots when
packName is an empty string; update buildRuleName (which takes packName, ruleId,
criterionId) to drop empty or falsy segments before joining so empty packName
(or empty criterionId) doesn't produce malformed names; e.g., filter the parts
array to remove ''/falsy values before returning parts.join('.') so only
non-empty segments are joined.

In `@src/cli/result-routing.ts`:
- Around line 395-403: The lookup for criterionId drops to prompt-level because
meta.criteria is matched case-sensitively; update the match in the
violationsByCriterion loop so criterionId is resolved case-insensitively by
comparing criterionName.toLowerCase() with each c.name.toLowerCase() (guarding
for undefined names) before calling buildRuleName(promptFile.pack, promptId,
criterionId) so criterion-specific rules are used even when the model returns
different case.

In `@src/config/global-config.ts`:
- Around line 18-20: Replace the outdated template comment that begins with "#
In agent mode, the top-level loop uses the default provider, lint uses the
default provider unless a call passes model, and delegated sub-agents do the
same." in src/config/global-config.ts so it reflects the new defaults: top-level
agent work should state it defaults to high-capability, the bundled "lint"
should state it defaults to mid-capability, and delegated "agent" calls should
state they default to high-capability when "model" is omitted; update the
wording accordingly and keep the same comment block location and formatting.

In `@tests/agent/agent-executor.test.ts`:
- Around line 348-395: The test "uses the default provider when the agent tool
omits model" incorrectly asserts that resolveCapabilityProvider should not be
called and thus permits the executor to use the default provider for delegated
agent() calls; update the test to expect the executor will call
resolveCapabilityProvider with the high-capability tier when model is omitted.
Specifically, in the test around runAgentExecutor invocation, change the
resolveCapabilityProvider stub to assert it is called with argument 'high-cap'
(or 'high-capability' if that is the code's canonical string) and return the
high-capability provider, incrementing requestedTierCount accordingly, and
adjust the final expectation to ensure requestedTierCount reflects that
resolveCapabilityProvider('high-cap') was invoked; keep references to
runAgentExecutor and resolveCapabilityProvider so the change is applied to the
right place.

---

Nitpick comments:
In `@src/agent/sub-agent.ts`:
- Around line 42-48: The catch block in sub-agent.ts currently returns only {
ok: false, error: message } and drops any provider token usage; change the error
path in the catch inside the function handling provider calls (the catch that
builds the returned object) to include usage when present by extracting usage
from the thrown error or from the local usage variable (e.g., check (error as
any).usage || usage) and return it as a usage field alongside ok:false and error
so partial token usage is preserved for aggregation.

In `@src/agent/types.ts`:
- Around line 35-47: Replace plain z.string().min(1) usages with the existing
TRIMMED_NON_BLANK_STRING to prevent whitespace-only values: update
TOP_LEVEL_REFERENCE_SCHEMA's file to use TRIMMED_NON_BLANK_STRING, update
TOP_LEVEL_REPORT_INPUT_SCHEMA's ruleSource and message to use
TRIMMED_NON_BLANK_STRING, and make suggestion
TRIMMED_NON_BLANK_STRING.optional() if you want to trim/validate non-blank
optional suggestions; keep references as-is. Ensure you import or reference the
TRIMMED_NON_BLANK_STRING constant where these schemas are defined.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ca0756e3-a098-42fd-8d94-ea709272b01a

📥 Commits

Reviewing files that changed from the base of the PR and between 18deb3a and af86909.

📒 Files selected for processing (22)
  • CONFIGURATION.md
  • README.md
  • src/agent/executor.ts
  • src/agent/sub-agent.ts
  • src/agent/types.ts
  • src/cli/agent-evaluation.ts
  • src/cli/agent-mode-capability.ts
  • src/cli/commands.ts
  • src/cli/file-evaluation.ts
  • src/cli/issue-output.ts
  • src/cli/orchestrator.ts
  • src/cli/result-routing.ts
  • src/cli/types.ts
  • src/config/global-config.ts
  • src/prompts/schema.ts
  • src/providers/capability-provider-resolver.ts
  • src/providers/model-capability.ts
  • tests/agent/agent-executor.test.ts
  • tests/agent/types-contract.test.ts
  • tests/orchestrator-agent-output.test.ts
  • tests/prompt-schema.test.ts
  • tests/provider-factory.test.ts
✅ Files skipped from review due to trivial changes (2)
  • CONFIGURATION.md
  • README.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/cli/types.ts
  • src/cli/commands.ts
  • tests/agent/types-contract.test.ts
  • tests/provider-factory.test.ts
  • src/prompts/schema.ts

Comment thread src/cli/agent-evaluation.ts Outdated
Comment thread src/cli/file-evaluation.ts Outdated
Comment thread src/cli/file-evaluation.ts
Comment thread src/cli/issue-output.ts
Comment thread src/cli/result-routing.ts Outdated
Comment thread src/config/global-config.ts Outdated
Comment on lines +18 to +20
# In agent mode, the top-level loop uses the default provider,
# lint uses the default provider unless a call passes model,
# and delegated sub-agents do the same.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update the agent-mode defaults in this template comment.

This text conflicts with the behavior described in this PR: top-level agent work should default to high-capability, bundled lint should default to mid-capability, and delegated agent calls should default to high-capability when model is omitted. Keeping this as-is can mislead users configuring tiers.

Suggested wording fix
-# In agent mode, the top-level loop uses the default provider,
-# lint uses the default provider unless a call passes model,
-# and delegated sub-agents do the same.
+# In agent mode, the top-level loop defaults to high-capability.
+# Bundled lint calls default to mid-capability unless model is explicitly set.
+# Delegated sub-agents default to high-capability when model is omitted.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# In agent mode, the top-level loop uses the default provider,
# lint uses the default provider unless a call passes model,
# and delegated sub-agents do the same.
# In agent mode, the top-level loop defaults to high-capability.
# Bundled lint calls default to mid-capability unless model is explicitly set.
# Delegated sub-agents default to high-capability when model is omitted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config/global-config.ts` around lines 18 - 20, Replace the outdated
template comment that begins with "# In agent mode, the top-level loop uses the
default provider, lint uses the default provider unless a call passes model, and
delegated sub-agents do the same." in src/config/global-config.ts so it reflects
the new defaults: top-level agent work should state it defaults to
high-capability, the bundled "lint" should state it defaults to mid-capability,
and delegated "agent" calls should state they default to high-capability when
"model" is omitted; update the wording accordingly and keep the same comment
block location and formatting.

Comment thread tests/agent/agent-executor.test.ts Outdated
- Extract resolvePresetsDir() into src/cli/preset-resolution.ts
- Both commands.ts and validate-command.ts now import and call the shared utility
- Utility handles dev mode (src/cli/ → ../../presets) and built mode (dist/cli/ → ../presets) via meta.json detection
- Eliminates path divergence risk between validate and main commands across execution modes
- Centralized rule-matching logic into `src/rules/matched-prompts.ts`
  with a single `resolveMatchedPromptsForFile()` export
- Replaced inline glob/pack/override resolution in `src/cli/file-evaluation.ts`
  with a call to the shared service
- Replaced inline glob/pack/override resolution in `src/agent/executor.ts`
  (`buildFileRuleMatches`) with a call to the shared service
- Eliminates divergence risk between standard and agent evaluation paths:
  both now use identical matching semantics and will stay in sync
- Add NoConfigurationFoundError to the errors module with code NO_CONFIGURATION_FOUND
- Throw NoConfigurationFoundError from ScanPathResolver instead of bare new Error(...)
- Replace string-based isNoConfigurationFoundError guard in executor.ts with instanceof check
- Remove NO_CONFIGURATION_FOUND_PREFIX constant from executor.ts
- Catch NoConfigurationFoundError in file-evaluation.ts and skip files gracefully
- Move printFileHeader call to after resolveMatchedPromptsForFile succeeds
  so orphaned headers are never printed for files with no matching config
- Add tests/file-evaluation.test.ts with a unit test verifying evaluateFile
  returns zeroed metrics (no throw) when the file matches no scanPaths entry
- Assert tokenUsage.totalInputTokens and totalOutputTokens are both 0
  on the zeroed-metrics result
- Add afterEach(() => vi.restoreAllMocks()) to clean up console spies
  between tests
- Extract IssueSink interface + 4 format-specific implementations
  (LineIssueSink, ValeJsonIssueSink, StructuredIssueSink, JsonIssueSink)
  into result-routing/issue-sink.ts; createIssueSink() factory owns all
  format branching so routing logic stays format-agnostic
- Extract check evaluation routing into result-routing/check-routing.ts;
  handles violation grouping by criterion, score calculation, and error
  tracking for check-mode prompts
- Extract judge evaluation routing into result-routing/judge-routing.ts;
  handles per-criterion extraction, score normalization, target validation,
  and EvaluationScore accumulation for judge-mode prompts
- Extract debug artifact writing into result-routing/debug-artifact.ts;
  writeCheckRoutingDebugArtifact and writeJudgeRoutingDebugArtifact are
  now isolated from the routing logic
- result-routing.ts reduced to ~50-line coordinator: creates sink, guards
  on isJudgeResult, delegates to routeCheckResult or routeJudgeResult
- issue-output.ts is now a thin helper: buildRuleName, getViolationFilterResults,
  locateAndReportViolations — all format-agnostic, delegating to IssueSink
- Reduces cognitive complexity of the original monolithic result-routing file;
  format branching is isolated to a single createIssueSink() factory
- Construct IssueSink once per agent evaluation run in evaluateFilesInAgentMode
- Pass sink into reportAgentFinding instead of raw outputFormat/jsonFormatter
- Remove reportIssue() wrapper from issue-output.ts (no remaining call sites)
- Remove toSinkIssue() private helper (only served reportIssue)
- Remove ReportIssueParams interface from types.ts (no longer referenced)
- issue-output.ts is now a thin helper delegating to IssueSink, with no inline format branching
…types

- Change StructuredIssueSink.formatter from private to protected so
  JsonIssueSink can access it via cast instead of redeclaring it
- Remove redundant private readonly jsonFormatter in JsonIssueSink;
  addEvaluationScore now casts this.formatter to JsonFormatter
- Remove unused interfaces from src/cli/types.ts:
  ProcessViolationsParams, ProcessCriterionParams,
  ProcessCriterionResult, ValidationParams
- Drop imports that became orphaned after the removals:
  PromptMeta, PromptCriterionSpec, JudgeResult, Severity, ScoreComponent
- Removed src/errors/validation-errors.ts: confirmed unused via grep (0 import hits outside the file itself)
- Removed src/boundaries/yaml-parser.ts: confirmed unused via grep (0 consumer hits; only reference was re-export in boundaries/index.ts, which is also removed)
- Moved @types/micromatch from dependencies to devDependencies (compile-time types only, not needed at runtime)
- Fixed async-without-await lint error in tests/rule-matching.test.ts introduced by a prior mock
…t params

- Remove RequestBuilder class and EvalContext type; runPromptStructured
  now takes (systemPrompt, userMessage, schema) — callers own prompt
  assembly instead of delegating to an injected builder
- Add src/prompts/system-prompt.ts: composeSystemPrompt() joins
  directive, userInstructions, and rule body with double newlines,
  dropping empty sections
- Add StructuredPromptContext to evaluator-registry; BaseEvaluator
  gains buildSystemPrompt() that calls composeSystemPrompt() so the
  directive and user-instruction context flows into every LLM call
- Pass structuredPromptContext through TechnicalAccuracyEvaluator and
  its nested BaseEvaluator so the claim-extraction path also benefits
- Remove builder injection from provider-factory and
  capability-provider-resolver; VercelAIProvider constructor simplified
- findings.ts: appendInlineFinding locates quoted text, applies the
  violation filter, and appends inline finding events to the session
  store; findingsFromEvents reconstructs AgentFindingRecord list from
  stored events
- lint-prompt.ts: buildMergedLintPrompt assembles a single merged
  rule body from multiple LintRuleCall entries; buildEffectiveRuleBody
  applies per-call reviewInstruction and context overrides
- tool-handlers.ts: createToolHandlers returns all AgentToolHandlers
  (lint, report_finding, read_file, search_files, list_directory,
  search_content, agent, finalize_review) as a single factory so the
  executor stays thin and handlers can be tested independently
- Update agent-executor tests to reflect the extracted module boundaries
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (1)
src/agent/tool-handlers.ts (1)

321-325: ⚠️ Potential issue | 🟠 Major

Route bare agent() calls through the high-capability resolver.

Falling back straight to defaultProvider skips the new capability-tier mapping on agent({ task }) calls when model is omitted. If that injected default ever differs from the configured high-cap provider, delegation will silently run on the wrong model, and the omission-model test will keep locking that behavior in.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tool-handlers.ts` around lines 321 - 325, The handler currently
falls back to defaultProvider when parsed.model is absent, bypassing the
capability-tier resolver; change agentToolHandler so it always uses
resolveCapabilityProvider for determining the provider (i.e., remove the direct
defaultProvider fallback and call resolveCapabilityProvider with the
parsed.model even when it's undefined/omitted) so bare agent({ task }) calls are
routed through the high-capability resolver rather than directly to
defaultProvider; update any related usage of AGENT_TOOL_INPUT_SCHEMA.parse and
the provider selection logic accordingly.
🧹 Nitpick comments (10)
src/cli/preset-resolution.ts (1)

10-21: Extract shared path/meta literals into constants

These literals represent shared runtime concepts ('../presets', '../../presets', 'meta.json'). Defining constants here avoids drift and makes future resolver changes safer.

♻️ Suggested refactor
 export function resolvePresetsDir(dir: string): string {
-  const buildPath = path.resolve(dir, '../presets');
-  if (existsSync(path.join(buildPath, 'meta.json'))) {
+  const BUILD_PRESETS_RELATIVE = '../presets';
+  const DEV_PRESETS_RELATIVE = '../../presets';
+  const PRESETS_META_FILE = 'meta.json';
+
+  const buildPath = path.resolve(dir, BUILD_PRESETS_RELATIVE);
+  if (existsSync(path.join(buildPath, PRESETS_META_FILE))) {
     return buildPath;
   }
 
   // Dev mode fallback: src/cli/ → ../../presets
-  const devPath = path.resolve(dir, '../../presets');
-  if (existsSync(path.join(devPath, 'meta.json'))) {
+  const devPath = path.resolve(dir, DEV_PRESETS_RELATIVE);
+  if (existsSync(path.join(devPath, PRESETS_META_FILE))) {
     return devPath;
   }

As per coding guidelines: "Define shared domain constants, enums, or types for core runtime concepts; avoid magic strings in shared runtime code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/preset-resolution.ts` around lines 10 - 21, Extract the magic strings
into module-level constants (e.g., PRESETS_BUILD_RELATIVE = '../presets',
PRESETS_DEV_RELATIVE = '../../presets', PRESETS_META_FILENAME = 'meta.json') and
replace uses of the literals in the preset-resolution logic (references:
buildPath, devPath and the existsSync(path.join(..., 'meta.json')) checks) to
use those constants; ensure the resolved buildPath and devPath are computed
using the new constants and that the thrown Error message remains descriptive
(you may interpolate the constants instead of raw strings).
tests/preset-resolution.test.ts (1)

47-55: Add an explicit precedence test for dist/cli when both candidates exist

Current coverage validates fallback, but not the “first candidate wins” contract when both dist/presets/meta.json and presets/meta.json are present. A targeted case will prevent future regression in candidate ordering.

🧪 Suggested test addition
+  it('prefers dist/presets when both dist and root presets exist from dist/cli', () => {
+    const root = setupTree({
+      'dist/presets/meta.json': JSON.stringify({ presets: { dist: true } }),
+      'presets/meta.json': JSON.stringify({ presets: { root: true } }),
+    });
+
+    const resolved = resolvePresetsDir(path.join(root, 'dist', 'cli'));
+
+    expect(resolved).toBe(path.join(root, 'dist', 'presets'));
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/preset-resolution.test.ts` around lines 47 - 55, Add a test that
asserts candidate precedence by creating both 'dist/presets/meta.json' and
'presets/meta.json' in the test tree, then call resolvePresetsDir with a
starting path under 'dist/cli' and expect the function to return the
'dist/presets' path (i.e., first candidate wins). Use the same helpers used
elsewhere in the file (setupTree, resolvePresetsDir, path.join) to construct the
tree and assert the expected resolved directory.
tests/provider-factory.test.ts (1)

30-33: Utility helper for test assertions.

The getProviderModelName helper safely navigates the nested provider config structure. Consider adding a brief inline comment documenting why this specific path (config.model.model) is expected, to help future maintainers understand the provider structure.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/provider-factory.test.ts` around lines 30 - 33, The helper
getProviderModelName safely returns provider.config?.model?.model but lacks
context; add a one-line inline comment above the function stating that provider
objects store model identifier under config.model.model (e.g., config holds
provider settings and model is an object with a model string) so future
maintainers understand why the nested path is used; keep the comment concise and
colocated with the getProviderModelName declaration.
src/cli/result-routing/debug-artifact.ts (2)

34-34: Each artifact gets a unique runId.

A new UUID is generated per writeArtifact call, meaning each file in a single CLI run gets a separate runId. If correlation across files in the same run is needed for debugging, consider passing a shared runId from the caller.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/result-routing/debug-artifact.ts` at line 34, The current
debug-artifact implementation generates a new runId via randomUUID() inside the
module, so every call to writeArtifact gets a distinct runId; change
writeArtifact (and its callers) to accept an optional runId parameter (or accept
a shared runId from the higher-level CLI entrypoint) and only generate a new
randomUUID() inside writeArtifact when no runId is provided; update references
to the symbol writeArtifact (and any factory/constructor that creates debug
artifacts) to pass a shared runId when correlation across files in a single CLI
run is required.

30-49: Consider using injected logger instead of console.warn.

The coding guidelines recommend routing runtime logging through an injected logger interface. While console.warn is acceptable for debug output, using the logger abstraction would provide consistency and allow operators to control log routing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/result-routing/debug-artifact.ts` around lines 30 - 49, The
writeArtifact function currently uses console.warn; change it to use the
project's injected logger abstraction instead: update the writeArtifact
signature in src/cli/result-routing/debug-artifact.ts to accept a Logger (or
obtain the logger via the existing injection helper used elsewhere), then
replace both console.warn calls with logger.warn (or logger.error for the catch
if your logger has levels). Ensure you reference the same Logger interface/type
used elsewhere in the codebase and propagate the new parameter to callers of
writeArtifact (or use the central getLogger helper) so runtime logs are routed
through the logger abstraction.
src/rules/matched-prompts.ts (1)

52-53: Consider injecting ScanPathResolver for testability.

A new ScanPathResolver instance is created on each call. While functionally correct (the resolver is stateless), injecting it as an optional parameter would improve testability and align with the codebase's dependency injection patterns.

♻️ Optional: Accept resolver as parameter
 export function resolveMatchedPromptsForFile(params: {
   filePath: string;
   prompts: PromptFile[];
   scanPaths: FilePatternConfig[];
+  resolver?: ScanPathResolver;
 }): MatchedPromptResolution {
-  const { filePath, prompts, scanPaths } = params;
+  const { filePath, prompts, scanPaths, resolver = new ScanPathResolver() } = params;
   const availablePacks = getAvailablePacks(prompts);

   if (scanPaths.length === 0) {
     return {
       prompts: [...prompts],
       packs: availablePacks,
       overrides: {},
     };
   }

-  const resolver = new ScanPathResolver();
   const resolution = resolver.resolveConfiguration(filePath, scanPaths, availablePacks);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/rules/matched-prompts.ts` around lines 52 - 53, The code instantiates a
new ScanPathResolver inside the function; change the function that contains
"const resolver = new ScanPathResolver();" to accept an optional
ScanPathResolver parameter (e.g., resolver?: ScanPathResolver) and default it to
new ScanPathResolver() when undefined, then replace the direct instantiation
with the parameter and call resolver.resolveConfiguration(filePath, scanPaths,
availablePacks); update any callers to pass a mocked or real resolver in tests
as needed so resolveConfiguration remains injectable for unit tests.
tests/result-routing/judge-routing.test.ts (1)

123-124: Use try/finally for console spy restoration.

If an assertion fails before Line 157, console mocks can leak into later tests.

Safer spy lifecycle pattern
   it('reports missing targets without relying on formatter branches', () => {
     const sink = new RecordingSink();
     const errorSpy = vi.spyOn(console, 'error').mockImplementation(() => {});
     const warnSpy = vi.spyOn(console, 'warn').mockImplementation(() => {});
-
-    const result = routeJudgeResult({
-      promptFile: makePromptFile({
-        criteria: [
-          {
-            id: 'Clarity',
-            name: 'Clarity',
-            weight: 2,
-            target: { regex: '^## Required$', flags: 'm', required: true, suggestion: 'Add ## Required' },
-          },
-        ],
-      }),
-      result: makeJudgeResult(),
-      content: 'No heading here',
-      relFile: 'doc.md',
-      sink,
-    });
-
-    expect(result).toMatchObject({
-      errors: 1,
-      warnings: 0,
-      hadOperationalErrors: false,
-      hadSeverityErrors: true,
-      scoreEntries: [{ id: 'Default.Review.Clarity', scoreText: '0.0/10', score: 0 }],
-    });
-    expect(sink.issues[0]).toMatchObject({
-      summary: 'target not found',
-      ruleName: 'Default.Review.Clarity',
-      severity: Severity.ERROR,
-      suggestion: 'Add ## Required',
-    });
-
-    errorSpy.mockRestore();
-    warnSpy.mockRestore();
+    try {
+      const result = routeJudgeResult({
+        promptFile: makePromptFile({
+          criteria: [
+            {
+              id: 'Clarity',
+              name: 'Clarity',
+              weight: 2,
+              target: { regex: '^## Required$', flags: 'm', required: true, suggestion: 'Add ## Required' },
+            },
+          ],
+        }),
+        result: makeJudgeResult(),
+        content: 'No heading here',
+        relFile: 'doc.md',
+        sink,
+      });
+
+      expect(result).toMatchObject({
+        errors: 1,
+        warnings: 0,
+        hadOperationalErrors: false,
+        hadSeverityErrors: true,
+        scoreEntries: [{ id: 'Default.Review.Clarity', scoreText: '0.0/10', score: 0 }],
+      });
+      expect(sink.issues[0]).toMatchObject({
+        summary: 'target not found',
+        ruleName: 'Default.Review.Clarity',
+        severity: Severity.ERROR,
+        suggestion: 'Add ## Required',
+      });
+    } finally {
+      errorSpy.mockRestore();
+      warnSpy.mockRestore();
+    }
   });

Also applies to: 157-158

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/result-routing/judge-routing.test.ts` around lines 123 - 124, Wrap the
console spy setup (vi.spyOn(console, 'error') and vi.spyOn(console, 'warn')
stored in errorSpy and warnSpy) around the test execution using a try/finally so
the spies are restored in finally (call errorSpy.mockRestore() and
warnSpy.mockRestore()); do the same for the other console spies referenced later
(the ones at the second occurrence) so failures don’t leak mocked console
behavior into subsequent tests.
tests/file-evaluation.test.ts (1)

16-21: Clean up temp directories to avoid test artifact buildup.

Line 17 creates per-test temp dirs that are never removed. Consider deleting them in afterEach.

Suggested cleanup refactor
-import { mkdtempSync, writeFileSync } from 'fs';
+import { mkdtempSync, writeFileSync, rmSync } from 'fs';
@@
-function createTempFile(content: string): string {
+function createTempFile(content: string): { dir: string; filePath: string } {
   const dir = mkdtempSync(path.join(tmpdir(), 'vectorlint-file-eval-'));
   const filePath = path.join(dir, 'input.md');
   writeFileSync(filePath, content);
-  return filePath;
+  return { dir, filePath };
 }
@@
 describe('evaluateFile', () => {
+  const tempDirs: string[] = [];
+
@@
-  afterEach(() => vi.restoreAllMocks());
+  afterEach(() => {
+    vi.restoreAllMocks();
+    for (const dir of tempDirs.splice(0)) {
+      rmSync(dir, { recursive: true, force: true });
+    }
+  });
@@
-    const file = createTempFile('Hello world\n');
+    const { dir, filePath: file } = createTempFile('Hello world\n');
+    tempDirs.push(dir);

Also applies to: 30-30

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/file-evaluation.test.ts` around lines 16 - 21, The tests create
per-test temp dirs in createTempFile (uses mkdtempSync/tmpdir) but never remove
them; update the test file to track created temp directories (e.g., push each
dir returned by createTempFile into an array) and add an afterEach hook that
iterates the tracked dirs and removes them with a recursive/force removal call
(use fs.rmSync or equivalent), then clear the tracker; keep references to
createTempFile and afterEach so the cleanup is clearly tied to the temp-dir
creation.
src/cli/result-routing/judge-routing.ts (1)

64-79: Route these model-contract diagnostics through the injected logger.

This new shared runtime path writes directly to console.error/console.warn, which makes embedding and test control harder than the rest of the runtime. Please thread the existing logger abstraction through this router instead.

As per coding guidelines, "Route runtime logging through an injected logger interface; keep concrete logger implementations behind the abstraction".

Also applies to: 96-99

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/result-routing/judge-routing.ts` around lines 64 - 79, The loops in
judge-routing.ts write diagnostics directly to console.error/console.warn;
instead accept or use the injected logger abstraction (e.g., a Logger parameter
or existing router.logger) and replace console.error(...) with logger.error(...)
and console.warn(...) with logger.warn(...), preserving the same messages and
the hadErrors flag semantics for the missing-criterion check that uses
expectedNormalized/returnedNormalized/expectedOriginalMap, and make the same
replacements for the other occurrence around returnedNames (lines ~96-99).
tests/agent/agent-executor.test.ts (1)

1238-1242: Make the isolation assertion actually cover Rule 2.

These checks only prove Rule 1's reviewInstruction/context appear somewhere in the merged prompt. The test still passes if that text is copied into Rule 2 as well, so it doesn't really verify per-member isolation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/agent/agent-executor.test.ts` around lines 1238 - 1242, The test
currently only checks promptBodies[0] and thus doesn't verify per-member
isolation; update the assertions to explicitly verify Rule 2's content in the
prompt body produced for Rule 2 (e.g., check promptBodies[1] or the element
corresponding to Rule 2) by asserting it contains 'Rule 2' and 'Check link
targets.' and also add a negative assertion that promptBodies[1] does not
contain Rule 1's unique text (like 'Required context for this review:\nEvidence
from docs/glossary.md') so each rule's reviewInstruction/context is isolated;
refer to the promptBodies array used in the test to locate where to add these
checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/agent/executor.ts`:
- Around line 377-381: You computed fileRuleMatches/matchedRuleUnits via
buildFileRuleMatches and buildMatchedRuleUnits (using promptBySource and
MATCHED_RULE_UNIT_TOKEN_BUDGET) but the handler and tool-loop still expose all
ruleSources; fix by building an allowedRulesByFile map from fileRuleMatches
(keys = normalized file paths, values = set of normalized ruleSource strings
using normalizeRuleSource) and pass that map into the lint handler and into the
tool-loop prompt generation so only matched sources are advertised; update the
lint(file, rules[]) handler to immediately reject or ignore any (file,
normalizeRuleSource(rule.ruleSource)) pair not present in allowedRulesByFile
(return an error or no-op) so globally validRules cannot be applied to unmatched
files, and apply the same restriction when constructing
matchedRuleUnits/promptBySource usage in the tool-loop.

In `@src/agent/tool-handlers.ts`:
- Around line 206-226: The payload is still persisting the raw parsed.references
instead of the per-iteration, normalized values; create a sanitizedReferences
array by mapping parsed.references (or the default single-item array) to
resolved, canonical entries using resolveTargetForTopLevel(workspaceRoot,
targets, reference.file) and normalized start/end/line values (as you do for
relFile and line) and then pass that sanitizedReferences into store.append's
payload (replace ...(parsed.references ? { references: parsed.references } : {})
with the normalized references object) so persisted events match the emitted
relFile and line values.

In `@src/cli/preset-resolution.ts`:
- Line 21: Replace the raw throw new Error(...) in preset-resolution.ts with the
repository's custom error type used for not-found or resolution failures (import
the appropriate class from the project's errors module, e.g.,
NotFoundError/RepositoryError/AppError) so the thrown exception participates in
the app's error hierarchy; preserve the original message but pass buildPath and
devPath (or attach them as metadata) to the custom error constructor, and update
any imports to pull the custom error class into the file.

In `@src/cli/result-routing/debug-artifact.ts`:
- Around line 7-28: The getModelInfoFromEnv function omits the 'bedrock'
provider so Bedrock model names aren't captured; add a switch case for 'bedrock'
in getModelInfoFromEnv (alongside cases for 'openai', 'anthropic', etc.) and
assign the model name from the appropriate env var (e.g.,
process.env.BEDROCK_MODEL) to the local name variable so the returned object
includes provider and name (and tag) when LLM_PROVIDER === 'bedrock'.

In `@src/cli/result-routing/issue-sink.ts`:
- Around line 46-58: The payload construction in issue-sink.ts is adding a
non‑declared field "fix" into the JsonIssue shape (const payload) which violates
the JsonIssue contract defined in src/output/vale-json-formatter.ts; either
remove the spread that injects issue.fix from the payload (delete or stop using
...(issue.fix !== undefined ? { fix: issue.fix } : {})) to keep output inside
the declared schema, or if "fix" is required, add it end-to-end by updating the
JsonIssue type in src/output/vale-json-formatter.ts and all related formatter
code and tests to include and validate the new fix field. Ensure changes target
the JsonIssue type and the payload construction (payload) and update tests
accordingly if you choose the second path.

---

Duplicate comments:
In `@src/agent/tool-handlers.ts`:
- Around line 321-325: The handler currently falls back to defaultProvider when
parsed.model is absent, bypassing the capability-tier resolver; change
agentToolHandler so it always uses resolveCapabilityProvider for determining the
provider (i.e., remove the direct defaultProvider fallback and call
resolveCapabilityProvider with the parsed.model even when it's
undefined/omitted) so bare agent({ task }) calls are routed through the
high-capability resolver rather than directly to defaultProvider; update any
related usage of AGENT_TOOL_INPUT_SCHEMA.parse and the provider selection logic
accordingly.

---

Nitpick comments:
In `@src/cli/preset-resolution.ts`:
- Around line 10-21: Extract the magic strings into module-level constants
(e.g., PRESETS_BUILD_RELATIVE = '../presets', PRESETS_DEV_RELATIVE =
'../../presets', PRESETS_META_FILENAME = 'meta.json') and replace uses of the
literals in the preset-resolution logic (references: buildPath, devPath and the
existsSync(path.join(..., 'meta.json')) checks) to use those constants; ensure
the resolved buildPath and devPath are computed using the new constants and that
the thrown Error message remains descriptive (you may interpolate the constants
instead of raw strings).

In `@src/cli/result-routing/debug-artifact.ts`:
- Line 34: The current debug-artifact implementation generates a new runId via
randomUUID() inside the module, so every call to writeArtifact gets a distinct
runId; change writeArtifact (and its callers) to accept an optional runId
parameter (or accept a shared runId from the higher-level CLI entrypoint) and
only generate a new randomUUID() inside writeArtifact when no runId is provided;
update references to the symbol writeArtifact (and any factory/constructor that
creates debug artifacts) to pass a shared runId when correlation across files in
a single CLI run is required.
- Around line 30-49: The writeArtifact function currently uses console.warn;
change it to use the project's injected logger abstraction instead: update the
writeArtifact signature in src/cli/result-routing/debug-artifact.ts to accept a
Logger (or obtain the logger via the existing injection helper used elsewhere),
then replace both console.warn calls with logger.warn (or logger.error for the
catch if your logger has levels). Ensure you reference the same Logger
interface/type used elsewhere in the codebase and propagate the new parameter to
callers of writeArtifact (or use the central getLogger helper) so runtime logs
are routed through the logger abstraction.

In `@src/cli/result-routing/judge-routing.ts`:
- Around line 64-79: The loops in judge-routing.ts write diagnostics directly to
console.error/console.warn; instead accept or use the injected logger
abstraction (e.g., a Logger parameter or existing router.logger) and replace
console.error(...) with logger.error(...) and console.warn(...) with
logger.warn(...), preserving the same messages and the hadErrors flag semantics
for the missing-criterion check that uses
expectedNormalized/returnedNormalized/expectedOriginalMap, and make the same
replacements for the other occurrence around returnedNames (lines ~96-99).

In `@src/rules/matched-prompts.ts`:
- Around line 52-53: The code instantiates a new ScanPathResolver inside the
function; change the function that contains "const resolver = new
ScanPathResolver();" to accept an optional ScanPathResolver parameter (e.g.,
resolver?: ScanPathResolver) and default it to new ScanPathResolver() when
undefined, then replace the direct instantiation with the parameter and call
resolver.resolveConfiguration(filePath, scanPaths, availablePacks); update any
callers to pass a mocked or real resolver in tests as needed so
resolveConfiguration remains injectable for unit tests.

In `@tests/agent/agent-executor.test.ts`:
- Around line 1238-1242: The test currently only checks promptBodies[0] and thus
doesn't verify per-member isolation; update the assertions to explicitly verify
Rule 2's content in the prompt body produced for Rule 2 (e.g., check
promptBodies[1] or the element corresponding to Rule 2) by asserting it contains
'Rule 2' and 'Check link targets.' and also add a negative assertion that
promptBodies[1] does not contain Rule 1's unique text (like 'Required context
for this review:\nEvidence from docs/glossary.md') so each rule's
reviewInstruction/context is isolated; refer to the promptBodies array used in
the test to locate where to add these checks.

In `@tests/file-evaluation.test.ts`:
- Around line 16-21: The tests create per-test temp dirs in createTempFile (uses
mkdtempSync/tmpdir) but never remove them; update the test file to track created
temp directories (e.g., push each dir returned by createTempFile into an array)
and add an afterEach hook that iterates the tracked dirs and removes them with a
recursive/force removal call (use fs.rmSync or equivalent), then clear the
tracker; keep references to createTempFile and afterEach so the cleanup is
clearly tied to the temp-dir creation.

In `@tests/preset-resolution.test.ts`:
- Around line 47-55: Add a test that asserts candidate precedence by creating
both 'dist/presets/meta.json' and 'presets/meta.json' in the test tree, then
call resolvePresetsDir with a starting path under 'dist/cli' and expect the
function to return the 'dist/presets' path (i.e., first candidate wins). Use the
same helpers used elsewhere in the file (setupTree, resolvePresetsDir,
path.join) to construct the tree and assert the expected resolved directory.

In `@tests/provider-factory.test.ts`:
- Around line 30-33: The helper getProviderModelName safely returns
provider.config?.model?.model but lacks context; add a one-line inline comment
above the function stating that provider objects store model identifier under
config.model.model (e.g., config holds provider settings and model is an object
with a model string) so future maintainers understand why the nested path is
used; keep the comment concise and colocated with the getProviderModelName
declaration.

In `@tests/result-routing/judge-routing.test.ts`:
- Around line 123-124: Wrap the console spy setup (vi.spyOn(console, 'error')
and vi.spyOn(console, 'warn') stored in errorSpy and warnSpy) around the test
execution using a try/finally so the spies are restored in finally (call
errorSpy.mockRestore() and warnSpy.mockRestore()); do the same for the other
console spies referenced later (the ones at the second occurrence) so failures
don’t leak mocked console behavior into subsequent tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa428ebb-28d2-419c-a121-caf070bd715f

📥 Commits

Reviewing files that changed from the base of the PR and between af86909 and 697f837.

📒 Files selected for processing (45)
  • package.json
  • src/agent/executor.ts
  • src/agent/findings.ts
  • src/agent/lint-prompt.ts
  • src/agent/tool-handlers.ts
  • src/boundaries/index.ts
  • src/boundaries/scan-path-resolver.ts
  • src/boundaries/yaml-parser.ts
  • src/cli/agent-evaluation.ts
  • src/cli/commands.ts
  • src/cli/file-evaluation.ts
  • src/cli/issue-output.ts
  • src/cli/preset-resolution.ts
  • src/cli/result-routing.ts
  • src/cli/result-routing/check-routing.ts
  • src/cli/result-routing/debug-artifact.ts
  • src/cli/result-routing/issue-sink.ts
  • src/cli/result-routing/judge-routing.ts
  • src/cli/types.ts
  • src/cli/validate-command.ts
  • src/errors/index.ts
  • src/errors/validation-errors.ts
  • src/evaluators/accuracy-evaluator.ts
  • src/evaluators/base-evaluator.ts
  • src/evaluators/evaluator-registry.ts
  • src/prompts/system-prompt.ts
  • src/providers/capability-provider-resolver.ts
  • src/providers/index.ts
  • src/providers/llm-provider.ts
  • src/providers/provider-factory.ts
  • src/providers/request-builder.ts
  • src/providers/vercel-ai-provider.ts
  • src/rules/matched-prompts.ts
  • tests/agent/agent-executor.test.ts
  • tests/file-evaluation.test.ts
  • tests/preset-resolution.test.ts
  • tests/provider-factory.test.ts
  • tests/request-builder.test.ts
  • tests/result-routing/check-routing.test.ts
  • tests/result-routing/issue-sink.test.ts
  • tests/result-routing/judge-routing.test.ts
  • tests/rule-matching.test.ts
  • tests/scoring-types.test.ts
  • tests/system-prompt.test.ts
  • tests/vercel-ai-provider.test.ts
💤 Files with no reviewable changes (6)
  • src/boundaries/index.ts
  • src/providers/index.ts
  • src/boundaries/yaml-parser.ts
  • src/providers/request-builder.ts
  • tests/request-builder.test.ts
  • src/errors/validation-errors.ts
✅ Files skipped from review due to trivial changes (3)
  • src/providers/capability-provider-resolver.ts
  • src/cli/issue-output.ts
  • src/cli/agent-evaluation.ts
🚧 Files skipped from review as they are similar to previous changes (4)
  • src/cli/commands.ts
  • src/providers/llm-provider.ts
  • src/providers/provider-factory.ts
  • src/cli/file-evaluation.ts

Comment thread src/agent/executor.ts Outdated
Comment on lines +206 to +226
const references = parsed.references && parsed.references.length > 0
? parsed.references
: [{ file: resolveTargetForTopLevel(workspaceRoot, targets), startLine: 1, endLine: 1 }];

let findingsRecorded = 0;
for (const reference of references) {
const relFile = resolveTargetForTopLevel(workspaceRoot, targets, reference.file);
onFindingRecorded();
findingsRecorded += 1;
await store.append({
eventType: SESSION_EVENT_TYPE.FindingRecordedTopLevel,
payload: {
file: relFile,
line: Math.max(1, Math.trunc(reference.startLine)),
column: 1,
severity: severityFromPrompt(prompt),
ruleId: buildRuleId(prompt),
ruleSource: normalizeRuleSource(parsed.ruleSource),
message: parsed.message,
...(parsed.suggestion ? { suggestion: parsed.suggestion } : {}),
...(parsed.references ? { references: parsed.references } : {}),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Persist the sanitized references, not parsed.references.

Each reference is resolved through resolveTargetForTopLevel(), but the event payload still writes the raw model-supplied array. That can leave absolute or non-canonical in-repo paths in persisted session events/operator artifacts, and those paths won't match the normalized file you emit for the same finding.

🧹 One way to normalize before persisting
-    const references = parsed.references && parsed.references.length > 0
-      ? parsed.references
-      : [{ file: resolveTargetForTopLevel(workspaceRoot, targets), startLine: 1, endLine: 1 }];
+    const references = (parsed.references && parsed.references.length > 0
+      ? parsed.references
+      : [{ file: resolveTargetForTopLevel(workspaceRoot, targets), startLine: 1, endLine: 1 }]
+    ).map((reference) => ({
+      file: resolveTargetForTopLevel(workspaceRoot, targets, reference.file),
+      startLine: Math.max(1, Math.trunc(reference.startLine)),
+      endLine: Math.max(1, Math.trunc(reference.endLine)),
+    }));
@@
-      const relFile = resolveTargetForTopLevel(workspaceRoot, targets, reference.file);
+      const relFile = reference.file;
@@
-          ...(parsed.references ? { references: parsed.references } : {}),
+          references,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/agent/tool-handlers.ts` around lines 206 - 226, The payload is still
persisting the raw parsed.references instead of the per-iteration, normalized
values; create a sanitizedReferences array by mapping parsed.references (or the
default single-item array) to resolved, canonical entries using
resolveTargetForTopLevel(workspaceRoot, targets, reference.file) and normalized
start/end/line values (as you do for relFile and line) and then pass that
sanitizedReferences into store.append's payload (replace ...(parsed.references ?
{ references: parsed.references } : {}) with the normalized references object)
so persisted events match the emitted relFile and line values.

return devPath;
}

throw new Error(`Could not locate presets directory containing meta.json. Looked in ${buildPath} and ${devPath}`);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use repository error types instead of raw Error

Line 21 throws new Error(...), which bypasses the custom error hierarchy used in src/ and weakens typed error handling consistency.

As per coding guidelines: "Prefer the repository's custom error hierarchy over native Error; catch blocks use unknown type and extend existing custom error types before introducing raw exceptions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/preset-resolution.ts` at line 21, Replace the raw throw new
Error(...) in preset-resolution.ts with the repository's custom error type used
for not-found or resolution failures (import the appropriate class from the
project's errors module, e.g., NotFoundError/RepositoryError/AppError) so the
thrown exception participates in the app's error hierarchy; preserve the
original message but pass buildPath and devPath (or attach them as metadata) to
the custom error constructor, and update any imports to pull the custom error
class into the file.

Comment on lines +7 to +28
function getModelInfoFromEnv(): { provider?: string; name?: string; tag?: string } {
const provider = process.env.LLM_PROVIDER;
let name: string | undefined;

switch (provider) {
case 'openai':
name = process.env.OPENAI_MODEL;
break;
case 'anthropic':
name = process.env.ANTHROPIC_MODEL;
break;
case 'azure-openai':
name = process.env.AZURE_OPENAI_DEPLOYMENT_NAME;
break;
case 'gemini':
name = process.env.GEMINI_MODEL;
break;
}

const tag = [provider, name].filter(Boolean).join('-');
return { ...(provider && { provider }), ...(name && { name }), ...(tag && { tag }) };
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing Bedrock provider case in model info extraction.

The switch statement handles openai, anthropic, azure-openai, and gemini, but omits the bedrock provider. When Bedrock is configured, the model name won't be captured in debug artifacts.

🔧 Add Bedrock case
     case 'gemini':
       name = process.env.GEMINI_MODEL;
       break;
+    case 'bedrock':
+      name = process.env.BEDROCK_MODEL;
+      break;
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function getModelInfoFromEnv(): { provider?: string; name?: string; tag?: string } {
const provider = process.env.LLM_PROVIDER;
let name: string | undefined;
switch (provider) {
case 'openai':
name = process.env.OPENAI_MODEL;
break;
case 'anthropic':
name = process.env.ANTHROPIC_MODEL;
break;
case 'azure-openai':
name = process.env.AZURE_OPENAI_DEPLOYMENT_NAME;
break;
case 'gemini':
name = process.env.GEMINI_MODEL;
break;
}
const tag = [provider, name].filter(Boolean).join('-');
return { ...(provider && { provider }), ...(name && { name }), ...(tag && { tag }) };
}
function getModelInfoFromEnv(): { provider?: string; name?: string; tag?: string } {
const provider = process.env.LLM_PROVIDER;
let name: string | undefined;
switch (provider) {
case 'openai':
name = process.env.OPENAI_MODEL;
break;
case 'anthropic':
name = process.env.ANTHROPIC_MODEL;
break;
case 'azure-openai':
name = process.env.AZURE_OPENAI_DEPLOYMENT_NAME;
break;
case 'gemini':
name = process.env.GEMINI_MODEL;
break;
case 'bedrock':
name = process.env.BEDROCK_MODEL;
break;
}
const tag = [provider, name].filter(Boolean).join('-');
return { ...(provider && { provider }), ...(name && { name }), ...(tag && { tag }) };
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/result-routing/debug-artifact.ts` around lines 7 - 28, The
getModelInfoFromEnv function omits the 'bedrock' provider so Bedrock model names
aren't captured; add a switch case for 'bedrock' in getModelInfoFromEnv
(alongside cases for 'openai', 'anthropic', etc.) and assign the model name from
the appropriate env var (e.g., process.env.BEDROCK_MODEL) to the local name
variable so the returned object includes provider and name (and tag) when
LLM_PROVIDER === 'bedrock'.

Comment on lines +46 to +58
const payload: JsonIssue = {
file: issue.file,
line: issue.line,
column: issue.column,
severity: issue.severity,
message: issue.summary,
rule: issue.ruleName,
match: issue.match || '',
matchLength: issue.match ? issue.match.length : 0,
...(issue.suggestion !== undefined ? { suggestion: issue.suggestion } : {}),
...(issue.fix !== undefined ? { fix: issue.fix } : {}),
...(issue.scoreText !== undefined ? { score: issue.scoreText } : {}),
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep Vale JSON output inside the declared formatter schema.

src/output/vale-json-formatter.ts's JsonIssue shape doesn't define fix, so adding it here can leak a field the Vale JSON formatter contract never promised. Either drop it here or add fix end-to-end in the formatter/type/tests.

✂️ Minimal fix
     const payload: JsonIssue = {
       file: issue.file,
       line: issue.line,
       column: issue.column,
       severity: issue.severity,
       message: issue.summary,
       rule: issue.ruleName,
       match: issue.match || '',
       matchLength: issue.match ? issue.match.length : 0,
       ...(issue.suggestion !== undefined ? { suggestion: issue.suggestion } : {}),
-      ...(issue.fix !== undefined ? { fix: issue.fix } : {}),
       ...(issue.scoreText !== undefined ? { score: issue.scoreText } : {}),
     };
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const payload: JsonIssue = {
file: issue.file,
line: issue.line,
column: issue.column,
severity: issue.severity,
message: issue.summary,
rule: issue.ruleName,
match: issue.match || '',
matchLength: issue.match ? issue.match.length : 0,
...(issue.suggestion !== undefined ? { suggestion: issue.suggestion } : {}),
...(issue.fix !== undefined ? { fix: issue.fix } : {}),
...(issue.scoreText !== undefined ? { score: issue.scoreText } : {}),
};
const payload: JsonIssue = {
file: issue.file,
line: issue.line,
column: issue.column,
severity: issue.severity,
message: issue.summary,
rule: issue.ruleName,
match: issue.match || '',
matchLength: issue.match ? issue.match.length : 0,
...(issue.suggestion !== undefined ? { suggestion: issue.suggestion } : {}),
...(issue.scoreText !== undefined ? { score: issue.scoreText } : {}),
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/cli/result-routing/issue-sink.ts` around lines 46 - 58, The payload
construction in issue-sink.ts is adding a non‑declared field "fix" into the
JsonIssue shape (const payload) which violates the JsonIssue contract defined in
src/output/vale-json-formatter.ts; either remove the spread that injects
issue.fix from the payload (delete or stop using ...(issue.fix !== undefined ? {
fix: issue.fix } : {})) to keep output inside the declared schema, or if "fix"
is required, add it end-to-end by updating the JsonIssue type in
src/output/vale-json-formatter.ts and all related formatter code and tests to
include and validate the new fix field. Ensure changes target the JsonIssue type
and the payload construction (payload) and update tests accordingly if you
choose the second path.

Comment on lines +189 to +207
if (!got) {
return {
errors: 0,
warnings: 0,
userScore: 0,
maxScore,
hadOperationalErrors,
hadSeverityErrors,
scoreEntry: { id: ruleName, scoreText: '-', score: 0.0 },
scoreComponent: {
criterion: nameKey,
rawScore: 0,
maxScore: 4,
weightedScore: 0,
weightedMaxScore: weightNum,
normalizedScore: 0,
normalizedMaxScore: 10,
},
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't turn a missing criterion into a real zero score.

validateCriteriaCompleteness() already treats an omitted expected criterion as an operational error, but this branch still returns score: 0.0 plus a zeroed scoreComponent. That makes malformed model output look like a genuine content failure and will drag downstream score aggregates down incorrectly.

Possible direction
   if (!got) {
     return {
       errors: 0,
       warnings: 0,
       userScore: 0,
       maxScore,
-      hadOperationalErrors,
+      hadOperationalErrors: true,
       hadSeverityErrors,
-      scoreEntry: { id: ruleName, scoreText: '-', score: 0.0 },
-      scoreComponent: {
-        criterion: nameKey,
-        rawScore: 0,
-        maxScore: 4,
-        weightedScore: 0,
-        weightedMaxScore: weightNum,
-        normalizedScore: 0,
-        normalizedMaxScore: 10,
-      },
+      scoreEntry: { id: ruleName, scoreText: '-' },
     };
   }

- agent tool now uses resolveCapabilityProvider('high-cap') when no
  model tier is specified, rather than falling back to defaultProvider
- remove 10.0/10 score for rules that never matched any file —
  unmatched rules are now omitted from agent score output entirely
- make criterion name lookup case-insensitive in check routing
- use ConfigError for missing criteria in specialized evaluators
- pack-qualify allScores key to prevent cross-pack name collisions
- filter empty packName in buildRuleName to prevent leading-dot names
- update global config comment to reflect new sub-agent defaults
- move Gemini/Bedrock env-parser test to its own describe block
@oshorefueled
Copy link
Copy Markdown
Contributor Author

Implementation Artifact

1) Outcome and scope

Goal: Ship agent-mode capability tiers, deterministic rule-to-file
mapping, merged lint calls, and sub-agent delegation as a coherent
feature; simultaneously refactor the CLI evaluation pipeline to eliminate
duplicated routing and path logic.

Non-goals: No new evaluator types, no rule file format changes, no
search provider changes.

2) Capability changes

After merge, you can:

  • Configure per-tier models for each provider via new env vars
    ({PROVIDER}_HIGH_CAPABILITY_MODEL, _MID_, _LOW_; Azure uses
    _DEPLOYMENT_NAME suffix). Tier resolution falls back upward:
    low-cap → mid-cap → high-cap → provider default.
  • Pass model: "high-cap" | "mid-cap" | "low-cap" on lint and
    agent tool calls to route that specific call to a different model.
  • Delegate document-exploration tasks to a read-only sub-agent via the
    agent tool; when model is omitted the sub-agent uses high-cap.
  • Expect merged lint calls: all rules applicable to a given file are
    bundled into one structured LLM call (lint tool), not N separate
    calls.

Changed behavior:

  • Rules that never matched a file are excluded from the per-rule score
    output (previously awarded a phantom 10.0/10).
  • buildRuleName now omits the leading dot when the pack name is empty.
  • Criterion name lookup is case-insensitive in check routing.
  • Score summary keys are pack-qualified to prevent cross-pack collision.

3) Contract changes

New env vars (all optional, all per-provider):

Variable Providers
OPENAI_HIGH/MID/LOW_CAPABILITY_MODEL openai
AZURE_OPENAI_HIGH/MID/LOW_CAPABILITY_DEPLOYMENT_NAME azure-openai
ANTHROPIC_HIGH/MID/LOW_CAPABILITY_MODEL anthropic
GEMINI_HIGH/MID/LOW_CAPABILITY_MODEL gemini
BEDROCK_HIGH/MID/LOW_CAPABILITY_MODEL amazon-bedrock

Agent tool contracts (unchanged shape, new model? field honored):

  • lint: { file, rules[], model? }model selects capability tier
  • agent: { task, label?, model? }model selects tier; defaults
    to high-cap when omitted

Internal interface change (not user-facing):

  • LLMProvider.runPromptStructured now takes
    (systemPrompt, userMessage, schema)RequestBuilder removed

No migrations. No schema changes to rule YAML files.

4) High-level design

Agent evaluation flow:

  1. CLI creates defaultProvider + resolveCapabilityProvider (tier→
    provider resolver with per-run provider cache) via
    createAgentModeCapabilityAccess().
  2. runAgentExecutor pre-computes fileRuleMatches (file × rule pairs)
    and packs them into matchedRuleUnits respecting a token budget.
  3. Top-level agent loop runs on defaultProvider; the agent is given
    merged rule prompts and calls tools to explore and lint files.
  4. lint tool: reads the file once, assembles one merged prompt for all
    applicable rules, routes to resolveCapabilityProvider(model) or
    defaultProvider.
  5. agent tool: dispatches a read-only sub-agent (no lint/report tools)
    on resolveCapabilityProvider(model ?? "high-cap").
  6. Findings flow through the session store and are surfaced via the
    existing IssueSink abstraction in the CLI evaluation pipeline.

CLI evaluation pipeline (refactored, no behavior change):

  • resolvePresetsDir() centralizes dev/dist path detection for all
    commands.
  • resolveMatchedPromptsForFile() is the single code path for rule
    matching across standard and agent flows.
  • createIssueSink() owns all output-format branching (line / json /
    vale-json / rdjson); routing logic is format-agnostic.

5) Risk and safeguards

Main risks:

  • Merged lint calls send more tokens per call; malformed responses are
    caught by Zod schema validation and surface as operational errors.
  • Sub-agent delegation via agent tool is new; if the LLM calls it
    unexpectedly, it only has read-only file tools.
  • Tier literal renaming (high-capabilityhigh-cap): callers using
    the old literals will get schema validation errors.

Safeguards:

  • tests/agent/types-contract.test.ts pins tool input schemas.
  • tests/agent/agent-executor.test.ts covers provider selection,
    merged-lint bundling, and sub-agent delegation.
  • tests/provider-factory.test.ts covers tier fallback resolution.
  • IssueSink format coverage: existing result-routing tests verify all
    four output formats.

Known limitations:

  • lint tool is not yet constrained to the pre-computed
    fileRuleMatches list; the agent can in principle call lint with
    rule sources outside that set (flagged as a follow-up).

6) How to verify

Happy path checks:

  1. npm run test:run — expect 346/346 passing.
  2. npm run lint — expect zero errors.
  3. Set OPENAI_HIGH_CAPABILITY_MODEL=gpt-4.1 and run agent mode against
    a .md file; confirm sub-agent calls use gpt-4.1.
  4. Run against a file matched by 2 rules; confirm output shows one lint
    call, not two.

Edge case checks:

  1. File matched by zero configured rules → rule omitted from score
    output entirely (no score line).
  2. Criterion name in LLM response uses different casing than YAML →
    violation is still attributed to the correct criterion ID.
  3. lint with model: "mid-cap" → only that call uses the mid-cap
    model; other calls stay on default provider.

Checks actually run:

  • npm run test:run: 346/346 passed
  • npm run lint: zero errors

7) Rollout and rollback

Rollout:

  1. Add tier-specific env vars to your .vectorlint/config.toml if you
    want different models per tier (optional — provider default is used
    otherwise).
  2. Update any prompt configurations or wrapper scripts that passed the
    old tier literals (high-capability, mid-capability,
    low-capability).
  3. Re-run the verification commands above after merge.

Rollback:

  1. Revert to the commit before feat(agent): add provider capability tiers.
  2. Re-run npm run test:run && npm run lint.
  3. No data migrations to reverse.

oshorefueled and others added 16 commits April 6, 2026 11:59
…/rules/

- Rename PromptFile/PromptMeta/PromptCriterionSpec → RuleFile/RuleMeta/RuleCriterionSpec
- Rename .body → .content on the rule file schema
- Move src/prompts/prompt-loader.ts → src/rules/rule-loader.ts
- Move src/prompts/prompt-validator.ts → src/rules/rule-validator.ts
- Rename src/rules/matched-prompts.ts → src/rules/matched-rules.ts
- Rename prompt-schemas.ts → rule-schemas.ts; update schemas/index.ts
- Update loadRuleFile() return key: { prompt } → { rule }
- Update loadRules() return key: { prompts } → { rules }
- Update validatePrompt() → validateRule(), validateAll() param type
- Update resolveMatchedPromptsForFile() → resolveMatchedRulesForFile()
  and MatchedPromptResolution.prompts → MatchedRuleResolution.rules
- Rename BaseEvaluator.this.prompt → this.rule; update accuracy-evaluator
- Rename EvaluationOptions.prompts → rules across CLI and agent layers
- Rename RunAgentExecutorParams.prompts → rules; promptBySource → ruleBySource
- Rename LintRuleCall.prompt → rule field
- Update all import paths across ~18 source files
- Update all test files to import RuleFile/RuleMeta from
  src/rules/rule-loader and src/schemas/rule-schemas
- Rename .body → .content and { prompts } → { rules } in
  all test fixtures and option objects
- Rename validatePrompt → validateRule and
  resolveMatchedPromptsForFile → resolveMatchedRulesForFile
  at call sites
- Rename GateViolationLike → FilterableViolation in
  violation-filter.ts for domain-accurate naming
…builder

- Move MERGED_LINT_REVIEW_INSTRUCTIONS out of lint-prompt.ts into a
  new buildLintSystemPrompt() in prompt-builder.ts; lint-prompt.ts
  now only handles rule-content merging
- Extract inline sub-agent system prompt from sub-agent.ts into
  buildSubAgentSystemPrompt() in prompt-builder.ts
- prompt-builder.ts is now the single file to edit for any agent
  system prompt change
- Alias InlineFindingEventPayload = AgentFindingRecord in findings.ts,
  eliminating a structurally identical duplicate type
- Move estimateTokens out of src/utils/ (top-level bucket) into
  src/agent/utils.ts; inline the one-liner directly in boundaries/
  user-instruction-loader.ts to keep cross-module imports clean;
  delete src/utils/
- Trim README agent mode section: remove internal implementation
  details (rule unit precomputation, capability tier fallback chains)
  that are not user-facing
- Add clarifying comment to makePromptVariant test factory
- Add three coding convention rules to CLAUDE.md: no top-level
  src/utils/, centralized prompt construction, self-documenting names
…counting

- Add src/tokenizer.ts as a focused module exporting resolveTokenizer()
  and estimateTokens(); provider→encoding map mirrors provider-factory.ts
  (anthropic/bedrock→claude, openai/azure/gemini→o200k_base); exhaustive
  switch with no fallback — LLM_PROVIDER is validated at startup so an
  unknown value here is always a bug
- Delete src/agent/utils.ts; the single estimateTokens heuristic moves
  to src/tokenizer.ts with a TokenCounter interface so callers and tests
  stay decoupled from the concrete Tokenizer class
- Thread TokenCounter through buildMatchedRuleUnits and
  estimateMatchedRuleUnitTokens; executor creates tokenizer once and
  passes it down
- user-instruction-loader: drop inline heuristic, import from tokenizer
- Tests: set LLM_PROVIDER=anthropic in beforeAll for test files that
  call runAgentExecutor; rule-units tests use a heuristic mock counter
  to keep budget/content assertions stable
- Move Severity enum definition from evaluators/types to rule-schemas (where it belongs as a rule/finding concept)
- Remove the evaluator field from RULE_META_SCHEMA (evaluator abstraction being removed)
- Update RULE_META_SCHEMA comment to remove references to evaluator types
- Severity is now defined in rule-schemas.ts and imported by other modules

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…scorer

- src/prompts/schema.ts: import ReviewType from lint/types; update JudgeResult,
  CheckResult, RawCheckResult discriminant types and isJudgeResult/isCheckResult
  guard comparisons
- src/scoring/scorer.ts: import ReviewType from lint/types; update type field
  assignments in calculateCheckScore, calculateJudgeScore, averageJudgeScores
- src/evaluators/base-evaluator.ts: add ReviewType import; update RawCheckResult
  literal construction (line 232) to use ReviewType.CHECK
- src/evaluators/accuracy-evaluator.ts: add ReviewType import; update RawCheckResult
  literal construction to use ReviewType.CHECK

String values unchanged ('judge'/'check'); purely a nominal rename at the type level.
- Add runLint() — plain async function porting BaseEvaluator logic (judge + check modes, chunking, usage aggregation)
- Add runLintMerged() — extracts LLM-call logic from lintToolHandler for multi-rule merged evaluation
- Export LintRuleCall, MergedLintResult, RunLintParams, RunLintMergedParams types
- Export resolveRuleContent, mergeRulesForLint, buildLintSystemPrompt (moved from agent/lint-prompt.ts and agent/prompt-builder.ts)
- Uses ReviewType (from ./types) and Severity (from ../schemas/rule-schemas) throughout
- computeFilterDecision sourced from ./violation-filter (not evaluators)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… MergedLintResult type

- Remove `defaultSeverity` from `RunLintParams.options` — scoring happens
  downstream in the routing layer (check-routing.ts), not inside
  `runCheckEvaluation`, so the field was unreachable dead code
- Remove `export type { Severity }` re-export — Severity is already
  exported from `src/schemas/rule-schemas.ts`; a second path causes confusion
- Replace inline anonymous `findings` type in `MergedLintResult` with
  `MergedCheckLLMResult["findings"]` to derive from the schema source of truth
- Drop the now-unused `Severity` import
- file-evaluation.ts: replace createEvaluator/evaluate with runLint();
  drop accuracy-evaluator-specific criteria validation block; remove
  ConfigError, Type, and searchProvider from imports/destructuring
- tool-handlers.ts: replace inline composeSystemPrompt + runPromptStructured
  call with runLintMerged(); import LintRuleCall from src/lint instead of
  lint-prompt; remove buildMergedCheckLLMSchema, MergedCheckLLMResult,
  composeSystemPrompt, and buildLintSystemPrompt imports
- prompt-builder.ts: remove buildLintSystemPrompt (moved to src/lint/),
  remove mergeRulesForLint and LintRuleCall imports from lint-prompt
- src/index.ts: remove evaluator bootstrap side-effect import

Bundle size: 215 KB → 201 KB (evaluator registry no longer loaded at startup)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…luators

- Severity: all consumers now import from schemas/rule-schemas
- computeFilterDecision/FilterDecision: all consumers now import from lint/violation-filter
- EvaluationType → ReviewType in 3 test files (import + usages)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Delete src/evaluators/ (base-evaluator, registry, accuracy-evaluator, all support files)
- Delete src/agent/lint-prompt.ts (content already in src/lint/)
- Delete src/debug/violation-filter.ts (dead re-export barrel)
- Fix src/agent/executor.ts: import resolveRuleContent from ../lint (not ./lint-prompt)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove evaluator: base from all 10 preset and .github rule files
- Delete .github/rules/TinyRocket/technical-accuracy.md (accuracy evaluator removed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- orchestrator-filtering: mock runLint from ../lint instead of createEvaluator
- rule-matching: spy on runLint instead of createEvaluator
- scoring-types: rewrite to use runLint; drop TechnicalAccuracyEvaluator block
- prompt-loader-validation: remove evaluator: field from test YAML and assertions
- preset-resolution, agent-executor: bump slow-import test timeouts to 30s

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant