feat: inference-time tool-call reliability — constrained decoding, tool retrieval, pre-exec critic gate

Three flag-gated, independent inference-time reliability features for the agent loop, each default-off and pure/unit-tested:

1. **Constrained (grammar) decoding for tool calls** (`provider/constrained.ts`) — builds a JSON-Schema envelope for a valid tool call so a local model (vLLM/LM Studio/llama.cpp) can be forced at the token level to emit a parseable, schema-correct call. Deterministic fix for unparseable tool calls; base-model-agnostic.
2. **Tool retrieval** (`tool/retrieval.ts`) — with ~78 tools, sending the full set every turn floods context and hurts tool selection. Picks a relevant per-turn subset (always-on core + lexically-ranked top-k), never dropping a tool referenced mid-trajectory. v1 lexical (dependency-free, deterministic).
3. **Pre-execution critic gate** (`tool/critic.ts`) — before a side-effecting tool runs, a pluggable `Verifier` checks the proposed args; on hard failure the call is denied with a reason fed back for retry. Default verifier allows everything (ungated); a real verifier is injected by the caller.

Wired (flag-gated) into `session/llm.ts`. All three are independently toggleable and off by default.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: inference-time tool-call reliability — constrained decoding, tool retrieval, pre-exec critic gate #857

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat: inference-time tool-call reliability — constrained decoding, tool retrieval, pre-exec critic gate #857

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions