Skip to content

feat: inference-time tool-call reliability — constrained decoding, tool retrieval, pre-exec critic gate #857

@anandgupta42

Description

@anandgupta42

Three flag-gated, independent inference-time reliability features for the agent loop, each default-off and pure/unit-tested:

  1. Constrained (grammar) decoding for tool calls (provider/constrained.ts) — builds a JSON-Schema envelope for a valid tool call so a local model (vLLM/LM Studio/llama.cpp) can be forced at the token level to emit a parseable, schema-correct call. Deterministic fix for unparseable tool calls; base-model-agnostic.
  2. Tool retrieval (tool/retrieval.ts) — with ~78 tools, sending the full set every turn floods context and hurts tool selection. Picks a relevant per-turn subset (always-on core + lexically-ranked top-k), never dropping a tool referenced mid-trajectory. v1 lexical (dependency-free, deterministic).
  3. Pre-execution critic gate (tool/critic.ts) — before a side-effecting tool runs, a pluggable Verifier checks the proposed args; on hard failure the call is denied with a reason fed back for retry. Default verifier allows everything (ungated); a real verifier is injected by the caller.

Wired (flag-gated) into session/llm.ts. All three are independently toggleable and off by default.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions