Skip to content

feat: per-turn tool retrieval (trim tool-definition context flood)#858

Open
anandgupta42 wants to merge 3 commits into
mainfrom
feat/inference-stack
Open

feat: per-turn tool retrieval (trim tool-definition context flood)#858
anandgupta42 wants to merge 3 commits into
mainfrom
feat/inference-stack

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Jun 1, 2026

What does this PR do?

Adds a flag-gated (ALTIMATE_TOOL_RETRIEVAL, default OFF) per-turn tool subset: an always-on core set plus a lexically-ranked top-k of the remaining tools, never dropping a tool already referenced mid-trajectory. This trims the ~78-tool definition flood that hurts tool selection and inflates input tokens. v1 is lexical, dependency-free, and deterministic. Wired into session/llm.ts (marker-wrapped); a no-op when the flag is unset.

Previously this PR also carried constrained decoding and a critic gate. Both were split out so each lands on its own merits:

Validation (measured)

Tool retrieval — A/B, 8 ADE-bench dbt tasks, deepseek-v4-flash, value-graded (output checked for correctness, not just "builds"):

Arm Resolved Input tokens Cost
ALTIMATE_TOOL_RETRIEVAL=1 4/8 238k $0.024
baseline (all ~78 tools) 4/8 474k $0.048

identical resolve-rate (0 tasks differ), −50% input tokens, −49% cost.

Type of change

  • New feature (non-breaking change which adds functionality)

Issue for this PR

Closes #857

How did you verify your code works?

  • 5 unit tests (test/tool/retrieval.test.ts) — selection logic incl. always-on core + referenced-tool retention; all green.
  • Tool-retrieval A/B (above) — value-graded on real dbt tasks: −50% input tokens / −49% cost at identical resolve-rate.
  • tsgo typecheck clean; altimate_change markers in session/llm.ts balanced (7/7); default-off so the non-flagged path is unchanged.

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have added tests that prove my feature works
  • New and existing unit tests pass locally with my changes

…rieval, critic gate)

Three flag-gated, independent, default-off reliability features for the agent loop:

- `provider/constrained.ts` — grammar/JSON-Schema constrained decoding so a local
  model (vLLM/LM Studio/llama.cpp) is forced to emit a parseable, schema-correct
  tool call. Pure (schema in → payload out).
- `tool/retrieval.ts` — per-turn tool subset (always-on core + lexical top-k), never
  dropping a tool referenced mid-trajectory; trims the ~78-tool context flood. v1
  lexical, dependency-free, deterministic.
- `tool/critic.ts` — pre-execution gate for side-effecting tools via a pluggable
  `Verifier`; default allows everything (ungated), a real verifier is injected.

Wired flag-gated into `session/llm.ts` (markers; default off → upstream path unchanged).
18 unit tests; typecheck clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 1, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

Adds deterministic lexical-ranked tool selection to reduce context during inference and wires it conditionally into LLM.stream. The Retrieval module filters large tool sets to a relevant top-k subset based on query-token matches, preserves core tools and in-flight references, and can be toggled off by default via an environment flag.

Changes

Tool Retrieval and LLM Integration

Layer / File(s) Summary
Tool Retrieval Module
packages/opencode/src/tool/retrieval.ts, packages/opencode/test/tool/retrieval.test.ts
Exports Retrieval namespace with CORE tool list, Tool and Options interfaces, and enabled() / select() functions. Selection logic is deterministic: always includes CORE tools, honors a forced-keep set for mid-trajectory references, scores remaining tools by lexical query relevance with stable tie-breaking, and fills up to topk. Returns all tools if the candidate list is small. Feature controlled by ALTIMATE_TOOL_RETRIEVAL environment flag. Tests cover core retention, lexical ranking, keep semantics, small-list pass-through, and flag behavior.
LLM Stream Integration
packages/opencode/src/session/llm.ts
Imports and conditionally applies retrieval when Retrieval.enabled() is true in LLM.stream. Derives a query string from the last user message (handling both string and array content shapes), calls Retrieval.select() with the query and tool list (preserving tools referenced in prior context), and removes non-selected tools from the active tool map except the invalid tool.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

A retrieval rabbit hops through the toolbox,
Picking the finest by lexical locks,
Keeps the essentials, forgets the rest,
Query by query, always the best! 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The description lacks the required 'PINEAPPLE' marker at the top that the template mandates for AI-generated content. Add 'PINEAPPLE' at the very top of the description before any other content as required by the repository template.
Linked Issues check ❓ Inconclusive The PR partially addresses issue #857: tool retrieval is fully implemented with tests and wiring, but constrained decoding and critic gate features mentioned in #857 are absent (removed in follow-up commit per PR description). Clarify whether this PR intentionally addresses only the retrieval portion of #857, or if constrained decoding and critic gate should still be included; ideally split into separate PRs for each feature.
✅ Passed checks (2 passed)
Check name Status Explanation
Out of Scope Changes check ✅ Passed All code changes (retrieval.ts, retrieval.test.ts, and llm.ts modifications) are directly scoped to the tool retrieval feature and fall within the linked issue's objectives.
Title check ✅ Passed The title accurately summarizes the main change: adding per-turn tool retrieval to reduce tool-definition context by filtering to core and top-k tools.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/inference-stack

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The `ALTIMATE_CRITIC_GATE` flag was a no-op — `tool/critic.ts` was never
imported into the execute path, so enabling the flag did nothing. Removing
it from this PR so every shipped flag is actually wired:

- `ALTIMATE_TOOL_RETRIEVAL` — wired in `session/llm.ts`, validated (-50% input tokens at equal resolve)
- `ALTIMATE_CONSTRAINED_TOOLCALLS` — wired in `session/llm.ts` (local providers)

The critic gate (pre-execution `Verifier` for side-effecting tools) moves to
a follow-up that wires it into the `session/prompt.ts` execute wrapper with an
integration test. Code preserved on `feat/critic-gate`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/opencode/src/tool/retrieval.ts (1)

55-76: ⚡ Quick win

topk budget is mostly consumed by CORE.

keep (CORE + caller keep) counts against topk, and CORE alone is 10 entries. With the default topk: 12, an enabled retrieval pass exposes only ~2 lexically-ranked non-core tools out of ~78 — which may starve the selection of task-relevant tools and undercut the feature's intent. Consider treating topk as a budget for retrieved tools beyond core/keep, or raising the default.

♻️ One option: budget retrieved tools separately from core/keep
-    for (const r of ranked) {
-      if (keep.size >= topk) break
-      keep.add(r.name)
-    }
+    const limit = keep.size + topk // topk additional tools beyond core + forced-keep
+    for (const r of ranked) {
+      if (keep.size >= limit) break
+      keep.add(r.name)
+    }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/tool/retrieval.ts` around lines 55 - 76, The current
select function lets CORE and caller-provided opts.keep consume the topk budget,
so CORE (10 items) leaves almost no room for lexically-ranked tools; adjust
select so topk (and minToolsToRetrieve behavior) applies only to additional
retrieved tools beyond CORE/keep: compute baseKeep from CORE and opts.keep, then
compute retrievedBudget = (opts.topk ?? 12) - baseKeep.size (clamped to >=0) or,
better, treat topk as the number of non-core tools directly (e.g., retrievedTopk
= opts.topk ?? 12) and only count items added from ranked into that
retrievedTopk; update the loop that iterates over ranked (and any minToRetrieve
logic) to stop after retrievedBudget/retrievedTopk is filled while still always
including all CORE/keep entries; reference function select, variables
topk/minToRetrieve, CORE, keep, rest, ranked, and score to locate and modify the
selection and stopping logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 17-20: Retrieval.CORE contains "ls" which doesn't match any
registered Tool id, so update the CORE array in retrieval.ts: either replace
"ls" with the actual registered tool id used when defining the listing tool
(e.g., the id passed to Tool.define for the filesystem/ls-like tool) or remove
"ls" from Retrieval.CORE; ensure the change aligns with the select() behavior
that only keeps CORE entries present in the candidate set so the tool id in
Retrieval.CORE must exactly match a Tool.define(...) id.

---

Nitpick comments:
In `@packages/opencode/src/tool/retrieval.ts`:
- Around line 55-76: The current select function lets CORE and caller-provided
opts.keep consume the topk budget, so CORE (10 items) leaves almost no room for
lexically-ranked tools; adjust select so topk (and minToolsToRetrieve behavior)
applies only to additional retrieved tools beyond CORE/keep: compute baseKeep
from CORE and opts.keep, then compute retrievedBudget = (opts.topk ?? 12) -
baseKeep.size (clamped to >=0) or, better, treat topk as the number of non-core
tools directly (e.g., retrievedTopk = opts.topk ?? 12) and only count items
added from ranked into that retrievedTopk; update the loop that iterates over
ranked (and any minToRetrieve logic) to stop after retrievedBudget/retrievedTopk
is filled while still always including all CORE/keep entries; reference function
select, variables topk/minToRetrieve, CORE, keep, rest, ranked, and score to
locate and modify the selection and stopping logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 09c66980-6898-48d1-8b44-a89331a61b52

📥 Commits

Reviewing files that changed from the base of the PR and between a490bd4 and e6b70c6.

📒 Files selected for processing (5)
  • packages/opencode/src/provider/constrained.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/tool/retrieval.ts
  • packages/opencode/test/provider/constrained.test.ts
  • packages/opencode/test/tool/retrieval.test.ts

Comment on lines +17 to +20
export const CORE = [
"bash", "read", "write", "edit", "glob", "grep", "ls",
"task", "todowrite", "skill",
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# List registered tool IDs to compare against CORE.
rg -nP "id:\s*['\"][a-z_]+['\"]" --type=ts -g '!**/*.test.ts' packages/opencode/src/tool || true
# Also look for how tool ids/names are declared
ast-grep --pattern 'Tool.define($$$)'
rg -nP '\b(todowrite|todo_write|todoread)\b' --type=ts -g '!**/*.test.ts'

Repository: AltimateAI/altimate-code

Length of output: 50381


🏁 Script executed:

#!/bin/bash
set -euo pipefail

CORE=(bash read write edit glob grep ls task todowrite skill)

# Show where each CORE id is registered via Tool.define("<id>", ...)
for id in "${CORE[@]}"; do
  echo "==== $id ===="
  rg -n "Tool\.define\\(\"${id//\//\\/}\"" packages/opencode/src/tool -S --type=ts || true
done

# Show the CORE constant and retrieval selection logic for context
echo "==== CORE constant ===="
rg -n "export const CORE" packages/opencode/src/tool/retrieval.ts
sed -n '1,120p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '1,120p'

echo "==== select() logic (if present) ===="
rg -n "function select|const select|select\\(" packages/opencode/src/tool/retrieval.ts
sed -n '120,260p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '120,260p'

Repository: AltimateAI/altimate-code

Length of output: 1149


🏁 Script executed:

#!/bin/bash
set -euo pipefail

CORE=(bash read write edit glob grep ls task todowrite skill)

# Show where each CORE id is registered via Tool.define("<id>", ...)
for id in "${CORE[@]}"; do
  echo "==== $id ===="
  rg -n "Tool\.define\\(\"${id//\//\\/}\"" packages/opencode/src/tool -S --type=ts || true
done

# Show the CORE constant and retrieval selection logic for context
echo "==== CORE constant ===="
rg -n "export const CORE" packages/opencode/src/tool/retrieval.ts
sed -n '1,120p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '1,120p'

echo "==== select() logic (if present) ===="
rg -n "function select|const select|select\\(" packages/opencode/src/tool/retrieval.ts
sed -n '120,260p' packages/opencode/src/tool/retrieval.ts | nl -ba | sed -n '120,260p'

Repository: AltimateAI/altimate-code

Length of output: 1149


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find the "ls" tool registration id
echo "==== registrations for ls ===="
rg -n "Tool\\.define\\(\"ls\"|Tool\\.define\\('ls'" packages/opencode/src/tool packages/opencode/src -S --type=ts || true

echo "==== any use of id: \"ls\" (Tool.define arg objects etc.) ===="
rg -n "(\"ls\")" packages/opencode/src/tool/retrieval.ts packages/opencode/src/tool -S --type=ts | head -n 200 || true

echo "==== retrieval.ts (CORE + select logic) ===="
cat -n packages/opencode/src/tool/retrieval.ts | sed -n '1,260p'

Repository: AltimateAI/altimate-code

Length of output: 3714


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "==== Tool.define(<id>, ...) ids in packages/opencode/src ===="
rg -n "Tool\.define\(\s*(['\"])([a-z_]+)\1" packages/opencode/src -S --type=ts --type=tsx || true

echo "==== Any tool registry mentions of \"ls\" in packages/opencode/src ===="
rg -n "\"ls\"" packages/opencode/src -S --type=ts --type=tsx || true

Repository: AltimateAI/altimate-code

Length of output: 480


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "==== Tool.define IDs (capturing first arg) — filter for ls ===="
rg -n --pcre2 "Tool\.define\(\s*(['\"])([a-z_]+)\1" packages/opencode/src -S -g'*.ts' -g'*.tsx' \
  | rg -n "\"ls\"|'ls'" || true

echo "==== Direct Tool.define(\"ls\" / 'ls') matches ===="
rg -n --pcre2 "Tool\.define\(\s*(['\"])ls\1" packages/opencode/src -S -g'*.ts' -g'*.tsx' || true

echo "==== Any tool-id string literal \"ls\" near tool code ===="
rg -n "\"ls\"" packages/opencode/src/tool -S -g'*.ts' -g'*.tsx' || true

Repository: AltimateAI/altimate-code

Length of output: 341


Fix Retrieval.CORE tool id mismatch for ls.

packages/opencode/src/tool/retrieval.ts includes "ls" in Retrieval.CORE, but there’s no Tool.define("ls", ...) (the only "ls" occurrence is inside CORE). Since select() force-keeps CORE entries only when they exist in the candidate set (if (all.has(n))), "ls" won’t be reliably “always-on” when retrieval is enabled—update "ls" to the actual registered tool id (or remove it).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/tool/retrieval.ts` around lines 17 - 20, Retrieval.CORE
contains "ls" which doesn't match any registered Tool id, so update the CORE
array in retrieval.ts: either replace "ls" with the actual registered tool id
used when defining the listing tool (e.g., the id passed to Tool.define for the
filesystem/ls-like tool) or remove "ls" from Retrieval.CORE; ensure the change
aligns with the select() behavior that only keeps CORE entries present in the
candidate set so the tool id in Retrieval.CORE must exactly match a
Tool.define(...) id.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/tool/retrieval.ts">

<violation number="1" location="packages/opencode/src/tool/retrieval.ts:18">
P1: Core retrieval list uses `"ls"`, but the actual tool name is `"list"`; this can cause the list tool to be incorrectly filtered out.</violation>

<violation number="2" location="packages/opencode/src/tool/retrieval.ts:44">
P2: Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

export namespace Retrieval {
/** Always-available agent essentials — never retrieved out. */
export const CORE = [
"bash", "read", "write", "edit", "glob", "grep", "ls",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Core retrieval list uses "ls", but the actual tool name is "list"; this can cause the list tool to be incorrectly filtered out.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/retrieval.ts, line 18:

<comment>Core retrieval list uses `"ls"`, but the actual tool name is `"list"`; this can cause the list tool to be incorrectly filtered out.</comment>

<file context>
@@ -0,0 +1,77 @@
+export namespace Retrieval {
+  /** Always-available agent essentials — never retrieved out. */
+  export const CORE = [
+    "bash", "read", "write", "edit", "glob", "grep", "ls",
+    "task", "todowrite", "skill",
+  ]
</file context>

Comment thread packages/opencode/src/provider/constrained.ts Outdated
const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? [])
const hay = (t.name + " " + (t.description ?? "")).toLowerCase()
let s = 0
for (const w of words) if (w.length > 3 && hay.includes(w)) s += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/tool/retrieval.ts, line 44:

<comment>Scoring ignores 3-character keywords, which can miss relevant tools for common queries (e.g., sql/dbt).</comment>

<file context>
@@ -0,0 +1,77 @@
+    const words = new Set(query.toLowerCase().match(/[a-z_]+/g) ?? [])
+    const hay = (t.name + " " + (t.description ?? "")).toLowerCase()
+    let s = 0
+    for (const w of words) if (w.length > 3 && hay.includes(w)) s += 1
+    // small boost for a direct name mention
+    if (words.has(t.name.toLowerCase())) s += 3
</file context>

Constrained tool-call decoding is local-providers-only and has no validation
run behind it yet (the A/B that justifies this PR measured tool retrieval).
Removing it so the validated retrieval lever can land clean; constrained moves
to its own branch/PR pending a local vLLM guided-decoding run.

- remove `provider/constrained.ts` + its test
- remove the constrained wiring + import from `session/llm.ts` (retrieval stays)

#858 is now retrieval-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@anandgupta42 anandgupta42 changed the title feat: inference-time tool-call reliability (constrained decoding, retrieval, critic gate) feat: per-turn tool retrieval (trim tool-definition context flood) Jun 1, 2026
Copy link
Copy Markdown

@dev-punia-altimate dev-punia-altimate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Persona Review — Verdict: block

The PR contains a critical syntax error that will cause runtime failure, and lacks proper error handling for destructive database operations. While the performance optimization is well-researched and aligned with best practices, the code as written will break production migrations.

15/15 agents completed · 236s · 9 findings (1 critical, 2 high, 1 medium)

Critical

  • [code-reviewer] Incomplete string literal: 'workspace_id, endpoint_' is cut off mid-line, likely due to truncation in the diff. This will cause a SyntaxError when executed. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:358
    • 💡 Complete the column name after 'endpoint_' — likely 'endpoint_id' or similar — to form a valid tuple element in the leading columns list.

High

  • [tech-lead] Migration script directly executes raw SQL via SQLAlchemy text() without abstraction or validation layer, making it brittle and hard to test. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:150
    • 💡 Extract table rebuild logic into a reusable utility in app/utils/clickhouse_migrator.py with input validation and logging, then call from migration. This improves testability and maintainability.
  • [tech-lead] No explicit error handling around EXCHANGE TABLES or DROP operations; failure could leave orphaned _new tables or corrupt state. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:280
    • 💡 Wrap table exchange and drop in try/except blocks with rollback logic and explicit logging of state before/after critical operations.

Medium

  • [tech-lead] TABLE_PLANS dictionary contains 27 entries with inline string tuples; hard to validate and error-prone if column names change. → alembic_tenant_clickhouse/migrations/versions/2026_06_01_0030_databricks_reorder_clickhouse_sort_keys.py:55
    • 💡 Define a dataclass or named tuple for TablePlan and validate column names against system.tables during migration startup to catch typos early.

Multi-Persona Review · vllm:qwen3-next-80b (waves) + vllm-fallback (synth) ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

2 participants