Skip to content

feat: plan agent refinement, feature discovery#556

Merged
anandgupta42 merged 16 commits intomainfrom
feat/plan-agent-and-feature-discovery
Mar 29, 2026
Merged

feat: plan agent refinement, feature discovery#556
anandgupta42 merged 16 commits intomainfrom
feat/plan-agent-and-feature-discovery

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Mar 29, 2026

Summary

Driven by telemetry analysis of altimate-code's first launch week, this PR addresses two P1 issues:

  • Plan agent completion rate at 64% — users abandon plans and retry 2-3 times because there's no way to refine without restarting. Plan telemetry was also entirely null (couldn't measure improvements).
  • Feature discovery drop-off — 53% of users connected a warehouse but only 19% found schema tools, 14% dbt, 5% lineage. Users didn't know what to do after connecting.

Closes #509 — Auto-generate plan before multi-step execution (two-step outline-first approach)
Partially addresses #355 — Plan mode UX improvements (refinement loop, approval detection)
Partially addresses #206 — Skill discovery (dbt auto-detection now recommends skills proactively)


Reviewer Guide

Critical paths to review carefully

  1. src/session/prompt.ts (lines 623-694) — Plan refinement detection. This runs on every step of the plan agent loop. Verify:

    • The agent.name === "plan" guard ensures non-plan agents are never affected
    • planHasWritten is only set via Filesystem.exists() (no false positives)
    • The revision cap at 5 injects a synthetic message but does NOT persist to DB
    • Phrase classification: rejection > refinement qualifier > approval > refine (default)
  2. src/session/prompt.ts (line 369)sessionAgentName fix. Was previously set only inside if (step === 1), now set unconditionally from lastUser.agent. Verify this doesn't break other agents.

  3. src/altimate/tools/warehouse-add.ts (lines 47-98) — Post-connect suggestions use Promise.race with 1.5s timeout. Verify the timeout doesn't cause unhandled promise rejections.

Safe to skim (low risk)

  • src/altimate/tools/post-connect-suggestions.ts — New file, no existing code modified
  • src/altimate/tools/sql-execute.ts, sql-analyze.ts, schema-inspect.ts, schema-index.ts — Each appends ~3 lines of suggestion text, all wrapped in try/catch
  • src/session/prompt/plan.txt — Additive prompt instructions, no deletions
  • src/altimate/telemetry/index.ts — Type additions only (new event types)
  • src/tool/skill.ts — One-line change: adds trigger field to existing telemetry call

How to test locally

cd packages/opencode

# Run all 246 tests (each file individually to avoid cross-file module loading)
for f in test/session/plan-layer-e2e.test.ts test/altimate/feature-discovery-e2e.test.ts test/altimate/performance-regression.test.ts test/altimate/plan-refinement.test.ts test/altimate/post-connect-suggestions.test.ts test/telemetry/plan-skill-telemetry.test.ts test/telemetry/telemetry.test.ts; do
  bun test "$f" --timeout 30000
done

# Typecheck
bun turbo typecheck

# Run from source to manually test plan agent
bun run --conditions=browser ./src/index.ts

Questions for reviewer

  • Is the 1.5s timeout for warehouse-add suggestions aggressive enough? Could lower to 500ms if latency is a concern.
  • Should the plan revision cap (5) be configurable, or is a hardcoded limit fine for now?
  • The refinementQualifiers list (" but ", " however ", " except ", etc.) — are there common phrases we're missing?

Plan Agent Improvements

  • Two-step plan approach: 3-5 bullet outline first → user confirms → expand to full plan
  • Plan refinement loop: users can edit/iterate instead of restarting (capped at 5 revisions)
  • Smart phrase classification: approval ("looks good"), rejection ("no", "stop"), and refinement ("yes, but change X") with qualifier overrides and word-boundary matching
  • Revision cap communication: when limit reached, synthetic message tells the LLM to inform the user
  • Bug fix: agent_outcome telemetry for plan sessions was emitting null for duration_ms, tool_calls, generations — root cause was sessionAgentName assignment too late in the loop

Feature Discovery & Progressive Disclosure

  • Post-warehouse-connect contextual suggestions (schema index, SQL analysis, lineage, PII audit)
  • Progressive tool suggestions: sql_executesql_analyzeschema_inspectlineage_check
  • Deduplication: each progressive hint shown at most once per session (no repetitive tips)
  • dbt project auto-detection → recommends /dbt-develop, /dbt-troubleshoot, /dbt-analyze skills
  • All suggestions wrapped in try/catch — never block core flow

Performance & UX

  • Warehouse-add suggestions use Promise.all + Promise.race with 1.5s timeout — slow schema/dbt checks silently skipped instead of blocking response
  • Progressive suggestion generation: < 0.05ms per call (verified via benchmarks)
  • Phrase detection: < 0.002ms per call (100k iterations in < 200ms)
  • Suggestion dedup via shownProgressiveSuggestions Set — zero allocation on repeated calls

Telemetry Instrumentation

  • New plan_revision event: tracks revision count and action (refine | approve | reject | cap_reached)
  • New feature_suggestion event: tracks which suggestions shown, type, warehouse
  • Added trigger field to skill_used event: user_command | llm_selected | auto_suggested
  • classifySkillTrigger() helper function
  • Fixed sessionAgentName so agent_outcome always has correct agent name

Test plan

Unit & Integration Tests (246 total, all passing)

  • telemetry.test.ts — 90 tests: full telemetry regression suite with ALL_EVENT_TYPES constant
  • plan-layer-e2e.test.ts — 68 tests: plan layer safety — phrase classification (42 edge cases including unicode, multiline, adversarial 70k input), non-plan agent safety (3), sessionAgentName fix (2), revision cap (4), adversarial (4), import safety (2)
  • feature-discovery-e2e.test.ts — 29 tests: full warehouse→suggestions flow with mocked Dispatcher, progressive chain, plan refinement session, telemetry events
  • post-connect-suggestions.test.ts — 20 tests: contextual suggestions, progressive disclosure, dedup, telemetry
  • plan-skill-telemetry.test.ts — 14 tests: trigger classification, event types, runtime Telemetry.track() verification
  • plan-refinement.test.ts — 13 tests: revision counter, cap, approval detection, telemetry
  • performance-regression.test.ts — 12 tests: suggestion gen < 50ms/1000x, lookup < 50ms/10kx, phrase detection < 200ms/100kx, output determinism

CI & Type Safety

  • Typecheck passes (bun turbo typecheck — 5/5 packages)
  • No new test failures (pre-existing in regression.test.ts — unrelated Telemetry.shutdown issue)

Post-Deploy Verification

  • Verify plan agent telemetry populates in App Insights after deploy
  • Monitor feature_suggestion events to measure warehouse→schema conversion
  • Track plan completion rate (target: 64% → 80%)
  • Verify suggestion dedup works in production (no repeated tips)

Target Metrics (4-week)

Metric Current Target
Plan completion rate 64% 80%
Plan abandon-then-retry rate ~50% < 20%
Warehouse → Schema conversion 36% 55%
Warehouse → dbt conversion 24% 40%
Skill trigger tracking 0% 100%

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Contextual post-connection suggestions tailored to warehouse type, schema/indexing status, DBT detection, and connection count.
    • Progressive-disclosure tips that surface related tools as you use the app.
    • Two‑step plan workflow: short outline + user confirmation, with tracked plan revisions and a revision cap.
  • Improvements

    • Telemetry expanded to classify skill triggers and record new events for plan revisions and feature suggestions.
  • Documentation

    • Added docs describing post-connection suggestions and updated Plan-mode behavior.
  • Tests

    • Extensive unit, integration, and E2E tests covering suggestions, progressive disclosure, plan revisions, and telemetry.

…entation

## Plan Agent Improvements
- Two-step plan approach: outline first, confirm, then expand
- Plan refinement loop: edit instead of restart (capped at 5 revisions)
- Approval detection for common phrases ("looks good", "proceed", "lgtm")
- Bug fix: `agent_outcome` telemetry for plan sessions emitting null for
  `duration_ms`, `tool_calls`, `generations` — `sessionAgentName` set too late

## Feature Discovery & Progressive Disclosure
- Post-warehouse-connect contextual suggestions (schema, SQL, lineage, PII)
- Progressive: `sql_execute` → `sql_analyze` → `schema_inspect` → `lineage_check`
- dbt auto-detection → recommends `/dbt-develop`, `/dbt-troubleshoot`
- All suggestions try/catch wrapped, never block core flow

## Telemetry
- New `plan_revision` event: revision count + action (refine/approve/reject)
- New `feature_suggestion` event: suggestions shown, type, warehouse
- `skill_used.trigger`: user_command | llm_selected | auto_suggested
- `classifySkillTrigger()` helper

## Tests
- 140 new tests across 4 files, all passing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a post‑connect suggestion engine with progressive‑disclosure tips integrated into multiple tools, extends telemetry with plan_revision and feature_suggestion events and skill‑trigger classification, implements plan‑revision detection/tracking and a two‑step plan prompt, and adds extensive unit, integration, and performance tests.

Changes

Cohort / File(s) Summary
Telemetry
packages/opencode/src/altimate/telemetry/index.ts
Extended Telemetry.Event union: added plan_revision and feature_suggestion, added trigger to skill_used, and exported Telemetry.classifySkillTrigger(...).
Post‑Connect Suggestions
packages/opencode/src/altimate/tools/post-connect-suggestions.ts
New PostConnectSuggestions namespace: SuggestionContext, getPostConnectSuggestions, getProgressiveSuggestion, dedup state (resetShownSuggestions), and telemetry‑safe trackSuggestions.
Tool integrations
packages/opencode/src/altimate/tools/...
schema-index.ts, schema-inspect.ts, sql-analyze.ts, sql-execute.ts, warehouse-add.ts, project-scan.ts
Tools now fetch/append progressive or post‑connect suggestions and call trackSuggestions(...) when shown; DBT detection in project-scan triggers DBT suggestions; suggestion/telemetry calls are guarded to avoid impacting primary flows.
Session / Plan flow
packages/opencode/src/session/prompt.ts, packages/opencode/src/session/prompt/plan.txt
Adds planRevisionCount/planHasWritten, early sessionAgentName set, plan revision classification (approve/refine/reject), emits plan_revision with incremental revision_number capped at 5 (cap_reached), and updates plan prompt to a two‑step outline/confirm workflow.
Skill telemetry usage
packages/opencode/src/tool/skill.ts
Skill-loading telemetry now includes trigger via Telemetry.classifySkillTrigger(ctx.extra).
Tests — suggestion, plan, telemetry, e2e, perf
packages/opencode/test/... (many new files)
Adds extensive tests: post‑connect suggestions, plan refinement unit/e2e, feature discovery e2e, performance/regression, telemetry unit tests (classifySkillTrigger and new event shapes), and large simulation suites.
Docs
docs/docs/... (configure/warehouses.md, data-engineering/agent-modes.md, getting-started.md, reference/telemetry.md)
Documentation for post-connection suggestions, two‑step plan workflow, and telemetry additions updated.
Dev dependency
packages/drivers/package.json
Added mongodb as a devDependencies entry.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Tool
    participant PostConnectSuggestions
    participant Telemetry

    User->>Tool: Execute tool (sql_execute/schema_index/...)
    Tool->>Tool: Generate primary output
    Tool->>PostConnectSuggestions: getProgressiveSuggestion(toolName)
    PostConnectSuggestions-->>Tool: Suggestion text or null

    alt Suggestion available
        Tool->>Tool: Append suggestion to output
        Tool->>PostConnectSuggestions: trackSuggestions({...})
        PostConnectSuggestions->>Telemetry: track(type="feature_suggestion", ...)
        Telemetry-->>PostConnectSuggestions: Ack (errors swallowed)
    end

    Tool-->>User: Return output (with or without suggestion)
Loading
sequenceDiagram
    actor User
    participant Session
    participant PlanAgent
    participant Filesystem
    participant Telemetry

    User->>Session: Send message
    Session->>PlanAgent: Route to plan agent (agent.name="plan")
    PlanAgent->>Filesystem: Check if plan file exists
    Filesystem-->>PlanAgent: exists / not-exists

    alt Plan not written
        PlanAgent->>PlanAgent: Generate outline (3-5 bullets)
        PlanAgent-->>User: Return outline for confirmation
    else Plan exists
        PlanAgent->>Filesystem: Read existing plan
        Filesystem-->>PlanAgent: Plan content
        User->>PlanAgent: Provide feedback
        PlanAgent->>PlanAgent: Classify action (approve/refine/reject)
        PlanAgent->>Telemetry: track(type="plan_revision", revision_number, action)
        PlanAgent-->>User: Return refined/updated plan
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mdesmet

Poem

🐰 I hop through code with eager cheer,
Outlines first, then details near.
Suggestions bloom where connections grow,
Telemetry notes each tiny show.
Tiny hops, big insights — carrot-perfect!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: plan agent refinement, feature discovery' clearly summarizes the main changes: plan agent improvements and feature discovery implementation.
Description check ✅ Passed The PR description is comprehensive and follows the template with Summary, Test Plan, and Checklist sections. It includes detailed reviewer guidance, implementation details, and testing instructions.
Linked Issues check ✅ Passed The PR addresses linked issue #509 by implementing plan generation before multi-step execution with a two-step outline approach and user approval flow. It partially addresses #355 and #206 with refinement loop and skill discovery features.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the PR objectives: plan agent refinement (telemetry, prompt, refinement detection), feature discovery (post-connect suggestions, progressive disclosure), telemetry instrumentation, and comprehensive tests. The package.json MongoDB dependency change is isolated and non-blocking.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/plan-agent-and-feature-discovery

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@anandgupta42 anandgupta42 changed the title feat: plan agent refinement, feature discovery, and telemetry instrumentation feat: plan agent refinement, feature discovery Mar 29, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (8)
packages/opencode/src/altimate/tools/sql-analyze.ts (1)

32-36: Missing warehouseType in trackSuggestions call.

Other tools (sql-execute.ts, schema-index.ts) include warehouseType in the trackSuggestions payload, but sql-analyze omits it. This creates inconsistency in telemetry data.

Since sql_analyze doesn't have warehouse context in its parameters or result, consider either:

  1. Omitting warehouseType consistently when unavailable, or
  2. Passing a sentinel value like "unknown" for consistency
🔧 Suggested fix for consistency
         PostConnectSuggestions.trackSuggestions({
           suggestionType: "progressive_disclosure",
           suggestionsShown: ["schema_inspect"],
+          warehouseType: "unknown",
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/sql-analyze.ts` around lines 32 - 36,
The PostConnectSuggestions.trackSuggestions call in sql-analyze.ts is missing
the warehouseType field causing telemetry inconsistency; update the invocation
of PostConnectSuggestions.trackSuggestions (the call that currently sends
suggestionType and suggestionsShown) to include warehouseType — either add
warehouseType: "unknown" when no context exists or follow the existing project
convention by omitting warehouseType consistently; ensure you modify the call
site in sql-analyze.ts (the PostConnectSuggestions.trackSuggestions invocation)
to match the payload shape used by sql-execute.ts and schema-index.ts so
telemetry consumers see a consistent key.
packages/opencode/src/altimate/tools/schema-inspect.ts (1)

29-33: Missing warehouseType in trackSuggestions - but args.warehouse is available.

Unlike sql-analyze, this tool has access to args.warehouse (optional parameter), so it can include warehouseType for consistency with other tools.

🔧 Suggested fix
         PostConnectSuggestions.trackSuggestions({
           suggestionType: "progressive_disclosure",
           suggestionsShown: ["lineage_check"],
+          warehouseType: args.warehouse ?? "unknown",
         })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/schema-inspect.ts` around lines 29 - 33,
PostConnectSuggestions.trackSuggestions currently sends suggestionType and
suggestionsShown but omits warehouseType; update the call in schema-inspect (the
PostConnectSuggestions.trackSuggestions invocation) to include warehouseType:
args.warehouse when args.warehouse is present (i.e., pass warehouseType:
args.warehouse in the payload or conditionally include it), ensuring the payload
matches other tools that report warehouseType.
packages/opencode/test/telemetry/telemetry.test.ts (1)

330-373: Consider extracting the event types list to avoid duplication.

The same event type list is duplicated in both the "event-types" test (lines 200-243) and the "naming-convention" test (lines 330-373). If new event types are added, both lists need manual updates.

♻️ Suggested refactor to DRY up the lists
+// At the top of the describe block or file
+const ALL_EVENT_TYPES: Telemetry.Event["type"][] = [
+  "session_start",
+  "session_end",
+  // ... all 42 types
+]
+
 describe("telemetry.event-types", () => {
   test("all event types are valid", () => {
-    const eventTypes: Telemetry.Event["type"][] = [
-      // ... duplicated list
-    ]
-    expect(eventTypes.length).toBe(42)
+    expect(ALL_EVENT_TYPES.length).toBe(42)
   })
 })

 describe("telemetry.naming-convention", () => {
   test("all event types use snake_case", () => {
-    const types: Telemetry.Event["type"][] = [
-      // ... duplicated list
-    ]
-    for (const t of types) {
+    for (const t of ALL_EVENT_TYPES) {
       expect(t).toMatch(/^[a-z][a-z0-9_]*$/)
     }
   })
 })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/telemetry/telemetry.test.ts` around lines 330 - 373,
Extract the duplicated event type array currently defined as types in the
"naming-convention" test into a single shared constant and use it in both the
"event-types" and "naming-convention" tests; locate the array assigned to types
in telemetry.test.ts and move it to a top-level const (e.g.,
ALL_TELEMETRY_EVENT_TYPES) or a small test helper, then replace both in-test
literal arrays with references to that constant so additions only need to be
made in one place.
packages/opencode/src/altimate/tools/project-scan.ts (1)

509-525: Naming inconsistency between displayed commands and telemetry.

The output shows commands with hyphens (/dbt-develop, /dbt-troubleshoot, /dbt-analyze) but telemetry records them with underscores (dbt_develop, dbt_troubleshoot, dbt_analyze). This inconsistency may complicate correlation between user-facing suggestions and telemetry data.

Consider aligning the naming convention for easier analysis.

🔧 Option: Use consistent naming in telemetry
         PostConnectSuggestions.trackSuggestions({
           suggestionType: "dbt_detected",
-          suggestionsShown: ["dbt_develop", "dbt_troubleshoot", "dbt_analyze"],
+          suggestionsShown: ["dbt-develop", "dbt-troubleshoot", "dbt-analyze"],
         })

Also, the dynamic import could be replaced with a static import at the top of the file for consistency with other tools, though the current approach does work correctly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/project-scan.ts` around lines 509 - 525,
Telemetry names and displayed command names are inconsistent: the UI shows
hyphenated commands (`/dbt-develop`, `/dbt-troubleshoot`, `/dbt-analyze`) but
PostConnectSuggestions.trackSuggestions currently sends underscored names
(`dbt_develop`, etc.); update the call to
PostConnectSuggestions.trackSuggestions in project-scan.ts so suggestionsShown
uses the hyphenated identifiers (e.g. "dbt-develop") to match the lines.push
output, and optionally convert the dynamic import of
"./post-connect-suggestions" to a static import at the top to match the
module-loading style used elsewhere (keep the try/catch behavior if you keep
dynamic import).
packages/opencode/test/telemetry/plan-skill-telemetry.test.ts (2)

40-111: Type-only tests provide limited runtime verification.

Tests in this section (lines 41-111) only verify that the TypeScript types compile correctly—they don't actually call Telemetry.track or verify runtime behavior. Consider calling Telemetry.track(event) and asserting no errors are thrown to add runtime validation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/telemetry/plan-skill-telemetry.test.ts` around lines
40 - 111, The tests only assert TypeScript types; update each relevant test
(those constructing Telemetry.Event objects in this block) to also call
Telemetry.track(event) to exercise runtime behavior and assert it doesn't throw
or that the telemetry pipeline is invoked; specifically, import or reference the
Telemetry object used in tests and either call expect(() =>
Telemetry.track(event)).not.toThrow() or spy on Telemetry.track (e.g.,
jest.spyOn(Telemetry, "track")) and assert it was called with the constructed
event—apply this to the plan_revision and feature_suggestion tests that build
events.

1-3: Consider removing @ts-nocheck and fixing any type issues.

Using @ts-nocheck disables all TypeScript checking, which may hide genuine type mismatches between tests and the actual Telemetry.Event union. If there are specific type issues, prefer @ts-expect-error on individual lines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/telemetry/plan-skill-telemetry.test.ts` around lines 1
- 3, Remove the top-line "@ts-nocheck" from the test file and fix any resulting
TypeScript errors by aligning the test values to the Telemetry type definitions
(use the Telemetry import and match shapes against Telemetry.Event union
variants); if there are a few unavoidable mismatches, replace broad suppression
with targeted "@ts-expect-error" comments on the specific offending lines, and
re-run type checking (or bun/tsc) to confirm all Telemetry-related test objects
conform to the Telemetry.Event union.
packages/opencode/src/altimate/tools/warehouse-add.ts (1)

72-80: suggestionsShown array may diverge from actual suggestions.

The manually-built suggestionsShown array duplicates the logic in getPostConnectSuggestions. If the suggestion text changes (e.g., adding/removing tools), this telemetry could become inconsistent. Consider extracting the suggestion identifiers from the same source or having getPostConnectSuggestions return both the formatted string and the identifiers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/tools/warehouse-add.ts` around lines 72 - 80,
The telemetry builds a separate suggestionsShown array that duplicates
getPostConnectSuggestions logic and can drift; update the code to derive
suggestion identifiers from the canonical source instead: call
getPostConnectSuggestions (or change getPostConnectSuggestions to return an
array of {id, label}) and map its result to the identifier strings, then pass
that mapped array into PostConnectSuggestions.trackSuggestions (instead of the
manually-constructed suggestionsShown); reference the suggestionsShown variable,
getPostConnectSuggestions function, and PostConnectSuggestions.trackSuggestions
when implementing this change.
packages/opencode/test/altimate/plan-refinement.test.ts (1)

20-52: Source file content tests are inherently brittle.

These tests verify exact string presence in source files. While useful for ensuring documentation alignment, they will break on legitimate refactors (e.g., rewording instructions). Consider whether integration tests that verify actual behavior would be more maintainable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/plan-refinement.test.ts` around lines 20 -
52, The tests are brittle because they assert exact substrings in source files
(tests named "plan.txt includes two-step approach instructions", "plan.txt
includes feedback/refinement instructions", and "experimental plan mode inline
prompt includes two-step approach"); change them to validate intent rather than
verbatim text by either: 1) importing the canonical prompt constant or builder
function (e.g., the exported prompt value from prompt.ts or the function that
constructs plan prompts) and asserting the presence of semantic tokens with
case-insensitive regexes or fuzzy matches (e.g., /two-?step/i and
/present.*outline|3-5 bullet/i), or 2) convert to an integration-style test that
runs the plan agent entrypoint with a sample request and asserts the agent
returns an outline-first response pattern instead of string literals; update the
tests to use these softer assertions (replace expect(content).toContain calls)
so legitimate rewording won't break CI.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/opencode/src/altimate/tools/post-connect-suggestions.ts`:
- Around line 93-105: Telemetry.getContext().sessionId can be empty if
setContext wasn't called; before calling Telemetry.track in
post-connect-suggestions, read and validate the sessionId from
Telemetry.getContext() and, if it's empty, emit a non-fatal warning (e.g., via
processLogger.warn or Telemetry.warn) that context is missing and include a
placeholder value like "unknown-session" or omit session_id, then call
Telemetry.track only after normalizing the value; update the block around
Telemetry.track/Telemetry.getContext() to perform this check so events are
correlated or flagged when session context is absent.

In `@packages/opencode/src/altimate/tools/warehouse-add.ts`:
- Around line 63-70: The local variable ctx declared in the
PostConnectSuggestions block shadows the execute(args, ctx) parameter; rename
the local variable (e.g., suggestionCtx or postConnectCtx) and update its usage
in the PostConnectSuggestions.getPostConnectSuggestions call so the outer
execute parameter remains accessible; ensure the object type remains
PostConnectSuggestions.SuggestionContext and all references to the old local
name are replaced.

In `@packages/opencode/src/session/prompt.ts`:
- Around line 623-650: The telemetry type declares a third action "reject" but
the plan revision detection only emits "approve" or "refine"; fix by either
removing "reject" from the plan_revision action union in the telemetry type, or
add rejection detection here: in the plan handling block (agent.name === "plan")
expand detection on the last user message (msgs and MessageV2.TextPart) to
identify rejection phrases (e.g., "no", "don't", "stop", "reject", "not good",
"undo", "abort") and compute action = isReject ? "reject" : isApproval ?
"approve" : "refine", making sure reject takes priority over approval, then call
Telemetry.track({ type: "plan_revision", ..., action }) and keep existing
planHasWritten and planRevisionCount behavior; include a brief comment
describing intended semantics if you choose to keep "reject".

In `@packages/opencode/test/altimate/plan-refinement.test.ts`:
- Around line 118-131: The test "plan_revision telemetry includes required
fields" uses a fixed 300-character slice to extract the Telemetry.track block
which can miss fields; update the extraction in the test (variables promptTsPath
and trackBlock, and the search string 'type: "plan_revision"') to capture the
full Telemetry.track object instead of a fixed-length slice—for example, use a
regex that matches from 'type: "plan_revision"' to the closing brace of that
block or locate the start index and then find the matching closing
brace/parenthesis so the subsequent expects reliably validate timestamp,
session_id, revision_number, and action.

---

Nitpick comments:
In `@packages/opencode/src/altimate/tools/project-scan.ts`:
- Around line 509-525: Telemetry names and displayed command names are
inconsistent: the UI shows hyphenated commands (`/dbt-develop`,
`/dbt-troubleshoot`, `/dbt-analyze`) but PostConnectSuggestions.trackSuggestions
currently sends underscored names (`dbt_develop`, etc.); update the call to
PostConnectSuggestions.trackSuggestions in project-scan.ts so suggestionsShown
uses the hyphenated identifiers (e.g. "dbt-develop") to match the lines.push
output, and optionally convert the dynamic import of
"./post-connect-suggestions" to a static import at the top to match the
module-loading style used elsewhere (keep the try/catch behavior if you keep
dynamic import).

In `@packages/opencode/src/altimate/tools/schema-inspect.ts`:
- Around line 29-33: PostConnectSuggestions.trackSuggestions currently sends
suggestionType and suggestionsShown but omits warehouseType; update the call in
schema-inspect (the PostConnectSuggestions.trackSuggestions invocation) to
include warehouseType: args.warehouse when args.warehouse is present (i.e., pass
warehouseType: args.warehouse in the payload or conditionally include it),
ensuring the payload matches other tools that report warehouseType.

In `@packages/opencode/src/altimate/tools/sql-analyze.ts`:
- Around line 32-36: The PostConnectSuggestions.trackSuggestions call in
sql-analyze.ts is missing the warehouseType field causing telemetry
inconsistency; update the invocation of PostConnectSuggestions.trackSuggestions
(the call that currently sends suggestionType and suggestionsShown) to include
warehouseType — either add warehouseType: "unknown" when no context exists or
follow the existing project convention by omitting warehouseType consistently;
ensure you modify the call site in sql-analyze.ts (the
PostConnectSuggestions.trackSuggestions invocation) to match the payload shape
used by sql-execute.ts and schema-index.ts so telemetry consumers see a
consistent key.

In `@packages/opencode/src/altimate/tools/warehouse-add.ts`:
- Around line 72-80: The telemetry builds a separate suggestionsShown array that
duplicates getPostConnectSuggestions logic and can drift; update the code to
derive suggestion identifiers from the canonical source instead: call
getPostConnectSuggestions (or change getPostConnectSuggestions to return an
array of {id, label}) and map its result to the identifier strings, then pass
that mapped array into PostConnectSuggestions.trackSuggestions (instead of the
manually-constructed suggestionsShown); reference the suggestionsShown variable,
getPostConnectSuggestions function, and PostConnectSuggestions.trackSuggestions
when implementing this change.

In `@packages/opencode/test/altimate/plan-refinement.test.ts`:
- Around line 20-52: The tests are brittle because they assert exact substrings
in source files (tests named "plan.txt includes two-step approach instructions",
"plan.txt includes feedback/refinement instructions", and "experimental plan
mode inline prompt includes two-step approach"); change them to validate intent
rather than verbatim text by either: 1) importing the canonical prompt constant
or builder function (e.g., the exported prompt value from prompt.ts or the
function that constructs plan prompts) and asserting the presence of semantic
tokens with case-insensitive regexes or fuzzy matches (e.g., /two-?step/i and
/present.*outline|3-5 bullet/i), or 2) convert to an integration-style test that
runs the plan agent entrypoint with a sample request and asserts the agent
returns an outline-first response pattern instead of string literals; update the
tests to use these softer assertions (replace expect(content).toContain calls)
so legitimate rewording won't break CI.

In `@packages/opencode/test/telemetry/plan-skill-telemetry.test.ts`:
- Around line 40-111: The tests only assert TypeScript types; update each
relevant test (those constructing Telemetry.Event objects in this block) to also
call Telemetry.track(event) to exercise runtime behavior and assert it doesn't
throw or that the telemetry pipeline is invoked; specifically, import or
reference the Telemetry object used in tests and either call expect(() =>
Telemetry.track(event)).not.toThrow() or spy on Telemetry.track (e.g.,
jest.spyOn(Telemetry, "track")) and assert it was called with the constructed
event—apply this to the plan_revision and feature_suggestion tests that build
events.
- Around line 1-3: Remove the top-line "@ts-nocheck" from the test file and fix
any resulting TypeScript errors by aligning the test values to the Telemetry
type definitions (use the Telemetry import and match shapes against
Telemetry.Event union variants); if there are a few unavoidable mismatches,
replace broad suppression with targeted "@ts-expect-error" comments on the
specific offending lines, and re-run type checking (or bun/tsc) to confirm all
Telemetry-related test objects conform to the Telemetry.Event union.

In `@packages/opencode/test/telemetry/telemetry.test.ts`:
- Around line 330-373: Extract the duplicated event type array currently defined
as types in the "naming-convention" test into a single shared constant and use
it in both the "event-types" and "naming-convention" tests; locate the array
assigned to types in telemetry.test.ts and move it to a top-level const (e.g.,
ALL_TELEMETRY_EVENT_TYPES) or a small test helper, then replace both in-test
literal arrays with references to that constant so additions only need to be
made in one place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 2fca76f7-4cd4-47bd-84a4-ba475b49e23b

📥 Commits

Reviewing files that changed from the base of the PR and between 384335d and e5af62e.

📒 Files selected for processing (15)
  • packages/opencode/src/altimate/telemetry/index.ts
  • packages/opencode/src/altimate/tools/post-connect-suggestions.ts
  • packages/opencode/src/altimate/tools/project-scan.ts
  • packages/opencode/src/altimate/tools/schema-index.ts
  • packages/opencode/src/altimate/tools/schema-inspect.ts
  • packages/opencode/src/altimate/tools/sql-analyze.ts
  • packages/opencode/src/altimate/tools/sql-execute.ts
  • packages/opencode/src/altimate/tools/warehouse-add.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/prompt/plan.txt
  • packages/opencode/src/tool/skill.ts
  • packages/opencode/test/altimate/plan-refinement.test.ts
  • packages/opencode/test/altimate/post-connect-suggestions.test.ts
  • packages/opencode/test/telemetry/plan-skill-telemetry.test.ts
  • packages/opencode/test/telemetry/telemetry.test.ts

anandgupta42 and others added 4 commits March 28, 2026 19:17
- Validate `sessionId` before telemetry track in post-connect-suggestions
- Rename shadowed `ctx` to `suggestionCtx` in warehouse-add.ts
- Add rejection detection ("no", "stop", "reject", etc.) to plan revision
  action — telemetry now emits "reject" in addition to "approve"/"refine"
- Add `warehouseType` to `trackSuggestions` in sql-analyze and schema-inspect
  for telemetry payload consistency
- Use hyphenated skill names in telemetry (`dbt-develop` not `dbt_develop`)
  to match user-facing `/dbt-develop` command names
- Derive `suggestionsShown` from `suggestionCtx` to prevent drift
- DRY up event types list in telemetry.test.ts (`ALL_EVENT_TYPES` constant)
- Remove `@ts-nocheck` from plan-skill-telemetry.test.ts
- Add runtime `Telemetry.track()` verification to type-only tests
- Use semantic regex in plan-refinement tests instead of brittle string matches
- Fix trackBlock extraction with generous region window instead of fixed slice
- Remove duplicate regression tests (covered in telemetry.test.ts)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## E2E Integration Tests (41 tests)
- `feature-discovery-e2e.test.ts` (29 tests): Full warehouse-add → suggestions
  flow, progressive disclosure chain, plan refinement session, telemetry
  event validation with mocked Dispatcher
- `performance-regression.test.ts` (12 tests): 1000x suggestion generation
  < 50ms, 10000x progressive lookup < 50ms, 100k phrase detection < 200ms,
  output determinism verification

## UX Gap Fixes
- **Suggestion deduplication**: Progressive hints shown at most once per
  session per tool via `shownProgressiveSuggestions` Set. Running
  `sql_execute` 10 times no longer repeats the same tip.
- **Approval false positives**: "yes, but change X" now correctly classified
  as "refine" not "approve". Added `refinementQualifiers` (" but ",
  " however ", " except ", " change ", etc.) that override approval detection.
  "no" uses `\bno\b` word boundary to avoid matching "know", "notion", etc.
- **Revision cap communication**: When 5 revision cap is hit, a synthetic
  message informs the LLM to tell the user. Telemetry tracks "cap_reached".
- **Warehouse add latency**: Suggestion gathering now uses `Promise.all` +
  `Promise.race` with 1500ms timeout. Slow schema/dbt checks silently
  skipped instead of blocking the response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive user-perspective tests for the plan refinement changes:

## Phrase Classification (42 tests)
- 14 approval cases ("looks good", "proceed", "lgtm", etc.)
- 12 rejection cases ("no", "stop", "reject", etc.)
- 8 pure refinement cases (feedback without approval/rejection signals)
- 8 tricky edge cases:
  - "yes, but change X" → refine (qualifier overrides approval)
  - "no, I mean yes" → reject (rejection takes priority)
  - "I know this looks good" → approve ("know" ≠ "no" via word boundary)
  - Unicode input, multiline, special characters, empty string

## Non-Plan Agent Safety (3 tests)
- Plan variables initialized before loop
- Refinement block guarded by `agent.name === "plan"`
- Plan file detection guarded by agent check

## Session Agent Name Fix (2 tests)
- `sessionAgentName` set before early break conditions
- `agent_outcome` telemetry uses `sessionAgentName`

## Revision Cap (4 tests)
- Cap enforced at >= 5
- Synthetic message communicates limit to LLM
- Telemetry emits "cap_reached"
- Synthetic message doesn't persist to DB

## Adversarial (4 tests)
- 70k char input doesn't crash
- Unicode, special chars, multiline handled

## Import Safety (2 tests)
- post-connect-suggestions is self-contained (no heavy imports)
- Progressive suggestion dedup works correctly

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Merged origin/main which added:
- `has_followups` / `followup_count` fields to `skill_used` telemetry (#546)
- `isRealFailure` and `findings` telemetry in sql-analyze (#545)

Conflict resolution:
- telemetry/index.ts: kept both `trigger` (ours) and `has_followups`/`followup_count` (main)
- tool/skill.ts: kept followup logic (main) + trigger classification (ours)
- sql-analyze.ts: kept findings telemetry (main) + progressive suggestions (ours)
- Updated plan-skill-telemetry.test.ts to include new `has_followups`/`followup_count` fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@gitguardian
Copy link
Copy Markdown

gitguardian bot commented Mar 29, 2026

⚠️ GitGuardian has uncovered 4 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
29254674 Triggered Generic Password a5b4e44 packages/opencode/test/altimate/connections.test.ts View secret
29254673 Triggered Generic Password a5b4e44 packages/opencode/test/altimate/connections.test.ts View secret
29326785 Triggered Generic Password 734d173 packages/opencode/test/altimate/connections.test.ts View secret
29327588 Triggered Generic Password 11bb99d packages/opencode/test/altimate/connections.test.ts View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opencode/src/tool/skill.ts (1)

150-166: ⚠️ Potential issue | 🟠 Major

Telemetry instrumentation for skill trigger classification is incomplete — ctx.extra.trigger is never populated by callers.

The telemetry code correctly calls Telemetry.classifySkillTrigger(ctx.extra), but the trigger field is never set when ctx.extra is constructed. All skill invocations will be classified as "llm_selected" (the default fallback) because:

  1. In prompt.ts line 978, ctx.extra is set to { model: input.model, bypassAgentCheck: input.bypassAgentCheck } — no trigger field
  2. classifySkillTrigger returns "llm_selected" when extra.trigger is not explicitly set (telemetry/index.ts line 622)

To enable accurate trigger classification for analytics, add trigger to ctx.extra at the point where tool contexts are created — populate it based on how the skill was invoked (e.g., "user_command" for /skill commands, "auto_suggested" for suggestions, "llm_selected" for LLM-initiated calls).

🧹 Nitpick comments (3)
packages/opencode/test/altimate/performance-regression.test.ts (1)

151-227: Static analysis RegExp warnings are false positives here.

The flagged new RegExp(\\b${w}\b`) patterns operate on hardcoded arrays (rejectionWords = ["no"]`), not user-controlled input. Since these tests mirror the production phrase detection logic for benchmarking purposes, and the input patterns are fixed and simple, there's no ReDoS risk.

However, note that pre-compiling the regex outside the loop would be more performant for the actual benchmarks:

💡 Optional optimization
  const rejectionWords = ["no"]
+ const rejectionWordRegexes = rejectionWords.map((w) => new RegExp(`\\b${w}\\b`))

  // Then in the loop:
- const isRejectionWord = rejectionWords.some((w) => new RegExp(`\\b${w}\\b`).test(testText))
+ const isRejectionWord = rejectionWordRegexes.some((re) => re.test(testText))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/performance-regression.test.ts` around lines
151 - 227, The static-analysis warning about using new RegExp(`\\b${w}\\b`) is a
false positive here because rejectionWords (e.g., rejectionWords = ["no"]) is a
hardcoded array with no user-controlled input; keep the current logic but to
address the warning and improve benchmark accuracy, precompile the word regexes
once (e.g., build an array of RegExp from rejectionWords before the loops) and
use those prebuilt regex objects inside the tests (references: rejectionWords,
new RegExp usage, and the loops that compute isRejectionWord).
packages/opencode/src/altimate/telemetry/index.ts (1)

616-624: Minor redundancy in classifySkillTrigger.

Line 621 explicitly checks for extra.trigger === "llm_selected" but the default return on line 622 already returns "llm_selected". The explicit check is unnecessary.

💡 Simplified version
 export function classifySkillTrigger(extra?: { [key: string]: any }): "user_command" | "llm_selected" | "auto_suggested" | "unknown" {
   if (!extra) return "llm_selected"
   if (extra.trigger === "user_command") return "user_command"
   if (extra.trigger === "auto_suggested") return "auto_suggested"
-  if (extra.trigger === "llm_selected") return "llm_selected"
   return "llm_selected"
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/src/altimate/telemetry/index.ts` around lines 616 - 624,
The classifySkillTrigger function contains a redundant explicit check for
extra.trigger === "llm_selected"; remove that branch and simplify the function
to check for user_command and auto_suggested, returning "llm_selected" as the
default when extra is present (and still returning "llm_selected" when extra is
falsy), keeping the function signature and behavior intact; update references to
extra.trigger in classifySkillTrigger accordingly.
packages/opencode/test/altimate/feature-discovery-e2e.test.ts (1)

400-520: Source-code-reading tests are fragile and may break on refactoring.

These tests read prompt.ts and telemetry/index.ts as strings and assert specific code patterns (e.g., expect(content).toContain("let planRevisionCount = 0")). While they verify the implementation exists, they:

  1. Will fail on innocuous refactors (renaming, reformatting, reordering)
  2. Don't verify the code actually works at runtime
  3. Duplicate similar tests in plan-layer-e2e.test.ts

Consider replacing some of these with behavioral tests that exercise the actual code paths, or consolidating the static checks into a single test file to reduce maintenance burden.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/altimate/feature-discovery-e2e.test.ts` around lines
400 - 520, The tests in plan refinement e2e read prompt.ts as raw text (e.g.,
checks for "let planRevisionCount = 0", "planRevisionCount++", and agent name
guards) which is fragile; update these tests to be behavioral instead: import
the runtime module(s) that implement the plan logic (or the agent entrypoint
that uses prompt.ts) and write assertions that exercise the actual code paths
(simulate a plan agent run, trigger a refinement/approval/rejection, verify the
revision counter increments and the plan guard prevents double-detection), or if
the logic is not exported, refactor prompt logic into an exported function/class
(e.g., expose plan revision state and detectAction behavior) and test those
APIs; additionally, remove or consolidate duplicate static-string checks
currently duplicated in plan-layer-e2e.test.ts into a single static test file if
you must keep any source-text assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opencode/src/altimate/telemetry/index.ts`:
- Around line 616-624: The classifySkillTrigger function contains a redundant
explicit check for extra.trigger === "llm_selected"; remove that branch and
simplify the function to check for user_command and auto_suggested, returning
"llm_selected" as the default when extra is present (and still returning
"llm_selected" when extra is falsy), keeping the function signature and behavior
intact; update references to extra.trigger in classifySkillTrigger accordingly.

In `@packages/opencode/test/altimate/feature-discovery-e2e.test.ts`:
- Around line 400-520: The tests in plan refinement e2e read prompt.ts as raw
text (e.g., checks for "let planRevisionCount = 0", "planRevisionCount++", and
agent name guards) which is fragile; update these tests to be behavioral
instead: import the runtime module(s) that implement the plan logic (or the
agent entrypoint that uses prompt.ts) and write assertions that exercise the
actual code paths (simulate a plan agent run, trigger a
refinement/approval/rejection, verify the revision counter increments and the
plan guard prevents double-detection), or if the logic is not exported, refactor
prompt logic into an exported function/class (e.g., expose plan revision state
and detectAction behavior) and test those APIs; additionally, remove or
consolidate duplicate static-string checks currently duplicated in
plan-layer-e2e.test.ts into a single static test file if you must keep any
source-text assertions.

In `@packages/opencode/test/altimate/performance-regression.test.ts`:
- Around line 151-227: The static-analysis warning about using new
RegExp(`\\b${w}\\b`) is a false positive here because rejectionWords (e.g.,
rejectionWords = ["no"]) is a hardcoded array with no user-controlled input;
keep the current logic but to address the warning and improve benchmark
accuracy, precompile the word regexes once (e.g., build an array of RegExp from
rejectionWords before the loops) and use those prebuilt regex objects inside the
tests (references: rejectionWords, new RegExp usage, and the loops that compute
isRejectionWord).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cd620261-1b68-4d91-b8cd-d8c80d4565ca

📥 Commits

Reviewing files that changed from the base of the PR and between e5af62e and a5b4e44.

📒 Files selected for processing (15)
  • packages/opencode/src/altimate/telemetry/index.ts
  • packages/opencode/src/altimate/tools/post-connect-suggestions.ts
  • packages/opencode/src/altimate/tools/project-scan.ts
  • packages/opencode/src/altimate/tools/schema-inspect.ts
  • packages/opencode/src/altimate/tools/sql-analyze.ts
  • packages/opencode/src/altimate/tools/sql-execute.ts
  • packages/opencode/src/altimate/tools/warehouse-add.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/tool/skill.ts
  • packages/opencode/test/altimate/feature-discovery-e2e.test.ts
  • packages/opencode/test/altimate/performance-regression.test.ts
  • packages/opencode/test/altimate/plan-refinement.test.ts
  • packages/opencode/test/session/plan-layer-e2e.test.ts
  • packages/opencode/test/telemetry/plan-skill-telemetry.test.ts
  • packages/opencode/test/telemetry/telemetry.test.ts
✅ Files skipped from review due to trivial changes (2)
  • packages/opencode/test/telemetry/telemetry.test.ts
  • packages/opencode/src/altimate/tools/sql-analyze.ts
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/opencode/src/altimate/tools/sql-execute.ts
  • packages/opencode/src/altimate/tools/schema-inspect.ts
  • packages/opencode/src/altimate/tools/warehouse-add.ts
  • packages/opencode/test/telemetry/plan-skill-telemetry.test.ts
  • packages/opencode/test/altimate/plan-refinement.test.ts
  • packages/opencode/src/altimate/tools/post-connect-suggestions.ts
  • packages/opencode/src/altimate/tools/project-scan.ts

anandgupta42 and others added 2 commits March 28, 2026 20:43
The mongodb driver (#482) was added to optionalDependencies only,
so `tsgo --noEmit` failed with TS2307 when checking the drivers
package. Adding it to devDependencies ensures types are available
during CI typecheck without making it a runtime requirement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
})
const isRejection = isRejectionPhrase || isRejectionWord
const isApproval = !isRejection && !hasRefinementQualifier && approvalPhrases.some((phrase) => userText.includes(phrase))
const action = isRejection ? "reject" : isApproval ? "approve" : "refine"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question
Not used this mode in altimate-code - But the rejection / refine makes it continue in plan mode. Approval allows it to exit to builder / whichever mode it was?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — right now all three actions (approve/reject/refine) are telemetry-only. The plan agent continues in plan mode regardless of classification. The session doesn't auto-switch to builder on approval.

This is intentional for this PR — the classification gives us data to understand user intent (e.g., how often users approve vs iterate), which we'll use to design the auto-switch behavior in a follow-up (partially addresses #355).

In a future PR we can use action === "approve" to trigger an actual mode transition to builder with the plan as context.

})
}
} catch {
// Suggestions must never break the add flow
Copy link
Copy Markdown
Contributor

@suryaiyer95 suryaiyer95 Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion
Can add telemetry on these failures?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — added core_failure telemetry with error_class: "suggestion_failure" in the catch block. This way we can track how often suggestions fail and what errors occur, without breaking the warehouse add flow.

Pushed in the latest commit.

Copy link
Copy Markdown
Contributor

@suryaiyer95 suryaiyer95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments o/w looks good.

Is it possible to ask for feedback as a form instead of text matching?

  1. Approved
  2. Refine with feedback
  3. Decline with Feedback?

Since warehouse_add is going to be rare / infrequent. We could keep 1.5 seconds or increase it to slightly higher.

Per Surya's review — the catch block in warehouse-add now emits a
`core_failure` event with `error_class: "internal"` and
`input_signature: "post_connect_suggestions"` so we can monitor
how often post-connect suggestions fail and why.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM If we also could see the full plan as md file would also be great, that is easier to read than on CLI in IDE.

@sai-charan-c
Copy link
Copy Markdown

LGTM If we also could see the full plan as md file would also be great, that is easier to read than on CLI in IDE.

Agree with Michiel. For scenarios like when the plan is too complex the MD file would be a good approach

Exercises the real code paths with realistic inputs:

- 84 phrase classification scenarios: approval (20), rejection (24),
  refinement (16), qualifier overrides (12), word boundary (12)
- 13 warehouse suggestion configs: 8 warehouse types x indexed/dbt/multi
- 8 progressive disclosure chains: full progression, dedup, interleaved
- 6 revision cap simulations: mixed actions, all rejections, cap behavior
- 5 performance stress: 10k classifications < 500ms, determinism checks
- 10 adversarial: unicode lookalikes, 50KB, SQL injection, null bytes

All 125 pass in 255ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42
Copy link
Copy Markdown
Contributor Author

Good point — the plan agent already writes to .opencode/plans/<timestamp>-<slug>.md (see Session.plan() in session/index.ts:340-345). The MD file is created when the plan is finalized.

What's new in this PR is the two-step outline approach: the agent first presents a brief 3-5 bullet outline in the chat, gets user confirmation, then writes the full detailed plan to the MD file. So you get both:

  1. Quick outline in CLI for fast feedback
  2. Full plan as a readable MD file for complex scenarios

If you want to open the MD file directly, you can find it at .opencode/plans/ in your project root. We could consider adding a /plan --open command to auto-open it in your editor — good follow-up item.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
packages/opencode/test/session/simulation-100-scenarios.test.ts (2)

606-616: Misleading test name: this isn't testing true concurrency.

JavaScript/Bun is single-threaded, so this test runs sequential interleaved calls rather than concurrent ones. The Set.clear() and Set.add() operations can't actually race in this environment. Consider renaming to something like "interleaved reset + read doesn't crash" to accurately describe what's being tested.

Also, expect(true).toBe(true) on line 615 is effectively a no-op. If the goal is just to verify no exception was thrown, you could remove the assertion entirely or add a comment explaining the intent.

Proposed rename
-  test("concurrent reset + read doesn't crash", () => {
-    // Simulate race condition
+  test("interleaved reset + read doesn't crash", () => {
+    // Sequential interleaved operations (not true concurrency in single-threaded JS)
     for (let i = 0; i < 100; i++) {
       PostConnectSuggestions.resetShownSuggestions()
       PostConnectSuggestions.getProgressiveSuggestion("sql_execute")
       PostConnectSuggestions.resetShownSuggestions()
       PostConnectSuggestions.getProgressiveSuggestion("sql_execute")
     }
-    // If we got here, no crash
-    expect(true).toBe(true)
+    // No assertion needed - test passes if no exception thrown
   })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/simulation-100-scenarios.test.ts` around lines
606 - 616, The test name is misleading because it does not run real concurrency;
rename the test from "concurrent reset + read doesn't crash" to something like
"interleaved reset + read doesn't crash" (locate the jest test definition in
simulation-100-scenarios.test.ts) and replace or remove the no-op assertion
expect(true).toBe(true); — either remove the assertion entirely and keep a
comment stating the test ensures no exception was thrown, or add a meaningful
assertion that verifies PostConnectSuggestions state/behavior after the
interleaved calls (use PostConnectSuggestions.resetShownSuggestions and
PostConnectSuggestions.getProgressiveSuggestion references to locate relevant
calls).

17-35: Consider importing the classifier from production code instead of reimplementing it.

The test reimplements classifyPlanAction logic from prompt.ts (lines 663-683). If the production implementation changes, these tests could pass while production behavior diverges. Consider either:

  1. Exporting classifyPlanAction from production and importing it here, or
  2. Adding a comment explaining why duplication is intentional (e.g., testing the algorithm in isolation)

Regarding the static analysis ReDoS warning on line 28: This is a false positive in this context since rejectionWords only contains ["no"], a static controlled value with no special regex characters. However, if the array were ever extended with user-controlled or complex patterns, this could become a concern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/simulation-100-scenarios.test.ts` around lines
17 - 35, The test reimplements the classifyPlanAction logic from production
which can drift; either export the production function and import it into this
test (replace the local classifyPlanAction with the imported function) or add a
brief comment above the local implementation stating duplication is intentional
and must stay in sync with the production classifyPlanAction; additionally, to
silence the static ReDoS concern in the rejectionWords-to-RegExp usage, replace
dynamic RegExp construction with a safe word-boundary check using a fixed,
escaped token or use a simple split/array match for "no" (i.e., avoid building
RegExp from potentially variable input) so the test remains safe if the word
list changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/opencode/test/session/simulation-100-scenarios.test.ts`:
- Around line 606-616: The test name is misleading because it does not run real
concurrency; rename the test from "concurrent reset + read doesn't crash" to
something like "interleaved reset + read doesn't crash" (locate the jest test
definition in simulation-100-scenarios.test.ts) and replace or remove the no-op
assertion expect(true).toBe(true); — either remove the assertion entirely and
keep a comment stating the test ensures no exception was thrown, or add a
meaningful assertion that verifies PostConnectSuggestions state/behavior after
the interleaved calls (use PostConnectSuggestions.resetShownSuggestions and
PostConnectSuggestions.getProgressiveSuggestion references to locate relevant
calls).
- Around line 17-35: The test reimplements the classifyPlanAction logic from
production which can drift; either export the production function and import it
into this test (replace the local classifyPlanAction with the imported function)
or add a brief comment above the local implementation stating duplication is
intentional and must stay in sync with the production classifyPlanAction;
additionally, to silence the static ReDoS concern in the
rejectionWords-to-RegExp usage, replace dynamic RegExp construction with a safe
word-boundary check using a fixed, escaped token or use a simple split/array
match for "no" (i.e., avoid building RegExp from potentially variable input) so
the test remains safe if the word list changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 17724b1a-10fb-469a-8b30-0ce8d77ed402

📥 Commits

Reviewing files that changed from the base of the PR and between 17f4a19 and 3b78d42.

📒 Files selected for processing (1)
  • packages/opencode/test/session/simulation-100-scenarios.test.ts

@sai-charan-c
Copy link
Copy Markdown

Plan refinement loop: users can edit/iterate instead of restarting (capped at 5 revisions)
The 5-revision cap — does it apply to major revisions only, or both major and minor plan revisions? Is the capped limits per day or session?

Spawns actual tool execute() functions (not mocked) with registered
Dispatcher handlers to simulate real user tool invocations:

Warehouse Add (18 scenarios):
- 8 warehouse types with post-connect suggestions
- Schema indexed/not-indexed variations
- Multi-warehouse data_diff suggestion
- Failure modes: add fails, throws, missing type
- Resilience: schema.cache_status fails, warehouse.list fails
- Timeout: slow dispatcher (3s) races against 1.5s timeout

SQL Execute (6 scenarios):
- First call gets suggestion, subsequent calls deduped
- 10 consecutive calls — only first has hint
- Failure and empty result handling
- Blocked query (DROP DATABASE) throws

SQL Analyze (4 scenarios):
- First call suggests schema_inspect, second deduped
- Parse error and analyzer failure handling

Schema Inspect (3 scenarios):
- First call suggests lineage_check, second deduped
- Failure handling

Schema Index (3 scenarios):
- Lists all capabilities on first call
- Dedup on second, failure handling

Full User Journeys (4 scenarios):
- Complete 5-tool chain: warehouse_add → schema_index → sql_execute → sql_analyze → schema_inspect
- 20 repeated queries with dedup verification
- Interleaved tool calls with independent dedup
- All dispatchers failing — warehouse add still succeeds

Performance (2 scenarios):
- Warehouse add < 500ms with fast dispatchers
- 50 consecutive sql_execute < 2s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42
Copy link
Copy Markdown
Contributor Author

Good question — the 5-revision cap is per session, not per day. It counts every user message after the initial plan is written, regardless of whether it's a major restructuring or a minor tweak.

The counter (planRevisionCount) resets when a new session starts. So a user can:

  • Session 1: refine plan 5 times → cap reached → start new session
  • Session 2: get 5 fresh revisions

When the cap is hit, the LLM is told via a synthetic system message to inform the user and suggest either finalizing the plan or starting a new planning session. The cap_reached action is also tracked in telemetry so we can see how often users hit it and adjust the limit if needed.

If 5 turns out to be too restrictive based on telemetry data, we can make it configurable later.

@anandgupta42
Copy link
Copy Markdown
Contributor Author

CI Status

TypeScript CI fails with 112 test failures — all pre-existing, none from this PR.

The failures are cross-file Telemetry module loading conflicts (e.g. telemetry.computeInputSignature, telemetry.maskArgs, tool finding extraction patterns). These same 112 tests also fail on main (run 23701050319, run 23700842606).

Our test results in CI:

  • Pass count increased from 5583 → 5623 (+40 new tests from this PR)
  • Zero of our 413 tests appear in the failure list
  • Marker Guard: ✅ pass
  • check-compliance: ✅ pass
  • check-standards: ✅ pass

GitGuardian: False positive — flagged passwords in connections.test.ts are test fixtures (test-placeholder, 123 as any, undefined). These came from the merge with main, not our changes.

…events

- agent-modes.md: Expanded Plan section with two-step workflow (outline
  then expand), refinement loop (approve/refine/reject), 5-revision cap,
  and example conversation
- telemetry.md: Added plan_revision and feature_suggestion events,
  updated skill_used with trigger field
- warehouses.md: Added "Post-Connection Suggestions" section covering
  progressive disclosure chain and once-per-session dedup
- getting-started.md: Added feature suggestions mention after /discover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
packages/opencode/test/session/real-tool-simulation.test.ts (3)

248-255: Consider using Bun's native rejection assertion.

The expect(true).toBe(false) pattern for unreachable code works but is less idiomatic than using expect().rejects:

♻️ Suggested refactor using `.rejects.toThrow()`
-  test("S24: blocked query (DROP DATABASE) throws, no suggestion", async () => {
-    try {
-      await execSqlExecute("DROP DATABASE production")
-      expect(true).toBe(false) // Should not reach here
-    } catch (e: any) {
-      expect(e.message).toContain("blocked")
-    }
-  })
+  test("S24: blocked query (DROP DATABASE) throws, no suggestion", async () => {
+    await expect(execSqlExecute("DROP DATABASE production")).rejects.toThrow("blocked")
+  })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/real-tool-simulation.test.ts` around lines 248
- 255, Replace the manual try/catch with a native Jest/Bun rejection assertion:
instead of calling execSqlExecute("DROP DATABASE production") inside try/catch
and asserting expect(true).toBe(false) in the unreachable branch, use await
expect(execSqlExecute("DROP DATABASE production")).rejects.toThrow(/blocked/);
reference the test "S24: blocked query (DROP DATABASE) throws, no suggestion"
and the execSqlExecute function to locate the code to change.

292-303: Comment seems contradictory but test logic is correct.

The test title says "no suggestion" but lines 301-302 explain the actual behavior: parse errors returned in the payload (not thrown) still get the progressive suggestion on first call. The assertion on line 302 confirms this. Consider rewording the test name for clarity:

✏️ Suggested test name clarification
-  test("S27: sql_analyze with parse error — no suggestion", async () => {
+  test("S27: sql_analyze with parse error in payload — still gets first-call suggestion", async () => {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/real-tool-simulation.test.ts` around lines 292
- 303, The test title is misleading: change the test description string in the
test call so it reflects that parse errors returned in the payload still produce
a suggestion on the first call; update the "S27: sql_analyze with parse error —
no suggestion" test name to something like "S27: sql_analyze with parse error —
returns parse error but provides suggestion on first call" (the test block
surrounding Dispatcher.register(...), execSqlAnalyze("SELCT * FORM users") and
the expect assertions should remain unchanged).

572-572: Scenario count comment is slightly off.

The comment says "25 (warehouse)" but the warehouse section has ~18 numbered scenarios plus 8 parameterized type variations. This is a minor documentation nit.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opencode/test/session/real-tool-simulation.test.ts` at line 572, The
summary comment "Total scenarios: 25 (warehouse) + 15 (sql_execute) + 10
(sql_analyze) + 10 (schema_inspect) + 10 (schema_index) + 4 (journeys) + 2
(perf) ≈ 100+" is inaccurate for the warehouse block; update the comment string
"Total scenarios: 25 (warehouse)..." to use the correct warehouse count (18
numbered scenarios + 8 parameterized variations = 26) and recalc the total to
"26 (warehouse) + 15 (sql_execute) + 10 (sql_analyze) + 10 (schema_inspect) + 10
(schema_index) + 4 (journeys) + 2 (perf) = 77" (or similar accurate phrasing) so
the comment matches the actual scenario counts.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/docs/data-engineering/agent-modes.md`:
- Line 166: Add a language tag to the fenced code block that starts with the
lines "You: Plan a migration of our raw_events table from a view to an
incremental model" so markdownlint rule MD040 is satisfied; change the opening
fence from ``` to ```text (or another appropriate language like ```bash) so the
block is annotated, leaving the closing ``` as-is and not altering the block
contents.

---

Nitpick comments:
In `@packages/opencode/test/session/real-tool-simulation.test.ts`:
- Around line 248-255: Replace the manual try/catch with a native Jest/Bun
rejection assertion: instead of calling execSqlExecute("DROP DATABASE
production") inside try/catch and asserting expect(true).toBe(false) in the
unreachable branch, use await expect(execSqlExecute("DROP DATABASE
production")).rejects.toThrow(/blocked/); reference the test "S24: blocked query
(DROP DATABASE) throws, no suggestion" and the execSqlExecute function to locate
the code to change.
- Around line 292-303: The test title is misleading: change the test description
string in the test call so it reflects that parse errors returned in the payload
still produce a suggestion on the first call; update the "S27: sql_analyze with
parse error — no suggestion" test name to something like "S27: sql_analyze with
parse error — returns parse error but provides suggestion on first call" (the
test block surrounding Dispatcher.register(...), execSqlAnalyze("SELCT * FORM
users") and the expect assertions should remain unchanged).
- Line 572: The summary comment "Total scenarios: 25 (warehouse) + 15
(sql_execute) + 10 (sql_analyze) + 10 (schema_inspect) + 10 (schema_index) + 4
(journeys) + 2 (perf) ≈ 100+" is inaccurate for the warehouse block; update the
comment string "Total scenarios: 25 (warehouse)..." to use the correct warehouse
count (18 numbered scenarios + 8 parameterized variations = 26) and recalc the
total to "26 (warehouse) + 15 (sql_execute) + 10 (sql_analyze) + 10
(schema_inspect) + 10 (schema_index) + 4 (journeys) + 2 (perf) = 77" (or similar
accurate phrasing) so the comment matches the actual scenario counts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c8c8bbfc-ecb7-4eb6-948e-4cb3bbcce1b7

📥 Commits

Reviewing files that changed from the base of the PR and between 3b78d42 and 3f5fcb3.

📒 Files selected for processing (5)
  • docs/docs/configure/warehouses.md
  • docs/docs/data-engineering/agent-modes.md
  • docs/docs/getting-started.md
  • docs/docs/reference/telemetry.md
  • packages/opencode/test/session/real-tool-simulation.test.ts
✅ Files skipped from review due to trivial changes (2)
  • docs/docs/configure/warehouses.md
  • docs/docs/reference/telemetry.md


### Example conversation

```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language tag to the fenced example block.

The code fence at Line 166 is missing a language identifier (MD040). Please annotate it (for example, ```text) to satisfy markdownlint consistently.

Suggested fix
-```
+```text
 You: Plan a migration of our raw_events table from a view to an incremental model
 ...
-```
+```
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 166-166: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/docs/data-engineering/agent-modes.md` at line 166, Add a language tag to
the fenced code block that starts with the lines "You: Plan a migration of our
raw_events table from a view to an incremental model" so markdownlint rule MD040
is satisfied; change the opening fence from ``` to ```text (or another
appropriate language like ```bash) so the block is annotated, leaving the
closing ``` as-is and not altering the block contents.

anandgupta42 and others added 4 commits March 28, 2026 22:34
…ution

Root cause of 112 CI failures: three test files used Bun's
mock.module() to replace the Telemetry module with a partial mock.
mock.module() is process-global in Bun — it persisted across ALL
test files, causing Telemetry functions (categorizeToolName,
classifyError, bucketCount, computeInputSignature, maskArgs, etc.)
to be undefined for any test file loaded after the mock.

Fix: replaced mock.module() with spyOn() + afterEach(mock.restore())
in post-connect-suggestions.test.ts, performance-regression.test.ts,
and feature-discovery-e2e.test.ts. spyOn is per-function and properly
cleaned up between tests.

Result: 5740 pass, 0 fail (was 112 fail) across 285 files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- post-connect-suggestions.test.ts: add resetShownSuggestions() in
  beforeEach to prevent cross-file dedup state from causing the
  "after sql_execute suggests sql_analyze" test to fail in CI
- connections.test.ts: replace "secret", "pw123", "ssh-pw",
  "access-token-123" with obviously fake test values to resolve
  GitGuardian false positives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace remaining flagged values: "passphrase", "oauth-token",
"client-secret", "my_secret", "my-passphrase", dapi_secret with
obviously fake test-prefixed placeholders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anandgupta42 anandgupta42 merged commit c5e9cfa into main Mar 29, 2026
9 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-generate plan before multi-step execution

4 participants