fix: defensive null guards in tool formatters and DuckDB concurrent access retry#571
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 16 minutes and 36 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (11)
📝 WalkthroughWalkthroughThis PR adds null-safe fallbacks across eight Altimate tools to avoid literal "undefined" output, captures and surfaces error messages in tool metadata, hardens the DuckDB driver with lock detection and a READ_ONLY retry for file databases, and introduces a large simulation test suite plus an E2E simulation harness. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Driver as DuckDB Driver
participant Engine as DuckDB Engine
participant DB as Database File
rect rgba(100,150,200,0.5)
Note over Client,Engine: Initial connect attempt
Client->>Driver: connect(dbPath)
Driver->>Engine: create/open database (default mode)
Engine->>DB: try acquire write lock
alt Lock acquired
Engine-->>Driver: success
Driver-->>Client: connected
else Locked / SQLITE_BUSY
Engine-->>Driver: error ("SQLITE_BUSY" / "locked")
Driver->>Driver: wrapDuckDBError -> mark DUCKDB_LOCKED
end
end
rect rgba(200,150,100,0.5)
Note over Driver,DB: Fallback for non-:memory: DB
alt dbPath != ":memory:"
Driver->>Engine: reconnect in READ_ONLY mode
Engine->>DB: acquire read lock
alt Read lock acquired
Engine-->>Driver: success (read-only)
Driver-->>Client: connected (READ_ONLY)
else Still locked or other error
Engine-->>Driver: error
Driver-->>Client: throw standardized lock error
end
else :memory: database
Driver-->>Client: rethrow original lock error
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
️✅ There are no secrets present in this pull request anymore.If these secrets were true positive and are still valid, we highly recommend you to revoke them. 🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request. |
4448833 to
71d2ab0
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
packages/opencode/src/altimate/tools/dbt-manifest.ts (1)
17-24:⚠️ Potential issue | 🟡 MinorUse the returned array lengths before falling back to
0.If
dbt.manifestomitsmodel_countorsource_countbut still returnsmodels/sources, the updated title and summary now say0while the body lists entries below. Reusing the array lengths keeps the header, metadata, and body consistent.Suggested fix
async execute(args, ctx) { try { const result = await Dispatcher.call("dbt.manifest", { path: args.path }) + const modelCount = result.model_count ?? result.models?.length ?? 0 + const sourceCount = result.source_count ?? result.sources?.length ?? 0 return { - title: `Manifest: ${result.model_count ?? 0} models, ${result.source_count ?? 0} sources`, + title: `Manifest: ${modelCount} models, ${sourceCount} sources`, metadata: { - model_count: result.model_count, - source_count: result.source_count, + model_count: modelCount, + source_count: sourceCount, test_count: result.test_count, snapshot_count: result.snapshot_count, seed_count: result.seed_count, }, output: formatManifest(result), @@ function formatManifest(result: DbtManifestResult): string { const lines: string[] = [] + const models = result.models ?? [] + const sources = result.sources ?? [] + const modelCount = result.model_count ?? models.length + const sourceCount = result.source_count ?? sources.length lines.push("=== Project Summary ===") - lines.push(`Models: ${result.model_count ?? 0}`) - lines.push(`Sources: ${result.source_count ?? 0}`) + lines.push(`Models: ${modelCount}`) + lines.push(`Sources: ${sourceCount}`) lines.push(`Tests: ${result.test_count ?? 0}`) lines.push(`Snapshots: ${result.snapshot_count ?? 0}`) lines.push(`Seeds: ${result.seed_count ?? 0}`) - - const models = result.models ?? [] - const sources = result.sources ?? []Also applies to: 42-49, 51-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/tools/dbt-manifest.ts` around lines 17 - 24, The title and metadata currently use result.model_count and result.source_count (and similarly test_count, snapshot_count, seed_count) which can be undefined even when result.models/result.sources/result.tests/result.snapshots/result.seeds arrays exist; update the title and metadata to use the array lengths as fallbacks (e.g., use (result.model_count ?? result.models?.length ?? 0) and (result.source_count ?? result.sources?.length ?? 0), and likewise for test_count/snapshot_count/seed_count using result.tests?.length, result.snapshots?.length, result.seeds?.length) so the header, metadata, and body remain consistent; apply this change wherever the code constructs the title string and the metadata object in dbt-manifest.ts.packages/opencode/src/altimate/tools/sql-analyze.ts (1)
42-49:⚠️ Potential issue | 🟡 MinorReuse the normalized count and confidence in the header and metadata.
formatAnalysis()already falls back toissues.length, but the title still falls back to0and metadata keeps the raw nullable values. Whenissue_countorconfidenceis omitted, the header/metadata can disagree with the body.Suggested fix
// The handler returns success=true when analysis completes (issues are // reported via issues/issue_count). Only treat it as a failure when // there's an actual error (e.g. parse failure). const isRealFailure = !!result.error + const issueCount = result.issue_count ?? (result.issues ?? []).length + const confidence = result.confidence ?? "unknown" // altimate_change start — sql quality findings for telemetry const findings: Telemetry.Finding[] = (result.issues ?? []).map((issue) => ({ category: issue.rule ?? issue.type ?? "analysis_issue", })) @@ return { - title: `Analyze: ${result.error ? "ERROR" : `${result.issue_count ?? 0} issue${(result.issue_count ?? 0) !== 1 ? "s" : ""}`} [${result.confidence ?? "unknown"}]`, + title: `Analyze: ${result.error ? "ERROR" : `${issueCount} issue${issueCount !== 1 ? "s" : ""}`} [${confidence}]`, metadata: { success: !isRealFailure, - issueCount: result.issue_count, - confidence: result.confidence, + issueCount, + confidence, dialect: args.dialect, has_schema: hasSchema, ...(result.error && { error: result.error }), ...(findings.length > 0 && { findings }),Also applies to: 78-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/src/altimate/tools/sql-analyze.ts` around lines 42 - 49, formatAnalysis() already normalizes missing values (e.g. falling back to issues.length), but the title and metadata still use raw nullable fields (result.issue_count and result.confidence) so they can disagree with the body; update the header construction and metadata to reuse the normalized values returned by formatAnalysis (e.g. compute normalizedIssueCount and normalizedConfidence from the formatAnalysis output or its normalized properties) and use those variables in the title string and metadata.issueCount/metadata.confidence (also apply the same change to the second occurrence around the block that builds the alternate title/metadata at the later lines).
🧹 Nitpick comments (4)
test/simulation/run-e2e-simulations.sh (2)
393-393: Unusedwarehousesarray.The
warehousesarray is declared but never referenced. Line 460 uses a different inline list for schema operations. Either remove this declaration or use it in the bulk loop.Option 1: Remove unused variable
- local warehouses=(green_taxi_duckdb nyc_taxi_duckdb jaffle_shop_dev github_artifacts molecular_database mb_sales_db snowflake_test)Option 2: Use the variable
- for wh in green_taxi_duckdb jaffle_shop_dev github_artifacts mb_sales_db; do + for wh in "${warehouses[@]:0:4}"; do # Use first 4 from warehouses array🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/simulation/run-e2e-simulations.sh` at line 393, The local variable "warehouses" is declared but never used; either remove the unused declaration or make the bulk schema-ops loop reference it. Fix option A: delete the line defining local warehouses=(...) to remove dead code. Fix option B: replace the inline list used in the schema operations loop with the array expansion "${warehouses[@]}" so the loop iterates the declared warehouses variable (update any loop header or usages that currently hard-code the list). Ensure quoting/expansion is correct for bash arrays.
510-512: Shellcheck SC2155: Declare and assign separately to preserve exit status.While the
|| echo "0"fallbacks mitigate most issues, separating declaration from assignment is cleaner:Example fix for line 511
- local session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' "$RESULTS_DIR/results.csv" 2>/dev/null | sort -u) + local session_ids + session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' "$RESULTS_DIR/results.csv" 2>/dev/null | sort -u)Apply the same pattern to lines 526, 534, 545, and 592.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/simulation/run-e2e-simulations.sh` around lines 510 - 512, The command-substitution assignments using local with fallback (e.g., local session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' "$RESULTS_DIR/results.csv" 2>/dev/null | sort -u)) should be split: declare the local variable first (local session_ids) and then assign it on the next line with the command substitution and fallback (session_ids=$(grep ... || echo "0")), preserving the original fallback behavior; apply this same declare-then-assign pattern to the other similar locals in the script (the other command-substitution locals referenced in the review) so each preserves the exit status and follows ShellCheck SC2155 guidance.packages/opencode/test/altimate/simulation-suite.test.ts (2)
1353-1356: Remove dead increment/decrement code.The
totalScenarios++followed immediately bytotalScenarios--is a no-op. The comment claims it "adjusts" forrecordResultalready counting, but the net effect is zero. This appears to be leftover debugging code.Proposed fix
const durationMs = performance.now() - start recordResult("error_handling", `${tool.name}_exception_${exception || "empty"}`, tool.name, status, durationMs, { error: errorMsg }) - totalScenarios++ // Adjust since recordResult already counted - totalScenarios-- // Undo double-count expect(status).not.toBe("error")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/simulation-suite.test.ts` around lines 1353 - 1356, Remove the dead increment/decrement around totalScenarios in the error-handling test: the consecutive totalScenarios++ and totalScenarios-- are a no-op and should be deleted so the test only calls recordResult("error_handling", ...) and then asserts expect(status).not.toBe("error"); update the block around the recordResult call (referencing recordResult and totalScenarios) to eliminate those two lines and leave the correct counting behavior to recordResult.
629-647: Mock response structure may not match actual dispatcher output.The mocks use
{ success: true, data: { columns, row_count } }butSchemaInspectToolaccessesresult.tableandresult.columnsdirectly without unwrapping.data. The defensive guards (?? [],?? args.table) make tests pass, but this validates the guards rather than normal operation.Consider adding a few tests with the expected structure to verify happy-path behavior:
// Example: happy-path mock matching expected dispatcher shape mockResponse: { table: "users", schema_name: "public", columns: [{ name: "id", data_type: "BIGINT", nullable: false, primary_key: true }], row_count: 10000, }This isn't blocking since the current structure effectively stress-tests the defensive guards added in this PR.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/opencode/test/altimate/simulation-suite.test.ts` around lines 629 - 647, The test's mockResponse uses { success: true, data: ... } but SchemaInspectTool expects the dispatcher shape with top-level fields like table, schema_name, columns, and row_count; update the runToolScenario call in simulation-suite.test.ts to include at least one happy-path mockResponse matching that shape (e.g. top-level table, schema_name, columns: [{ name, data_type, nullable, primary_key }], row_count) so the tool exercises the normal unwrapped code path (leave the existing defensive-case mock as another test or adjust assertions to verify both shapes); reference the runToolScenario invocation and SchemaInspectTool behavior when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/drivers/src/duckdb.ts`:
- Around line 21-31: The lock-error wrapping implemented in wrapDuckDBError
isn't applied to the parameterized path used by queryWithParams (which backs
execute(..., binds), listTables(), and describeTable()), so those still surface
raw driver errors; update queryWithParams to catch errors and rethrow the result
of wrapDuckDBError(err) (or return the wrapped Error) before propagating,
ensuring all calls that use queryWithParams receive the actionable "Database ...
is locked" message; reference queryWithParams and wrapDuckDBError when making
this change.
- Around line 88-97: When handling a DUCKDB_LOCKED error and retrying with
tryConnect("READ_ONLY"), don't overwrite the original retry error
unconditionally; only throw the specialized "locked" message when the retry also
fails with a lock-related error. Preserve and rethrow the actual caught error
from the inner catch (or wrap it while preserving its message) so non-lock
failures from tryConnect are visible. Update the catch in the block that checks
err.message === "DUCKDB_LOCKED" (and references dbPath and tryConnect) to
inspect the inner error and either throw the current user-friendly locked
message only for lock cases or rethrow the inner error otherwise.
- Around line 55-82: The tryConnect promise currently resolves with the
duckdb.Database instance on a 2s timeout even if the constructor callback hasn't
confirmed success, causing later open errors to be ignored; update the
setTimeout fallback in tryConnect to reject with a distinct timeout error (e.g.
new Error("DUCKDB_OPEN_TIMEOUT")) instead of resolving, so the promise only
resolves when the duckdb.Database constructor callback reports success, and
optionally ensure any partial instance is closed/cleaned up if the driver
exposes a close/destroy API; reference tryConnect and the new duckdb.Database
constructor callback when making this change.
In `@packages/opencode/src/altimate/tools/sql-analyze.ts`:
- Around line 38-40: The mapping that builds `findings` currently sets category
to `issue.rule ?? issue.type` which can be undefined; update the `findings`
construction (the const findings: Telemetry.Finding[] = ... mapping in
sql-analyze.ts) to provide a real string fallback when both `issue.rule` and
`issue.type` are missing (e.g., "unknown" or "unclassified_sql_issue") so every
Telemetry.Finding.category is always a non-empty string for the `sql_quality`
aggregation.
In `@test/simulation/run-e2e-simulations.sh`:
- Around line 377-379: The run_parallel calls are being double-backgrounded
which prevents the main shell's PIDS array from being updated; remove the
trailing ampersands from the run_parallel invocations (leave run_parallel
"edge_concurrent_1" ..., etc. without a trailing &) so that run_parallel (and
the underlying run_sim) can background and push their PIDs into PIDS properly;
verify that run_parallel, run_sim and wait_all continue to reference the shared
PIDS array.
---
Outside diff comments:
In `@packages/opencode/src/altimate/tools/dbt-manifest.ts`:
- Around line 17-24: The title and metadata currently use result.model_count and
result.source_count (and similarly test_count, snapshot_count, seed_count) which
can be undefined even when
result.models/result.sources/result.tests/result.snapshots/result.seeds arrays
exist; update the title and metadata to use the array lengths as fallbacks
(e.g., use (result.model_count ?? result.models?.length ?? 0) and
(result.source_count ?? result.sources?.length ?? 0), and likewise for
test_count/snapshot_count/seed_count using result.tests?.length,
result.snapshots?.length, result.seeds?.length) so the header, metadata, and
body remain consistent; apply this change wherever the code constructs the title
string and the metadata object in dbt-manifest.ts.
In `@packages/opencode/src/altimate/tools/sql-analyze.ts`:
- Around line 42-49: formatAnalysis() already normalizes missing values (e.g.
falling back to issues.length), but the title and metadata still use raw
nullable fields (result.issue_count and result.confidence) so they can disagree
with the body; update the header construction and metadata to reuse the
normalized values returned by formatAnalysis (e.g. compute normalizedIssueCount
and normalizedConfidence from the formatAnalysis output or its normalized
properties) and use those variables in the title string and
metadata.issueCount/metadata.confidence (also apply the same change to the
second occurrence around the block that builds the alternate title/metadata at
the later lines).
---
Nitpick comments:
In `@packages/opencode/test/altimate/simulation-suite.test.ts`:
- Around line 1353-1356: Remove the dead increment/decrement around
totalScenarios in the error-handling test: the consecutive totalScenarios++ and
totalScenarios-- are a no-op and should be deleted so the test only calls
recordResult("error_handling", ...) and then asserts
expect(status).not.toBe("error"); update the block around the recordResult call
(referencing recordResult and totalScenarios) to eliminate those two lines and
leave the correct counting behavior to recordResult.
- Around line 629-647: The test's mockResponse uses { success: true, data: ... }
but SchemaInspectTool expects the dispatcher shape with top-level fields like
table, schema_name, columns, and row_count; update the runToolScenario call in
simulation-suite.test.ts to include at least one happy-path mockResponse
matching that shape (e.g. top-level table, schema_name, columns: [{ name,
data_type, nullable, primary_key }], row_count) so the tool exercises the normal
unwrapped code path (leave the existing defensive-case mock as another test or
adjust assertions to verify both shapes); reference the runToolScenario
invocation and SchemaInspectTool behavior when making the change.
In `@test/simulation/run-e2e-simulations.sh`:
- Line 393: The local variable "warehouses" is declared but never used; either
remove the unused declaration or make the bulk schema-ops loop reference it. Fix
option A: delete the line defining local warehouses=(...) to remove dead code.
Fix option B: replace the inline list used in the schema operations loop with
the array expansion "${warehouses[@]}" so the loop iterates the declared
warehouses variable (update any loop header or usages that currently hard-code
the list). Ensure quoting/expansion is correct for bash arrays.
- Around line 510-512: The command-substitution assignments using local with
fallback (e.g., local session_ids=$(grep -o 'ses_[a-zA-Z0-9]*'
"$RESULTS_DIR/results.csv" 2>/dev/null | sort -u)) should be split: declare the
local variable first (local session_ids) and then assign it on the next line
with the command substitution and fallback (session_ids=$(grep ... || echo
"0")), preserving the original fallback behavior; apply this same
declare-then-assign pattern to the other similar locals in the script (the other
command-substitution locals referenced in the review) so each preserves the exit
status and follows ShellCheck SC2155 guidance.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: ace4aae7-d220-46ff-bbce-cd5142b6a28d
📒 Files selected for processing (11)
packages/drivers/src/duckdb.tspackages/opencode/src/altimate/tools/altimate-core-check.tspackages/opencode/src/altimate/tools/altimate-core-rewrite.tspackages/opencode/src/altimate/tools/dbt-manifest.tspackages/opencode/src/altimate/tools/finops-analyze-credits.tspackages/opencode/src/altimate/tools/schema-inspect.tspackages/opencode/src/altimate/tools/sql-analyze.tspackages/opencode/src/altimate/tools/sql-translate.tspackages/opencode/src/altimate/tools/warehouse-list.tspackages/opencode/test/altimate/simulation-suite.test.tstest/simulation/run-e2e-simulations.sh
71d2ab0 to
c6e80a3
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/simulation/run-e2e-simulations.sh (1)
377-379:⚠️ Potential issue | 🟠 MajorDouble-backgrounding breaks PID tracking.
run_parallelalready backgroundsrun_siminternally (line 113) and captures the PID. Adding&after therun_parallelcalls creates a subshell, causingPIDS+=($!)to modify a copy of the array—the main shell'sPIDSwon't see these PIDs, sowait_allwon't wait for them.Proposed fix
# Concurrent warehouse access - run_parallel "edge_concurrent_1" "execute against green_taxi_duckdb: SELECT COUNT(*) FROM green_taxi" & - run_parallel "edge_concurrent_2" "execute against green_taxi_duckdb: SELECT AVG(total_amount) FROM green_taxi" & - run_parallel "edge_concurrent_3" "execute against green_taxi_duckdb: SELECT MAX(trip_distance) FROM green_taxi" & + run_parallel "edge_concurrent_1" "execute against green_taxi_duckdb: SELECT COUNT(*) FROM green_taxi" + run_parallel "edge_concurrent_2" "execute against green_taxi_duckdb: SELECT AVG(total_amount) FROM green_taxi" + run_parallel "edge_concurrent_3" "execute against green_taxi_duckdb: SELECT MAX(trip_distance) FROM green_taxi"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/simulation/run-e2e-simulations.sh` around lines 377 - 379, The three run_parallel invocations are being double-backgrounded (run_parallel already backgrounds run_sim and appends to PIDS), so the trailing ampersands create subshells and prevent the main PIDS array from receiving those PIDs, breaking wait_all; remove the trailing & characters from the run_parallel calls (the ones invoking run_parallel "edge_concurrent_1"/"edge_concurrent_2"/"edge_concurrent_3") so that run_parallel itself handles backgrounding and PID capture into PIDS, and audit any other call sites to ensure run_parallel is not externally backgrounded.
🧹 Nitpick comments (2)
test/simulation/run-e2e-simulations.sh (2)
393-393: Unused variable:warehousesarray is defined but never referenced.The array is declared but the bulk schema operations on lines 460-466 use a different inline list. Either use this variable in the loop or remove it.
Option A: Use the variable
# Bulk schema operations against real databases - for wh in green_taxi_duckdb jaffle_shop_dev github_artifacts mb_sales_db; do + for wh in "${warehouses[@]}"; do run_parallel "bulk_schema_inspect_${wh}" "inspect all tables in ${wh}"Option B: Remove unused declaration
- # Warehouses to test against - local warehouses=(green_taxi_duckdb nyc_taxi_duckdb jaffle_shop_dev github_artifacts molecular_database mb_sales_db snowflake_test) - # SQL patterns to analyze (each will be tested against each warehouse context)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/simulation/run-e2e-simulations.sh` at line 393, The declared array warehouses is never used; either remove the unused declaration or switch the bulk schema loop to reference this variable: replace the inline list used in the schema operations with the warehouses array (e.g., iterate over "${warehouses[@]}") so the array is actually consumed, or simply delete the local warehouses=(...) line to eliminate the unused variable; update the loop that performs bulk schema operations to use the warehouses identifier if choosing the first option.
511-511: Consider separating declaration from assignment to avoid masking errors.While the current code handles empty results gracefully, combining
localwith command substitution masks the command's exit status. This is a minor shell best-practice issue.Proposed fix
# Collect all session IDs from this run - local session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' "$RESULTS_DIR/results.csv" 2>/dev/null | sort -u) + local session_ids + session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' "$RESULTS_DIR/results.csv" 2>/dev/null | sort -u) || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/simulation/run-e2e-simulations.sh` at line 511, Split the declaration and command substitution to avoid masking the command exit status: declare the variable first with "local session_ids" (or "local -a session_ids" if expecting an array) and then assign it on the next line with "session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' \"$RESULTS_DIR/results.csv\" 2>/dev/null | sort -u)"; optionally check the grep pipeline's exit status ($?) after the assignment and handle errors accordingly in run-e2e-simulations.sh where session_ids is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/simulation/run-e2e-simulations.sh`:
- Around line 377-379: The three run_parallel invocations are being
double-backgrounded (run_parallel already backgrounds run_sim and appends to
PIDS), so the trailing ampersands create subshells and prevent the main PIDS
array from receiving those PIDs, breaking wait_all; remove the trailing &
characters from the run_parallel calls (the ones invoking run_parallel
"edge_concurrent_1"/"edge_concurrent_2"/"edge_concurrent_3") so that
run_parallel itself handles backgrounding and PID capture into PIDS, and audit
any other call sites to ensure run_parallel is not externally backgrounded.
---
Nitpick comments:
In `@test/simulation/run-e2e-simulations.sh`:
- Line 393: The declared array warehouses is never used; either remove the
unused declaration or switch the bulk schema loop to reference this variable:
replace the inline list used in the schema operations with the warehouses array
(e.g., iterate over "${warehouses[@]}") so the array is actually consumed, or
simply delete the local warehouses=(...) line to eliminate the unused variable;
update the loop that performs bulk schema operations to use the warehouses
identifier if choosing the first option.
- Line 511: Split the declaration and command substitution to avoid masking the
command exit status: declare the variable first with "local session_ids" (or
"local -a session_ids" if expecting an array) and then assign it on the next
line with "session_ids=$(grep -o 'ses_[a-zA-Z0-9]*' \"$RESULTS_DIR/results.csv\"
2>/dev/null | sort -u)"; optionally check the grep pipeline's exit status ($?)
after the assignment and handle errors accordingly in run-e2e-simulations.sh
where session_ids is used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 9bca526d-c531-4991-88d1-f134ae0036b4
📒 Files selected for processing (11)
packages/drivers/src/duckdb.tspackages/opencode/src/altimate/tools/altimate-core-check.tspackages/opencode/src/altimate/tools/altimate-core-rewrite.tspackages/opencode/src/altimate/tools/dbt-manifest.tspackages/opencode/src/altimate/tools/finops-analyze-credits.tspackages/opencode/src/altimate/tools/schema-inspect.tspackages/opencode/src/altimate/tools/sql-analyze.tspackages/opencode/src/altimate/tools/sql-translate.tspackages/opencode/src/altimate/tools/warehouse-list.tspackages/opencode/test/altimate/simulation-suite.test.tstest/simulation/run-e2e-simulations.sh
✅ Files skipped from review due to trivial changes (4)
- packages/opencode/src/altimate/tools/altimate-core-rewrite.ts
- packages/opencode/src/altimate/tools/altimate-core-check.ts
- packages/opencode/src/altimate/tools/sql-translate.ts
- packages/opencode/src/altimate/tools/dbt-manifest.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- packages/opencode/src/altimate/tools/warehouse-list.ts
- packages/opencode/src/altimate/tools/sql-analyze.ts
- packages/drivers/src/duckdb.ts
- packages/opencode/src/altimate/tools/schema-inspect.ts
- packages/opencode/test/altimate/simulation-suite.test.ts
b7886b5 to
4342c2f
Compare
…ccess retry (#570) - Add null/undefined guards across 8 tool formatters to prevent literal `undefined` in user-facing output (sql-analyze, schema-inspect, sql-translate, dbt-manifest, finops-analyze-credits, warehouse-list, altimate-core-check, altimate-core-rewrite) - Add `error: msg` to catch block metadata in schema-inspect, dbt-manifest, warehouse-list so telemetry can classify exceptions - DuckDB driver: auto-retry in `READ_ONLY` mode on `database is locked` errors, with clear actionable error message - Add simulation suite (839 mock + 346 real E2E scenarios) covering 10 personas x 11 dialects x 14 use-case categories Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4342c2f to
6157ecd
Compare
…ts (#564) * feat: add implicit quality signal telemetry event Add `task_outcome_signal` event that maps agent outcomes to behavioral signals (accepted/error/abandoned/cancelled). Emitted alongside `agent_outcome` at session end with zero user cost — pure client-side computation from data already in memory. - New event type with `signal`, `tool_count`, `step_count`, `duration_ms`, `last_tool_category` fields - Exported `deriveQualitySignal()` for testable outcome→signal mapping - MCP tool detection via `mcp__` prefix for accurate categorization - 8 unit tests covering all signal derivations and event shape Closes AI-6028 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add task intent classification telemetry event Add `task_classified` event emitted at session start with keyword/regex classification of the first user message. Categories: debug_dbt, write_sql, optimize_query, build_model, analyze_lineage, explore_schema, migrate_sql, manage_warehouse, finops, general. - `classifyTaskIntent()` — pure regex matcher, zero LLM cost, <1ms - Includes warehouse type from fingerprint cache - Strong/weak confidence levels (1.0 vs 0.5) - 15 unit tests covering all intent categories + edge cases Closes AI-6029 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit aggregated tool chain outcome at session end Add `tool_chain_outcome` event that captures the ordered tool sequence, error count, recovery count, and final outcome at session end. Only emitted when tools were actually used (non-empty chain). - Tracks up to 50 tool names in execution order - Detects error→success recovery patterns for auto-fix insights - Aggregates existing per-tool-call data — near-zero additional cost - 3 unit tests for event shape and error/recovery tracking Closes AI-6030 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: link error-recovery pairs with hashed fingerprint Add `error_fingerprint` event emitted per unique error at session end. SHA256-hashes normalized error messages for anonymous grouping, links each error to its recovery tool (if the next tool succeeded). - `hashError()` — 16-char hex hash of masked error messages - Tracks error→recovery pairs during tool chain execution - Capped at 20 fingerprints per session to bound telemetry volume - 4 unit tests for hashing, event shape, and recovery tracking Closes AI-6031 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit SQL structure fingerprint using altimate-core Add `sql_fingerprint` event emitted after successful SQL execution via `sql_execute`. Uses `extractMetadata()` + `getStatementTypes()` from altimate-core NAPI — local parsing, no API calls, ~1-5ms. - Captures: statement types, categories, table/function count, subqueries, aggregation, window functions, AST node count - No table/column names or SQL content — PII-safe by design - Wrapped in try/catch so fingerprinting never breaks query execution - `computeSqlFingerprint()` exported from sql-classify for reuse - 6 unit tests including PII safety verification Closes AI-6032 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: expand environment_census with dbt project fingerprint Add optional dbt project metrics to the existing `environment_census` event: snapshot/seed count buckets, materialization distribution (table/view/incremental/ephemeral counts). Data already parsed at startup — just extracts more fields from the same manifest parse. - Backward compatible — new fields are optional - No extra file reads or API calls Closes AI-6033 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: emit schema complexity signal during warehouse introspection Add `schema_complexity` event emitted alongside `warehouse_introspection` after successful schema indexing. Uses data already computed during introspection — no extra warehouse queries. - Bucketed table/column/schema counts + avg columns per table - Division-by-zero guard for empty warehouses - Emitted inside existing try/catch — never breaks introspection Closes AI-6034 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update telemetry reference with 7 new intelligence signals Add task_outcome_signal, task_classified, tool_chain_outcome, error_fingerprint, sql_fingerprint, schema_complexity to the event catalog. Update environment_census description for new dbt fields. Update naming convention section. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add comprehensive integration tests for telemetry signals Add 38 integration tests that verify all 7 telemetry signals fire through real code paths with spy on Telemetry.track(): - Signal 1: quality signal derivation + error/abandoned/cancelled cases - Signal 2: intent classifier with 10 real DE prompts + PII safety - Signal 3: tool chain collection with error recovery state machine - Signal 4: error fingerprint hashing + consecutive error flush - Signal 5: SQL fingerprint via altimate-core (aggregation, CTE, DDL) - Signal 6: environment_census expansion + backward compatibility - Signal 7: schema complexity bucketing + zero-table edge case - Full E2E: complete session simulation with all 7 signals in order Also fixes regex patterns for natural language flexibility: - dbt debug: allows words between "dbt" and error keywords - migrate: allows words between "to/from" and warehouse name Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add altimate-core failure isolation tests Verify computeSqlFingerprint resilience when altimate-core NAPI: - throws (segfault, OOM) — returns null, never leaks exception - returns undefined — uses safe defaults (empty arrays, 0 counts) - returns garbage data — handled gracefully via ?? fallbacks Also verifies sql-execute.ts code structure ensures fingerprinting runs AFTER query result and is wrapped in isolated try/catch. Tests crash-resistant SQL inputs (control chars, empty, incomplete, very wide queries) and deterministic output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address stakeholder review findings Fixes from 5-stakeholder review (architect, privacy, perf, markers, tests): - Marker fix: remove nested altimate_change start/end, fold new variables into existing session telemetry tracking block - Performance: cap errorRecords at 200 entries (prevent unbounded growth) - Performance: slice intent classifier input to 2000 chars (bound regex) - Architecture: fix import path in sql-execute.ts (../telemetry not ../../altimate/telemetry) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: bump altimate-core to 0.2.6 Picks up extractMetadata fixes: - Aggregate function names (COUNT, SUM, AVG, etc.) now in functions array - IN (SELECT ...) and EXISTS (SELECT ...) subquery detection - Any/All quantified comparison subquery detection (guarded) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- altimate-core NAPI binding: set `NODE_PATH` to global npm root so
`require('@altimateai/altimate-core')` resolves after `npm install -g`
- upstream branding: replace "opencode" with "altimate-code" in user-facing
`describe` strings (uninstall, tui, pr commands, config, server API docs)
- driver resolvability: set `NODE_PATH` in driver check loop and install
`duckdb` alongside the main package so at least one peer dep is present
- hardcoded CI paths: restrict grep to JS/JSON files only — compiled Bun
binaries embed build-machine paths in debug info which is unavoidable
- NAPI module exports: already had correct `NODE_PATH` in extended test;
root cause was the base test (fix 1) which is now resolved
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…siveness-and-duckdb-lock # Conflicts: # .github/meta/commit.txt
What does this PR do?
Fixes tool output formatting bugs and DuckDB concurrent access crashes found via 1,185 automated simulations (839 mock + 346 real E2E) across 10 personas, 11 dialects, and 14 use-case categories.
Three categories of fixes:
Null guard defensiveness (8 tools) — When native handlers return missing/optional fields, tool formatters displayed literal
undefinedin user-facing output (e.g.,[high] undefined: undefinedin safety analysis). Added?? defaultValueguards across all formatter string interpolations.DuckDB concurrent access retry — DuckDB crashes with
database is lockedwhen multiple sessions access the same file. Added auto-retry inREAD_ONLYmode on lock errors with clear actionable error message.Missing error propagation —
schema-inspect,dbt-manifest, andwarehouse-listcatch blocks didn't includeerror: msgin metadata, so telemetry couldn't classify exceptions.Files changed:
packages/drivers/src/duckdb.ts— read-only retry + improved error messagespackages/opencode/src/altimate/tools/sql-analyze.ts— null guards on issues, confidence, severitypackages/opencode/src/altimate/tools/schema-inspect.ts— null guards + error in catchpackages/opencode/src/altimate/tools/sql-translate.ts— null guards on warningspackages/opencode/src/altimate/tools/dbt-manifest.ts— null guards + error in catchpackages/opencode/src/altimate/tools/finops-analyze-credits.ts— null guards on credits/summarypackages/opencode/src/altimate/tools/warehouse-list.ts— null guards + error in catchpackages/opencode/src/altimate/tools/altimate-core-check.ts— null guards on lint/safety/PII formatterspackages/opencode/src/altimate/tools/altimate-core-rewrite.ts— null guard on suggestion descriptionpackages/opencode/test/altimate/simulation-suite.test.ts— 839 scenario mock simulation suitetest/simulation/run-e2e-simulations.sh— E2E simulation harness (7 phases)Type of change
Issue for this PR
Closes #570
How did you verify your code works?
bun test test/altimate/simulation-suite.test.ts— tests every tool with null/undefined/malformed responses across 10 personas × 11 dialectsaltimate-codefrom source against 12 real databases (DuckDB + Snowflake) with LLM-driven sessions across 7 phasesbun turbo typecheckpassesbun run script/upstream/analyze.ts --markers --base main --strictpassesChecklist
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Tests