Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 10 additions & 12 deletions .github/meta/commit.txt
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
ci: add Verdaccio sanity suite to CI and release workflows
feat: add task intent classification telemetry event

Adds the Verdaccio-based sanity suite (real `npm install -g` flow)
to both CI and release pipelines:
Add `task_classified` event emitted at session start with keyword/regex
classification of the first user message. Categories: debug_dbt, write_sql,
optimize_query, build_model, analyze_lineage, explore_schema, migrate_sql,
manage_warehouse, finops, general.

**CI (`ci.yml`):**
- New `sanity-verdaccio` job on push to main
- Builds linux-x64 binary + dbt-tools, runs full Docker Compose suite
- Independent of other jobs (doesn't block PRs)
- `classifyTaskIntent()` — pure regex matcher, zero LLM cost, <1ms
- Includes warehouse type from fingerprint cache
- Strong/weak confidence levels (1.0 vs 0.5)
- 15 unit tests covering all intent categories + edge cases

**Release (`release.yml`):**
- New `sanity-verdaccio` job between build and npm publish
- Downloads linux-x64 artifact from build matrix
- **Blocks `publish-npm`** — broken install flow prevents release
- Dependency chain: build → sanity-verdaccio → publish-npm → github-release
Closes AI-6029

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
14 changes: 7 additions & 7 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

13 changes: 12 additions & 1 deletion docs/docs/reference/telemetry.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ We collect the following categories of events:
| `doom_loop_detected` | A repeated tool call pattern is detected (tool name and count) |
| `compaction_triggered` | Context compaction runs (strategy and token counts) |
| `tool_outputs_pruned` | Tool outputs are pruned during compaction (count) |
| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, feature flags, but no hostnames) |
| `environment_census` | Environment snapshot on project scan (warehouse types, dbt presence, dbt materialization distribution, snapshot/seed counts, feature flags, but no hostnames or project names) |
| `context_utilization` | Context window usage per generation (token counts, utilization percentage, cache hit ratio) |
| `agent_outcome` | Agent session outcome (agent type, tool/generation counts, cost, outcome status) |
| `error_recovered` | Successful recovery from a transient error (error type, strategy, attempt count) |
Expand All @@ -39,6 +39,12 @@ We collect the following categories of events:
| `sql_execute_failure` | A SQL execution fails (warehouse type, query type, error message, PII-masked SQL — no raw values) |
| `core_failure` | An internal tool error occurs (tool name, category, error class, truncated error message, PII-safe input signature, and optionally masked arguments — no raw values or credentials) |
| `first_launch` | Fired once on first CLI run after installation. Contains version and is_upgrade flag. No PII. |
| `task_outcome_signal` | Behavioral quality signal at session end — accepted, error, abandoned, or cancelled. Includes tool count, step count, duration, and last tool category. No user content. |
| `task_classified` | Intent classification of the first user message using keyword matching — category (e.g. `debug_dbt`, `write_sql`, `optimize_query`), confidence score, and detected warehouse type. No user text is sent — only the classified category. |
| `tool_chain_outcome` | Aggregated tool execution sequence at session end — ordered tool names (capped at 50), error count, recovery count, final outcome, duration, and cost. No tool arguments or outputs. |
| `error_fingerprint` | Hashed error pattern for anonymous grouping — SHA-256 hash of masked error message, error class, tool name, and whether recovery succeeded. Raw error content is never sent. |
| `sql_fingerprint` | SQL structural shape via AST parsing — statement types, table count, function count, subquery/aggregation/window function presence, and AST node count. No table names, column names, or SQL content. |
| `schema_complexity` | Warehouse schema structural metrics from introspection — bucketed table, column, and schema counts plus average columns per table. No schema names or content. |

Each event includes a timestamp, anonymous session ID, CLI version, and an anonymous machine ID (a random UUID stored in `~/.altimate/machine-id`, generated once and never tied to any personal information).

Expand Down Expand Up @@ -129,6 +135,11 @@ Event type names use **snake_case** with a `domain_action` pattern:
- `context_utilization`, `context_overflow_recovered` for context management events
- `agent_outcome` for agent session events
- `error_recovered` for error recovery events
- `task_outcome_signal`, `task_classified` for session quality signals
- `tool_chain_outcome` for tool execution chain aggregation
- `error_fingerprint` for anonymous error pattern grouping
- `sql_fingerprint` for SQL structural analysis
- `schema_complexity` for warehouse schema metrics

### Adding a New Event

Expand Down
2 changes: 1 addition & 1 deletion packages/opencode/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@
"@ai-sdk/togetherai": "1.0.34",
"@ai-sdk/vercel": "1.0.33",
"@ai-sdk/xai": "2.0.51",
"@altimateai/altimate-core": "0.2.5",
"@altimateai/altimate-core": "0.2.6",
"@altimateai/drivers": "workspace:*",
"@aws-sdk/credential-providers": "3.993.0",
"@clack/prompts": "1.0.0-alpha.1",
Expand Down
14 changes: 14 additions & 0 deletions packages/opencode/src/altimate/native/schema/register.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,20 @@ register("schema.index", async (params: SchemaIndexParams): Promise<SchemaIndexR
duration_ms: Date.now() - startTime,
result_count: result.tables_indexed,
})
// altimate_change start — schema complexity signal from introspection results
Telemetry.track({
type: "schema_complexity",
timestamp: Date.now(),
session_id: Telemetry.getContext().sessionId,
warehouse_type: warehouseType,
table_count_bucket: Telemetry.bucketCount(result.tables_indexed),
column_count_bucket: Telemetry.bucketCount(result.columns_indexed),
schema_count_bucket: Telemetry.bucketCount(result.schemas_indexed),
avg_columns_per_table: result.tables_indexed > 0
? Math.round(result.columns_indexed / result.tables_indexed)
: 0,
})
// altimate_change end
} catch {}
return result
} catch (e) {
Expand Down
208 changes: 208 additions & 0 deletions packages/opencode/src/altimate/telemetry/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,13 @@ export namespace Telemetry {
dbt_model_count_bucket: string
dbt_source_count_bucket: string
dbt_test_count_bucket: string
// altimate_change start — dbt project fingerprint expansion
dbt_snapshot_count_bucket?: string
dbt_seed_count_bucket?: string
/** JSON-encoded Record<string, number> — count per materialization type */
dbt_materialization_dist?: string
dbt_macro_count_bucket?: string
// altimate_change end
connection_sources: string[]
mcp_server_count: number
skill_count: number
Expand Down Expand Up @@ -445,8 +452,209 @@ export namespace Telemetry {
dialect?: string
duration_ms: number
}
// implicit quality signal for task outcome intelligence
| {
type: "task_outcome_signal"
timestamp: number
session_id: string
/** Behavioral signal derived from session outcome patterns */
signal: "accepted" | "error" | "abandoned" | "cancelled"
/** Total tool calls in this loop() invocation */
tool_count: number
/** Number of LLM generation steps in this loop() invocation */
step_count: number
/** Total session wall-clock duration in milliseconds */
duration_ms: number
/** Last tool category the agent used (or "none") */
last_tool_category: string
}
// task intent classification for understanding DE problem distribution
| {
type: "task_classified"
timestamp: number
session_id: string
/** Classified intent category */
intent:
| "debug_dbt"
| "write_sql"
| "optimize_query"
| "build_model"
| "analyze_lineage"
| "explore_schema"
| "migrate_sql"
| "manage_warehouse"
| "finops"
| "general"
/** Keyword match confidence: 1.0 for strong match, 0.5 for weak */
confidence: number
/** Detected warehouse type from fingerprint (or "unknown") */
warehouse_type: string
}
// schema complexity signal — structural metrics from warehouse introspection
| {
type: "schema_complexity"
timestamp: number
session_id: string
warehouse_type: string
/** Bucketed table count */
table_count_bucket: string
/** Bucketed total column count across all tables */
column_count_bucket: string
/** Bucketed schema count */
schema_count_bucket: string
/** Average columns per table (rounded to integer) */
avg_columns_per_table: number
}
// sql structure fingerprint — AST shape without content
| {
type: "sql_fingerprint"
timestamp: number
session_id: string
/** JSON-encoded statement types, e.g. ["SELECT"] */
statement_types: string
/** Broad categories, e.g. ["query"] */
categories: string
/** Number of tables referenced */
table_count: number
/** Number of functions used */
function_count: number
/** Whether the query has subqueries */
has_subqueries: boolean
/** Whether the query uses aggregation */
has_aggregation: boolean
/** Whether the query uses window functions */
has_window_functions: boolean
/** AST node count — proxy for complexity */
node_count: number
}
// error pattern fingerprint — hashed error grouping with recovery data
| {
type: "error_fingerprint"
timestamp: number
session_id: string
/** SHA256 hash of normalized (masked) error message for grouping */
error_hash: string
/** Classification from classifyError() */
error_class: string
/** Tool that produced the error */
tool_name: string
/** Tool category */
tool_category: string
/** Whether a subsequent tool call succeeded (error was recovered) */
recovery_successful: boolean
/** Tool that succeeded after the error (if recovered) */
recovery_tool: string
}
// tool chain effectiveness — aggregated tool sequence + outcome at session end
| {
type: "tool_chain_outcome"
timestamp: number
session_id: string
/** JSON-encoded ordered tool names (capped at 50) */
chain: string
/** Number of tools in the chain */
chain_length: number
/** Whether any tool call errored */
had_errors: boolean
/** Number of errors followed by successful tool calls */
error_recovery_count: number
/** Final session outcome */
final_outcome: string
/** Total session duration in ms */
total_duration_ms: number
/** Total LLM cost */
total_cost: number
}
// altimate_change end

/** SHA256 hash a masked error message for anonymous grouping. */
export function hashError(maskedMessage: string): string {
return createHash("sha256").update(maskedMessage).digest("hex").slice(0, 16)
}

/** Classify user intent from the first message text.
* Pure regex/keyword matcher — zero LLM cost, <1ms. */
export function classifyTaskIntent(
text: string,
): { intent: string; confidence: number } {
const lower = text.slice(0, 2000).toLowerCase()

// Order matters: more specific patterns first
const patterns: Array<{ intent: string; strong: RegExp[]; weak: RegExp[] }> = [
{
intent: "debug_dbt",
strong: [/dbt\s+.*?(error|fail|bug|issue|broken|fix|debug|not\s+work)/],
weak: [/dbt\s+(run|build|test|compile|parse)/, /dbt_project/, /ref\s*\(/, /source\s*\(/],
},
{
intent: "build_model",
strong: [/(?:create|build|write|add|new)\s+.*?(?:dbt\s+)?model/, /(?:create|build)\s+.*?(?:staging|mart|dim|fact)/],
weak: [/\bmodel\b/, /materialization/, /incremental/],
},
{
intent: "optimize_query",
strong: [/optimiz|performance|slow\s+query|speed\s+up|make.*faster|too\s+slow|query\s+cost/],
weak: [/index|partition|cluster|explain\s+plan/],
},
{
intent: "write_sql",
strong: [/(?:write|create|build|generate)\s+(?:a\s+)?(?:sql|query)/, /(?:write|create)\s+(?:a\s+)?(?:select|insert|update|delete)/],
weak: [/\bsql\b/, /\bquery\b/, /\bjoin\b/, /\bwhere\b/],
},
{
intent: "analyze_lineage",
strong: [/lineage|upstream|downstream|dependency|depends\s+on|impact\s+analysis/],
weak: [/dag|graph|flow|trace/],
},
{
intent: "explore_schema",
strong: [/(?:show|list|describe|inspect|explore)\s+.*?(?:schema|tables?|columns?|database)/, /what\s+.*?(?:tables|columns|schemas)/],
weak: [/\bschema\b/, /\btable\b/, /\bcolumn\b/, /introspect/],
},
{
intent: "migrate_sql",
strong: [/migrat|convert.*(?:to|from)\s+.*?(?:snowflake|bigquery|postgres|redshift|databricks)/, /translate.*(?:sql|dialect)/],
weak: [/dialect|transpile|port\s+(?:to|from)/],
},
{
intent: "manage_warehouse",
strong: [/(?:connect|setup|configure|add|test)\s+.*?(?:warehouse|connection|database)/, /warehouse.*(?:config|setting)/],
weak: [/\bwarehouse\b/, /connection\s+string/, /\bcredentials\b/],
},
{
intent: "finops",
strong: [/cost|spend|bill|credits|usage|expensive\s+quer|warehouse\s+size/],
weak: [/resource|utilization|idle/],
},
]

for (const { intent, strong, weak } of patterns) {
if (strong.some((r) => r.test(lower))) return { intent, confidence: 1.0 }
}
for (const { intent, weak } of patterns) {
if (weak.some((r) => r.test(lower))) return { intent, confidence: 0.5 }
}
return { intent: "general", confidence: 1.0 }
}

/** Derive a quality signal from the agent outcome.
* Exported so tests can verify the derivation logic without
* duplicating the implementation. */
export function deriveQualitySignal(
outcome: "completed" | "abandoned" | "aborted" | "error",
): "accepted" | "error" | "abandoned" | "cancelled" {
switch (outcome) {
case "abandoned":
return "abandoned"
case "aborted":
return "cancelled"
case "error":
return "error"
case "completed":
return "accepted"
}
}

// altimate_change start — expanded error classification patterns for better triage
// Order matters: earlier patterns take priority. Use specific phrases, not
// single words, to avoid false positives (e.g., "connection refused" not "connection").
Expand Down
Loading
Loading