Skip to content

fix: improve log slow/error SQL with args and accurate streaming durations#306

Merged
wesm merged 5 commits into
wesm:mainfrom
jesserobbins:jesse/sql-log-timing-args
May 2, 2026
Merged

fix: improve log slow/error SQL with args and accurate streaming durations#306
wesm merged 5 commits into
wesm:mainfrom
jesserobbins:jesse/sql-log-timing-args

Conversation

@jesserobbins
Copy link
Copy Markdown
Contributor

@jesserobbins jesserobbins commented May 1, 2026

Motivation

(Deliberately not affected by & mergable with #304) This PR enhances #263 by sharpening the SQL logger so slow and failing queries actually tell you what happened. Three things change. First, streaming queries now record their duration at rows.Close instead of when db.Query returns, because the work happens while you read rows, not when you get the handle. Second, slow and errored queries log their bound args, and the statement truncation preserves the WHERE clause, so you can see what ran with what values without the diagnostic part getting cut off. Third, a few small correctness fixes: loggedRows.Close ordering, UTF-8-safe arg truncation, and logStmt inlined where it belongs.

Summary

  • Log streaming-query duration at rows.Close, not at db.Query return
  • Log args on slow/error queries; preserve the WHERE clause when truncating long statements
  • Fix loggedRows.Close ordering, make arg truncation UTF-8-safe, inline logStmt

The logged duration_ms for SELECT statements was the time it took
db.QueryContext to return the *sql.Rows handle — typically
sub-millisecond — not the wall-clock time spent inside rows.Next.
That made every streaming query look instantaneous in the log even
when its scan took hundreds of milliseconds, and made the slow-query
threshold useless for queries that stream large result sets or
materialize correlated subqueries lazily.

Wrap *sql.Rows in *loggedRows. Close emits a single "sql" log
line whose duration_ms reflects total elapsed time since Query
returned. Close is idempotent so the common defer-then-close
pattern only emits one line. Errors at Query are still logged
immediately because no rows handle is returned to scan.

chunkQuerier and the three func(*sql.Rows) callbacks in messages.go,
plus scanMessageRows in api.go and the *sql.Rows declaration in
ListSources, switch to *loggedRows. The wrapper embeds *sql.Rows
so every method outside Close is forwarded transparently.
Two changes that make the SQL log useful when something is wrong:

Args on WARN paths. logStmt and logStmtWith now take []any
instead of int and attach an "args" attr only on slow + error
lines. Without bound parameter values, a slow or failing query
that hits a templated statement is unreproducible — every line
looks identical except for the run-id. Args are not attached on
Info/Debug or --full-trace lines: those run for every successful
query, the args carry PII (subjects, addresses, IDs), and nargs
is sufficient at that volume. formatArgs renders strings with a
64-char cap, raw byte slices as <N bytes>, and other Go values
through %v.

Head+tail truncation. normalizeStmt previously cut at the first
N chars; for the dedup group-fetch query (and other long-tailed
SELECTs with discriminating WHERE clauses), this systematically
hid the part of the statement that distinguishes one logged
query from another. The new form keeps roughly the first 60%
and last 40% of the budget joined by " ... ", so WHERE,
GROUP BY, and ORDER BY clauses survive. Tiny budgets still fall
back to head-only truncation.
…ne logStmt

Three fixes from review job 429:

Check r.closed before calling r.Rows.Close so the embedded
Close is never invoked twice on repeated Close calls.

Cap string args at 64 runes rather than 64 bytes in formatArgs,
avoiding a mid-rune cut on multi-byte UTF-8 (CJK subjects, emoji).

Remove the logStmt passthrough; all call sites now call
logStmtWith directly.
@jesserobbins jesserobbins changed the title store: log slow/error SQL with args and accurate streaming durations fix: correctly log slow/error SQL with args and accurate streaming durations May 1, 2026
@jesserobbins jesserobbins marked this pull request as draft May 1, 2026 22:45
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (20a2bd3)

Summary verdict: Three medium-severity issues remain: sensitive SQL argument logging, missed row iteration errors, and unsafe UTF-8 truncation.

Medium

  • Sensitive bound SQL arguments may be logged

    • Location: internal/store/db_logger.go:343, internal/store/db_logger.go:347
    • Problem: Slow or failing SQL statements log full bound arguments via formatArgs(args). Truncation limits volume, but not sensitivity, so addresses, subjects, body excerpts, tokens, passwords, or other sensitive values could be persisted in logs.
    • Fix: Avoid logging raw argument values by default. Prefer arg count, types, string lengths, hashes, or an allowlist of known-safe parameters. Gate exact argument logging behind an explicit unsafe debug option with redaction.
  • Streaming row iteration errors can be logged as successful queries

    • Location: internal/store/db_logger.go:305
    • Problem: loggedRows.Close logs only the error returned by Rows.Close(). Errors surfaced through rows.Err() after rows.Next() can be missed, including context cancellation or driver iteration failures.
    • Fix: After closing, check r.Rows.Err() and use that error for logStmtWith when the close error is nil.
  • SQL truncation can split UTF-8 runes

    • Location: internal/store/db_logger.go:459-460
    • Problem: normalizeStmt truncates SQL using raw byte slicing (s[:headLen], s[len(s)-tailLen:]). Multi-byte UTF-8 characters can be split, producing invalid UTF-8 or garbage in logs.
    • Fix: Convert to a rune slice before slicing, or use the utf8 package to find safe byte boundaries.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

… stmt truncation

Three fixes addressing roborev mediums on PR wesm#306 (commit 20a2bd3):

- Replace formatArgs with formatArgsShape on slow/error log paths.
  Bound parameters now log as `args_shape=[int64 string(len=12) bytes(len=11) nil]`
  instead of raw values, so subjects, addresses, tokens, and body excerpts
  cannot land in logs. Type and length are enough to diagnose a slow or
  failing query (placeholder count, NULL bind in the wrong slot, an
  unexpectedly huge string) without persisting sensitive content.

- loggedRows.Close now consults Rows.Err() when Close itself returns nil,
  so context cancellation and driver-level scan failures discovered during
  rows.Next() get logged as "sql error" rather than as a successful query.

- normalizeStmt truncates by rune count, not byte count. SQL literals or
  comments containing multi-byte UTF-8 (café, 漢) no longer split mid-rune
  and emit invalid UTF-8 to logs. Fast-path threshold uses
  utf8.RuneCountInString; the head+tail and tiny-budget branches both
  slice through []rune.

Tests:
- TestLogStmt_SlowQueryIncludesArgsShape — asserts type/length present,
  raw values absent, legacy "args" attr absent.
- TestFormatArgsShape_RedactsValues — direct check across string,
  []byte, nil, numeric.
- TestLoggedRows_IterationErrorSurfacedOnClose — cancels context mid-scan
  and asserts "sql error" line is emitted on Close.
- TestNormalizeStmt_UTF8Safe — utf8.ValidString on both head+tail and
  tiny-budget paths.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 1, 2026

roborev: Combined Review (ae7346c)

Net verdict: one medium issue remains; security review found no issues.

Medium

  • internal/store/db_logger.go:302 - Successful streaming queries are logged only when loggedRows.Close is called. Since most callers defer Close, the recorded duration_ms can include unrelated work after row iteration completes, such as count queries or batchPopulate.

    Fix: Add an idempotent finalization path and call it from a loggedRows.Next override when Next returns false; have Close use the same path for early exits.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

…lose

duration_ms for a streaming query previously reflected wall-clock time
between db.Query return and the caller's deferred rows.Close. For the
common pattern

    rows, _ := db.Query(...)
    defer rows.Close()
    for rows.Next() { ... }
    // unrelated work: count queries, batchPopulate, post-processing

the recorded duration absorbed the unrelated post-iteration work, making
slow-query logs misattribute time.

Move the timing finalizer to the end of iteration. loggedRows now
overrides Next as well as Close. Both paths route through an idempotent
finalize() that emits the log line exactly once:

  - Next finalizes when the embedded Rows.Next() returns false
    (normal end-of-scan). The deferred Close still fires; finalize
    short-circuits on the second call.
  - Close finalizes for the early-exit path where the caller breaks
    out of the loop without exhausting rows.

Both paths consult Rows.Err() when no close error is present, so context
cancellation and driver-level scan failures continue to surface as
"sql error" lines.

Tests:
- TestLoggedRows_FinalizesAtEndOfScan — sleeps 50ms after the loop
  and asserts duration_ms is captured before the sleep and not
  re-emitted on Close.
- TestLoggedRows_EarlyExitFinalizesOnClose — breaks after one row
  and asserts the log line lands on Close, exactly once.
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 2, 2026

roborev: Combined Review (5d83641)

No Medium, High, or Critical findings were reported.

Both reviewers found no actionable issues at or above Medium severity. Low-severity observations were omitted per instructions.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@jesserobbins jesserobbins marked this pull request as ready for review May 2, 2026 00:54
@jesserobbins jesserobbins changed the title fix: correctly log slow/error SQL with args and accurate streaming durations fix: improve log slow/error SQL with args and accurate streaming durations May 2, 2026
@wesm
Copy link
Copy Markdown
Owner

wesm commented May 2, 2026

LGTM! Thank you

@wesm wesm merged commit 2c5c3e9 into wesm:main May 2, 2026
6 checks passed
@jesserobbins jesserobbins deleted the jesse/sql-log-timing-args branch May 3, 2026 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants