Skip to content

LLM-based optimizations for BatchCompleter#1293

Open
brandur wants to merge 1 commit into
brandur-smoother-completionfrom
brandur-completion-optimizations
Open

LLM-based optimizations for BatchCompleter#1293
brandur wants to merge 1 commit into
brandur-smoother-completionfrom
brandur-completion-optimizations

Conversation

@brandur

@brandur brandur commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

This one's just throwing Codex at BatchCompleter to see what
optimizations it can find in there as this seems to be by far our
slowest spot in River with respect to benchmarking at least.

It didn't do anything huge, but ended up putting in a variety of minor
optimizations:

  • Removed the separate setStateStartTimes map by storing StartTime
    in batchCompleterSetState.

  • Changed queued state storage from pointer values to map values,
    removing a wrapper allocation.

  • Collapsed backlog checking + enqueue into one lock acquisition on the
    common path.

  • Replaced sliceutil.Map with a direct event-building loop.

  • Use one batch completion timestamp instead of calling Now() once per
    completed row.

  • Avoid repeatedly assigning params.Schema inside the batch mapping
    loop.

Benchmarks seem to indicate a previous 12-13 us/op coming down to 10-11
us/op, and with one fewer allocation per op.

I'm seeing a definite speedup when I run the full benchmark. Our bench
has some consistency problems (can show considerably different results
per run), but whereas before ~46k jobs/sec was about the best I ever saw
on my commodity MBP here, I'm now seeing up to ~56k jobs/sec, so at best
a 10k jobs/sec improvement:

$ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 1_000_000
bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [     106472 ], inserted [          0 ], job/sec [    53236.0 ] [2s]
bench: jobs worked [     108440 ], inserted [          0 ], job/sec [    54220.0 ] [2s]
bench: jobs worked [     114035 ], inserted [          0 ], job/sec [    57017.5 ] [2s]
bench: jobs worked [     107402 ], inserted [          0 ], job/sec [    53701.0 ] [2s]
bench: jobs worked [     114433 ], inserted [          0 ], job/sec [    57216.5 ] [2s]
bench: jobs worked [     105701 ], inserted [          0 ], job/sec [    52850.5 ] [2s]
bench: jobs worked [     116051 ], inserted [          0 ], job/sec [    58025.5 ] [2s]
bench: jobs worked [     108054 ], inserted [          0 ], job/sec [    54027.0 ] [2s]
bench: jobs worked [     119412 ], inserted [          0 ], job/sec [    59706.0 ] [2s]
bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    55710.8 ], running 17.949838958s

The number should be even better on faster computers.

Nothing in the code gets any worse (and I think some of it is actually
an improvement?) so I think it's probably worthwhile to bring these in.

@brandur brandur force-pushed the brandur-smoother-completion branch from 168f657 to d177425 Compare June 20, 2026 16:07
@brandur brandur force-pushed the brandur-completion-optimizations branch 2 times, most recently from 1e5df34 to 4a982be Compare June 20, 2026 16:19
@brandur brandur force-pushed the brandur-smoother-completion branch 2 times, most recently from fb7ef2c to 4acc752 Compare June 20, 2026 19:17
@brandur brandur force-pushed the brandur-completion-optimizations branch from 4a982be to d165299 Compare June 20, 2026 19:48
@brandur brandur requested a review from bgentry June 20, 2026 19:58
@brandur brandur force-pushed the brandur-smoother-completion branch from 4acc752 to 2bb559c Compare June 20, 2026 20:01
This one's just throwing Codex at `BatchCompleter` to see what
optimizations it can find in there as this seems to be by far our
slowest spot in River with respect to benchmarking at least.

It didn't do anything huge, but ended up putting in a variety of minor
optimizations:

- Removed the separate `setStateStartTimes` map by storing `StartTime`
  in `batchCompleterSetState`.

- Changed queued state storage from pointer values to map values,
  removing a wrapper allocation.

- Collapsed backlog checking + enqueue into one lock acquisition on the
  common path.

- Replaced `sliceutil.Map` with a direct event-building loop.

- Use one batch completion timestamp instead of calling `Now()` once per
  completed row.

- Avoid repeatedly assigning `params.Schema` inside the batch mapping
  loop.

Benchmarks seem to indicate a previous 12-13 us/op coming down to 10-11
us/op, and with one fewer allocation per op.

I'm seeing a definite speedup when I run the full benchmark. Our bench
has some consistency problems (can show considerably different results
per run), but whereas before ~46k jobs/sec was about the best I ever saw
on my commodity MBP here, I'm now seeing up to ~56k jobs/sec, so at best
a 10k jobs/sec improvement:

    $ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 1_000_000
    bench: jobs worked [          0 ], inserted [    1000000 ], job/sec [        0.0 ] [0s]
    bench: jobs worked [     106472 ], inserted [          0 ], job/sec [    53236.0 ] [2s]
    bench: jobs worked [     108440 ], inserted [          0 ], job/sec [    54220.0 ] [2s]
    bench: jobs worked [     114035 ], inserted [          0 ], job/sec [    57017.5 ] [2s]
    bench: jobs worked [     107402 ], inserted [          0 ], job/sec [    53701.0 ] [2s]
    bench: jobs worked [     114433 ], inserted [          0 ], job/sec [    57216.5 ] [2s]
    bench: jobs worked [     105701 ], inserted [          0 ], job/sec [    52850.5 ] [2s]
    bench: jobs worked [     116051 ], inserted [          0 ], job/sec [    58025.5 ] [2s]
    bench: jobs worked [     108054 ], inserted [          0 ], job/sec [    54027.0 ] [2s]
    bench: jobs worked [     119412 ], inserted [          0 ], job/sec [    59706.0 ] [2s]
    bench: total jobs worked [    1000000 ], total jobs inserted [    1000000 ], overall job/sec [    55710.8 ], running 17.949838958s

The number should be even better on faster computers.

Nothing in the code gets any worse (and I think some of it is actually
an improvement?) so I think it's probably worthwhile to bring these in.
@brandur brandur force-pushed the brandur-completion-optimizations branch from d165299 to 9b268fa Compare June 20, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant