Smoother completion: Earlier backpressure + possible early completion + fewer warnings by brandur · Pull Request #1292 · riverqueue/river

brandur · 2026-06-20T15:32:24Z

This one's largely aimed at trying to resolve a common problem that we
see during benchmarks, in which the BatchCompleter becomes completely
saturated and starts producing noisy warnings saying that it'll produce
backpressure while waiting for completions:

$ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 300_000
bench: jobs worked [          0 ], inserted [     300000 ], job/sec [        0.0 ] [0s]
Jun 20 07:33:22.258 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000
Jun 20 07:33:23.464 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000
bench: jobs worked [      74578 ], inserted [          0 ], job/sec [    37289.0 ] [2s]
Jun 20 07:33:24.480 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000
bench: jobs worked [      87308 ], inserted [          0 ], job/sec [    43654.0 ] [2s]
Jun 20 07:33:25.873 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000
Jun 20 07:33:27.060 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000
bench: jobs worked [     113346 ], inserted [          0 ], job/sec [    56673.0 ] [2s]
bench: jobs worked [      24768 ], inserted [          0 ], job/sec [    12384.0 ] [2s]
bench: total jobs worked [     300000 ], total jobs inserted [     300000 ], overall job/sec [    46803.8 ], running 6.409735541s

Here, we modify our approach so that instead of waiting until we're at a
backlog of 20k jobs in the completer, we start to apply backpressure at
10k jobs, and reserve the warnings for if we get to the 20k level, which
will be much more difficult to get to. This has the effect of producing
much fewer warnings as we're running a benchmark:

$ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 300_000
bench: jobs worked [          0 ], inserted [     300000 ], job/sec [        0.0 ] [0s]
bench: jobs worked [      79849 ], inserted [          0 ], job/sec [    39924.5 ] [2s]
bench: jobs worked [      96590 ], inserted [          0 ], job/sec [    48295.0 ] [2s]
bench: jobs worked [     105726 ], inserted [          0 ], job/sec [    52863.0 ] [2s]
bench: jobs worked [      17835 ], inserted [          0 ], job/sec [     8917.5 ] [2s]
bench: total jobs worked [     300000 ], total jobs inserted [     300000 ], overall job/sec [    47026.1 ], running 6.379439334s

In my tests, throughput is roughly the same. Before at 20k the completer
was the bottleneck, and it's still the bottleneck. We do put in one
minor optimization in which BatchCompleter will start completing as
soon as it's got a full batch rather than having to wait for a 50 ms
tick. This doesn't have a super significant effect on numbers though.

An argument against this change is that to some degree, we'd be sweeping
the slow completer under the rug. I'm receptive to that, but a couple
things:

It looks bad when the benchmark process regularly produces warnings as
it runs.
I think we can do better than warnings by putting in a more
comprehensive stats layer that tracks specific durations for each
phase of a job's life cycle (time to lock, time to work, time waiting
to complete, time to complete, etc.). This seems like a much better
alternative to the status quo.

brandur · 2026-06-20T15:33:53Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 168f657fb4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-20T15:39:43Z

+	}
+	c.setStateParamsMu.Unlock()
+
+	<-waitChan


Release backlog waiters when a batch fails

When backlog pressure is active and JobSetStateIfRunningMany fails after retries, handleBatch returns before the only close(c.waitOnBacklogChan) path. This newly added wait means the goroutine that enabled pressure has not enqueued its completion yet, so a transient DB error at the 10k threshold leaves it and any later callers blocked on the same channel indefinitely. Please either release the wait channel on error/requeue the batch, or let the initiating completion fall through as before.

Useful? React with 👍 / 👎.

Fixed and w/ regression test.

brandur · 2026-06-20T16:24:05Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d177425483

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-20T16:27:41Z

 		jobRows, err = completeSubBatch(params)
 		if err != nil {
+			c.requeueBatch(ctx, setStateBatch, setStateStartTimes)
 			return err


Avoid requeueing non-retryable completion errors

When JobSetStateIfRunningMany returns context.Canceled or riverdriver.ErrClosedPool while the batch completer is still running, withRetries returns immediately without any backoff because those errors are considered non-retryable. Requeueing the same batch here puts it straight back into setStateParams; for batches at or above the run loop's batchCompleterStartThreshold (100 jobs), Start immediately calls handleBatch again, creating a tight error loop until the service is stopped instead of giving up or backing off.

Useful? React with 👍 / 👎.

handleBatch will now distinguish between retryable and non-retryable (context.Canceled, riverdriver.ErrClosedPool) errors and give up on the batch in the case of the latter.

brandur · 2026-06-20T18:27:26Z

@codex review

brandur · 2026-06-20T19:30:33Z

@codex review

chatgpt-codex-connector · 2026-06-20T19:35:09Z

Codex Review: Didn't find any major issues. 👍

Reviewed commit: 4acc7526c2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… + fewer warnings This one's largely aimed at trying to resolve a common problem that we see during benchmarks, in which the `BatchCompleter` becomes completely saturated and starts producing noisy warnings saying that it'll produce backpressure while waiting for completions: $ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 300_000 bench: jobs worked [ 0 ], inserted [ 300000 ], job/sec [ 0.0 ] [0s] Jun 20 07:33:22.258 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000 Jun 20 07:33:23.464 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000 bench: jobs worked [ 74578 ], inserted [ 0 ], job/sec [ 37289.0 ] [2s] Jun 20 07:33:24.480 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000 bench: jobs worked [ 87308 ], inserted [ 0 ], job/sec [ 43654.0 ] [2s] Jun 20 07:33:25.873 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000 Jun 20 07:33:27.060 WRN jobcompleter.BatchCompleter: Hit maximum backlog; completions will wait until below threshold max_backlog=20000 bench: jobs worked [ 113346 ], inserted [ 0 ], job/sec [ 56673.0 ] [2s] bench: jobs worked [ 24768 ], inserted [ 0 ], job/sec [ 12384.0 ] [2s] bench: total jobs worked [ 300000 ], total jobs inserted [ 300000 ], overall job/sec [ 46803.8 ], running 6.409735541s Here, we modify our approach so that instead of waiting until we're at a backlog of 20k jobs in the completer, we start to apply backpressure at 10k jobs, and reserve the warnings for if we get to the 20k level, which will be much more difficult to get to. This has the effect of producing much fewer warnings as we're running a benchmark: $ go run ./cmd/river bench --database-url $DATABASE_URL --num-total-jobs 300_000 bench: jobs worked [ 0 ], inserted [ 300000 ], job/sec [ 0.0 ] [0s] bench: jobs worked [ 79849 ], inserted [ 0 ], job/sec [ 39924.5 ] [2s] bench: jobs worked [ 96590 ], inserted [ 0 ], job/sec [ 48295.0 ] [2s] bench: jobs worked [ 105726 ], inserted [ 0 ], job/sec [ 52863.0 ] [2s] bench: jobs worked [ 17835 ], inserted [ 0 ], job/sec [ 8917.5 ] [2s] bench: total jobs worked [ 300000 ], total jobs inserted [ 300000 ], overall job/sec [ 47026.1 ], running 6.379439334s In my tests, throughput is roughly the same. Before at 20k the completer was the bottleneck, and it's still the bottleneck. We do put in one minor optimization in which `BatchCompleter` will start completing as soon as it's got a full batch rather than having to wait for a 50 ms tick. This doesn't have a super significant effect on numbers though. An argument against this change is that to some degree, we'd be sweeping the slow completer under the rug. I'm receptive to that, but a couple things: * It looks bad when the benchmark process regularly produces warnings as it runs. * I think we can do better than warnings by putting in a more comprehensive stats layer that tracks specific durations for each phase of a job's life cycle (time to lock, time to work, time waiting to complete, time to complete, etc.). This seems like a much better alternative to the status quo.

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

brandur force-pushed the brandur-smoother-completion branch from 168f657 to d177425 Compare June 20, 2026 16:07

chatgpt-codex-connector Bot reviewed Jun 20, 2026

View reviewed changes

brandur force-pushed the brandur-smoother-completion branch from d177425 to fb7ef2c Compare June 20, 2026 18:26

brandur force-pushed the brandur-smoother-completion branch from fb7ef2c to 4acc752 Compare June 20, 2026 19:17

brandur requested a review from bgentry June 20, 2026 19:57

brandur force-pushed the brandur-smoother-completion branch from 4acc752 to 2bb559c Compare June 20, 2026 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smoother completion: Earlier backpressure + possible early completion + fewer warnings#1292

Smoother completion: Earlier backpressure + possible early completion + fewer warnings#1292
brandur wants to merge 1 commit into
masterfrom
brandur-smoother-completion

brandur commented Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Uh oh!

brandur Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Uh oh!

brandur Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brandur commented Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandur Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandur Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

brandur commented Jun 20, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant