Add `Config.HardStopTimeout` to perform a "hard stop" setting jobs errored by brandur · Pull Request #1289 · riverqueue/river

brandur · 2026-06-19T15:07:07Z

Here, add a new Config.HardStopTimeout on top of the existing
SoftStopTimeout whose job it is to recover badly behaving job as much
as possible before coming to a full stop. Currently, if a client is
stopping and is running jobs that don't respond to context cancellation,
those jobs end up getting left in a running state, which means that
they won't be recoverable again until they're rescued an hour later.

HardStopTimeout engages after soft stop, and has each producer perform
a "hard stop", which means to have it set any jobs still running to an
error state. Because they're errored, they'll get to run immediately the
next time a client starts up.

Ideally, users don't need to depend on this functionality since the
"correct" behavior would be to make sure that all jobs are able to
respond to context cancellation, so we make this new feature optional.

brandur · 2026-06-19T17:43:33Z

+
+		var setStateParams *riverdriver.JobSetStateIfRunningParams
+		if job.Attempt >= job.MaxAttempts {
+			setStateParams = riverdriver.JobSetStateDiscarded(job.ID, now, errData, nil)


This mirrors existing behavior where a soft stop will set an error and potentially send the job to discarded, but looking at this again, this existing behavior does seem potentially wrong.

…rored Here, add a new `Config.HardStopTimeout` on top of the existing `SoftStopTimeout` whose job it is to recover badly behaving job as much as possible before coming to a full stop. Currently, if a client is stopping and is running jobs that don't respond to context cancellation, those jobs end up getting left in a `running` state, which means that they won't be recoverable again until they're rescued an hour later. `HardStopTimeout` engages after soft stop, and has each producer perform a "hard stop", which means to have it set any jobs still running to an error state. Because they're errored, they'll get to run immediately the next time a client starts up. Ideally, users don't need to depend on this functionality since the "correct" behavior would be to make sure that all jobs are able to respond to context cancellation, so we make this new feature optional.

While working on #1289, I realized that jobs which are "soft stopped" via context cancellation are still prone to the same side effects as if they errored in any other way: * Their number of attempts is incremented. * They may be discarded if reaching max attempts. * They'll have to wait to be retried according to retry policy. This doesn't really seem right because these jobs didn't actually misbehave in any way, but were rather just slow-to-run jobs that couldn't finish cleanly inside the default stop allowance while a client was restarting or being deployed. The proper behavior should probably be more like a snooze. i.e. The soft timeout cancellation doesn't count and the jobs get a chance to be retried immediately. Here, make that change.

brandur marked this pull request as draft June 19, 2026 15:08

brandur force-pushed the brandur-hard-stop-timeout branch 2 times, most recently from 1234376 to c5151af Compare June 19, 2026 15:16

brandur commented Jun 19, 2026

View reviewed changes

brandur force-pushed the brandur-hard-stop-timeout branch from c5151af to 7342765 Compare June 19, 2026 17:44

brandur mentioned this pull request Jun 19, 2026

Make jobs cancelled due to a soft stop immediately available #1290

Open

brandur marked this pull request as ready for review June 20, 2026 03:46

brandur requested a review from bgentry June 20, 2026 03:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Config.HardStopTimeout` to perform a "hard stop" setting jobs errored#1289

Add `Config.HardStopTimeout` to perform a "hard stop" setting jobs errored#1289
brandur wants to merge 1 commit into
masterfrom
brandur-hard-stop-timeout

brandur commented Jun 19, 2026

Uh oh!

brandur Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brandur commented Jun 19, 2026

Uh oh!

brandur Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant