Skip to content

feat: Add async_backtrace instrumentation to store, fetch, and push#628

Merged
evanh merged 2 commits intomainfrom
evanh/feat/async-backtrace
May 8, 2026
Merged

feat: Add async_backtrace instrumentation to store, fetch, and push#628
evanh merged 2 commits intomainfrom
evanh/feat/async-backtrace

Conversation

@evanh
Copy link
Copy Markdown
Member

@evanh evanh commented May 8, 2026

Wire up the async-backtrace crate so we can introspect the live async call graph when the broker hangs or behaves unexpectedly. The upkeep loop logs the current taskdump_tree snapshot at debug! every 30 seconds, gated on a new config flag.

Where #[framed] is applied

All async fns in src/store/adapters/postgres.rs (module-level pool helpers, PostgresActivationStore constructors, and every InflightActivationStore trait method), plus the async fns in src/push/mod.rs (WorkerClient::send, PushPool::start, PushPool::submit, push_task) and src/fetch/mod.rs (TaskPusher::submit_task, FetchPool::start).

Spawned workers wrapped with frame!()

PushPool::start and FetchPool::start spawn their actual loops via crate::tokio::spawn_pool. Each spawned future is a separate task root, so #[framed] on the parent does not propagate. The inner async move { ... } bodies are now wrapped with async_backtrace::frame!(...) so fetch and push workers show up as roots in the dump tree, with in-flight postgres calls hanging off them.

New config flag: log_async_backtrace

Config::log_async_backtrace: bool (default false) controls the periodic dump. When enabled, upkeep() emits debug!(backtrace = %tree, "async backtrace dump") with async_backtrace::taskdump_tree(false) if 30s has elapsed since the last dump. false for taskdump_tree means non-blocking — frames currently being polled are skipped rather than waited on. Off by default since the tree can be large and noisy; operators opt in when diagnosing a hang.

Wire up the async-backtrace crate so we can introspect the live async
call graph when the broker hangs or misbehaves. All async fns in the
postgres store and the fetch/push pipelines are decorated with
#[framed], the spawned worker/fetch loops are wrapped with frame!()
(they're separate task roots that #[framed] on the parent does not
reach), and the upkeep loop dumps taskdump_tree(false) at debug! every
30 seconds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@evanh evanh marked this pull request as ready for review May 8, 2026 20:23
@evanh evanh requested a review from a team as a code owner May 8, 2026 20:23
Add `log_async_backtrace: bool` to Config (default false) and gate the
periodic taskdump_tree emission in the upkeep loop on it. The dump can
be expensive and noisy, so operators opt in when they need it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@evanh evanh merged commit 07cc2a5 into main May 8, 2026
32 of 33 checks passed
@evanh evanh deleted the evanh/feat/async-backtrace branch May 8, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants