Skip to content

make foyer io_uring engine opt-in (default psync)#64

Merged
jaredLunde merged 1 commit into
mainfrom
foyer-io-uring-opt-in
May 30, 2026
Merged

make foyer io_uring engine opt-in (default psync)#64
jaredLunde merged 1 commit into
mainfrom
foyer-io-uring-opt-in

Conversation

@jaredLunde
Copy link
Copy Markdown
Contributor

Problem

glidefs was burning 200% CPU on the homelab node — two foyer-uring-0 threads pinned at 99% each — while doing zero disk I/O. Kernel stacks were empty and wchan=0, i.e. the threads were spinning in userspace, not blocked.

Root cause: foyer-storage 0.22's io_uring engine (UringIoEngineShard::run) busy-polls its completion ring with no idle backoff. When both submission channels are empty and nothing is inflight, every loop iteration is try_recv (empty) → skip submit → drain empty completion queue → loop. Each engine thread pins a full core whenever its cache is idle, not just under load. Two foyer HybridCache instances (clean cache + pack-index cache) each spawn one such thread → 200%.

This was a regression from #62 (use foyer io_uring I/O engine on Linux), which measured throughput on cached reads and missed the idle-CPU cost. There's no config knob — selection was automatic.

The same engine also has an unfixed upstream use-after-free (foyer-rs/foyer#1286), present in the latest release (0.22.3), so falling back to psync sidesteps that too.

Fix

Default back to foyer's psync engine (parks when idle → 0% CPU) and gate io_uring behind a new [cache] io_uring TOML knob (default false). The ~25% latency / ~37% throughput win from io_uring only materializes under sustained cached-read pressure, so it can be enabled per-node when the cache is hot enough to justify the spinning cores.

  • config: add io_uring: Option<bool> + use_io_uring() helper (default false)
  • foyer_engine: build_preferring_uring takes a prefer_uring flag
  • thread the flag through FoyerCacheConfig + PackIndexCache::open_with_sizes
  • ephemeral CLI/registry caches (push, bless, oci) hardcode psync

Verification

Built with --features ublk and deployed to the homelab node via live handoff (zero downtime, 60 ublk exports recovered). Post-deploy: zero foyer-uring threads, instantaneous CPU 0.0% while idle, health/ready=200, exports intact.

🤖 Generated with Claude Code

foyer-storage 0.22's io_uring engine (UringIoEngineShard::run) busy-polls
its completion ring with no idle backoff — each engine thread pins a full
core whenever the cache is idle, not just under load. On a homelab node
this showed up as glidefs burning 200% CPU (two foyer-uring-0 threads at
99%) while doing zero disk I/O. It also carries an unfixed upstream
use-after-free (foyer-rs/foyer#1286).

Switch the default back to foyer's psync engine (parks when idle, 0% CPU)
and gate io_uring behind a new `[cache] io_uring` TOML knob (default
false). The ~25% latency / ~37% throughput win from io_uring only
materializes under sustained cached-read pressure, so enable it per-node
when the cache is hot enough to justify the spinning cores.

- config: add `io_uring: Option<bool>` + `use_io_uring()` (default false)
- foyer_engine: `build_preferring_uring` takes a `prefer_uring` flag
- thread the flag through FoyerCacheConfig + PackIndexCache::open_with_sizes
- ephemeral CLI/registry caches (push, bless, oci) hardcode psync

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jaredLunde jaredLunde changed the title perf(cache): make foyer io_uring engine opt-in (default psync) make foyer io_uring engine opt-in (default psync) May 30, 2026
@jaredLunde jaredLunde merged commit eb4e99e into main May 30, 2026
24 checks passed
@jaredLunde jaredLunde deleted the foyer-io-uring-opt-in branch May 30, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant