Skip to content

fix(storage): flush before acking in local write_text_if_absent#204

Merged
aaltshuler merged 1 commit into
mainfrom
fix/local-write-if-absent-flush
Jun 12, 2026
Merged

fix(storage): flush before acking in local write_text_if_absent#204
aaltshuler merged 1 commit into
mainfrom
fix/local-write-if-absent-flush

Conversation

@aaltshuler

Copy link
Copy Markdown
Collaborator

The "flaky" cluster test that failed CI twice (#199's first run, #202) was a real bug, not a flake.

Root cause: LocalStorageAdapter::write_text_if_absent (the create-only CAS used for the cluster's first state-write, the lock, and sidecars since #190) writes through tokio's async File with write_all but never flushes. tokio files buffer internally — the OS write happens in a background task after the function returns — so Ok(true) could be acknowledged with the file created but empty, and an immediate reader saw EOF while parsing. Invariant 6 (durable before acknowledgement) violated on exactly one path: the other local writes use tokio::fs::write, which flushes internally.

Fix: flush().await before Ok(true), with the same remove-on-failure cleanup. One real line.

Test: best-effort tight-loop regression (write_text_if_absent_is_read_consistent_immediately) — the window is timing-dependent and reproduces reliably only on loaded runners, so the two recorded CI failures are the red half of the red→green pair; the commit message says so explicitly rather than pretending local determinism.

Full workspace gate green (57 suites). Unblocks #202 (the version bump), whose only failure was this.

🤖 Generated with Claude Code

tokio's async File buffers writes internally: write_all only fills the
buffer, and the actual OS write happens in a background task after drop —
so write_text_if_absent could return Ok(true) with the file created but
still EMPTY, and an immediate reader saw EOF. Caught twice in CI as
'EOF while parsing a value' reading state.json right after cluster import
(the cluster's first state-write routes here since the storage port);
also an invariant-6 violation (acknowledged before the write reached the
OS). The other local write paths use tokio::fs::write, which flushes
internally — this was the one miss.

Fix: flush().await before Ok, with the same remove-on-failure cleanup as
the write itself. Regression test is a best-effort tight loop (the window
is timing-dependent; the two CI failures are the recorded red) asserting
read-after-ack never sees a short file.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aaltshuler has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@aaltshuler aaltshuler merged commit 13ceab3 into main Jun 12, 2026
8 checks passed
@aaltshuler aaltshuler deleted the fix/local-write-if-absent-flush branch June 12, 2026 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant