fix(core): retain buffered data in WriteGenerator on write failure by TennyZhuang · Pull Request #7400 · apache/opendal

TennyZhuang · 2026-04-17T07:38:25Z

Which issue does this PR close?

N/A - discovered via deterministic testing of error-handling paths.

Rationale

WriteGenerator's write() and close() methods called buffer.take() before attempting to write to the underlying writer. If the write failed, the buffered data was permanently lost, making retry impossible without data loss.

This violated the invariant maintained by other writer implementations (AppendWriter, BlockWriter, MultipartWriter, PositionWriter), which only clear state after a successful write.

Changes

Inexact mode write path: Clone the taken buffer before writing; restore it on failure.
Exact mode write path: Same clone-and-restore pattern.
close() method: Same clone-and-restore pattern in the flush loop.
Added 3 deterministic tests: FailOnceMockWriter that fails the first write call, then succeeds — verifying no data loss in inexact mode, exact mode, and close() paths.

Are there any user-facing changes?

Users who rely on retry (e.g., via RetryLayer) will no longer silently lose data when the underlying storage temporarily fails during a write flush.

This PR was generated with the assistance of AI (Claude).

dentiny

May I ask some dumb questions?

It seems to me that the "issue" reproduces if users call write again after a failed write. In production, why would users do that?
In terms of retry layer (which calls write repeatedly after failure), it seems to sit below the write generator. I'm curious if we could provide a reproducible example to show data loss with retry layer used together?

WriteGenerator's write() and close() methods called buffer.take() before attempting to write to the underlying writer. If the write failed, the buffered data was lost, making retry impossible without data loss. Fix by cloning the taken buffer before writing, and restoring it on failure. This matches the invariant maintained by other writer implementations (AppendWriter, BlockWriter, MultipartWriter, PositionWriter). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… duplication On write failure in inexact mode, only restore the previously buffered data (not the current bs). The caller will retry with the same bs, so including it in the restored buffer would cause data duplication. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…d data

TennyZhuang · 2026-06-05T21:52:48Z

@dentiny Good question. A few clarifications:

RetryWrapper has a limited retry budget. While RetryWrapper does clone the buffer on each internal retry attempt (r.write(bs.clone()).await), once all retries are exhausted, the error propagates back to WriteGenerator. The buffer passed to RetryWrapper is consumed (moved), so WriteGenerator's internal buffer (taken via buffer.take()) is gone.
Application-level retry is common. Even with RetryLayer configured, users commonly add their own retry logic at the application level. For example:
```
for _ in 0..max_retries {
    match writer.write(data).await {
        Ok(()) => break,
        Err(e) if e.is_temporary() => continue,
        Err(e) => return Err(e),
    }
}
```
In this pattern, if write(data) fails during a flush (because RetryWrapper exhausted its retries), the user retries with the same data. But the previously buffered data from an earlier write() call is no longer in the argument — it was already consumed from WriteGenerator's buffer.
Real-world evidence — RisingWave's foyer project (mentioned in the PR thread, foyer: error handling improvement risingwavelabs/risingwave#24673) hit this exact issue. Batch processing systems frequently retry failed write operations at the task level.

I've added a test test_inexact_write_failure_with_retry_style_retry on the branch that demonstrates this scenario step by step.

TennyZhuang requested a review from Xuanwo as a code owner April 17, 2026 07:38

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. releases-note/fix The PR fixes a bug or has a title that begins with "fix" labels Apr 17, 2026

dentiny reviewed Apr 20, 2026

View reviewed changes

hzxa21 mentioned this pull request May 29, 2026

foyer: error handling improvement risingwavelabs/risingwave#24673

Open

TennyZhuang and others added 2 commits June 6, 2026 04:42

TennyZhuang force-pushed the fix/write-generator-data-loss branch from 46bfec9 to 1c10532 Compare June 5, 2026 20:48

test: add test demonstrating RetryLayer-style retry preserves buffere…

3231588

…d data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(core): retain buffered data in WriteGenerator on write failure#7400

fix(core): retain buffered data in WriteGenerator on write failure#7400
TennyZhuang wants to merge 3 commits into
mainfrom
fix/write-generator-data-loss

TennyZhuang commented Apr 17, 2026

Uh oh!

dentiny left a comment

Uh oh!

TennyZhuang commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TennyZhuang commented Apr 17, 2026

Which issue does this PR close?

Rationale

Changes

Are there any user-facing changes?

This PR was generated with the assistance of AI (Claude).

Uh oh!

dentiny left a comment

Choose a reason for hiding this comment

Uh oh!

TennyZhuang commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants