store: Don't hold write lock while receiving LTX#450
store: Don't hold write lock while receiving LTX#450saleemrashid wants to merge 3 commits intosaleem/incremental-chksumfrom
Conversation
Holding the recovery lock for too long results in "locking protocol" errors from SQLite. This happens when receiving the LTX takes too long (e.g. the LTX is large or the connection to the primary is slow). Before we have the LTX, we only verify whether the database position matches the LTX pre-apply position. We can instead keep this as a preflight check (without the write lock), then verify again once we've received the LTX and acquired the write lock.
| // unless this is a snapshot, which will overwrite all data. | ||
| // unless this is a snapshot, which will overwrite all data. This is a preflight | ||
| // check so we can skip streaming an LTX we can't apply, but we'll need to verify | ||
| // again after we acquire the write lock. |
There was a problem hiding this comment.
There's a behavior change here now we're not acquiring the write lock: if a transaction is being applied concurrently, we might now observe the previous db.Pos() and return a position mismatch error here. In contrast, the existing code is serialized by the write lock, and would observe the post-apply db.Pos() of any transaction being applied concurrently.
I don't think this is an issue, but it is a behavior change, so I wanted to call it out.
b8dd162 to
ebc2c91
Compare
…mFrame
Moving the halt lock check after receiving the LTX increases the probability of an existing deadlock:
- AcquireRemoteHaltLock stores remoteHaltLock and calls WaitPosExact, i.e. it waits for processLTXStreamFrame to
process pending LTXs
- processLTXStreamFrame sees remoteHaltLock != nil (this is more likely now, since it receives the LTX first)
- It incorrectly assumes the remote lock is stale and tries to clear it, since we don't expect new LTXs from
the primary while we hold the halt lock
- UnsetRemoteHaltLock deadlocks in Recover, because processLTXStreamFrame already called AcquireWriteLock
- WaitPosExact deadlocks, because it's waiting for processLTXStreamFrame, which is deadlocked
Instead of trying to clear the remote lock locally, processLTXStreamFrame will perform recovery so the LTX can
be applied. If the remote lock was stale, we'll now only find out when committing a transaction, as the
primary will reject the remote commit. Deferring the error is an acceptable trade-off, as the halt lock is
intended to be short-lived.
| // We'll perform recovery so the LTX can be applied, but not clear the remote lock. | ||
| if haltLock := db.RemoteHaltLock(); haltLock != nil { | ||
| TraceLog.Printf("[ProcessLTXStreamFrame.Recover(%s)]: replica holds HALT lock but received LTX file, performing recovery", db.Name()) | ||
| if err := db.recover(ctx); err != nil { |
There was a problem hiding this comment.
@benbjohnson Since we haven't called UnsetRemoteHaltLock yet, which would have performed recovery, we have to perform recovery here to apply the LTX, right? And that should be safe whether the remote lock is stale, or if AcquireRemoteHaltLock is stuck in WaitPosExact (it'll just be a no-op, because we haven't done any writes yet)
Also: we probably shouldn't set remoteHaltLock until after WaitPosExact returns, since Writeable will return true prematurely (before we've caught up to the primary). But that would only happen if the user started writing before AcquireRemoteHaltLock returned success, and it'd result in the primary rejecting the remote commit. So I think it's fine to keep it as is (so we're not changing around too much).
0c601aa to
0982090
Compare
When "PRAGMA journal_mode = WAL" is used on the replica before the database file exists, SQLite will try to create the database file to update the header, which fails with "attempt to write a readonly database".
0982090 to
68dd622
Compare
|
I almost wonder if we should make the tests use an artificially slow connection between the primary and replica to better catch race conditions |
Holding the recovery lock for too long results in "locking protocol" errors from SQLite. This happens when receiving the LTX takes too long (e.g. the LTX is large or the connection to the primary is slow).
Before we have the LTX, we only verify whether the database position matches the LTX pre-apply position. We can instead keep this as a preflight check (without the write lock), then verify again once we've received the LTX and acquired the write lock.