Background
Sub-issue of #3011.
docker-agent can run multiple concurrent instances — CLI sessions (cmd/root/chat.go, run.go), gateway workers (pkg/gateway/, pkg/chatserver/), cron / API agents — all sharing the same SQLite memory database in pkg/memory/database/sqlite/. The current write path opens a transaction but does not guard against concurrent writers at the OS level or ensure readers always see a fully committed state.
Two failure modes:
- Torn reads: a reader sees a partially-written row during a slow write.
- Lost updates: two concurrent writers both read generation N, both decide to write, and one silently overwrites the other's changes without the drift guard (#TBD-C, drift detection) having a chance to fire (because both read the same generation before either committed).
The fix is a two-layer approach: SQLite WAL + busy timeout for in-process safety, plus a fcntl/LockFileEx advisory lock for cross-process serialisation of read-modify-write cycles.
Proposed design
1. SQLite WAL mode + busy timeout
Enable Write-Ahead Logging and a generous busy timeout on every connection opened to the memory DB in pkg/memory/database/sqlite/:
db.Exec("PRAGMA journal_mode=WAL")
db.Exec("PRAGMA busy_timeout=5000") // 5 s
WAL allows concurrent readers and a single writer without blocking. The busy timeout prevents immediate SQLITE_BUSY errors when a writer is active.
2. Advisory file lock for multi-process write serialisation
For the write paths that require read-modify-write atomicity (add, update, delete with drift-check), acquire an exclusive advisory lock on a companion .lock file before the read-generation / write cycle:
// pkg/memory/database/lock.go
type FileLock struct { … }
func (l *FileLock) Lock() error { … } // fcntl F_SETLKW on Linux/macOS; LockFileEx on Windows
func (l *FileLock) Unlock() error { … }
Lock path: <data_dir>/memory.lock.
The lock file is never deleted (avoids TOCTOU); its existence is benign.
3. Atomic snapshot export (for drift backups)
When the drift-detection guard (sibling sub-issue C) exports a .bak file, it must write to a temp file in the same directory and rename it into place atomically. Reuse pkg/atomicfile/ if its API suits, otherwise:
tmp, _ := os.CreateTemp(dir, ".mem_backup_*.json.tmp")
// … write JSON …
tmp.Sync()
tmp.Close()
os.Rename(tmp.Name(), finalPath) // atomic on POSIX; best-effort on Windows
This prevents a concurrent reader from seeing a half-written backup.
4. Connection pool limits
Limit the SQLite connection pool to 1 writer connection and allow multiple reader connections. This is enforced by using database/sql with db.SetMaxOpenConns(1) on the write connection and a separate read pool.
5. Cross-platform support
- Linux / macOS:
fcntl(2) F_SETLKW (blocking exclusive lock).
- Windows:
LockFileEx with LOCKFILE_EXCLUSIVE_LOCK.
- Fallback (neither available): proceed without OS-level lock but log a warning; SQLite WAL + busy timeout still provide best-effort safety.
The repo already uses build-tagged lock files in pkg/cache/ (lock_unix.go, lock_windows.go, lock_js.go) — follow the same pattern.
Implementation checklist
Acceptance criteria
Background
Sub-issue of #3011.
docker-agent can run multiple concurrent instances — CLI sessions (
cmd/root/chat.go,run.go), gateway workers (pkg/gateway/,pkg/chatserver/), cron / API agents — all sharing the same SQLite memory database inpkg/memory/database/sqlite/. The current write path opens a transaction but does not guard against concurrent writers at the OS level or ensure readers always see a fully committed state.Two failure modes:
The fix is a two-layer approach: SQLite WAL + busy timeout for in-process safety, plus a
fcntl/LockFileExadvisory lock for cross-process serialisation of read-modify-write cycles.Proposed design
1. SQLite WAL mode + busy timeout
Enable Write-Ahead Logging and a generous busy timeout on every connection opened to the memory DB in
pkg/memory/database/sqlite/:WAL allows concurrent readers and a single writer without blocking. The busy timeout prevents immediate
SQLITE_BUSYerrors when a writer is active.2. Advisory file lock for multi-process write serialisation
For the write paths that require read-modify-write atomicity (add, update, delete with drift-check), acquire an exclusive advisory lock on a companion
.lockfile before the read-generation / write cycle:Lock path:
<data_dir>/memory.lock.The lock file is never deleted (avoids TOCTOU); its existence is benign.
3. Atomic snapshot export (for drift backups)
When the drift-detection guard (sibling sub-issue C) exports a
.bakfile, it must write to a temp file in the same directory and rename it into place atomically. Reusepkg/atomicfile/if its API suits, otherwise:This prevents a concurrent reader from seeing a half-written backup.
4. Connection pool limits
Limit the SQLite connection pool to 1 writer connection and allow multiple reader connections. This is enforced by using
database/sqlwithdb.SetMaxOpenConns(1)on the write connection and a separate read pool.5. Cross-platform support
fcntl(2)F_SETLKW (blocking exclusive lock).LockFileExwithLOCKFILE_EXCLUSIVE_LOCK.The repo already uses build-tagged lock files in
pkg/cache/(lock_unix.go,lock_windows.go,lock_js.go) — follow the same pattern.Implementation checklist
pkg/memory/database/sqlite/db.go— setPRAGMA journal_mode=WALandPRAGMA busy_timeout=5000on connection openpkg/memory/database/lock_unix.go/lock_windows.go/lock_js.go—FileLockwithLock()/Unlock(); cross-platform (fcntl/LockFileEx/ no-op fallback)pkg/tools/builtin/memory/— acquireFileLockbefore the read-generation → drift-check → write cycle inadd_memory,update_memory,delete_memory; release indeferpkg/memory/database/backup.go— atomic temp-file + rename for snapshot export (used by sub-issue C drift guard) — consider reusingpkg/atomicfile/db.SetMaxOpenConns(1)on the writer connection; separate read poolgo test -racepassesLockFileExpath compiles and passes basic lock/unlock round-trip testAcceptance criteria
busy_timeout=5000prevents immediateSQLITE_BUSYerrors under normal write contention.lockfile approach serialises cross-process writers on Linux/macOS and Windows.bakfile visible to readersgo test -racepasses onpkg/memory/database/andpkg/tools/builtin/memory/