feat: add LanceDB version snapshot auto-cleanup by njuboy11 · Pull Request #793 · CortexReach/memory-lancedb-pro

njuboy11 · 2026-05-10T07:27:44Z

Summary

LanceDB stores a new version snapshot on every write operation but never cleans them up. After a few days the database directory grows to 300+ MB despite only ~5-30 MB of actual data.

This PR adds a storageMaintenance.autoCleanup config block that periodically calls Table.optimize() to remove old version snapshots.

Config

{
  "storageMaintenance": {
    "autoCleanup": {
      "enabled": true,
      "intervalHours": 24,
      "retentionDays": 1
    }
  }
}

Changes

Add storageMaintenance.autoCleanup to plugin config schema (enabled, intervalHours, retentionDays)
Add MemoryStore.runMaintenance(retentionDays) → calls table.optimize({ cleanupOlderThan })
Add non-blocking setInterval timer in registerService.start (fires 5 min after startup, then every intervalHours)
Latest version is NEVER deleted by LanceDB

Closes #792

- Add storageMaintenance.autoCleanup config: - enabled: bool (default false) - intervalHours: int (default 24) - retentionDays: int (default 1) - Add MemoryStore.runMaintenance() to call table.optimize() - Add non-blocking setInterval timer for periodic cleanup - Add JSON schema for new config in openclaw.plugin.json - Closes CortexReach#792

TurboTheTurtle · 2026-05-11T08:32:48Z

Review findings:

P1: maintenanceTimer is scoped inside start() but used in stop()

maintenanceTimer is declared inside the start callback, but stop is a sibling callback and references that variable later (index.ts:4176, index.ts:4210). That variable is out of scope for stop, so this either fails type checking or throws when the service stops. Please hoist maintenanceTimer to the same scope as backupTimer so start and stop share the same timer handle.

P1: accidental /tmp/rerank.log write leaks recall queries

The auto-recall path now does a synchronous append to /tmp/rerank.log with the recall query text (index.ts:2596). That leaks user memory queries to a fixed temp file and adds sync filesystem I/O on a hot recall path. This looks like debug instrumentation and should be removed before merge.

P1: lock ENOENT falls back to lock-free writes

runWithFileLock() now catches ENOENT from proper-lockfile.lock() and proceeds with the write without acquiring any cross-process lock (src/store.ts:242-272). That breaks the serialization guarantee around LanceDB writes, and it is unrelated to the storage-cleanup feature. If lock acquisition fails, the write should fail or retry under the lock rather than entering the critical section lock-free.

rwmjhb

PR #793 Review: feat: add LanceDB version snapshot auto-cleanup

Verdict: RESOLVE-CONFLICTS-FIRST | Author: njuboy11 | Merge state: DIRTY

Pipeline short-circuited at the conflict gate after R0 verification. Deep review deferred until the branch rebases cleanly onto the base.

Problem Statement (R1)

LanceDB creates version snapshots on writes and the plugin does not currently clean them up, so active installations can see the database directory grow far beyond the live data size over days of use. The PR attempts to add configurable periodic cleanup via LanceDB table optimization.

Why This Stopped Here

GitHub reports mergeable=CONFLICTING (merge_state_status=DIRTY) for this PR. Reviewing the diff now would:

Give feedback against a branch the author must rewrite anyway
Produce findings that may be invalidated by the conflict resolution
Waste review cycles on code that cannot be merged as-is

Recommended Action

Author should:

Rebase this branch onto the latest base (or merge the base into this branch)
Resolve all merge conflicts
Push the rebased branch — the re-review will be picked up automatically

Reviewed at 2026-05-17T08:54:39Z | R0+R1 gate | Conflict gate

rwmjhb

Requesting changes. This PR has several P1 issues that need to be fixed before merge, independent of the current merge conflict.

maintenanceTimer appears to be scoped inside start() but referenced from sibling stop(). Please hoist it to shared plugin scope so both lifecycle callbacks use the same handle.
The auto-recall path writes recall query text to /tmp/rerank.log. That looks like debug instrumentation, leaks user memory queries to a fixed temp path, and adds sync I/O on a hot path. Please remove it.
runWithFileLock() appears to catch ENOENT from lock acquisition and continue into the write path without a cross-process lock. That weakens the LanceDB write serialization guarantee and is unrelated to storage cleanup. Please fail/retry under lock rather than performing lock-free writes.

After fixing, please rebase onto latest origin/master, preserve all current scripts/ci-test-manifest.mjs entries, and run:

npm run test:packaging-and-workflow
npm run test:core-regression

…d deadlock When 5+ concurrent write calls hit the same proper-lockfile lock simultaneously, all throw ENOENT due to graceful-fs race. The fixed 200ms retry delay causes all calls to retry synchronously, repeating the race until all exhausted -> lock-free fallback -> concurrent LanceDB writes deadlock (ECOMPROMISED). Fix: add random jitter (200-600ms) to scatter retry timing so the first caller acquires the lock while others wait in proper-lockfile's internal retry queue. Fixes the 5-minute gateway hang during auto-compaction under heavy memory write load.

…ore file lock Previously, runWithFileLock called proper-lockfile directly for every write. With 5+ concurrent writes (smart-extract, regex capture, auto-capture, inject, contextualize all firing at once), all hit proper-lockfile simultaneously → thundering-herd ENOENT on every call → fallback to lock-free → risk of deadlock under heavy load. Fix: wrap runWithFileLock with runSerializedUpdate, so only ONE caller attempts the file lock at a time. Others wait in the in-process Promise queue. File lock remains as cross-process safety net for multi-instance scenarios. Root cause: runSerializedUpdate already existed but was only used by patchMetadata; all other write paths (add, update, smartUpdate, etc.) called runWithFileLock directly, bypassing the queue. This eliminates the recurring ENOENT storm seen on every run completion.

…g API saturation) The queue fix eliminated ENOENT but serialized all writes through the queue. Since each write holds the lock during embedding API calls (1-3s each), 5 sequential writes saturate the embedding API key for 5-10s. Auto-recall's embedding call then queues up at MiniMax and times out at 5s. The jitter fix alone (commit 8b9ec9c) is sufficient to prevent the thundering-herd deadlock. ENOENT warnings are cosmetic noise; the fallback to lock-free is safe for LanceDB's MVCC writes. This reverts the queue wrapper added in f2ab00b while keeping the jitter retry fix.

…API saturation) Re-apply commit f2ab00b (reverted by a922205 due to perceived embedding API saturation). Analysis on 2026-05-20 confirmed that embedding calls already happen OUTSIDE runWithFileLock (callers pass pre-computed vectors), so the lock only gates fast LanceDB writes (~50ms each). The in-process Promise queue ensures only ONE caller hits proper-lockfile at a time, eliminating the 6-way thundering-herd ENOENT that caused deadlock under heavy write load (compaction + auto-capture + smart-extraction firing concurrently). Root cause: runSerializedUpdate already existed but was only used by patchMetadata; all other write paths (store, bulkStore, update, importEntry, smartUpdate, delete) called runWithFileLock directly, bypassing the queue. v1.1.0-beta.11

…k fix

…deadlock runWithFileLock now wraps with runSerializedUpdate (commit 5d953d6), but update() still had its own inner runSerializedUpdate wrapper. This created a nested deadlock: outer queue waits for inner → inner waits for outer → deadlock Remove the inner wrapper; the outer queue in runWithFileLock is sufficient.

- Remove inner runSerializedUpdate from update() to prevent double-queue deadlock - Remove debug logging, clean production code - Queue confirmed working: single caller observed, no concurrent ENOENT, recall no longer times out

Consolidates v1.0.0 (broken) + v1.0.1 (deadlock fix) into single v1.0.0. Changes: - In-process Promise queue serializes LanceDB writes - Fixes proper-lockfile ENOENT thundering herd → compaction deadlock - Fixes update() nested runSerializedUpdate deadlock - Clean production code, no debug logging

PR CortexReach#748 — realpath:false in lockfile.lock() to prevent ENOENT after proactive stale lock cleanup. proper-lockfile v4 calls realpath() by default; when the stale lock artifact is deleted, realpath() throws ENOENT. realpath:false bypasses this entirely. PR CortexReach#790 — drop late auto-recall results after timeout. If auto-recall times out (5000ms), any late-arriving results from embed/rerank are dropped at three checkpoints (post-retrieve, pre-metadata, pre-context) to prevent stale/incomplete data injection. 备份: /root/.openclaw/npm/memory-lancedb-pro.bak.0520

rwmjhb · 2026-05-23T09:08:25Z

PR #793 Review: feat: add LanceDB version snapshot auto-cleanup

Verdict: RESOLVE-CONFLICTS-FIRST | Author: njuboy11 | Merge state: DIRTY

Pipeline short-circuited at the conflict gate after R0 verification. Deep review deferred until the branch rebases cleanly onto the base.

Problem Statement (R1)

LanceDB write operations create version snapshots that the plugin never cleans up, causing active database directories to grow far beyond the live data size over days. The PR proposes configurable periodic LanceDB table optimization to reclaim old snapshots.

Why This Stopped Here

GitHub reports mergeable=CONFLICTING (merge_state_status=DIRTY) for this PR. Reviewing the diff now would:

Give feedback against a branch the author must rewrite anyway
Produce findings that may be invalidated by the conflict resolution
Waste review cycles on code that cannot be merged as-is

Recommended Action

Author should:

Rebase this branch onto the latest base (or merge the base into this branch)
Resolve all merge conflicts
Push the rebased branch — the re-review will be picked up automatically

Reviewed at 2026-05-23T08:59:36Z | R0+R1 gate | Conflict gate

rwmjhb reviewed May 17, 2026

View reviewed changes

rwmjhb requested changes May 17, 2026

View reviewed changes

ErGouzi added 13 commits May 19, 2026 02:07

chore: rename npm package to @njuboy11/memory-lancedb-pro

3e594f2

release: v1.0.0 - first stable release with queue-serialized file loc…

036e737

…k fix

chore: remove scope prefix to avoid org requirement

565ee31

debug: add queue diagnostic logging to runSerializedUpdate

d8f3bfb

release: v1.0.1 — merge upstream PRs CortexReach#748 CortexReach#790

42ae019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add LanceDB version snapshot auto-cleanup#793

feat: add LanceDB version snapshot auto-cleanup#793
njuboy11 wants to merge 14 commits into
CortexReach:masterfrom
njuboy11:feat/lancedb-storage-maintenance

njuboy11 commented May 10, 2026

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb left a comment

Uh oh!

rwmjhb commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

njuboy11 commented May 10, 2026

Summary

Config

Changes

Uh oh!

TurboTheTurtle commented May 11, 2026

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

PR #793 Review: feat: add LanceDB version snapshot auto-cleanup

Problem Statement (R1)

Why This Stopped Here

Recommended Action

Uh oh!

rwmjhb left a comment

Choose a reason for hiding this comment

Uh oh!

rwmjhb commented May 23, 2026

PR #793 Review: feat: add LanceDB version snapshot auto-cleanup

Problem Statement (R1)

Why This Stopped Here

Recommended Action

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants