feat: add force parameter to commit to skip blocking failed archives (fixes #2475)#2476
Open
njuboy11 wants to merge 1 commit into
Open
feat: add force parameter to commit to skip blocking failed archives (fixes #2475)#2476njuboy11 wants to merge 1 commit into
njuboy11 wants to merge 1 commit into
Conversation
Closes volcengine#2475 A single transient failure (VLM timeout, SSL error, etc.) during Phase 2 (memory extraction) writes a .failed.json marker that permanently blocks all future commits on that session. There is no auto-recovery, no retry, and no admin API to clear it. This commit adds a `force: bool = False` parameter throughout the commit flow that skips the blocking-failed-archive check when True: - session.py: commit_async(force=False) — skip _get_blocking_failed_archive_ref() - session.py: _run_memory_extraction(force=False) — skip _wait_for_previous_archive_done() - session_service.py: commit_async(force=False) — pass through - sessions.py router: CommitRequest.force field — pass through Default force=False preserves existing behavior. Clients that understand the failure mode can opt-in to force=True to recover without manual .file deletion.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨No code suggestions found for the PR. |
Contributor
|
有点没看懂,为什么要引入独立 pending queue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Closes #2475
A single transient failure (VLM timeout, SSL error, LLM parse error, etc.) during Phase 2 (memory extraction) writes a
.failed.jsonmarker that permanently blocks all future commits on that session. There is no auto-recovery, no retry mechanism, and no admin API to clear it.Real-world impact
When OpenViking is used as an OpenClaw context engine plugin, the
afterTurnhook auto-commits on token threshold. A single VLM 180s timeout locks the entire session forever — the user has no way to recover without manually deleting.failed.jsonfiles from the filesystem.Failure chain
.failed.jsonis written to the archive directoryPOST /sessions/{id}/commitreturnFAILED_PRECONDITIONAdditionally, the failure cascades: archive_002 fails → archive_003's
_wait_for_previous_archive_done()sees 002's.failed.json→ 003 also writes.failed.json→ archive_004 blocked → entire session dead.Solution
Add a
force: bool = Falseparameter throughout the commit flow. Whenforce=True, both the Phase 1 and Phase 2 blocking checks are skipped with a warning log.Changes
session.pycommit_async(force=False): skip_get_blocking_failed_archive_ref()when force=Truesession.py_run_memory_extraction(force=False): skip_wait_for_previous_archive_done()when force=Truesession.pycreate_taskcall: passforce=forceto_run_memory_extractionsession_service.pycommit_async(force=False): pass through to sessionsessions.pyrouterCommitRequest: addforce: bool = Falsefield, pass throughDesign decisions
force=Falsepreserves existing behavior — no breaking changesforce=Trueonly logs a warning, does not delete.failed.json— the marker stays for debuggingforcehandling because Phase 2 runs in a backgroundasyncio.create_task()with its own_wait_for_previous_archive_done()checkTesting
Verified in production:
.failed.jsonwritten → session lockedforce=True: commit proceeds, Phase 2 runs, new memories extractedforce: existingFAILED_PRECONDITIONbehavior unchanged