Skip to content

feat: add force parameter to commit to skip blocking failed archives (fixes #2475)#2476

Open
njuboy11 wants to merge 1 commit into
volcengine:mainfrom
njuboy11:feat/force-commit-skip-failed-archive
Open

feat: add force parameter to commit to skip blocking failed archives (fixes #2475)#2476
njuboy11 wants to merge 1 commit into
volcengine:mainfrom
njuboy11:feat/force-commit-skip-failed-archive

Conversation

@njuboy11

@njuboy11 njuboy11 commented Jun 6, 2026

Copy link
Copy Markdown

Problem

Closes #2475

A single transient failure (VLM timeout, SSL error, LLM parse error, etc.) during Phase 2 (memory extraction) writes a .failed.json marker that permanently blocks all future commits on that session. There is no auto-recovery, no retry mechanism, and no admin API to clear it.

Real-world impact

When OpenViking is used as an OpenClaw context engine plugin, the afterTurn hook auto-commits on token threshold. A single VLM 180s timeout locks the entire session forever — the user has no way to recover without manually deleting .failed.json files from the filesystem.

Failure chain

  1. Session commits → Phase 1 (archive) succeeds
  2. Phase 2 (memory extraction) fails (VLM timeout, SSL error, etc.)
  3. .failed.json is written to the archive directory
  4. All subsequent POST /sessions/{id}/commit return FAILED_PRECONDITION
  5. Session is permanently locked

Additionally, the failure cascades: archive_002 fails → archive_003's _wait_for_previous_archive_done() sees 002's .failed.json → 003 also writes .failed.json → archive_004 blocked → entire session dead.

Solution

Add a force: bool = False parameter throughout the commit flow. When force=True, both the Phase 1 and Phase 2 blocking checks are skipped with a warning log.

Changes

File Change
session.py commit_async(force=False): skip _get_blocking_failed_archive_ref() when force=True
session.py _run_memory_extraction(force=False): skip _wait_for_previous_archive_done() when force=True
session.py create_task call: pass force=force to _run_memory_extraction
session_service.py commit_async(force=False): pass through to session
sessions.py router CommitRequest: add force: bool = False field, pass through

Design decisions

  • Default force=False preserves existing behavior — no breaking changes
  • force=True only logs a warning, does not delete .failed.json — the marker stays for debugging
  • Phase 1 and Phase 2 both need independent force handling because Phase 2 runs in a background asyncio.create_task() with its own _wait_for_previous_archive_done() check

Testing

Verified in production:

  • VLM timeout → .failed.json written → session locked
  • With force=True: commit proceeds, Phase 2 runs, new memories extracted
  • Without force: existing FAILED_PRECONDITION behavior unchanged

Closes volcengine#2475

A single transient failure (VLM timeout, SSL error, etc.) during Phase 2
(memory extraction) writes a .failed.json marker that permanently blocks
all future commits on that session. There is no auto-recovery, no retry,
and no admin API to clear it.

This commit adds a `force: bool = False` parameter throughout the commit
flow that skips the blocking-failed-archive check when True:

- session.py: commit_async(force=False) — skip _get_blocking_failed_archive_ref()
- session.py: _run_memory_extraction(force=False) — skip _wait_for_previous_archive_done()
- session_service.py: commit_async(force=False) — pass through
- sessions.py router: CommitRequest.force field — pass through

Default force=False preserves existing behavior. Clients that understand
the failure mode can opt-in to force=True to recover without manual
.file deletion.
@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis ✅

2475 - Fully compliant

Compliant requirements:

  • Added force parameter to session.py's commit_async and skipped the blocking check when force=True
  • Added force parameter to session.py's _run_memory_extraction and skipped the previous archive check when force=True
  • Added force parameter to session_service.py's commit_async and passed through to session
  • Added force field to sessions.py router's CommitRequest and passed through
⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🏅 Score: 90
🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

github-actions Bot commented Jun 6, 2026

Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@CuSO41108

Copy link
Copy Markdown
Contributor

有点没看懂,为什么要引入独立 pending queue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

feat: Add force parameter to commit to skip blocking failed archives

2 participants