Skip to content

fix(skillopt): retry findInvocation through Deeplake insert→read lag#247

Merged
kaghni merged 3 commits into
mainfrom
fix/skillopt-worker-invocation-retry
Jun 8, 2026
Merged

fix(skillopt): retry findInvocation through Deeplake insert→read lag#247
kaghni merged 3 commits into
mainfrom
fix/skillopt-worker-invocation-retry

Conversation

@kaghni
Copy link
Copy Markdown
Collaborator

@kaghni kaghni commented Jun 8, 2026

Problem

The skillopt worker fires on a user reaction and reads the skill-invocation window from the Deeplake sessions table via findInvocation. That row is written by a separate process (capture.js) and lands on a short insert→read visibility lag — this is expected Deeplake latency, not a defect. But a worker that fires on a fast reaction reads stale → invocation not found in session → silent no-op.

Reproduced live in released prod 0.7.80: worker queried at 04:28:43invocation not found while [capture] writing… was still streaming at 04:29:09–45. The existing K=3 window only retries if the user keeps typing — a single fast or final reaction silently loses the improvement.

The fix belongs in the worker: tolerate the lag rather than give up on the first read.

Fix

findInvocation is now polled with bounded linear backoff (5 attempts, 3s base, ~45s max) before giving up. It runs inside the already-detached worker, so the wait blocks nothing in the user's session.

  • Only a not-found (null) result is retried — a query error (e.g. 402) propagates immediately (fail-fast, no spinning).
  • Bounded: the row is near-certain to land (capture is reliable) but not guaranteed (capture may be disabled/errored), so on exhaustion we return null and the caller gives up gracefully with no publish.
  • Lock interaction checked: the per-skill worker lock TTL is 10 min ≫ 45s, so no second worker can steal the lock mid-retry.

findInvocation itself stays pure (single query); the retry lives in improveSkillIfFailed, injectable via invocationRetries / invocationBackoffMs / sleep for tests.

Tests (failing-first)

  • miss-then-hit → retries with linear backoff, then judges + publishes (was: no-op)
  • never-propagates (e.g. capture disabled) → bounded retries, gives up, no insert
  • query error (402) → fails fast, single query, no retry loop

All 70 skillopt tests pass; tsc clean (only pre-existing tree-sitter native-dep errors remain).

The worker fires on a user reaction and reads the skill-invocation window
from the Deeplake sessions table. That row is written by a SEPARATE process
(capture.js) and lands on a short insert→read visibility lag, so a fast
reaction reads stale → "invocation not found" → silent no-op. The K=3 window
only retried if the user kept typing; a single fast/final reaction lost the
improvement.

findInvocation now polls with bounded linear backoff (5 attempts, 3s base,
~45s max) inside the already-detached worker, so the wait blocks nothing.
Only a not-found (null) result retries — a query error (e.g. 402) fails fast.
Bounded so a genuinely-absent invocation (e.g. capture disabled) gives up
gracefully with no publish.

Tests: miss-then-hit retries then publishes; never-propagates gives up with
no insert; query-error fails fast without spinning.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 8, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 100.00% (🎯 90%) 47 / 47
🟢 Statements 95.38% (🎯 90%) 62 / 65
🔴 Functions 71.43% (🎯 90%) 5 / 7
🟢 Branches 95.74% (🎯 90%) 45 / 47
File Coverage — 1 file changed
File Stmts Branches Functions Lines
src/skillify/skillopt-improve.ts 🟢 95.4% 🟢 95.7% 🔴 71.4% 🟢 100.0%

Generated for commit d8a135f.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

The PR adds configurable polling with linear backoff to improveSkillIfFailed to handle transient stale visibility when invocation rows are written by separate processes. New ImproveOpts fields control retry count and backoff interval. A findInvocationWithRetry helper implements bounded retries on null results while propagating query errors immediately. Tests validate zero-retry, multi-retry with success, exhausted-retry, and error-fast paths.

Changes

Invocation Retry Polling

Layer / File(s) Summary
ImproveOpts contract and retry defaults
src/skillify/skillopt-improve.ts
ImproveOpts gains invocationRetries, invocationBackoffMs, and injectable sleep parameters. Default constants and realSleep timer are defined.
Retry polling and integration
src/skillify/skillopt-improve.ts
findInvocationWithRetry loops on null results with linear backoff, bounded by retries, and propagates query errors. improveSkillIfFailed calls the retry variant instead of single findInvocation.
Retry behavior test scenarios
tests/shared/skillopt-improve.test.ts
Updated test uses invocationRetries: 0. New tests validate propagation-lag retry-and-succeed, bounded-retry graceful termination, and error-propagation no-retry paths.

Sequence Diagram

sequenceDiagram
  participant improveSkillIfFailed as improveSkillIfFailed
  participant findInvocationWithRetry as findInvocationWithRetry
  participant findInvocation as findInvocation
  participant sleep as sleep (backoff)
  improveSkillIfFailed->>findInvocationWithRetry: call with opts
  loop until invocation found or retries exhausted
    findInvocationWithRetry->>findInvocation: query session
    alt invocation exists
      findInvocation-->>findInvocationWithRetry: return invocation
      findInvocationWithRetry-->>improveSkillIfFailed: return invocation
    else invocation is null
      findInvocationWithRetry->>sleep: linear backoff
      sleep-->>findInvocationWithRetry: timer complete
      Note over findInvocationWithRetry: increment attempt counter
    else query throws error
      findInvocation-->>findInvocationWithRetry: propagate error
      findInvocationWithRetry-->>improveSkillIfFailed: reject with error
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • efenocchi

Poem

🐰 A retry loop with patience true,
Linear backoff for the stale view,
Rows arrive after a quiet hop,
Poll, then judge — then publish once on top,
Errors skip the loop; we stop, not flop. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding retry logic with bounded backoff to handle Deeplake insert→read visibility lag in findInvocation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description comprehensively covers the problem, fix, testing strategy, and technical considerations (lock TTL, fail-fast behavior, graceful exhaustion).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/skillopt-worker-invocation-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/shared/skillopt-improve.test.ts`:
- Around line 139-156: The test currently asserts
expect(sessionsCalls).toBeGreaterThanOrEqual(3) but the mock is designed to
succeed on the third call, so change that assertion to
expect(sessionsCalls).toBe(3) to precisely verify the retry behavior; update the
assertion in the "retries findInvocation when the row hasn't propagated yet
(Deeplake lag) → then judges + improves" test (the variable sessionsCalls and
the query mock are used there) to use .toBe(3).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 816903c1-6c10-4470-859b-779aeb6288a2

📥 Commits

Reviewing files that changed from the base of the PR and between 27222b5 and 7476389.

📒 Files selected for processing (2)
  • src/skillify/skillopt-improve.ts
  • tests/shared/skillopt-improve.test.ts

Comment thread tests/shared/skillopt-improve.test.ts
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

CodeRabbit flagged .toBeGreaterThanOrEqual(3) and suggested a precise equality.
The precise value is 4, not the 3 suggested — findInvocation polls 3× (miss,
miss, hit) and windowAroundInvocation issues one more /sessions/ query after the
invocation is found. .toBe(4) is both precise and correct.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 8, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 8, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…t a "bug"

The insert→read visibility lag is normal Deeplake behavior, not a defect — the
worker simply needs to tolerate it. Drop the "Bug #1" labeling from the comments;
the retry logic is unchanged.
@kaghni kaghni merged commit d250d3f into main Jun 8, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants