fix(skillopt): retry findInvocation through Deeplake insert→read lag by kaghni · Pull Request #247 · activeloopai/hivemind

kaghni · 2026-06-08T05:07:41Z

Problem

The skillopt worker fires on a user reaction and reads the skill-invocation window from the Deeplake sessions table via findInvocation. That row is written by a separate process (capture.js) and lands on a short insert→read visibility lag — this is expected Deeplake latency, not a defect. But a worker that fires on a fast reaction reads stale → invocation not found in session → silent no-op.

Reproduced live in released prod 0.7.80: worker queried at 04:28:43 → invocation not found while [capture] writing… was still streaming at 04:29:09–45. The existing K=3 window only retries if the user keeps typing — a single fast or final reaction silently loses the improvement.

The fix belongs in the worker: tolerate the lag rather than give up on the first read.

Fix

findInvocation is now polled with bounded linear backoff (5 attempts, 3s base, ~45s max) before giving up. It runs inside the already-detached worker, so the wait blocks nothing in the user's session.

Only a not-found (null) result is retried — a query error (e.g. 402) propagates immediately (fail-fast, no spinning).
Bounded: the row is near-certain to land (capture is reliable) but not guaranteed (capture may be disabled/errored), so on exhaustion we return null and the caller gives up gracefully with no publish.
Lock interaction checked: the per-skill worker lock TTL is 10 min ≫ 45s, so no second worker can steal the lock mid-retry.

findInvocation itself stays pure (single query); the retry lives in improveSkillIfFailed, injectable via invocationRetries / invocationBackoffMs / sleep for tests.

Tests (failing-first)

miss-then-hit → retries with linear backoff, then judges + publishes (was: no-op)
never-propagates (e.g. capture disabled) → bounded retries, gives up, no insert
query error (402) → fails fast, single query, no retry loop

All 70 skillopt tests pass; tsc clean (only pre-existing tree-sitter native-dep errors remain).

The worker fires on a user reaction and reads the skill-invocation window from the Deeplake sessions table. That row is written by a SEPARATE process (capture.js) and lands on a short insert→read visibility lag, so a fast reaction reads stale → "invocation not found" → silent no-op. The K=3 window only retried if the user kept typing; a single fast/final reaction lost the improvement. findInvocation now polls with bounded linear backoff (5 attempts, 3s base, ~45s max) inside the already-detached worker, so the wait blocks nothing. Only a not-found (null) result retries — a query error (e.g. 402) fails fast. Bounded so a genuinely-absent invocation (e.g. capture disabled) gives up gracefully with no publish. Tests: miss-then-hit retries then publishes; never-propagates gives up with no insert; query-error fails fast without spinning.

github-actions · 2026-06-08T05:09:25Z

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status	Category	Percentage	Covered / Total
🟢	Lines	100.00% (🎯 90%)	47 / 47
🟢	Statements	95.38% (🎯 90%)	62 / 65
🔴	Functions	71.43% (🎯 90%)	5 / 7
🟢	Branches	95.74% (🎯 90%)	45 / 47

File Coverage — 1 file changed

File	Stmts	Branches	Functions	Lines
`src/skillify/skillopt-improve.ts`	🟢 95.4%	🟢 95.7%	🔴 71.4%	🟢 100.0%

_{Generated for commit d8a135f.}

coderabbitai · 2026-06-08T05:09:25Z

📝 Walkthrough

Walkthrough

The PR adds configurable polling with linear backoff to improveSkillIfFailed to handle transient stale visibility when invocation rows are written by separate processes. New ImproveOpts fields control retry count and backoff interval. A findInvocationWithRetry helper implements bounded retries on null results while propagating query errors immediately. Tests validate zero-retry, multi-retry with success, exhausted-retry, and error-fast paths.

Changes

Invocation Retry Polling

Layer / File(s)	Summary
ImproveOpts contract and retry defaults `src/skillify/skillopt-improve.ts`	`ImproveOpts` gains `invocationRetries`, `invocationBackoffMs`, and injectable `sleep` parameters. Default constants and `realSleep` timer are defined.
Retry polling and integration `src/skillify/skillopt-improve.ts`	`findInvocationWithRetry` loops on null results with linear backoff, bounded by retries, and propagates query errors. `improveSkillIfFailed` calls the retry variant instead of single `findInvocation`.
Retry behavior test scenarios `tests/shared/skillopt-improve.test.ts`	Updated test uses `invocationRetries: 0`. New tests validate propagation-lag retry-and-succeed, bounded-retry graceful termination, and error-propagation no-retry paths.

Sequence Diagram

sequenceDiagram
  participant improveSkillIfFailed as improveSkillIfFailed
  participant findInvocationWithRetry as findInvocationWithRetry
  participant findInvocation as findInvocation
  participant sleep as sleep (backoff)
  improveSkillIfFailed->>findInvocationWithRetry: call with opts
  loop until invocation found or retries exhausted
    findInvocationWithRetry->>findInvocation: query session
    alt invocation exists
      findInvocation-->>findInvocationWithRetry: return invocation
      findInvocationWithRetry-->>improveSkillIfFailed: return invocation
    else invocation is null
      findInvocationWithRetry->>sleep: linear backoff
      sleep-->>findInvocationWithRetry: timer complete
      Note over findInvocationWithRetry: increment attempt counter
    else query throws error
      findInvocation-->>findInvocationWithRetry: propagate error
      findInvocationWithRetry-->>improveSkillIfFailed: reject with error
    end
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

efenocchi

Poem

🐰 A retry loop with patience true,
Linear backoff for the stale view,
Rows arrive after a quiet hop,
Poll, then judge — then publish once on top,
Errors skip the loop; we stop, not flop. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding retry logic with bounded backoff to handle Deeplake insert→read visibility lag in findInvocation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively covers the problem, fix, testing strategy, and technical considerations (lock TTL, fail-fast behavior, graceful exhaustion).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/skillopt-worker-invocation-retry

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/shared/skillopt-improve.test.ts`:
- Around line 139-156: The test currently asserts
expect(sessionsCalls).toBeGreaterThanOrEqual(3) but the mock is designed to
succeed on the third call, so change that assertion to
expect(sessionsCalls).toBe(3) to precisely verify the retry behavior; update the
assertion in the "retries findInvocation when the row hasn't propagated yet
(Deeplake lag) → then judges + improves" test (the variable sessionsCalls and
the query mock are used there) to use .toBe(3).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 816903c1-6c10-4470-859b-779aeb6288a2

📥 Commits

Reviewing files that changed from the base of the PR and between 27222b5 and 7476389.

📒 Files selected for processing (2)

src/skillify/skillopt-improve.ts
tests/shared/skillopt-improve.test.ts

kaghni · 2026-06-08T05:27:12Z

@coderabbitai review

coderabbitai · 2026-06-08T05:27:18Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

CodeRabbit flagged .toBeGreaterThanOrEqual(3) and suggested a precise equality. The precise value is 4, not the 3 suggested — findInvocation polls 3× (miss, miss, hit) and windowAroundInvocation issues one more /sessions/ query after the invocation is found. .toBe(4) is both precise and correct.

kaghni · 2026-06-08T05:29:25Z

@coderabbitai review

coderabbitai · 2026-06-08T05:29:29Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…t a "bug" The insert→read visibility lag is normal Deeplake behavior, not a defect — the worker simply needs to tolerate it. Drop the "Bug #1" labeling from the comments; the retry logic is unchanged.

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

Comment thread tests/shared/skillopt-improve.test.ts

docs(skillopt): reframe the Deeplake read-lag as expected latency, no…

2d00789

…t a "bug" The insert→read visibility lag is normal Deeplake behavior, not a defect — the worker simply needs to tolerate it. Drop the "Bug #1" labeling from the comments; the retry logic is unchanged.

efenocchi approved these changes Jun 8, 2026

View reviewed changes

kaghni merged commit d250d3f into main Jun 8, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(skillopt): retry findInvocation through Deeplake insert→read lag#247

fix(skillopt): retry findInvocation through Deeplake insert→read lag#247
kaghni merged 3 commits into
mainfrom
fix/skillopt-worker-invocation-retry

kaghni commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

kaghni commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

kaghni commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaghni commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Tests (failing-first)

Uh oh!

github-actions Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage Report

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kaghni commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaghni commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kaghni commented Jun 8, 2026 •

edited

Loading

github-actions Bot commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading