fix(skillopt): retry findInvocation through Deeplake insert→read lag#247
Conversation
The worker fires on a user reaction and reads the skill-invocation window from the Deeplake sessions table. That row is written by a SEPARATE process (capture.js) and lands on a short insert→read visibility lag, so a fast reaction reads stale → "invocation not found" → silent no-op. The K=3 window only retried if the user kept typing; a single fast/final reaction lost the improvement. findInvocation now polls with bounded linear backoff (5 attempts, 3s base, ~45s max) inside the already-detached worker, so the wait blocks nothing. Only a not-found (null) result retries — a query error (e.g. 402) fails fast. Bounded so a genuinely-absent invocation (e.g. capture disabled) gives up gracefully with no publish. Tests: miss-then-hit retries then publishes; never-propagates gives up with no insert; query-error fails fast without spinning.
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 1 file changed
Generated for commit d8a135f. |
📝 WalkthroughWalkthroughThe PR adds configurable polling with linear backoff to ChangesInvocation Retry Polling
Sequence DiagramsequenceDiagram
participant improveSkillIfFailed as improveSkillIfFailed
participant findInvocationWithRetry as findInvocationWithRetry
participant findInvocation as findInvocation
participant sleep as sleep (backoff)
improveSkillIfFailed->>findInvocationWithRetry: call with opts
loop until invocation found or retries exhausted
findInvocationWithRetry->>findInvocation: query session
alt invocation exists
findInvocation-->>findInvocationWithRetry: return invocation
findInvocationWithRetry-->>improveSkillIfFailed: return invocation
else invocation is null
findInvocationWithRetry->>sleep: linear backoff
sleep-->>findInvocationWithRetry: timer complete
Note over findInvocationWithRetry: increment attempt counter
else query throws error
findInvocation-->>findInvocationWithRetry: propagate error
findInvocationWithRetry-->>improveSkillIfFailed: reject with error
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/shared/skillopt-improve.test.ts`:
- Around line 139-156: The test currently asserts
expect(sessionsCalls).toBeGreaterThanOrEqual(3) but the mock is designed to
succeed on the third call, so change that assertion to
expect(sessionsCalls).toBe(3) to precisely verify the retry behavior; update the
assertion in the "retries findInvocation when the row hasn't propagated yet
(Deeplake lag) → then judges + improves" test (the variable sessionsCalls and
the query mock are used there) to use .toBe(3).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 816903c1-6c10-4470-859b-779aeb6288a2
📒 Files selected for processing (2)
src/skillify/skillopt-improve.tstests/shared/skillopt-improve.test.ts
|
@coderabbitai review |
✅ Action performedReview finished.
|
CodeRabbit flagged .toBeGreaterThanOrEqual(3) and suggested a precise equality. The precise value is 4, not the 3 suggested — findInvocation polls 3× (miss, miss, hit) and windowAroundInvocation issues one more /sessions/ query after the invocation is found. .toBe(4) is both precise and correct.
|
@coderabbitai review |
✅ Action performedReview finished.
|
…t a "bug" The insert→read visibility lag is normal Deeplake behavior, not a defect — the worker simply needs to tolerate it. Drop the "Bug #1" labeling from the comments; the retry logic is unchanged.
Problem
The skillopt worker fires on a user reaction and reads the skill-invocation window from the Deeplake
sessionstable viafindInvocation. That row is written by a separate process (capture.js) and lands on a short insert→read visibility lag — this is expected Deeplake latency, not a defect. But a worker that fires on a fast reaction reads stale →invocation not found in session→ silent no-op.Reproduced live in released prod 0.7.80: worker queried at
04:28:43→invocation not foundwhile[capture] writing…was still streaming at04:29:09–45. The existing K=3 window only retries if the user keeps typing — a single fast or final reaction silently loses the improvement.The fix belongs in the worker: tolerate the lag rather than give up on the first read.
Fix
findInvocationis now polled with bounded linear backoff (5 attempts, 3s base, ~45s max) before giving up. It runs inside the already-detached worker, so the wait blocks nothing in the user's session.402) propagates immediately (fail-fast, no spinning).findInvocationitself stays pure (single query); the retry lives inimproveSkillIfFailed, injectable viainvocationRetries/invocationBackoffMs/sleepfor tests.Tests (failing-first)
All 70 skillopt tests pass; tsc clean (only pre-existing tree-sitter native-dep errors remain).