Skip to content

refactor(cloud-agent-next): use message-based queued execution#3383

Open
eshurakov wants to merge 5 commits into
mainfrom
eshurakov/diamond-storm
Open

refactor(cloud-agent-next): use message-based queued execution#3383
eshurakov wants to merge 5 commits into
mainfrom
eshurakov/diamond-storm

Conversation

@eshurakov
Copy link
Copy Markdown
Contributor

@eshurakov eshurakov commented May 21, 2026

Summary

  • Replace execution-ID-centered Cloud Agent Next coordination with message IDs as the durable user-turn identity, while preserving compatibility aliases only at legacy V2 boundaries.
  • Add durable queued-message admission, drain/retry/terminalization, settlement outbox, wrapper supervision, and wrapper bootstrap paths so cold, busy, and reconnecting sessions accept follow-up turns reliably.
  • Update Cloud Agent clients and chat UI to queue sends while busy, replay delivery state on reconnect, render queued/failed delivery badges, and keep callback completion boundaries safe.

Verification

  • local integration tests with fake gateway
  • manual testing in browser

Reviewer Notes

  • The core behavior change is message-based execution: messageId becomes the durable delivery/settlement key, with executionId retained only where legacy response compatibility requires it.
  • The highest-risk review seams are queued-message terminalization, callback outbox idempotency, wrapper reconnect/liveness supervision, and the thin legacy prepare/send adapters.
  • getSession intentionally keeps callback targets redacted; stored callback metadata remains internal for outbox delivery and may contain service-to-service headers.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 21, 2026

Code Review Summary

Status: 2 Issues Remaining | Recommendation: Can merge; carried-forward items are low-risk

Executive Summary

The three new commits (5da7f74, 63377ca, 4e77a88) resolve two of the three previous WARNINGs and the existing reviewer inline comment about associatedPr. The extractAssistantTextFromParts DRY warning and two suggestions remain unaddressed but carry no new risk.

Overview

Severity Count
CRITICAL 0
WARNING 1
SUGGESTION 2
Issue Details (click to expand)

WARNING (carried forward — unchanged)

File Line Issue
services/cloud-agent-next/src/persistence/CloudAgentSession.ts 152 extractAssistantTextFromParts is defined identically in message-settlement-outbox.ts:85 — DRY violation

SUGGESTION (carried forward — unchanged)

File Line Issue
services/cloud-agent-next/src/session/pending-messages.ts 280–282 countPendingSessionMessages does a full list() scan
services/cloud-agent-next/src/session-service.ts 1232–1236 Dead-code guard: !gitToken can never be true here — the GitLab block above always sets gitToken or throws
Resolved in this round
File Issue Status
services/cloud-agent-next/src/persistence/CloudAgentSession.ts:269–291 Fire-and-forget callbackQueue.send() — now correctly awaited ✅ Fixed
services/cloud-agent-next/src/session/session-message-queue.ts:422–423 Orphaned queued message state if enqueuePendingSessionMessageIntent threw — now enqueued first with repair path ✅ Fixed
apps/mobile/src/components/agents/mobile-session-manager.ts associatedPr always set to null — now propagates from fetched session ✅ Fixed
New changes reviewed (commits since 1532be0)
  • 5da7f74 — harden message settlement lifecycle: Fixes the two major WARNINGs from previous round. Adds repairMissingQueuedStateFromPendingMessage recovery path, MessageDeliveryPlanValidationError for BAD_REQUEST classification, callbackRetryAt persistence fix, releaseWrapperTerminalWaitForIdleBatchForWrapperRun per-run scoping, isAborted=false drain-abort reset in lifecycle.ts, and resetLifecycle() on warm rebind in server.ts. cloud.message.failed now clears the pending entry rather than recording a failed state. All changes look correct.
  • 63377ca / 4e77a88 — lint-clean fixes for test files: no issues.
Files Reviewed

New in this round:

  • apps/mobile/src/components/agents/mobile-session-manager.ts — fixed
  • apps/mobile/src/components/agents/mobile-session-manager.test.ts — no issues
  • apps/web/src/lib/cloud-agent-sdk/service-state.ts — no issues
  • apps/web/src/lib/cloud-agent-sdk/service-state.test.ts — no issues
  • apps/web/src/lib/cloud-agent-sdk/__fixtures__/message-delivery-exhausted.ts — no issues
  • services/cloud-agent-next/src/persistence/CloudAgentSession.ts — 2 WARNINGs fixed, 1 WARNING (DRY) carried forward
  • services/cloud-agent-next/src/session/session-message-queue.ts — WARNING fixed
  • services/cloud-agent-next/src/session/message-settlement-outbox.ts — no new issues
  • services/cloud-agent-next/src/session/pending-messages.ts — no new issues
  • services/cloud-agent-next/src/session/wrapper-supervisor.ts — no new issues
  • services/cloud-agent-next/src/session/wrapper-supervisor.test.ts — no issues
  • services/cloud-agent-next/test/integration/session/legacy-callback-enqueue.test.ts — no issues
  • services/cloud-agent-next/wrapper/src/lifecycle.ts — no issues
  • services/cloud-agent-next/wrapper/src/lifecycle.test.ts — no issues
  • services/cloud-agent-next/wrapper/src/server.ts — no issues
  • services/cloud-agent-next/wrapper/src/server.test.ts — no issues

Carried forward (unchanged):

  • services/cloud-agent-next/src/session/pending-messages.ts — 1 suggestion
  • services/cloud-agent-next/src/session-service.ts — 1 suggestion

Fix these issues in Kilo Cloud


Reviewed by claude-4.6-sonnet-20260217 · 1,983,115 tokens

Review guidance: REVIEW.md from base branch main

@eshurakov eshurakov force-pushed the eshurakov/diamond-storm branch from 9670177 to ab795d7 Compare May 21, 2026 14:22
@eshurakov eshurakov force-pushed the eshurakov/diamond-storm branch from bdeaa03 to e2dbe6e Compare May 22, 2026 07:26
prompt: rs?.prompt ?? null,
initialMessageId: rs?.initialMessageId ?? null,
associatedPr: sessionResult.associatedPr,
associatedPr: null,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting associatedPr to always be null?

label: 'Queued' | 'Failed to deliver';
tone: 'info' | 'error';
title?: string;
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this should be:

export type DeliveryBadge =
  | {
      label: 'Queued';
      tone: 'info';
      title?: string;
    }
  | {
      label: 'Failed to deliver';
      tone: 'error';
      title?: string;
    };

Comment on lines +220 to +231
if (metadata.githubRepo) {
return {
type: 'github',
repo: metadata.githubRepo,
...omitUndefined({
platform: metadata.platform === 'github' ? 'github' : undefined,
githubInstallationId: metadata.githubInstallationId,
githubAppType: metadata.githubAppType,
upstreamBranch: metadata.upstreamBranch,
}),
};
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also still support the legacy githubToken attribute?

Comment on lines +248 to +257
const payload: CallbackJob['payload'] = {
sessionId,
cloudAgentSessionId: sessionId,
status,
errorMessage: error,
lastSeenBranch: metadata.repository?.upstreamBranch,
kiloSessionId: metadata.auth.kiloSessionId,
gateResult,
lastAssistantMessageText,
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have zero clue about this one, but I figure that you will:

enqueueCallbackNotification builds the legacy callback payload without executionId (services/cloud-agent-next/src/persistence/CloudAgentSession.ts:248), but the security-analysis callback still requires it (apps/web/src/app/api/internal/security-analysis-callback/[findingId]/route.ts:42). Old execution-id-only terminal paths can complete but have callbacks rejected. Include executionId: messageId ??execution.executionId.

eshurakov added 4 commits May 22, 2026 11:53
Preserve callback and pending-message state through queue failures, keep legacy execution callback compatibility explicit during rollout, and tighten wrapper reconnect/lifecycle cleanup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants