Skip to content

fix(kiloclaw): harden Composio onboarding#3400

Open
pandemicsyn wants to merge 6 commits into
mainfrom
florian/fix-composio-onboarding-flow
Open

fix(kiloclaw): harden Composio onboarding#3400
pandemicsyn wants to merge 6 commits into
mainfrom
florian/fix-composio-onboarding-flow

Conversation

@pandemicsyn
Copy link
Copy Markdown
Contributor

@pandemicsyn pandemicsyn commented May 21, 2026

Summary

  • Stabilize the managed Composio Google Calendar onboarding flow by making identity reservation idempotent without advisory locks, restoring the tools step, and keeping callback completion resilient to Composio lifecycle lag.
  • Harden callback completion: accept a caller-owned connected account even before Composio reports it ACTIVE, but treat worker patchSecrets failures as real infra failures with bounded retry and no connected-account DB write on failure.
  • Keep managed status grounded in durable state for the current sandbox: provisioned instances only report connected when the sandbox is marked managed and worker secrets are configured, while popup/manual-save paths use event-driven refetch instead of interval polling.
  • Extract popup coordination into a dedicated useComposioPopup state machine that preserves postMessage, localStorage, and BroadcastChannel callback channels, including late callback delivery after popup-close races.

Verification

N/A

Visual Changes

Before After
Composio tools step was temporarily bypassed while the flow was broken. Composio tools step is restored in onboarding.
Inbound email Continue could be changed independently of the parent loading contract. Inbound email again waits for instance status/address setup before Continue is enabled.
Popup coordination lived as refs/effects inside ClawOnboardingFlow. Popup behavior is unchanged visually, but is now driven by useComposioPopup.

Reviewer Notes

  • Conceptual callback split: Composio lifecycle lag is accepted after ownership verification; worker secret patch failures retry 3 times and then surface honestly.
  • google_calendar_connected_account_id is only persisted after worker secrets are patched for an existing sandbox; status also verifies the current sandbox source/secrets before reporting connected.
  • Composio onboarding status no longer polls every 15 seconds; popup callback, focus/visibility, and mutation invalidation paths are expected to drive refreshes.
  • Popup callback handling now accepts same-attempt success after an early popup_closed observation and reads the stored callback result on focus/visibility to recover missed browser events.
  • Focus review on lock-free identity reservation, callback race handling, current-sandbox status checks, and popup state-machine edges such as dedupe, popup close, confirmation timeout, late redirect races, and late callback delivery.
  • Existing database indexes are reused; this PR does not add migrations or change schema.

@pandemicsyn pandemicsyn changed the title fix(kiloclaw): stabilize Composio onboarding fix(kiloclaw): harden Composio onboarding May 21, 2026
@pandemicsyn pandemicsyn marked this pull request as ready for review May 21, 2026 19:40
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented May 21, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Executive Summary

The lock-free identity reservation and popup state machine are well-designed; the main concern is a behavioral gap in getManagedComposioGoogleCalendarStatus where a provisioned sandbox without sandboxHasComposioSecrets silently reports disconnected even when the managed source is confirmed, which may cause misleading UI state after the callback completes.

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 1
Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/lib/kiloclaw/composio-onboarding.ts 367–368 Status check silently reports disconnected for managed sandbox without secrets
apps/web/src/lib/kiloclaw/composio-onboarding.ts 47–49 Non-KiloClawApiError transport errors are retried but never tested

SUGGESTION

File Line Issue
apps/web/src/app/(app)/claw/components/useComposioPopup.ts 48–116 document.write on popup is deprecated

Details

WARNING 1 — getManagedComposioGoogleCalendarStatus gap for managed sandbox without worker secrets

In composio-onboarding.ts:367–368:

if (params.instance && !params.sandboxHasComposioSecrets) {
  return { enabled: true, status: 'disconnected', ... };
}

This guard fires when a sandbox is provisioned (params.instance is set), the sandbox source is 'managed' (passed the earlier checks), the identity has a google_calendar_connected_account_id, but sandboxHasComposioSecrets is false. This scenario is the window between completeManagedComposioGoogleCalendarConnection persisting the connected-account ID and the worker reporting its secrets as configured. During this window the status UI shows disconnected instead of a transitional state — which could cause the popup confirmConnected() path to never fire since connectionStatus !== 'connected'. It is worth confirming that the refetchStatus → polling → status update chain is resilient to this window, and that the confirmation timeout (45 s) is long enough.

WARNING 2 — isRetryablePatchSecretsError treats all non-KiloClawApiError errors as retryable

In composio-onboarding.ts:47–49:

function isRetryablePatchSecretsError(error: unknown): boolean {
  if (error instanceof KiloClawApiError) return error.statusCode >= 500;
  return true; // covers network errors, TypeErrors, etc.
}

This is the intended design (retry transport errors), and the existing tests cover KiloClawApiError 5xx retry and 4xx terminal paths. However there is no test verifying that a plain Error / TypeError (e.g. DNS resolution failure, fetch abort) also respects the retry budget and logs status: 'transport_error'. Consider adding a test for this branch.

SUGGESTION — document.write is deprecated

In useComposioPopup.ts:48–116, writeComposioPopupLoadingPage uses popup.document.write(...). While this works today for same-origin about:blank popups, document.write is deprecated and may be blocked by a Content-Security-Policy: script-src that includes 'unsafe-inline' restrictions or future browser policy changes. An alternative is to assign popup.document.documentElement.innerHTML or use a blob: URL as the initial popup target.

Files Reviewed (14 files)
  • apps/web/src/app/(app)/claw/components/ClawOnboardingFlow.tsx
  • apps/web/src/app/(app)/claw/components/ConnectToolsStep.tsx
  • apps/web/src/app/(app)/claw/components/useComposioPopup.test.ts
  • apps/web/src/app/(app)/claw/components/useComposioPopup.ts
  • apps/web/src/app/api/integrations/composio/callback/route.test.ts
  • apps/web/src/app/api/integrations/composio/callback/route.ts
  • apps/web/src/hooks/useKiloClaw.ts
  • apps/web/src/hooks/useOrgKiloClaw.ts
  • apps/web/src/lib/kiloclaw/composio-identities.test.ts
  • apps/web/src/lib/kiloclaw/composio-identities.ts
  • apps/web/src/lib/kiloclaw/composio-onboarding.test.ts
  • apps/web/src/lib/kiloclaw/composio-onboarding.ts
  • apps/web/src/routers/kiloclaw-router.ts
  • apps/web/src/routers/organizations/organization-kiloclaw-router.ts

Fix these issues in Kilo Cloud


Reviewed by claude-sonnet-4.6 · 1,843,512 tokens

Review guidance: REVIEW.md from base branch main

return { enabled: true, status: 'error', connectedAccountId: null, sandboxConfigSource };
}

if (params.instance && !params.sandboxHasComposioSecrets) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moderate, Functional: With the 15s refetchInterval removed from getComposioOnboardingStatus, the awaiting-confirmation state in useComposioPopup relies on a single refetchStatus() at callback time plus focus/visibilitychange events. This guard reports disconnected whenever a provisioned, managed sandbox has not yet reported sandboxHasComposioSecrets. If that flag is not true on the one post-callback refetch and the user never blurs and refocuses the tab, the 45s confirmation timer expires and surfaces "Could not confirm Google Calendar" even though the connection succeeded.

The callback route writes patchSecrets, then the connected-account id, then composio_config_source='managed' before returning success, so the only remaining variable is whether sandboxHasComposioSecrets flips to true synchronously once the worker patchSecrets call returns. If the worker config is read through any cache or eventually-consistent path, please either keep a short poll while the popup is in awaiting-confirmation or have the state machine refetch on its own interval during that state.


return {
enabled: true,
status: 'connected',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low, Functional: getManagedComposioGoogleCalendarStatus no longer calls listComposioConnectedAccounts, so connected is now derived purely from durable state (composio_config_source, the stored google_calendar_connected_account_id, and sandboxHasComposioSecrets). This is reasonable and removes a network call from a hot path, but it means a Google Calendar connection revoked on Google's side keeps showing "Connected" in this surface. If revocation is expected to be handled elsewhere (worker, reconnect prompt), this is fine as scoped. Flagging it so the tradeoff is intentional rather than incidental.

const instance = organizationId
? await getActiveOrgInstance(user.id, organizationId)
: await getActiveInstance(user.id);
const verified = await completeManagedComposioGoogleCalendarConnection({
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low, Functional: Dropping withKiloclawProvisionContextLock here means the getComposioInstanceConfigSource(...) === 'manual' check inside completeManagedComposioGoogleCalendarConnection and its subsequent patchSecrets plus composio_config_source='managed' write are no longer atomic against provision/updateConfig, which still take that lock. The manual patchSecrets save path never took this lock even before, so the manual-versus-managed race is pre-existing and the durable source marker is the intended arbiter. Not a regression introduced here, just noting the lock no longer provides any mutual exclusion for this path.

return body.slice(0, 500);
}

function isRetryablePatchSecretsError(error: unknown): boolean {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low, Testing: isRetryablePatchSecretsError returns true for any non-KiloClawApiError, and logTerminalManagedComposioPatchSecretsError has a distinct status: 'transport_error' branch. The tests cover the KiloClawApiError 5xx-retry and 4xx-terminal paths but not a plain Error/TypeError (fetch abort, DNS failure). Add a test that rejects patchSecrets with a plain Error and asserts the retry budget is consumed and the log records status: 'transport_error', so a regression in that branch does not ship silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants