Skip to content

Fix false "relay unreachable" error on agent creation#1433

Open
matbalez wants to merge 1 commit into
block:mainfrom
matbalez:fix/agent-create-profile-sync-relay
Open

Fix false "relay unreachable" error on agent creation#1433
matbalez wants to merge 1 commit into
block:mainfrom
matbalez:fix/agent-create-profile-sync-relay

Conversation

@matbalez

@matbalez matbalez commented Jul 1, 2026

Copy link
Copy Markdown

Fix false "relay unreachable" error on agent creation

Problem

Creating an agent via Agents → New agent → Create agent always shows an
error toast:

<name> was created, but profile sync failed: relay unreachable: could not connect to relay

The relay is fully reachable — chat, message send, and the WebSocket connection
all work. The agent is created and (on next start) its profile is published
correctly. The toast is a false alarm, but it reads as data loss to the user.

Root cause

The New Agent dialog never pins a per-agent relay, so in create_managed_agent
(desktop/src-tauri/src/commands/agents.rs) resolved_relay_url is an empty
string. Phase 4 (profile sync) passed that empty string straight to
sync_managed_agent_profile, where relay_http_base_url("") returns "" and
the profile POST targets a schemeless, hostless /events. reqwest can't
connect, and classify_request_error buckets the failure as
"relay unreachable: could not connect to relay".

Every other managed-agent relay path resolves the target via
effective_agent_relay_url, which falls back to the active workspace relay when
the per-agent value is blank. The create-time sync was the only caller that
skipped it:

Caller Location Resolves relay?
create agents.rs:889 ❌ raw override (bug)
reconcile agents.rs:1226 effective_agent_relay_url
update/rename agent_models.rs:960 effective_agent_relay_url

This is why the profile "self-heals" on next agent start — the background
reconcile_agent_profile resolves the relay correctly. Only the create-time
toast was broken.

Fix

  1. Create path — resolve the sync target through effective_agent_relay_url
    in Phase 4, mirroring the reconcile and rename paths.
  2. Hardeningsync_managed_agent_profile now rejects a blank/hostless
    relay URL with a distinct, actionable error instead of the misleading
    "relay unreachable", so any future caller that forgets to resolve the relay
    fails loudly rather than emitting a false connectivity error.

Tests

Added to desktop/src-tauri/src/relay.rs:

  • ui_created_agent_blank_relay_resolves_to_reachable_host — regression: a
    UI-created agent (empty relay_url) resolves the sync target to the reachable
    workspace host, not a hostless base. Fails against the pre-fix logic.
  • blank_relay_http_base_is_empty — pins the trap the new guard catches.

cargo test relay:: → 26 passed. cargo fmt --check clean. Clippy clean on the
changed files.

Scope

Two files, +60/−2. No behavior change for agents that pin an explicit relay.

The New Agent dialog never pins a per-agent relay, so `create_managed_agent`'s
`resolved_relay_url` is empty. Phase 4 (profile sync) passed that empty string
straight to `sync_managed_agent_profile`, where `relay_http_base_url("")`
returns "" and the profile POST targets a schemeless, hostless `/events`. reqwest
cannot connect, and `classify_request_error` buckets the failure as
"relay unreachable: could not connect to relay" — surfaced to the user as a
false profile-sync error even though the workspace relay is fully reachable.

Every other managed-agent relay path (reconcile, rename) resolves the target via
`effective_agent_relay_url`, which falls back to the active workspace relay when
the per-agent value is blank. The create-time sync was the sole exception.

Fix: resolve the sync target through `effective_agent_relay_url` in Phase 4,
mirroring the reconcile and rename paths.

Hardening: `sync_managed_agent_profile` now rejects a blank/hostless relay URL
with a distinct, actionable error instead of the misleading "relay unreachable",
so any future caller that forgets to resolve the relay fails loudly.

Tests: add regression coverage that a UI-created agent (empty relay_url) resolves
the sync target to the reachable workspace host, and that a blank relay yields an
empty HTTP base (the trap the new guard catches).

Co-authored-by: Mathieu Balez <mbalez@squareup.com>
Signed-off-by: Mathieu Balez <mbalez@squareup.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@wesbillman wesbillman left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial review (requested by Wes, performed by Brain 🧠). The fix itself is correct and minimal — I traced all three sync_managed_agent_profile callers and create was indeed the only one skipping effective_agent_relay_url; the one-line resolution matches the reconcile (agents.rs) and rename (agent_models.rs) siblings exactly, and relay_ws_url_with_override can never return blank (falls through to DEFAULT_RELAY_WS_URL).

One correction for the PR body: it claims ui_created_agent_blank_relay_resolves_to_reachable_host "Fails against the pre-fix logic" — it doesn't (see inline comment). Worth fixing the body so future readers don't over-trust that test.

Inline: 3 comments (test claim, guard comment overstating coverage, comment verbosity). None blocking.

}

#[test]
fn ui_created_agent_blank_relay_resolves_to_reachable_host() {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test does not fail against the pre-fix logic, contrary to the PR body. It only composes effective_agent_relay_url + relay_http_base_url — two pure helpers unchanged by this PR and already covered individually by empty_override_uses_workspace_relay and the base-URL suite. The pre-fix bug lived in the caller (create_managed_agent Phase 4 not invoking the helper), which no unit test exercises.

Not blocking — Phase 4 is a Tauri command that's genuinely hard to unit-test, and the new guard in sync_managed_agent_profile is the real regression protection here. But please correct the PR body's "Fails against the pre-fix logic" claim, and consider trimming this test's comment to one line — as written it documents the fix rather than what the test verifies.

// any future caller that forgets gets a clear, actionable error instead.
let base = relay_http_base_url(relay_url);
if base.is_empty() {
return Err(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small things:

  1. The comment says "blank/hostless", but only blank is caught. A hostless-but-schemed input slips through: relay_http_base_url("wss://") returns "wss:" (the trailing-slash trim defeats the prefix strip), which is non-empty and sails past this guard into the same misclassified connect failure. Pre-existing edge and fine to leave unfixed — but the comment shouldn't claim coverage it doesn't have.
  2. This 6-line block restates the PR description; house style is the why lives in the PR, ≤1 short line in code. Something like // Fail loudly on a blank relay URL — callers must resolve via effective_agent_relay_url first. carries the load.

Nit: the error string surfaces verbatim in the user-facing toast. Only reachable if a future caller regresses, so acceptable as a loud dev-facing failure.

// this, a blank `resolved_relay_url` would produce a hostless `/events` POST
// that fails to connect and surfaces as a false "relay unreachable" error,
// even though the workspace relay is reachable. This mirrors the reconcile
// (`reconcile_agent_profile`) and rename (`update_managed_agent`) paths.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The resolution itself is exactly right (mirrors reconcile and rename). But this 8-line comment block restates the PR description in source — per house style the why belongs in the PR, at most one short line here, e.g. // Blank per-agent relay falls back to the workspace relay (matches reconcile/rename).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants