feat: add crew bridge plane for delegated and cross-provider peers#136
feat: add crew bridge plane for delegated and cross-provider peers#136zozo123 wants to merge 2 commits into
Conversation
Add a reserved label `crew=<name>` and a `--crew` flag on `run` / `warmup`. `list`, `status`, `release` accept `--crew` as a selector. For Tailscale-capable providers, the CLI mints the auth key tagged `tag:cbx-crew-<owner>-<name>` in user context — broker never sees Tailscale credentials. Cloud-init writes `/etc/hosts.cbx` so peers are reachable as `<slug>.cbx` and `<role>.cbx`. The `.cbx` suffix avoids collision with the real `.box` ICANN gTLD. When `TS_API_KEY` is exported, the CLI also self-bootstraps the concrete `tag:cbx-crew-*` rows on the operator tailnet on the first lease in each new crew — GET the policy with ETag, merge the missing tagOwners and self-peering grant, PUT back with If-Match so concurrent edits fail fast. Doctor reports `auto-managed` in that mode and falls back to a manual snippet hint without the key. Honors `TS_CONTROL_URL` for the client login server and `TS_API_URL` / `CRABBOX_TS_API_URL` for the admin API base so the same flow points at a self-hosted control plane (Headscale, etc.). When the endpoint is not Tailscale-shaped (HTTP 404 or no ETag on the policy GET), the auto-bootstrap and doctor both skip with a helpful pointer at the manual policy snippet. Non-Tailscale providers honor `--crew` as metadata; networking is rejected with a clear message and surfaced by doctor.
1a66a6d to
909af95
Compare
909af95 to
96bceda
Compare
Adds peer discovery across the full crew, regardless of provider: - Managed-Linux peers (Tailscale plane): endpoint=tailnet IP - SSH-lease peers: endpoint=ssh://host:port - Delegated-with-URL peers (E2B, Modal, Cloudflare, Railway, Islo, Tensorlake): endpoint=per-sandbox public URL - Blacksmith / no-adapter providers: surfaced as transport=none so doctor reports honestly `crabbox crew peers --crew <name> --json` returns the unified listing. `crabbox doctor --crew <name>` includes the reachability matrix per transport pair so users see the asymmetry: tailnet->url works one-way, url->tailnet doesn't, ssh-pairs need operator-side bridging (see the SSH-mesh DRAFT PR). Stacked on openclaw#129; merge after the foundation lands.
96bceda to
8834570
Compare
|
Codex review: needs real behavior proof before merge. Workflow note: Future ClawSweeper reviews update this same comment in place. How this review workflow works
Summary Reproducibility: not applicable. as an issue reproduction because this is a new feature PR. For the review findings, source inspection gives a high-confidence path: the resolver filters PR rating Rank-up moves:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. Real behavior proof Risk before merge
Maintainer options:
Next step before merge Security Review findings
Review detailsBest possible solution: Land or otherwise settle the crew foundation first, rebase this branch on current main, then make the bridge resolver’s source of truth match every advertised provider class before merging the bridge plane. Do we have a high-confidence way to reproduce the issue? Not applicable as an issue reproduction because this is a new feature PR. For the review findings, source inspection gives a high-confidence path: the resolver filters Is this the best way to solve the issue? No, not as currently proposed. The unified peer view should either persist crew and endpoint metadata for every advertised provider at claim time or resolve from provider-owned labels/status, and the stale stacked branch needs a rebase before merge review can be trusted. Label changes:
Label justifications:
Full review comments:
Overall correctness: patch is incorrect What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 35f1863259fa. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
Closing in favor of #129, which now consolidates all three transport planes (Tailscale + Bridge + SSH-mesh) into a single PR. The Bridge plane work (Islo / E2B / Railway adapters, |
TL;DR
PR #129 introduced crews and the Tailscale plane for managed-Linux providers. This PR makes
crabbox crew peers --crew <name>return one row per crew member regardless of provider, with atransporthint and a canonical endpoint. The result: a single command lists tailnet peers, URL peers, SSH peers, and even providers that own their own connectivity, with a documented reachability matrix that is honest about the asymmetric pairs.Cross-provider unified peer view
crabbox crew peers --crew <name> --jsonreturns the documented{ "members": [...] }shape:{ "members": [ { "slug": "web", "provider": "hetzner", "transport": "tailnet", "endpoint": "100.64.1.3", "labels": {"role": "web"} }, { "slug": "api", "provider": "islo", "transport": "url", "endpoint": "https://abc.share.islo.dev", "labels": {"role": "api"} }, { "slug": "db", "provider": "runpod", "transport": "ssh", "endpoint": "ssh://1.2.3.4:22", "labels": {"role": "db"} }, { "slug": "what", "provider": "blacksmith", "transport": "none", "endpoint": "", "labels": {"role": "isolated"}, "note": "blacksmith owns connectivity" } ] }tailneturlBridgeState=unsupported)sshssh://<host>:<port>pendingnoneThe resolver does not invoke a provider API for tailnet or SSH peers — it reads the endpoint straight off the local claim sidecar (new optional fields
tailscaleIPv4,tailscaleFQDN,sshHost,sshPort,bridgeURL,labels). When those fields are empty for a managed-Linux or SSH-lease provider, the peer surfaces astransport=pendingwith an honest note rather than pretending the endpoint exists.For URL peers the per-provider
BridgeProvideradapter from #136's original commit is still invoked when no canonical endpoint is recorded yet, preserving the islo live-demo behavior and theBridgeState=unsupportedsignal that Modal/Cloudflare/Tensorlake emit today.Doctor reachability matrix
crabbox doctor --crew <name>keeps the existing Tailscale ACL row check from #129 and, in the same invocation, prints the per-transport reachability matrix derived from the unified peer list:The matrix is intentionally asymmetric:
tailnet -> urlworks (outbound HTTPS),url -> tailnetdoes not (no public endpoint on tailnet peers), and SSH pairs are flagged WARN because Crabbox does not currently mesh SSH leases — operator-side bridging is required (see #137 for the SSH-mesh DRAFT).Provider coverage matrix
tailnetpending.sshssh://host:port; empty surfaces aspending.urlshareAPI, idempotent.urlhttps://<port>-<sandboxID>.<domain>.urlLatestDeployment. One URL per service.urlBridgeState=unsupportedso the gap is visible instead of silent.nonenote: "blacksmith owns connectivity".Tested
go test -race ./...— all packages green (39s forinternal/cli).internal/cli/crew_bridge_test.go:TestCrewPeersIncludesManagedLinuxWithTailnetTransportTestCrewPeersIncludesSSHLeaseWithSSHTransportTestCrewPeersHandlesPendingTailscaleIPTestCrewPeersHandlesBlacksmithAsNoneTestDoctorCrewReachabilityMatrixAsymmetricTestRenderCrewReachabilityMatrixIncludesAsymmetricNotesXDG_STATE_HOME/crabbox/claims/*.jsonfor four providers (hetzner, islo, runpod, blacksmith) and confirming the JSON matches the documented shape and the doctor matrix matches the table above.Live islo evidence from PR #136's first commit (still valid after this refactor): live-validated against two real islo.dev sandboxes. From the client sandbox, dialing the web peer's published share URL via
urllib.request.urlopenreturned HTTP 200 with the expected Python http.server body. Both sandboxes cleaned up. No new live tests were run for this amend — the resolver path for managed-Linux and SSH-lease peers is exercised by the unit suite against mocked claim sidecars; the URL-transport path is unchanged from the previously live-validated islo run.Why not Tailscale-here for delegated providers
Delegated providers don't expose a VM Crabbox can install
tailscaledon. The Tailscale plane stays the right answer for managed Linux providers; the bridge plane is the narrow HTTP-only primitive that delegated providers can actually provide.Honest scope
crabbox crew peersreports the SSH endpoint honestly so callers can dial it directly; the matrix flags SSH pairs as WARN. See feat: crew SSH-mesh (DRAFT, stacked on #129) #137 for the operator-orchestrated SSH-mesh alternative.BridgeState=unsupportedinstead of silently emitting empty peer rows.tailscaleIPv4/tailscaleFQDN/sshHost/sshPort/bridgeURL/labelsfields are optional and additive — existing claim sidecars without them surface aspending(with note) for managed-Linux and SSH-lease providers, which is the same as the previous behavior of "no peer row at all" but more visible.Follow-ups
transport=pendingshrinks to "never" for normal usage.unsupportedtosupportedwithout touching this PR).crabbox doctor crew --bridgeflag that also runsProbeBridgePeersagainst the URL-transport peers.<slug>.cbxalias rewriting once peer URLs are stable enough for clients to assume DNS.Note on base
This branch is rooted on
mainand lands the bridge plane on top of the crew foundation introduced by #129. Until #129 lands, the PR shows two commits: the rebased crew commit (identical to #129's tip) and the bridge commit. Once #129 merges, this PR can be rebased and only the bridge commit will remain.Related